VIM - generate a reference section for markdown html

2020-11-17 @Technology

We can use VIM to automate the creation of a ‘Sources referenced’ section in Markdown html. As I generate most blog content via Markdown and tend to include such a section for my longer posts having numerous links, this sort of automation can save significant time. In addition, the exercise will showcase a number of VIM features not widely known to users.

The goal is to automate the transition from this:

... some text [link1](url) ...
... more text [link2](url) ...

To this:

... some text [link1][] ...
... more text [link2][] ...

### Sources referenced

- [link1][]
- [link2][]

[link1]\: url
[link2]\: url

(Disregard the backslashes above. My markdown converter can’t seem to render these lines without them, seemingly confusing them with actual markdown, despite the code block.)

As you see above, we shall not only generate the desired reference section, but convert all inline urls to reference format, resulting in just one list of url definitions at the very bottom, enabling us to reference the same urls on multiple occasions.

Let’s first cover the individual commands step-by-step, then towards the end consolidate our automation into a function.

  1. Create the reference section header.

    :$normal o### Sources referenced <a name='#ref'></a>

    Here we execute a normal mode command on the last line (address $) to append a new line with the above section title and even an anchor, case we wish to later link to it.

    NOTE: Since we desire to automate the whole procedure into a function, we focus on command-mode commands, not normal-mode, in contrast to a mapping, for instance. However, in the above case, since insertion of content is easiest handled from the normal mode, we do so with the normal directive.

    (All key strokes that follow normal are executed verbatim as if we were in the normal mode.)

    As a bonus, since VIM stores the address of the last insert position in the mark ^, the following commands conveniently leverage this address to perform all relevant operations before and after the new section heading.

  2. Copy all lines containing inline urls to the new reference section.


    Here we execute a global copy command (t for short) on all lines matching the above regex, which ultimately matches all lines containing inline urls (precisely formatted [...](...)). Following the t we specify the destination address, in this case $ (the last line), which will copy the lines to just below the new section heading.

  3. We can now work with these copied lines to strip out the urls, eliminating the remainder. The following command, in one operation, will, for each inline url, convert it to a reference link AND create a url definition.

    :'^,$s/\v[^\[]*(\[[^\]]+\])\(([^)]+)\)[^\[]*/- \1[]\r\1: \2\r/g

    Remember, we’re now working exclusively in the new bottom section indicated by the above address ^,$ prior the search/replace directive..

    The above regex is the most convoluted in this entire process. I’ll not detail it, but it replaces the search pattern using numbered anchors \1 and \2, the first repeated twice since we convert one directive to two different ones. The regex also accounts for multiple urls on the same line, using the global g suffix.

    (The \v simply instructs a verbose regex mode, which slightly changes which characters we need or need not escape, although we could easily structure the regex without it.)

    So we convert from this

    ... some text [link1](url) ... more text ...
    ... more text [link2](url) ... and more

    To this:

    - [link1][]
    [link1]: url
    - [link2][]
    [link2]: url
  4. Delete all blank lines


    Here’s another global mode command executed in the new address space, deleting all blank (^$) lines with the d directive. Optional, but the elimination of these blank lines will make the generated html link list continuous, without blanks.

  5. Move the link definitions to below the referenced links

    In the penultimate step our reference URLs and their definitions were generated intermixed. We wish to segregate them, without which, the resulting HTML could not be a continuous list. Plus, the segregation looks more organized in the markdown.

    :^,$g/^\[/m $

    This takes the lines commencing immediately with the square bracket (the url definitions), and moves them to the bottom, via the m (move) directive, $ the address for the last line. We’re now left with this:

    - [link1][]
    - [link2][]
    [link1]: url
    [link2]: url
  6. Lastly, convert the original inline urls to a reference format.

    Since we concentrate all url definitions at the bottom, we now wish to convert the original URLs to a reference format, removing all duplication.


    We now search/replace in the address space prior to the reference section, converting each inline url [name](url) to [name][].

Let’s now create a function to consolidate all the above commands.

Place the below in a file ~/.vim/ftplugin/markdown/helpers.vim (or markdown/<anything>.vim). For the record, the ftplugin feature of VIM causes the appropriately named script to be loaded for the respected file type. For markdown, this can be either ftplugin/markdown.vim or ftplugin/markdown/<anything>.vim.

function! GenRefSection()
    $norm o### Sources referenced <a name='#ref'></a>
    silent! '^,$s/\v[^\[]*(\[[^\]]+\])\(([^)]+)\)[^\[]*/- \1[]\r\1: \2\r/g
    '^,$g/^\[/m $
    silent! 1,'^s/\v(\[[^\]]+\])\([^)]+\)/\1[]/g
command! GenRefSection call GenRefSection()

The function mimics the steps we’ve covered without the leading :, since these commands already presuppose the command mode. The silent! directives for the search/replace ops silence any errors, such as when no inline urls are to be found.

To invoke, it remains to type :GenRefSection in any markdown document (it should auto-complete).

Questions, comments? Connect.