Custom Note Tagging System with Ctags and Vim

This post is part of the Workflow series.

In the previous post I expressed my requirements for an ideal note taking system and took the first steps in designing it for my preferred editor, which is Vim. My overall desire is to create an ecosystem of interconnected notes in such a way that this system does not only become an extension or recording of my thoughts, but also a quasi-independent dialogue partner in the creative process of writing. The idea is that when you are going to write something, you start by opening a note on the topic of choice, and that from there on you can effortlessly follow links to other related notes to discover new lines of thought. To this end, I wanted to implement a tagging system that is tailored to the way I make notes, in addition to the search functions for file names and their contents that are discussed in the previous post .

For example, I was reading an article from Stiegler and encountered an interesting thought on capitalism and the Anthropocene, so I added the tags “@capitalism” and “@anthropocene”. At that point that specific place in the text is connected to all my other notes on capitalism or the Anthropocene and included in many possible trajectories through the note system.

Initially I was working on my own implementation by playing around with Python and Vimscript, but I have settled on a solution that is fast, cross platform, with minimal dependencies and perfectly integrated within Vim.

The credo of most of my posts on Vim so far has been that even though Vim is known as the programmer’s editor, non-technical writers should also leverage its power and endless options for customization. This post is written in the same vein because it repurposes a technical tool called ctags. The original purpose of ctags is to go over a code project and make an index of all function names and provide a link to the place where they are declared (a feature nowadays integrated in full blown IDEs). This allows the programmer to easily navigate through a complex coding project. Over time this program has been extended to be used with many other programming languages as well, even though it has retained the reference to the C programming language in its name. And here is the crux: later variaties of the ctags program, called Exuberant Ctags and Universal Ctags allow you to define extensions towards other languages!

What this means is that I can define my own syntax for the tags I’m going to use and then let ctags create an index of those tags with links to their corresponding files. As an icing on the cake: due to its strong roots in programming culture, Vim has native support for navigating with those tags! N.B. to be clear: even though Vim has native support for ctags, it is not a plugin but an external program that does not automatically ship with Vim.

In this post I’ll walk you through the process to set this up and explain the rationale of each step along the way. Note that even though I design this for Vim, this system works well for any editor that is smart about ctags or something equivalent. If you are not interested in the technical details at all, you can skip the next section. If you have no idea what I’m talking about and just want to see some pictures, start with the last section.

Defining and parsing the syntax of your tags

Because I write in Markdown the hashtag is not a candidate for our tag syntax because it already indicates a header. I chose to define tags instead as such: “@tag”. You can define your tags in a different way of course, but I suggest you keep it simple because you will have to write the rule that correctly parses your tags.

Different ctags versions offer different options, so it is good to provide a quick overview. Historically ctags has first evolved into Exuberant Ctags and recently into Universal ctags. Exuberant Ctags introduces support for other languages than C and first introduced the possibility to support other languages in two ways. The first and simplest option is to provide a regular expression, which is basically a rule to find your tag by matching a particular pattern in a string. The second option is to define your own full blown parser. We are not creating a whole new language, but only need to find simple tags, so we are obviously going for the first option.

Because non-technical writers very likely do not know how to write regular expressions (regex for short), I’ll walk you through the process of writing one. Be aware that there are many dialects for writing regex, but that all versions of ctags use an incredibly old version called Extended Regular Expressions (ERE) which really limits what we can do. Another reason to keep things simple.

This is how such a reasoning process could look like. Let’s say we write two tags on one line for example as such:

@meme-machine @vimlife

Our rule should recognize two single tags. Intuitively, the rule should be something like: find an “@” and then match all word characters until you encounter a character that clearly does not belong to the word.

This simple regex would be expressed as @(\w+):

@   find a literal "@"
(   start a "capture group", i.e. the part of the expresion that we are interested in
\w  the "@" should be followed by a "word character" (alphabetic letters and numbers)
+   there should be at least one character after the "@" but there can be infinitely more
)   close the capture group. The part within brackets is the tag. 

So when we write @vimlife in our note, the regex will find vimlife as the tag. However, this is a bad regex. The first problem is that it will not match @meme-machine correctly. Because the - is not a word character, this regex will incorrectly return meme as a tag instead of meme-machine. We could improve on this regex by refining our rule: find an “@” and then match any character until you find a space or a newline.

This regex could be expressed as @(\w.*)\s:

@   find a literal "@"
(   start a "capture group", i.e. the part of the expresion that we are interested in
\w  the "@" should be followed by a "word character" (alphabetic letters and numbers)
.   a wildcard that matches any character whatsoever, including characters such as "-"
*   there can be 0 or more of those wildcard characters
)   close the capture group. The part within brackets is the tag
\s  when we find a space we are at the end of the tag. 

This avoids the previous problem, but introduces a new one. One feature that is common nowadays but absent in the regex for ctags is a thing called lazy evaluation. If the regex would be lazy then the rule would stop matching at the first space, which separates the two tags. But unfortunately our regex is greedy, meaning he will make the match as long as possible. The combination .*\s matches everything until a space character is found, but the end of the line is also a space character type! As a result, @meme-machine @vimlife is considered to be a single tag, which is obviously not what we want.

In modern regex dialects you could explicitly make the star match lazily by appending a question mark. Then the regex would look as such: @(\w.*?)\s. But this is not possible in the ERE dialect of ctags. In other words, time to take a step back and re-evaluate how to solve this problem without lazy evaluation, which is better in any case because lazy evaluation is computationally expensive. Click away if you want to think about it yourself.

If not, my simple solution is @(\w\S*):

@   find a literal "@"
(   start a "capture group", i.e. the part of the expresion that we are interested in
\w  the "@" should be followed by a "word character" (alphabetic letters and numbers)
\S  any *non*-whitespace character (inverse of \s)
*   0 or more non-whitespace characters
)   close the capture group. The part within brackets is the tag

This is a more efficient approach with the same effect as using lazy evaluation. Because a tag now does not contain any whitespace characters by definition, the first tag is matched separately. I still enforce that the first character after the “@” has to be a word character, otherwise “@" or for example (and amusingly) the regex pattern itself would be a tag.

UPDATE: 15-1-2020

Of course, URLs that contain “@” will also be matched with the current regular expression. We can exclude these matches by requiring that “@” either occurs at the beginning of the line or is preceded by a “space” character (i.e. “@” occurs at the beginning of a new word somewhere in a sentence ). In other regex dialects you have the special \b sign to indicate word boundaries, but not in the ERE POSIX dialect. We can however write (^|[[:space:]])@(\w\S*):

( open a group
^ match the beginning of the line
| or instead match
[[:space:]] any whitespace character
) close the group
@   find a literal "@"
(   start a "capture group"; this the part of the expresion that we are interested in
\w  the "@" should be followed by a "word character" (alphabetic letters and numbers)
\S  any *non*-whitespace character (inverse of \s)
*   0 or more non-whitespace characters
)   close the capture group. The part within brackets is the tag

I adjusted the code below to this new regex. Note especially that we now have two groups, and that we are interested in the second one only, so our back reference changes from \1 to \2.

Installing and configuring ctags

There is still another problem left. Modern implementations of regex engines in programming languages offer the option to find all regex matches of a given line. However, when we use regex only our pattern only matches the first tag. This means that in @meme-machine @vimlife the second tag will never be registered.

I thought about this for a bit, but long story short, this problem cannot in principle be solved with Exuberant Ctags when we take the regex route. So if you for some reason insist on using Exuberant Ctags rather than Universal Ctags the tagging system strictly requires you to only put one tag on each line. If that’s the way you want to go, then create a configuration file called .ctags in your home directory and write the following specification of our markdown tagging language.

--langdef=markdowntags
--langmap=markdowntags:.md
--regex-markdowntags=/(^|[[:space:]])@(\w\S*)/\2/t,tag,tags/

The first line defines the name of our language, the second line associates our new language with a file extension (I use .md for Markdown) and the third line specifies our regex pattern, a backreference to our capture group (\2) and lastly a specification of the type of tag this is. I just called it tag, t for short. As you might see, these options are flags that will be given to the ctags command. You can download exuberant tags here or simply with your package manager of choice.

Despite they limitations of using regex only, the successor of Exuberant Ctags called Universal Ctags does have a way to return multiple tags per line through the use of an experimental feature. Using Universal Ctags has other benefits as well. The benefits as I perceive them are:

  • Support for even more languages, including Markdown!
  • Does not necessarily use a system wide configuration, so you can define your needs on a per project basis
    • The config file can thus be included in your GitHub repository and you’ll be set up immediately after cloning your repository on any computer.
  • Multiline support (this is actually what we abuse to find multiple tags on one line)

You can download the latest build of Universal Ctags for Windows on the project’s GitHub page . If you are using Windows, make sure you place the executable in a folder that is contained in the PATH variable, so that you can run ctags from the command line. On Linux just download the package with your package manager of choice. If you use the Arch User Repository (AUR) look for this package .

To avoid conflicts with Exuberant Ctags the configuration files are now located in a special directory. So after installing create the directory .ctags.d/ and create the file md.ctags within that directory. The configuration syntax has slightly changed. The main change is that we will use a multiline regex now. Because programming languages that rely on brackets to indicate scopes can spread structures of interest over multiple lines, the usefulness of pure regex is limited. This feature can however also be used to find multiple matches within a single line. Have a look here for documentation, if you are interested. Otherwise, copy the following configuration to your configuration file in ./.ctags.d/md.ctags, relative to your project folder.

--langdef=markdowntags
--languages=markdowntags
--langmap=markdowntags:.md
--kinddef-markdowntags=t,tag,tags
--mline-regex-markdowntags=/(^|[[:space:]])@(\w\S*)/\2/t/{mgroup=1}

Note that you can’t call your custom language just “markdown” because that language definition already exists (unlike in Exuberant Ctags). By default Markdown headers etc. will be produced as tags, but I actually do not care about that and added the second line to explicitly indicate I want to use my own language definition and not the default language also mapped to the .md extension. Almost good to go!

Creating tags

Tags can now be created easily from the command line by changing your directory to your project folder (here, our notes repository), and then running ctags recursively on the current folder (recursively indicating that all subfolders will be taken into account as well):

ctags -R .

This will create a file names tags in your project folder. You can open it to inspect if everything worked out correctly. As you will see, the generation of tags is very fast as this tool is designed to still work for very large and complex code projects, where each file has many tags. We’ll have less files and significantly less tags per file.

So far this post has been completely editor agnostic. But the beauty of using ctags for our note taking tags is that Vim handles them exceptionally well.

The power of the whole command line is at your fingertips, because Vim can run external commands from within the editor. So you do not have to leave Vim to generate the tags. You can simply type :!ctags -R . , where the dot refers to the current directory.

This does however assume that Vim’s current directory is your project root folder. Verify this with the command :pwd. Alternatively, you could replace the dot with the path towards your notes directory. But the better option is to use Vim’s native cd (change directory) command and change the working directory to your notes folder. For example, type :cd ~/Documents/Notes. This also allows you to more efficiently search files by only considering your notes.

To make this whole process smooth we can easily make some mappings so we don’t have to bother typing commands anymore. Remember that <leader> is by default the backslash.

" Generate ctags
nnoremap <leader>tt :!ctags -R . <CR>

Alternatively, if you do not want to see the command output you can generate the tags silently, but a quirk with this is that you have to force a redraw of your screen afterwards. Try it out without in terminal Vim, and you’ll see what I mean.

" Generate ctags silently
nnoremap <leader>tt :silent !ctags -R . <CR>:redraw!<CR>

As shown in the previous post on note taking in Vim, I have a mapping that immediately brings me to the index of my notes and also automatically changes my directory to the project root. I strongly recommend this. If you have an idea you quickly want to write down you can jump to your notes folder within a second and start writing.

" Go to index of notes and set working directory to my notes
nnoremap <leader>ni :e $NOTES_DIR/index.md<CR>:cd $NOTES_DIR<CR>

Alternatively, you can define a function to change the directory to the root of the file you are currently editing (e.g. the index of your notes):

" Change directory to directory of current file
nnoremap <leader>cd :cd %:h<CR>

UPDATE 14/4/2020: I’ve received replies and emails specifically from MacOS users that my ctags extension does not work. I do not have access to a machine with MacOS and cannot reproduce the issue. I suspect that the universal-ctags build for MacOS uses a slightly different regex engine. Luckily, a helpful comment from Fernando offers a fix. I’ve had confirmation from at least one other MacOS user that this fixed his issue as well.

As said before, Vim has great support for handling ctags. Vim knows about the location of your tags file. If Vim doesn’t find your tags, check that you are in the right directory and also make sure that the tags variable makes sense with :set tags? Alternatively, set tags explicitly in your .vimrc or ._vimrc (Windows) configuration file for example as such:

set tags+=./tags;,tags

The semicolon allows Vim to recursively move up a file tree to look for a tags file in case it doesn’t find one as explained here . You can now search tags with autocompletion with the tselect command, or ts for short.

I for example have a tag @workflow, so I would type in :ts work <TAB>, which auto completes ts workflow. This will open a menu with a numbered list of all files with the tag workflow. You can quickly jump to a file by entering its number.

Pro tip: make your search case insensitive! This makes autocompletion ignore the case, so that :ts Work<TAB> still autocompletes to :ts workflow. To achieve this, set this in your .vimrc:

" Ignore case in searches
set ignorecase

Another really nice feature is that you can search on the tag that is currently under your cursor (or one place to the right). You do this with the <Ctrl>-] command.

This will jump to the first encountered tag. What’s also really nice is that it jumps to the exact line where the tag is used, so you do not have to search further manually.

One interesting note here is that the way we use tags is really quite different than its regular use in programming. The base case in programming is that you define a function once and that it is called in many places. The desired default behavior is that from all those places where it is called, you can quickly jump to the place where that function is defined. It can however occur that you override a function definition, so that in fact you end up with an ambiguous tag where the same tag links to two different locations.

We however desire and exploit the ambiguity of tags.

The whole principle of rhizomatic navigation that I desire is exactly that tags are defined in multiple places. The tselect command already gives you all options for navigation. But if we want to find all files for the tag under the cursor rather than only the first one, we do not use <Ctrl>-] but g ] instead. This shows all ambiguous tags, i.e. all the files in which it is “defined.”

It gets even better. Because tags are so well integrated in Vim, your fuzzy finder plugin will almost certainly also be able to search the tags file. I use CtrlP because it works well both on Linux and Windows. My previous post mentions my setup for CtrlP using ripgrep. When searching using <Ctrl>-P you can toggle whether you are searching files, buffers or tags with and (backward and forward, see :help ctrlp-mappings). Alternatively, you can directly invoke the :CtrlPTag command. Various autocompletion plugins will also be able to suggest and complete tags.

UPDATE 15/4/2020: You probably want to define a quick mapping for this, for example:

" Binding for searching tags ("search tag")
nnoremap <leader>st :CtrlPTag<CR>

One last trick before I’ll share screenshots of an example workflow. If you follow a tag to another file, look around for a bit, and then want to go back to where you where before going down the rabbit hole, you can type <Ctrl>-t to go back to through what is called the tag stack. The tag stack basically tracks the trajectory you’ve taken by following tags through your notes. A beacon of light in the mess of the creative mind.

Screenshots of example workflow

After opening gVim (the screenshots are from my Windows machine), I press \ni (Notes Index) to change the working directory to my notes and to open the index page.

Starting from my index page, I can’t quite remember the name of a tag, so I’ll decide to use fuzzy finding.

The detail shows the fuzzy nature of the tag search. I typed AC (randomly), but as you see also results like “Jacobs” and “aircraft” are displayed.

From the list of suggestions I chose “Jacobs”, which is the name of a university professor. This could be some author you are writing a paper about. As a result I’m now viewing lecture notes of a security course I followed, which discusses a range of topics. We hold our cursor on the tag “security”.

The command g] opens a list of all ambiguous tags. We see that another file is also about security. So let’s expand our horizon and enter its number to visit that file.

We have now reached another file with course notes on a highly related topic. It discusses security, but clearly from a more societal and philosophical perspective, i.e. the human side of computer security.

And so on. I might by now have a more specific idea to write about. If it’s a single concept I’ll make a small note in my “Zettelkasten” directory (for which I have another easy binding), where I’ll might decide to explicitly link to all the files I’ve explored. If I add the security tag there as well together with a new tag, I’ve opened up new lines of thought!

Conclusion and summary of used Vim mappings

Like with my previous post on this topic, I’m writing about this while exploring ideas so everything is WIP. It is possible to define multiple regex rules for our custom language, so it’s easy to add more features to this tagging system. I might for example explore the usefulness of tracking explicit markdown links to other files with this system.

Let me know if you have suggestions! Feedback is welcomed.

If at some point I haven’t changed my system in a long time I’ll likely bundle together a .vimrc with everything you need. The system so far actually heavily depends on native Vim mappings, so you do not need much at all (Keep It Simple Stupid)! With the code below you can install CtrlP using vim-plug . There are two external dependencies, Universal Ctags and ripgrep which however are both cross-platform, minimalistic and do not require configuration outside of what is provided below. Plug and play. For now, I’ll provide a quick summary of mentioned Vim bindings and settings (and some not mentioned) as requested here :

" Specify a directory for plugins
" - Avoid using standard Vim directory names like 'plugin'
call plug#begin('~/.vim/plugged')

" Fuzzy file finding
Plug 'kien/ctrlp.vim'

" Initialize plugin system
call plug#end()

" Ignore case in searches
set ignorecase

" Generate ctags for current working directory
nnoremap <leader>tt :silent !ctags -R . <CR>:redraw!<CR>

" Change directory to directory of current file
nnoremap <leader>cd :cd %:h<CR>

" Quickly create a new entry into the "Zettelkasten" 
nnoremap <leader>z :e $NOTES_DIR/Zettelkasten/

" Go to index of notes and set working directory to my notes
nnoremap <leader>ni :e $NOTES_DIR/index.md<CR>:cd $NOTES_DIR<CR>

" 'Notes Grep' with ripgrep (see grepprg)
" -i case insensitive
" -g glob pattern
" ! to not immediately open first search result
command! -nargs=1 Ngrep :silent grep! "<args>" -i -g '*.md' $NOTES_DIR | execute ':redraw!'
nnoremap <leader>nn :Ngrep 

" Open quickfix list in a right vertical split (good for Ngrep results)
command! Vlist botright vertical copen | vertical resize 50
nnoremap <leader>v : Vlist<CR>

" Make CtrlP and grep use ripgrep
if executable('rg')
    set grepprg=rg\ --color=never\ --vimgrep
    set grepformat=%f:%l:%c:%m
    let g:ctrlp_user_command = 'rg %s --files --color=never --glob ""'
    let g:ctrlp_user_caching = 0
endif

" Binding for searching tags ("search tag")
nnoremap <leader>st :CtrlPTag<CR>

" What to ignore while searching files, speeds up CtrlP
set wildignore+=*/.git/*,*/tmp/*,*.swp

" This step is probably not necessary for you
" but I'll add it here for completeness
set tags+=./tags;,tags


Self portraits using stable diffusion <-- Latest

A Mailman's Digitized Grammar of Action <-- Next

Building a Note-taking System with Vanilla Vim <-- Previous

Digest April 2021 <-- Random

Webmentions


Do you want to link a webmention to this page?
Provide the URL of your response for it to show up here.

Comments

Nima on Saturday, Feb 29, 2020:

Thanks a lot for the great post.

As a suggestion I prefer fzf (e.g. vim-fzf) instead of CtrlP which is more neat and faster. The default command for tag is :Tag which can be easily mapped.

Edwin on Saturday, Feb 29, 2020
In reply to Nima

The main reason I went for CtrlP is that I wanted to use the exact same setup on Linux and Windows. Fzf is great on Linux but it’s not designed for Windows and didn’t work well for me on Windows (without suboptimal workarounds). I did find that the speedup with using ripgrep is significant though. Having said that, I could in a later post describe this same setup with fzf, just for completeness. Thanks for the feedback!

Xavi on Thursday, Mar 5, 2020:

Thanks for the post. Really cool! I’ve got a problem though. I did install universal ctags but using the md.ctags stated above:

–langdef=markdowntags –languages=markdowntags –langmap=markdowntags:.md –kinddef-markdowntags=t,tag,tags –mline-regex-markdowntags=/(^|[[:space:]])@(\w\S*)/\2/t/{mgroup=1}

it fails to parse the @tags. Ctags does pick the markdowntags extension but it seems as if the regex does not work properly?

The test.md file reads:

@tag1 @tag2 @ tag3

And when I run:

ctags –verbose test.md

It returns an empty tags file, the logs showing:

Considering option file /Users/xavi/.ctags.d/md.ctags: reading… Option: –langdef=markdowntags Add optlib parser: markdowntags Option: –languages=markdowntags Enabled languages: markdowntags Option: –langmap=markdowntags:.md Setting markdowntags language map: (removed from Markdown) Option: –kinddef-markdowntags=t,tag,tags Add kind[0] “t,tag,tags” to markdowntags Option: –mline-regex-markdowntags=/(^|[[:space:]])@(\w\S*)/\2/t/{mgroup=1} Entering configuration stage: loading file(s) under the current directory Entering configuration stage: loading environment variable Reading initial options from command line Entering configuration stage: loading command line Option: –options=/Users/xavi/.ctags.d/md.ctags Considering option file /Users/xavi/.ctags.d/md.ctags: already considered Reading command line arguments Get file language for test.md pattern: test.md #candidates: 1 0: markdowntags (extension: “md”) #candidates after sorting and filtering: 1 0: markdowntags (extension: “md”) OPENING test.md as markdowntags language file [new] Initialize parser: markdowntags sorting tag file system (“sort -u -o ‘tags’ ‘tags’")

I know is a long shot, but any hint or thing I could test further?

Thanks a lot!!!!

Xavi.

Edwin on Friday, Mar 6, 2020
In reply to Xavi

Hey, thanks for dropping by. I had a look at your output trace and tested the regex again both on Windows and Unix. I know of some other people also using the same Regex so I don’t think that’s the issue. Your output trace also looks fine, so I’m also a bit at a loss. The only irregularity I noticed was that for some reason ctags tried to read your options file twice (in the same location though). That does not occur for me and I don’t know why it does for you. And even then, I would not expect it to make a difference.

For further testing, it’s perhaps a good idea to repeat the same experiment but do it in a subdirectory somewhere instead of your home folder (and have .ctags.d in that same directory). It’s a long shot, but if that would make a difference perhaps ctags is loading something strange under the hood that’s in some subdirectory of your home folder.

Is test.md file under ~/test.md by the way?

And just to be sure, if you run ctags --version it indeed says you use Universal Ctags? My build was from December but I downloaded the latest one, and also verified my code works in that case.

Fernando on Monday, Apr 13, 2020:

Hello, nice post, thanks.

As for Xavi, universal ctags didn’t work for me. An issue with \w and \S which weren’t recognized. And even then, the multitag per line didn’t work. But looking at the doc, i don’t know why it works for you, it shouldn’t. Maybe a difference in ctags from the december version as of today (april) ? I did, however, managed to make it work (not using the \w and \S characters, and using recursion by re-entering the toplevel parser):

--languages=mdtags
--langmap=mdtags:.md
--kinddef-mdtags=t,tag,tags
--_tabledef-mdtags=toplevel
--_mtable-regex-mdtags=toplevel/(^|[[:space:]])@([a-zA-Z0-9][^[:space:]]*)/\2/t/{mgroup=1}{tenter=toplevel}
--_mtable-regex-mdtags=toplevel/.//```

I am using universal ctags v0.0.0(7a924d77),on macosx.

Hope this helps

Edwin on Monday, Apr 13, 2020
In reply to Fernando

Thanks a lot for your useful response! I actually received another mail today from someone on MacOS encountering issues with my setup, that I could not reproduce. When I have time I’ll update my universal ctags version to the most recent build and do some testing. If that doesn’t cause the issue, I might look into why specifically people on MacOS seem to be having issues (I only tested on Linux and Windows). In any case, great that you already found a fix and thanks a lot for sharing! :-)

Edwin on Monday, Apr 13, 2020
In reply to Fernando

My setup still works for me using the latest universal ctags build and interestingly I can’t seem to get your code to work (on Windows, in this case). If you have time, do you mind sharing why according to you my code should not work according to the documentation?

Greg on Thursday, Jun 4, 2020
In reply to Fernando

Thanks Fernando for your input!

Here is my working .ctags.d/md.ctags:

--langdef=mdtags
--languages=mdtags
--langmap=mdtags:.md
--kinddef-mdtags=t,tag,tags
--_tabledef-mdtags=toplevel
--_mtable-regex-mdtags=toplevel/(^|[[:space:]])@([a-zA-Z0-9][^[:space:]]*)/\2/t/{mgroup=1}{tenter=toplevel}
--_mtable-regex-mdtags=toplevel/.//

Fernando on Tuesday, Apr 14, 2020:

Sure. According to :

https://docs.ctags.io/en/latest/optlib.html#regular-expression-regex-engine

the regex should match only once - even though it doesn’t say so explicitl. However it does say:

“A more relevant use-case is when {_advanceTo=N[start|end]} is used in the experimental –_mtable-regex-, to “advance” back to the beginning of a match, so that one can generate multiple tags for the same input line(s).”

That is what I thought you used when you said in this page: ‘universal ctags does have a way to return multiple tags per line through the use of an experimental feature.’

But different behaviours on different platforms… :-(

Edwin on Thursday, Apr 16, 2020
In reply to Fernando

Thanks a lot for checking in again! I really wouldn’t have been aware of this issue if it weren’t for you, and I know for a fact you helped others, so hats off to that. I indeed see the confusion I may have caused with the word “experimental”. In the recent doc the mline option is presented as “new” but not as “experimental”. It is however discussed more or less in one breath with the experimental “_mtable” feature, and all further flags apply to both; that’s where I conflated them.

Anyways, you are absolutely right that the regex itself matches only once. The trick is that using mline-regex-markdowntags switches from the regular line based approach to a file-based approach. I have no clue about the internals and didn’t question them further as it got the job done for me. The doc on mline says:

This flag indicates the pattern should be applied to the whole file contents, not line by line. N is the number of a capture group in the pattern, which is used to record the line number location of the tag. In the above example 3 is specified. The start position of the regex capture group 3, relative to the whole file is used.

However, the doc also says that you must specify the mgroup variable. I just did some testing, and things still seem to work even if I delete that. So either it defaults to zero without the documentation saying so, or its… MAGIC. In any case, your solution still seems to be more systematic and the {_advanceTo=N[start|end]} is logically more transparent. So I hope this feature will get out of its experimental phase soon :-)! For now, I refer Mac users to your solution (see the UPDATE in the blog post).

Robert on Wednesday, Apr 22, 2020:

Edwin,

Another great post.

One possible conflict I noticed. The @ is also used by vim-pandoc to cite an author. So if one is going to use the bibliography function of vim-pandoc they may want to use a different character.

Thank you

Edwin on Wednesday, Apr 22, 2020
In reply to Robert

Thanks for pointing this out! I do have vim-pandoc installed but never relied on its bibliography functionality. When I need that kind of functionality I usually switch to writing in LaTeX directly. I might check it out in the future though and now I can anticipate this potential issue thanks to you.

asdfasdfadsw on Wednesday, May 6, 2020:

Thanks a lot for this awesome tutorial, it helped me immensely! I’m using your system to manage quotations from books I’m reading, which results in quite big markdown files. I thought it might be helpful to mention, that by using the multiline regex (at least my version of) ctags looses the ability to store the line number in the tags file. I therefore opted for the single-line version of the tags, s.t. I can search for a quote and jump directly to the line where I put the tag.

Edwin on Wednesday, May 6, 2020
In reply to asdfasdfadsw

Glad you liked it! Interesting, for me (using universal ctags on Windows and Linux) the tags jump to the correct line, wherever the tag is in the file. What version of ctags do you have?

Uncertainteee on Thursday, Aug 13, 2020:

To extend @Fernando and @Ediwin’s idea, here is my .ctags that passed the following test.

#tag1_no-space_allowed #tag2
`#not-a-tag`
[link](awesome#notTag)
code-block-fence
#notTag
code-block-fence
--langdef=mdtags
--languages=mdtags
--langmap=mdtags:.md
--kinddef-mdtags=t,tag,tags
--_tabledef-mdtags=toplevel
--_tabledef-mdtags=codeblock

--_mtable-regex-mdtags=toplevel/```//{tenter=codeblock}
--_mtable-regex-mdtags=toplevel/(#[a-zA-Z0-9_-]+)[[:space:]]/\1/t/{mgroup=1}{tenter=toplevel}
--_mtable-regex-mdtags=toplevel/.//


--_mtable-regex-mdtags=codeblock/```//{tleave}
--_mtable-regex-mdtags=codeblock/.//

FYI, I found ctag yesterday. There is a code example here from the manual. I’m using the most current version of Universal-Ctags on Arch Linux.

I also want to let you know that I really enjoyed your site/style. Hoping I could host a blog like this in the future.

Edwin on Thursday, Aug 13, 2020
In reply to Uncertainteee

Thanks for your contribution! I didn’t test your solution myself yet, but I should soon evaluate my own post in light of all suggestions (especially because I don’t understand anymore why my solution works for me, but not for some others).

I’m glad to hear that! Much appreciated.

Uncertainteee on Friday, Aug 14, 2020
In reply to Uncertainteee

There is a bug in above snippet. --languages=mdtags would erase all other parsers. Instead use --languages=+mdtags I also found the markdown parser from universal-ctags which is written in optlib file. And translated into c source code by optlib2c script. https://github.com/universal-ctags/ctags/blob/master/optlib/markdown.ctags

Dmitry on Wednesday, Sep 30, 2020:

Link to previous post is broken. Please fix it

Edwin on Wednesday, Sep 30, 2020
In reply to Dmitry

Roger! Thanks for pointing that out. I changed the handling of relative URLs on this domain and didn’t realize that broke these links. It should be fixed now :-)

Bruce Dillahunty on Saturday, Jan 2, 2021:

Just FYI your link to Fernando‘s comment is also broken.

Edwin on Saturday, Jan 2, 2021
In reply to Bruce Dillahunty

Thanks for pointing that out, and also for your comment on the other Vim notetaking post! The link to Fernando’s comment pointed to the localhost domain, very silly… It should be fixed now. Yours and take care, Edwin.