You greedy, greedy RegEx

Remember not too long ago I had to confess my regular expressions deficiencies to you, the dear reader, regarding an overly verbose regular expression I needlessly crafted, not realizing the greedy nature of the RegEx rendered part of it completely unnecessary?

It happened again.

This time it happened to me with a regular expression that I’ve used many times before. It has to do with my blog posting shortcut(s) and how I handle parsing markdown footnotes in iA Writer documents.

Here’s the regular expression I was using to find footnotes in my iA Writer texts:

\[\^(.+)]

My thinking when I wrote it is that it would find anything starting with [^, followed by one or more of any character, and ending in ]. Which it did, and it usually did it the way I expected it to. But then one day I typed the following paragraph:

Using [Audio Hijack](https://rogueamoeba.com/audiohijack/), we can record our mics, the soundboard, and the group call all on individual tracks.[^We don’t use the call recording file in the edited podcast because everyone records their own tracks locally for better sound quality, but it makes a great sync and emergency backup track.] And with [Loopback](https://rogueamoeba.com/loopback/) installed, the soundboard isn’t just local, it can be heard by everyone on the podcast by combining it with the microphone as the input source to the call.

Bonus points if you correctly guessed that the regular expression found the footnote starting with “We don’t use the call recording” and continued matching the pattern all the way until the closing bracket following the word “Loopback”, which was part of a markdown link.

The result is that I lost the first part of the sentence beginning with “And with Loopback installed”.

It’s actually fairly easy to fix this.

One interesting way to do it is to find anything starting with markdown footnote nomenclature ([^) followed by one or more characters that are NOT a closing bracket, followed by one closing bracket. By phrasing it this way, the closing bracket it matches as the end of the pattern will indeed be the first closing bracket it comes to after the start of the footnote, which is what I always intended.

It looks like this:

\[\^([^\]]+)]

This feels inelegant to me. I’d rather tell it what to look for instead of what NOT to look for, but this does work.

The more conventional way is probably just to tell .+ not to be greedy by putting a question mark right after the + that indicates “one or more of the preceding”:

\[\^(.+?)]

This results in the .+ match concluding the first time it encounters a ] instead of the last time it sees one.

Regular expressions are weird. Sometimes you think your regular expression functions exactly as intended only because you haven’t yet thrown something at it that allows it to do what it is actually written to do.