Regular Expressions and Shortcuts, Part 3

This post is part of a series on Regular Expressions and their applications in the Shortcuts app.

Regular Expressions and Shortcuts, Part 1

Regular Expressions and Shortcuts, Part 2

Hi, it’s me again, the guy with the terrible regular expression that I keep yammering on and on about:

^(?:.+\/){1,}(.+)\.json$

Last time I explained how (.+\/){1,} works to match file path directories in a file name, like these:

hugo-files/data
hugo-files/
hugo-files/data/links/

I did not explain, however, why the first part of the regular expression contains ?: inside the first set of parenthesis, like this:

(?:.+\/){1,}

In order to understand this, you have to understand the role of the parenthesis in regular expressions.

First, they do what I said last time they do: group things together for the purposes of applying a subsequent modifier to everything in the group. In our case, because .+\/ is inside a parenthesis and is one group, the term {1,} modifies .+\/ instead of just \/ (the forward slash).

But parenthesis also act as a capturing group, unless we tell them not to. A capturing group is a way of grabbing just a portion of the overall regular expression match and accessing it later.

Let’s modify my full original regular expression by removing ?: from the first set of parenthesis. That gives us this:

^(.+\/){1,}(.+)\.json$

This gives us two capturing groups, which we can reference with $1 and $2. $0 is another reference, and it returns the full match rather than a portion of the match inside a capturing group.

Our list of files:

hugo-files/data/links/cars.json
hugo-files/data/links/podcasts.json
hugo-files/data/links/language.json
hugo-files/data/links/apps.json
hugo-files/data/links/apple.json

Our regular expression:

^(.+\/){1,}(.+)\.json$

$0 contains:

hugo-files/data/links/cars.json
hugo-files/data/links/podcasts.json
hugo-files/data/links/language.json
hugo-files/data/links/apps.json
hugo-files/data/links/apple.json

$1 contains:

hugo-files/data/links/
hugo-files/data/links/
hugo-files/data/links/
hugo-files/data/links/
hugo-files/data/links/

$2 contains:

cars
podcasts
language
apps
apple

$0 is useful, because those are the matches to our entire regular expression – in other words, it’s working as expected.

$2 is useful, because it contains the list of file names without file extensions, which is what we set out to get. It’s our end goal.

$1, on the other hand, is pretty useless. Chances are, we know where our json files are kept, and even if we don’t, in our case we don’t care about anything except the file names. We can get rid of $1, and then the list of file names in $2 can be referenced with $1 instead.

But how?

Simple, we can tell the first parenthesis group in the regular expression we don’t want to capture whatever matches it by putting (you guessed it) ?: inside the parenthesis before the group of things to match.

Now we’re back to our original regular expression:

^(?:.+\/){1,}(.+)\.json$

$0 still contains our full matching file names, complete with paths:

hugo-files/data/links/cars.json
hugo-files/data/links/podcasts.json
hugo-files/data/links/language.json
hugo-files/data/links/apps.json
hugo-files/data/links/apple.json

But now $1 contains the file names without file extensions that we’re here for:

cars
podcasts
language
apps
apple

That’s because the second parenthesis, which is now our only capture group, matches everything after the last forward slash, but before .json (and end of the line of text). It’s this part of the expression:

(.+)

As I told you previously in this series, the . in .+ means “match any character”, and the + means “one or more times”.

The only reason (.+) captures only the portion of the file name we’re after and not every character in the file is because it’s bounded by other conditions on both sides: an expression that matches the file paths on one side, and an expression that matches the .json file extension and end of line on the other.

Just remember when you’re using parenthesis to group parts of regular expressions together that it will automatically become a capture group, even if you are simply using parenthesis to group things, unless you tell it you don’t care about what’s inside the parenthesis.

You don’t have to use non-capturing groups in your regular expressions. You can just simply ignore any capture groups you don’t care about. But if you only need one specific piece of information from the full pattern match, it makes it a lot easier if things that are grouped for syntax reasons rather than capture reasons aren’t captured in the first place.

Before ending this installment of our Regular Expressions to Confuse You series, let me refer you to the excellent RegExBuddy Regular Expressions Quick Start. It’s super clear, super concise, and super useful. I learned a ton about regular expressions from this site.

RegExBuddy Regular Expressions Quick Start https://regular-expressions.mobi/quickstart.html

Next time we’ll take a break from regular expression theory and look at how the full regular expression I’ve been talking about these first 3 posts is used inside of Shortcuts.