Regular Expressions and Shortcuts, Part 2

Last time, I presented a case in which I wanted to take a list of files complete with file path, and extract just the file name without the extension.1

So basically, I get a list of file names that come back like this:

hugo-files/data/links/cars.json
hugo-files/data/links/podcasts.json
hugo-files/data/links/language.json
hugo-files/data/links/apps.json
hugo-files/data/links/security.json
hugo-files/data/links/linux.json
hugo-files/data/links/programming.json
hugo-files/data/links/apple.json

And I want to turn it into the following list instead, by getting rid of the directory paths and the .json file extensions:

cars
podcasts
language
apps
security
linux
programming
apple

I do this in my shortcut using a Match Text action with the following regular expression:

^(?:.+\/){1,}(.+)\.json$

It looks mind-bogglingly weird if you’re not used to regular expressions, and certainly someone skilled with them could probably perform the same task with a much more elegant version, but this does the job for me, and it’s really quite simple. Basically, it looks for strings that match the following:

  1. Some number of characters, followed by a forward slash (/), one or more times (hugo-files/ or hugo-files/data/ or hugo-files/data/links/ all match, for example).
  2. Some number of characters after the last forward slash, followed by the extension .json at the end of the string.

How does it do this?

First, the bookends: ^ signifies the start of the line, and $ signifies the end of the line.2

This means that matches must both meet the criteria specified above and be on a line of text with no other text on that line. My list of files above meets that criteria, because there’s a new line after each.

Now let’s modify the portion after ^ slightly, to this:

(.+\/){1,}

Let’s go even further and ignore the parenthesis and the {1,} for a minute:

.+\/

The . means any character. The + means one or more times. So .+ matches any character on the keyboard, and it could be one character or 1,000,000 or more characters.

Except that immediately after comes \/, which just means / (forward slash). The reason for the backslash in front of the forward slash is to escape the forward slash. Forward slashes are expression modifiers, so to tell the regular expression that “no, I really want to look for a forward slash”, you need to escape it with the \ (backslash).

Characters that need escaped to allow regular expressions to search for those literal characters are the following:

* ? + [ ( ) { } ^ $ | \ . /

So in our regular expression, the .+\/ portion matches the first part of the file path, which is hugo-files/ in my example.

However, we have a parenthesis around .+\/ and we follow that with {1,}:

(.+\/){1,}

The parenthesis groups .+\/ so that anything we do afterwards applies to that whole thing. And what we do next is {1,} which is a fancy way of saying, do the preceding at least once, and possibly multiple times.

Because there’s no number after the comma in {1,}, the number of times the preceding pattern can repeat is open-ended.

That means that (.+\/){1,} matches any of the following:

hugo-files/data/link/
hugo-files/
hugo-files/data/

In each instance above, we have a pattern of any characters, one or more times, followed by a / (foward-slash), repeating one or more times.

Next time I’ll explain about more about grouping with parenthesis and how it has multiple uses, and why in the full regular expression I added ?: in front of the portion of the regular expression we looked at here.


  1. Much like the iPad Files app shows them, actually. ↩︎

  2. Well… sometimes. I’ll explain more about ^ and $ later. ↩︎