File Name Parsing With Regular Expressions

I haven’t talked more about regular expressions like I promised I would, but I’ll partially rectify that today by writing about a regular expression I created last night for my Blog Post Publish shortcut. This one applies specifically to how I name my blog post files for WTF Weekly.

Since WTF Weekly post titles are just dated blurbs like WTF Weekly for Oct 12th, 2020, I decided to name the post markdown files as consecutive numbers, starting at 1. I also make the post slug match this number so that the URLs are very simple, such as https://wtfweekly.me/41/ in the case of the one for Oct 12th, 2020.

Originally when I wrote my Blog Post Publish shortcut, I was entering slugs and file names manually as user input. This was partly because I do this already for this site, because instead of numbering posts, I give the files and slugs names related to the topic.1 The main reason I was manually entering post names though is because I just hadn’t gotten as far as making my shortcut figure it out for me. But now I have.

Here’s a partial listing of WTF Weekly posts from the copy of the site on my Mac:

-rw-r--r--   1 scott  staff  4536 Aug 18 22:12 27.md
-rw-r--r--   1 scott  staff  4524 Aug 18 22:12 28.md
-rw-r--r--   1 scott  staff  4023 Aug 18 22:12 29.md
-rw-r--r--   1 scott  staff  6001 Aug 18 22:12 3.md
-rw-r--r--   1 scott  staff  4351 Sep  3 12:57 30.md
-rw-r--r--   1 scott  staff  5305 Aug 18 22:12 31.md
-rw-r--r--   1 scott  staff  4974 Aug 18 22:12 32.md
-rw-r--r--   1 scott  staff  6790 Aug 18 22:12 33.md
-rw-r--r--   1 scott  staff  3967 Aug 18 22:12 34.md
-rw-r--r--   1 scott  staff  6122 Aug 18 22:12 35.md
-rw-r--r--   1 scott  staff  3918 Aug 18 22:12 36.md
-rw-r--r--   1 scott  staff  5237 Aug 18 22:12 37.md
-rw-r--r--   1 scott  staff  5255 Aug 18 22:12 38-11082020024334.md
-rw-r--r--   1 scott  staff  4747 Sep  3 12:57 39-24082020224445.md
-rw-r--r--   1 scott  staff  8358 Aug 18 22:12 4.md
-rw-r--r--   1 scott  staff  5550 Sep  3 12:57 40-03092020123553.md
-rw-r--r--   1 scott  staff  5761 Oct 12 10:24 41-11102020220809.md

You may notice that parsing these won’t be quite as simple as looking for highest-number.md because at some point, I started appending hyphen-date-time-stamp to file names after the post number. I did this for reasons to banal to get into here, but I’ve done this in general for my sites for everything I upload. Really it’s not necessary for posts that are just sequential numbers and not file names, but that’s a whole side topic.

Anyway, the result is that I need to parse the file names for a number, optionally followed by a hyphen and a bunch of other numbers, and always ending with .md.

The first step for my shortcut in parsing the blog post number is to list the contents of the posts directory in ascending alphabetical order. The way Secure Shellfish returns them is perfect because it knows that 3.md and 30.md don’t belong next to each other. The bottom line is I can always get the list in this order and then just grab the last item and know that it will be the one I need to increment by 1 for the new post.

Shortcut section for parsing blog post file names

Here’s the regular expression from that section used to get the matching group consisting of just the numbers from the file names before the hyphen (if the filename has one):

(?m:^PostBasePath\/(\d+)-*\d*\.md$)

First off, PostBasePath: this is a shortcut variable name. Replacing the variable name with its contents means this is the actual RegEx:

(?m:^hugo-files\/content\/(\d+)-*\d*\.md$)

I could have made it more generic, but the file path is always hugo-files/content/, and I had a variable name to paste in already, so I used it.

The outside (?m: ) just means that, for the following expression, ^ and $ should be treated as start of line and end of line respectively, instead of the default start of file and end of file. In our case, each match will have a line starting with hugo-files and ending with .md.

The part of the RegEx we’re really interested in is the following:

(\d+)-*\d*

Each file is named with one or more digits, followed by zero or more hyphens, and followed by zero or more additional digits. The part that’s captured are the digits before the hyphen. In the case of my earlier naming format, that’s the whole filename excluding the file extension, because I was not appending hyphen-date-time-stamp to file names.

As of this writing, I have 41 posts on WTF Weekly, and the resulting capture group is the group of numbers 1-41.2 I then grab the last item from the capture group and increment it by 1 to get the slug and base filename value for the post I am publishing.

Regular expressions are super useful when doing anything with text, and they’re super handy as implemented in Shortcuts. I have many, many shortcuts containing regular expressions for parsing text.


  1. Numbering would actually work for this site too, so long as each section of posts had its own numbering sequence, but I just never thought of doing it that way. ↩︎

  2. I actually only have 40, but let’s not talk about that missing number. I have posts numbered 1 - 41, and that’s what matters here. ↩︎