Whispering Podcast Transcripts

In what feels like a lifetime ago, I had a podcast called Pocket Sized Podcast, talking about iOS apps and devices, mostly. At some point, for reasons I can’t even begin to recall, I joined up with a fledgling podcast network called Fiat Lux, which was later rebranded Constellation by the two fairly angry guys running it. The whole thing was a giant fiasco full of insane stories, but it’s relevant to me now because podcast transcription is having a moment.

Fiat Lux/Constellation decided that the core feature of the podcast network would be incredibly detailed show notes on all podcasts. Unfortunately they had some really bad ideas about exactly what those show notes should be like.1 None of us wanted anything to do with their plan, mainly because of how they presented it and the amount of shouting involved in their attempts to convince us.

If you’re going to try to herd cats, you’d better be a cat person is what I’m saying.

But the idea of making podcast episodes available in text IS a good idea, and several very popular podcasters I know of are looking at all kinds of options for creating good transcripts without spending hours and hours on them.

There are several paid and soon-to-be-paid options such as Adobe Podcast (currently in beta, pricing to be determined), and Otter. But the completely free option that got my attention is a Mac port of OpenAI’s Whisper, called Whisper.cpp.

Whisper runs locally on your own machine, and Whisper.cpp jettisons the Python runtime for C and C++, which has obvious positive performance implications. Better yet, it’s even optimized for Apple Silicon.

I heard about Whisper.cpp while listening to Rebuild from Tatsuhiko Miyagawa, a very enjoyable Japanese language tech podcast. It’s actually one of my favorite podcasts in any language. Anyway, at the time Miyagawa-san was experimenting with Whisper.cpp on his new Apple Silicon Mac, and I filed that information away in my brain, figuring it would be some time before I got an Apple Silicon Mac of my own. It was, but now I have, and so I recently jumped into performing Whisper.cpp experiments of my own.2

Whisper.cpp has several models you can download, depending on what kind of quality vs time tradeoffs you want to make. I’ve tested Whisper.cpp on Friends with Brews episodes using the ggml-base.en.bin, ggml-medium.en.bin, and ggml-large.bin models, with interestingly varying results.

The first thing I found is that the base model is FAST. I transcribed a 50 minute podcast episode in about a minute with decent results. I had to fix a few names and technical terms, but otherwise it was quite good.

I couldn’t tell a huge difference between the medium and large models with the two particular episodes I experimented with, but the time difference between all three models was noticeable. I tested all three models on Friends with Brews episode 21, which is 45 minutes and 29 seconds long and roughly 24 MB in size.

Base.en – 1.5 minutes
Medium.en – 10 minutes
Large – 19 minutes

Even with the large model, transcribing a podcast at 2x speed is pretty good.

So that’s how long it takes, but what about the results? Compare for yourselves. The headers for each file are links so you can download the markdown files to compare locally if you like.

The end result is that I think it’s worth going with the medium or large models. It’ll cost you disk space – the base english model is 142MB, the medium english model is 1.5GB, and the large model is 2.9GB. But I think it’s worth it in terms of results.

You may have to do some testing to decide between the medium and large though, even if you’re convinced that the base model isn’t the way to go. Generally I think I like the large model results better, but there are some instances where the medium transcribed something more accurately.

Personally I’m using the large model, but that’s because I’m actually using yet another port, which I’ll talk about in another post very soon.

By the way, if you want to hear the Fiat Lux/Constellation stories, just slip @Vichudson1@appdot.net a nice glass of whiskey.3 He has a much better memory than I do about pretty much anything in the past, and especially about the saga of the world’s unhappiest podcast network.

Footnotes

  1. Including wanting markdown format in Google Docs specifically as opposed to any plaintext document format (like, I don’t know, .md?), but whatever.

  2. Whisper.cpp actually runs on Intel Silicon too, but I didn’t realize it at the time. But my late 2015 iMac probably would have barfed up a lung on it anyway.

  3. Fair warning, he’ll probably try to get you to buy him more than one.

Podcast Episode Image Script

As I mentioned in my last post, I want to use standard markdown (text only, no ability to use components) for Friends with Brews episodes so that I can use Astro’s ability to render full post body content of each episode in the RSS feed. This way the RSS feed shows each episode’s show notes instead of just a summary.

Step 1 of the Great Show Note Images odyssey is generating optimized versions of any images to be included in episode show notes. Figuring out which images those are is easy – I have a directory named src/images/episodes, and I’ll just dump my images in there.

From there it’s a matter of reading all the files in the directory, generating the desired sizes, and dumping them in public/images/episodes, which in the published site will be located at /images/episodes.

Because I’m not doing this inside an Astro file with pre-imported or pre-linked images, I can’t use the Astro Image component like I do for all the other images on the Friends with Brews website. I need something I can call from a JavaScript function. Fortunately, as I noted last time, I can use the eleventy-image plugin this way. Ben Holmes details how on his website in Picture perfect image optimization for any web framework article.

If you look at section 4 of his post, Using 11ty image with any framework, you can see a script Ben wrote to look in a directory and generate optimized images for each image file in the directory in the specified widths and formats using the Node package @11ty/eleventy-img.

I modified the script a little bit as shown below. I would have included it as a code block instead of an image, but for some reason it triggers modsecurity on my server and blocks the IP of whoever tries to load this page. Not exactly ideal.

Image optimization script

If I run this script with the following images in src/images/episodes

-rw-r--r--@ 571969 Oct  4  2020 18749389.jpg
-rw-r--r--@ 723612 Jan  4 23:20 ChonkChart-816246A9-8775-4561-B9DA-1D9E7E0413B1-20220323095602.png

the script generates the following images in public/images/episodes:

-rw-r--r--@ 11229 Jan  9 13:36 18749389-1000.avif
-rw-r--r--@ 54111 Jan  9 13:36 18749389-1000.jpeg
-rw-r--r--@ 29588 Jan  9 13:36 18749389-1000.webp
-rw-r--r--@ 15438 Jan  9 13:36 18749389-1400.avif
-rw-r--r--@ 85912 Jan  9 13:36 18749389-1400.jpeg
-rw-r--r--@ 44008 Jan  9 13:36 18749389-1400.webp
-rw-r--r--@ 41218 Jan  9 13:36 18749389-4000.avif
-rw-r--r--@ 331928 Jan  9 13:36 18749389-4000.jpeg
-rw-r--r--@ 150688 Jan  9 13:36 18749389-4000.webp
-rw-r--r--@ 6960 Jan  9 13:36 18749389-600.avif
-rw-r--r--@ 26053 Jan  9 13:36 18749389-600.jpeg
-rw-r--r--@ 16014 Jan  9 13:36 18749389-600.webp
-rw-r--r--@ 30453 Jan  9 13:36 ChonkChart-816246A9-8775-4561-B9DA-1D9E7E0413B1-20220323095602-600.avif
-rw-r--r--@ 78709 Jan  9 13:36 ChonkChart-816246A9-8775-4561-B9DA-1D9E7E0413B1-20220323095602-600.jpeg
-rw-r--r--@ 60118 Jan  9 13:36 ChonkChart-816246A9-8775-4561-B9DA-1D9E7E0413B1-20220323095602-600.webp
-rw-r--r--@ 38504 Jan  9 13:36 ChonkChart-816246A9-8775-4561-B9DA-1D9E7E0413B1-20220323095602-677.avif
-rw-r--r--@ 98443 Jan  9 13:36 ChonkChart-816246A9-8775-4561-B9DA-1D9E7E0413B1-20220323095602-677.jpeg
-rw-r--r--@ 78596 Jan  9 13:36 ChonkChart-816246A9-8775-4561-B9DA-1D9E7E0413B1-20220323095602-677.webp

This is good news. First of all, Ben did all of my work for me. Second, I can generate optimized images without having to know anything about them in advance. Now I just have the very little problem of replacing image links in my episode markdown file with picture elements that contain the sources for the different file types and srcsets for each of the different image sizes.

By the way, check out the file size differences on the optimized versions versus the originals in those file listings! Even the full width and height optimized images are quite a bit smaller in terms of file size than the originals, and the smaller ones are minute compared to the images I started with.

A couple of things I’d like to note about working with eleventy-img here:

  • Eleventy-img doesn’t stupidly try to generate image sizes larger than the original. If you specify widths: ["auto", 600, 1000, 1400] and one of the images is only 677 pixels wide, it will only generate the 600 and 677 pixel width versions (the “auto” option tells eleventy-img to also make a copy at the original size).
  • As you must have guessed by now, just because eleventy-img was developed as a plugin for the Eleventy SSG framework, it’s just JavaScript and can be installed with npm and used with any other Node.js compatible framework. That includes Astro.

Next time I write anything on this site, it may be completely incoherent depending on what Rube Goldberg mechanism I come up with for getting the responsive html for images into my show notes markdown files in the correct location. My writing workflow allows for a few different possibilities since I already know in advance the default width I want these images. Maybe next time I’ll detail that workflow and what options I think might be available to me, and then we can get into implementation.

More On Astro, Image Optimization, And Markdown

I’ve talked a lot about image optimization and RSS feed handling with Astro. I’m about to talk about it some more. I presume you’re like me and obsess endlessly about these topics, so you should enjoy this.1 In my case, I don’t know how much of it is enjoyment and how much of it is a compulsive search for a better way.

Now that Astro has support for full post content RSS feeds with the compiledContent property for markdown content, I modified Friends with Brews to use standard markdown files (.md) instead of MDX (.mdx). My reasoning was that, since I almost never include images in the show notes for Friends with Brews and since using standard markdown would let me use compiledContent to put full episode show notes in the podcast RSS feed, it was an automatic must-do.

Unfortunately for me, the second I made the switch, I needed to add an image to the show notes of an upcoming episode, which would make it the second time I’ve had to add an image to FwB show notes. The first time was back on episode 8, Satan is not normally depicted as being purple, and that was the critical piece of visual information known as the Chonk Chart.

If I have to do anything more than once, it means I will have to do it several times, and that means I have to support it properly. And that means being able to combine image optimization AND standard markdown in Astro, and that’s not currently possible - the Astro Image components are only supported in Astro and MDX files, for the obvious reason that they’re components, and markdown can’t execute components.

I can continue to process the usual images of episode brews (coffee, beer, tea) with Astro image in my layout templates, as well as all the other images used on the site, so that’s not a problem. Those brew images are not actually in the show notes. Instead, I have a JSON file of brews that contains a bunch of information about them, including which episode(s) they’re associated with. So I only have to concern myself with how to optimize the inline images in the show notes markdown files.

The good news is that manually writing Picture elements complete with sources and srcsets and default images is simple. And if it’s simple for a human, it’s simple for a script. I can easily assume a certain standard width for images I’m going to put in my show notes, and then generate optimized images for that size plus 2x and 3x resolutions, as well as the original for linking to. Then the html for the Picture element associated with the optimized images can be generated fairly simply. Ben Holmes, who now works for Astro, has a post about this approach using eleventy-image. Because eleventy-image can be manipulated directly in JavaScript, it’s a great candidate for this.

So here’s the plan:

  • Continue to optimize permanent site images and episode brew images using the Astro Image components in my Astro layout files,
  • Use eleventy-image to optimize inline show notes images to predetermined widths and image formats,
  • Figure out how to insert the associated Picture elements for the optimized images into my markdown files. This step might take the most work.

If you noticed that the last step of the plan looks a bit like the Far Side cartoon with scientists drawing out a diagram of the creation of the universe with a “and then a miracle occurs” note tagged onto the end, it’s true. The good news is, there are several places in my writing and publishing workflow I can inject the html. I’ll write more about that as I start implementing a system.

Footnotes

  1. If you aren’t endlessly obsessed with these topics, we need to talk about that.

Astro RSS 1.2.0 Update

Earlier today I posted about the new compiledContent() property for use in Astro RSS. What I didn’t mention was that Astro RSS 1.1.0 had a bug in its XML parsing that ignored custom content (which in my case I am using for audio enclosures) and also choked on my link constructors for my post and audio file links.

Today Astro RSS 1.2.0 was released with a fix for this, thanks to a pull request from Matt Stein, so now my RSS layout for Siracusa Says looks like this:

import rss from "@astrojs/rss";
import config from "config";
import path from "path";
import sanitizeHtml from "sanitize-html";
import { rfc2822 } from "../components/utilities/DateFormat";

const episodeImportResult = import.meta.globEager("../content/episodes/*.md");
let episodes = Object.values(episodeImportResult);
episodes = episodes.sort(
  (a, b) =>
    new Date(b.frontmatter.pubDate).valueOf() -
    new Date(a.frontmatter.pubDate).valueOf()
);

export const get = () =>
  rss({
    title: config.get("title"),
    description: config.get("description"),
    site: config.get("url"),
    items: Array.from(episodes).map((episode) => ({
      title: episode.frontmatter.title,
      link: `${new URL(
        path.join(config.get("episodes.path"), episode.frontmatter.slug),
        config.get("url")
      )}`,
      pubDate: rfc2822(episode.frontmatter.pubDate),
      description: episode.frontmatter.description,
      customData: `<enclosure url="${new URL(
        path.join(
          config.get("episodes.audioPrefix"),
          episode.frontmatter.audiofile
        ),
        config.get("url")
      )}" length="${episode.frontmatter.bytes}" type="audio/mpeg" />`,
      content: sanitizeHtml(episode.compiledContent()),
    })),
  });

I noticed today that on my enclosure links I wasn’t providing the domain in the enclosure link url, just the path. This fixes that and also makes sure that my post links (which are also used by Astro RSS for item entry GUIDs) are always correct. I hadn’t had any problems with them, but this is a safer way of making sure I don’t ever get any extra slashes or other malformed URL issues.

Astro RSS Compiled Content

It’s been awhile and I have lots of news, but just a short one today: Astro now supports full RSS feed content if you use md files for your content. It works like this:

export const get = () =>
  rss({
    title: config.get("title"),
    description: config.get("description"),
    site: config.get("url"),
    items: Array.from(episodes).map((episode) => ({
      title: episode.frontmatter.title,
      link: `${config.get("url")}${config.get("episodes.path")}${
        episode.frontmatter.slug
      }`,
      pubDate: rfc2822(episode.frontmatter.pubDate),
      description: episode.frontmatter.description,
      customData: `<enclosure url="${config.get("episodes.audioPrefix")}/${
        episode.frontmatter.audiofile
      }" length="${episode.frontmatter.bytes}" type="audio/mpeg" />`,
      content: sanitizeHtml(episode.compiledContent()),
    })),
  });

See the last line of code?

content: sanitizeHtml(episode.compiledContent());

That’s directly telling Astro RSS that for a given item, the content is equal to the post’s compiledContent property (and run through sanitize-html for good measure).

You can find the Astro docs for it here: Including Full Post Content

There is one caveat I need to mention that directly affects this site. If you use mdx instead of md for your posts like I do here, compiledContent() doesn’t work. Since I don’t like my current RSS feed generation tweak for this site in order to get full content RSS items, my plan is to work on converting back to md and figuring out a way to process images such that I get the benefit of Astro Image’s Picture and Image components while using standard markdown files.

The Mastodon Webfinger Domain Search Super Trick

I promised an article about whether or not Mastodon and the Fediverse were going to solve all our problems and make us happy humans with a long species survival probability, but work and other tech projects have intervened. More on that later.

In the meantime, if you ARE on Mastodon and don’t want to run your own instance but would like people to be able to search for you there using your own domain name instead of the domain name of the instance you’re on, Maarten Balliauw has a really cool trick for this:

In other words, if you want to be discovered on Mastodon using your own domain, you can do so by copying the contents of https://<your mastodon server>/.well-known/webfinger?resource=acct:<your account>@<your mastodon server> to https://<your domain>/.well-known/webfinger.

You can read Maarten’s post on the webfinger method on his blog.

Title Case

A good system should never make you remember its implementation details in order to use. That extends to blogging platforms. Since my blogging platform is a self-created, self-hosted website built on Astro, the onus of making myself not have to remember how it works just to use it rests solely on me.

Today I started writing a post about Mastodon and the Fediverse and why it’s not the solution to all your problems that you might think it is, and I realized I couldn’t remember if I write my blog post titles in Title Case or Just the first word capitalized or in camelCase.1 It’s not the first time I’ve had to ask myself this question, and that means it’s a problem I don’t want to think about anymore. So here’s a very simple Title Case generator:

export function titleCase(str) {
  return str.replace(/\b[A-Za-z]/g, (x) => x.toUpperCase());
}

Running the first paragraph of this blog post through it results in this:

A Good System Should Never Make You Remember Its Implementation Details In Order To Use. That Extends To Blogging Platforms. Since My Blogging Platform Is A Self-Created, Self-Hosted Website Built On Astro, The Onus Of Making Myself Not Have To Remember How It Works Just To Use It Rests Solely On Me.

So now my blog post titles are generated like this:

<h1>
  <a
    href={new URL(
      path.join(config.get("posts.path"), content.frontmatter.slug),
      config.get("url")
    )}
    >{titleCase(content.frontmatter.title)}
  </a>
</h1>

Speaking of camelCase, I also took the opportunity to add a camelCase function that outputs what we called camelCase when I was a juvenile programmer-wannabe.

export function camelize(str) {
  return str
    .toLowerCase()
    .replace(/[^a-zA-Z0-9]+(.)/g, (m, chr) => chr.toUpperCase());
}

I’ll be back next time with some reasons why Mastodon and the Fediverse aren’t suitable for most people in their current incarnations.2

Footnotes

  1. You modern whippersnappers probably think ThisIsCamelCase, but when I was a boy, thisWasCamelCase. So put that in your programming pipe and smoke it.

  2. Hint: it’s not a technical problem, it’s a people problem.