OCR in CleanShot X

I love this modern era of computing, and do you know why? Text Recognition, also known as OCR in many apps, is amazing in so many apps and OSes now, and it is very useful.

I’ve written about Image Search Text Recognition in Raycast and ScreenFloat before. Another Mac app that has Text Recognition using OCR technology (according to its own website) is CleanShot X.

CleanShot X Text Recognition

CleanShot X does Text Recognition a little differently than other apps. With CleanShot X, you can use a keyboard shortcut to bring up a capture tool, exactly like a screenshot capture tool, and you drag over the area you want text recognition in. In my case, I have ⌥⇧⌘O set as my keyboard shortcut.

CleanShot X OCR Keyboard Shortcuts

Let’s say I have a screenshot of the title of The Verge’s recent iPad Pro article, for some inexplicable reason.

The Verge Screen Shot

I can hit ⌥⇧⌘O, drag over the part of the image with the text I want to capture, and release. CleanShot X automatically detects all text in that region and copies it to the clipboard. Then I can (also inexplicably) open TextEdit and paste it in.

CleanShot X The Verge OCR Results

I like the simplicity of it and the fact that I get to define the region to look for text in, and the fact that it just copies all text in that region to the clipboard without me having to pick and choose words or lines or paragraphs or whatever.

Here’s a little tip for you Windows users: Snipping Tool has Text Recognition built in too. You can fire it up and capture onscreen notes that people are typing in Teams meetings and use the Text Recognition tool to grab the text for yourself in case the presenter forgets to send out a particular set of notes they’re typing up as they’re talking. It’s great. I do it all the time.

There are lots of Text Recognition examples in macOS and iOS and apps that run on those platforms, and I celebrate them all. We live in a golden age of utility software.

ChatGPT for macOS Will Not Sherlock Raycast AI for Me

Raycast

Part of the Raycast series

I’ve finally had a chance to play with the ChatGPT macOS app1 and I’m here to say it doesn’t swing the uppercut required to get me to stop paying for Raycast Advanced AI. Right now the one thing it has that Raycast AI does not is the ability to upload files for parsing, but that’s coming soon to Raycast AI. Raycast also keeps playing with ideas like support for local LLMs to augment their Advanced AI plan support for models like OpenAI GPT-4o, Anthropic Claude 3 Sonnet and Opus, and Perplexity Llama 3 Sonar Large.

Raycast local LLM prototype demo

Add to that things like Raycast AI Commands, which I posted about previously, and Raycast AI is still a very attractive option for integrating AI into workflows where it makes sense to do so. I feel like I have to add that caveat given that a lot of people want to dismiss the whole thing out of hand as some kind of scam. It’s not – but that doesn’t mean LLMs are applied optimally in a lot of cases and it doesn’t mean I trust the companies involved to take time to come up with correct and optimum use cases.2

Slight tangent – I think I view Raycast AI Commands as similar in purpose to things like Fabric and GPTScript, even if different in scope and flexibility, possibly. Definitely more on that as I find time to investigate all of these further.

Footnotes

  1. ChatGPT Plus subscription required

  2. By the way, keep an eye on Pragmatic podcast for an upcoming episode on this very topic.

Show Me More, but Let Me Read It

Mac display settings are weird. I have a 5k Apple Studio Display. Until recently, I’ve been using the default resolution , which is 2560 x 1440. Then I started using the Studio Display for my work laptop as well, which is a Lenovo laptop, and I noticed by default it just uses the full resolution of 5120 x 2880.

After seeing how great all that extra space on the PC was for remoting into servers and controlling large semiconductor test equipment UIs and being able to comfortably see everything without any scrolling, I started wondering why I was running the Mac at 2560 x 1440.

System Settings has a Displays section that, among other things, shows you your resolution as 5 buttons labeled “Larger Text” on the low resolution side of things, through “Default” for 2560 x 1440, up to “More Space” for the (apparently) highest resolution.

macOS Screen Resolution Buttons

The thing is, this isn’t actually the highest resolution. If you click the “Advanced” button on this screen and choose to display resolutions as a list, you see that “More Space” sets the monitor to 3200 x 1800, and that there’s another option for 5120 x 2880.

macOS Screen Resolution List

Here’s my gripe with macOS though – remember that “Larger Text” setting? That’s because Apple’s basic way of noticeably changing text size is changing monitor resolution. Which… is stupid. Screen resolution should be used to change how much you can see on screen at once, not to directly correlate that to text size. Yes, there will be some correlation, but Apple’s kind of making it 1:1 instead of allowing it to be a loose, more flexible relationship.

Yes, you can adjust the font size in finder windows to a degree (but not enough for old eyes at 5120 x 2880) and you can adjust font size in some apps with ⌘+, but other apps either have their own way or don’t let you do it at all, and the desktop and menu bar are non-adjustable as far as I can tell.

Accessibility has a few features for this, and you can VERY slightly increase menu bar font size (but not menu bar icon sizes (?) and I couldn’t find a way to change desktop font size. It’s very weird, and I’m sure an accessibility expert could point out all the things I’m missing, but the point is it’s non-discoverable and there’s no unified way to say “Guys, I’m looking at everything shrunk down because I’m on the highest resolution, just show me proportionally larger text EVERYWHERE. You’re the operating system, make it happen.”

I don’t know what Microsoft is doing, but they apparently realize that letting the text disappear into the distance at 5120 x 2880 is a bad idea, because the text is not that much smaller than it was for me on my previous QHD (2560 x 1440) work monitors. I really didn’t notice much of a difference for a lot of things other than suddenly having tons of viewing space. Certain apps like RDCMan and anything happening on remote machines at those higher resolutions are exceptions, of course.

Anyway, I’d love to know why this is the way it is, what I’m doing wrong (you’re the internet, isn’t it your God given mission to make sure people know they’re wrong?), and how everyone else handles it without just saying “well, I guess I’ll just view everything at 1440 x 810” or some ridiculous thing. Let me know.

QuickTune Is Beautiful and MacStories Knows It

Mario Guzmán has made some beautiful apps in his time, and QuickTune tops the list in my opinion. QuickTune is a beautifully Tiger-ish retro music player that uses Apple Music but gives a beautiful retro brushed metal QuickTime UI for controlling and visualizing your playlists with.

I mean, look at this!

QuickTune

So it’s well deserved but still very cool that QuickTune was featured in MacStories yesterday. As John says,

I’m not usually nostalgic about apps. I appreciate classic designs from the past, but I find ‘new’ more exciting. However, for every rule, there’s an exception, and for me, it’s Mario Guzmán’s beautiful, pixel-perfect reimagining of classic Apple music apps.

I’m kind of the same way – while I do feel that flat modern design sensibility has caused the Mac UI to lose some soul, I generally don’t worry about it a lot. But still, the kind of apps Mario makes do harken back to when beautiful apps mattered and there was a clear cut distinction between Mac operating systems and everything else. I really like that.

The full MacStories article is here.

Raycast AI Commands

Raycast

Part of the Raycast series

Marc Magnin brought up a point I hadn’t considered when I asked if ChatGPT desktop app might Sherlock Raycast Advanced AI for meRaycast AI Commands.

Raycast AI Commands (documented here in the Raycast manual) are really just prompts for the LLM to perform an action with specific instructions. You can also customize them to use specific models available to Raycast Advanced AI subscribers, so you could use Anthropic Claude 3 Opus for one thing, OpenAI GPT-4o for another, and so on.

The reason it didn’t occur to me that I might miss these? I never use them. I haven’t tried to incorporate them into my workflow and as a result I have no idea if I would benefit from them or not. I will definitely have to do some testing and find out if I’m missing a useful tool or if I never figure out a use for them.

One annoyance about them is also a side-effect of one of their main features – you will have to manually edit your AI commands to update to newer LLMs when it becomes useful to do so.

RaycastAIEdit

Again, this isn’t a lazy miss because being able to choose your model is a feature, but it can eventually become a maintenance chore as well.

You may be wondering “what are these good for?” Imagine you constantly write new regex based on text patterns. You might benefit from the Regex Generator AI command, which tells the LLM the following:

Generate a regular expression that match the
specific patterns in the text. Return the regular
expression in a format that can be easily copied
and pasted into a regex-enabled text editor or
programming language. Then, give clear and
understandable explanations on what the regex is
doing and how it is constructed.
Text: {selection}
Regex:

Let’s say I have some Markdown links, a subset of which looks like this:

## The Links
- [Hibiscus Tea | Traditional Medicinals](https://www.traditionalmedicinals.com/collections/all/products/hibiscus-tea)
- [J-BASKET BANCHA KAWAYANAGI TEA 48/8.00 OZ - JFC International](https://www.jfc.com/product/item/28211)
- [Constellation](https://tv.apple.com/us/show/constellation/umc.cmc.3lvo8a7ezxpysdy3gou3fsns0)
- [3 Body Problem](https://www.netflix.com/title/81024821)
- [‎Friendly Streaming Browser](https://apps.apple.com/us/app/friendly-streaming-browser/id553245401?mt=12)
- [Emperor of the Fading Suns - Wikipedia](https://en.wikipedia.org/wiki/Emperor_of_the_Fading_Suns)
- [GOG.com](https://www.gog.com/)
- [WineHQ - Run Windows applications on Linux, BSD, Solaris and macOS](https://www.winehq.org/)
- [Whisky - Run Modern Windows Games on macOS](https://getwhisky.app/)
- [Run Microsoft Windows software on Mac and Linux | CodeWeavers](https://www.codeweavers.com/crossover/)
- [Run Windows on Mac with a virtual machine | Parallels Desktop](https://www.parallels.com/products/desktop/)

The Regex Generator AI Command comes up with this regular expression:

\[(.*?)\]\((https?:\/\/[^\s)]+)\)

Using BBEdit’s Pattern Playground, you can see in the Capture groups section of the window that it captures the entire Markdown link and also the link name as the first capture group and the URL as the second capture group.

BBEditRegexPlayground

It’s not a horrible result given the text I gave it. It makes me want to play with this specific AI command more on various pieces of data such as log files and see what it does.

Anyway, all this to say that there are some advantages that Raycast AI has in how things can be massaged and customized to provide desirable responses, and the way the Raycast developers have made it extendable and customizable by the end users too.

More to come on this, probably.

Will ChatGPT Desktop Sherlock Raycast AI for Me?

Raycast

Part of the Raycast series

Interesting things are afoot at the company known as OpenAI. Apparently they’re in the Sherlocking business now, and their target is (unintentionally?) Raycast AI. I’m talking about an AI chat triggered by keyboard shortcut, with model selection, conversation history, contextual awareness and screenshot and image analysis, and more.

Sadly, I don’t have access to it yet – OpenAI is rolling it out to people gradually, even for those of us who have ChatGPT Plus.

Matt Birchler has a video overview of it, and I’m only going to pick one nit with Matt – Matt, how’d you manage to get access to it just by signing up as a newbie to ChatGPT Plus, while loyal old me who’s been handing over money for ages still can’t get it?!

Oh well. Here’s his video.

ChatGPT just released a Mac app, here’s how it works - YouTube

My thoughts on this are that Raycast might feel some pain with respect to their $8/mo Advanced AI option at some point. It’s true that Raycast Advanced AI gives you access to several non-OpenAI models, but personally I never use them. Raycast has to keep changing their allowed token limits whenever OpenAI or one of the others makes changes, and sometimes it comes after customers asking why the token limits are still at the old limits instead of following what the AI model providers are currently doing for a given model.

The way I usually use AI in Raycast is absolutely as a replacement for other third party apps like MacGPT, FridayGPT, etc. I use the hotkey (^G in my case) to open the separate AI chat window, which then basically behaves like a separate app with its own sidebar, history of chats, copy and paste, etc, etc. I very rarely use Raycast Quick AI, which is basically typing something in the standard Raycast text input and then hitting tab instead of return to submit it to the LLM of choice. This means that, for me, the OpenAI ChatGPT desktop app could well replace the Raycast Chat app.

The benefit of most 3rd party apps over Raycast AI is that you bring your API key and get all the access you pay for already, whereas Raycast needs to provide their AI integration as a service and having people pay monthly for something they’re also paying for an API key for won’t work… except that’s what most of us using Raycast Advanced AI are probably already doing.

The official ChatGPT desktop app, on the other hand, doesn’t use an API key, but instead uses your ChatGPT Plus subscription. I do pay for this still, in addition to an API key. And while it’s true that the Raycast Advanced AI option costs less than ChatGPT Plus, and it’s also true that it pays for itself for me, it still doesn’t make sense to pay for overlapping services unless there’s a tremendous need to. If I can get all the benefits of the Raycast AI Chat functionality in a desktop app that is included in the cost of my ChatGPT Plus subscription, I very probably will.

Right now I just need to wait for the slow rollout to turn its Eye of Sauron upon me so I can give it a shot and find out for sure.

Image Search Text Recognition in Raycast and ScreenFloat

Raycast

Part of the Raycast series

There I was, checking out Raycast’s updated website and looking at their tips videos, when I stumbled on the tip called Find Images by Text. Although I knew that Raycast would keep copied images in the clipboard history, what I did NOT know is that you can search those images not just by words matching the title, but words matching text IN the image.

Example: I hit ⌘ Space (that’s Command Space) to invoke Raycast, type ch (my alias for the Clipboard History function), and start typing “there I was”. Look at the results:

Raycast Clipboard History Image Text Recognition

The first result is a file called Image (1252x631), which is a screenshot of this blog post when I started writing it, taken with ScreenFloat and copied into the clipboard. Raycast sees the words “there I was” in the image and returns it as a match to my clipboard history search. Pretty cool.

Raycast will also let you designate folders to search for screenshots in, apart from images in clipboard history, and you can apply text recognition to those search as well. Open Raycast Settings by toggling Raycast open and typing ⌘, (that’s Command comma), select Extensions, search for the Search Screenshots command, and then verify that Text Recognition is not disabled but shows one of the accuracy level options instead.

Raycast Search Screenshots

All that Raycast functionality is great, but since I’m using ScreenFloat to take my screenshots, what if I don’t want to have to copy an image into the clipboard or save it in one of those files to find it by text recognition? ScreenFloat has search by text recognition covered too.

ScreenFloat saves screenshots you’ve taken in the Shots Browser. You can open the Shots Browser with your assigned keyboard shortcut (⇧⌘1 in my case), hit ⌘F to start a search, and type the word “float”. The results are anything that has the word “float” in the image name, or anything that has the text “float” in the image itself somewhere.

ScreenFloat Shots Browser Text Search

You can also see from the image above that ScreenFloat names the screenshots with the name of the app that was being screenshotted.1 You can rename images at any time in ScreenFloat, of course.

This is all pretty amazing and handy, and it sure makes finding things on my Mac a lot easier than before I used either of these tools. I love great software like these apps, and I love developing systems that mesh with my brain and that I can use and remember instinctively.

As Peter said when I showed him the ScreenFloat text recognition search, “I really, really, really wish Notion would add OCR.”

Me too, Peter. Me too.

Footnotes

  1. I genuinely have no clue if that’s a word.