Pseudo-Automating the Listened to Podcasts List on My /Now Page

Saturday, 03 Aug 2024

As you know, I have a /now page that I update on occasion to let anyone who cares know what kinds of things I’m watching, reading, and eating at some random point in my life. So far, it’s been a very manual update process because I haven’t had time to start automating any of it until now.

I’ve taken inspiration from Robb Knight’s video Using Eleventy to Gobble Up Everything I Do Online, particularly for the Overcast part of the automation process. I watched enough of the video to see Robb mention the extended version of the Overcast OPML file you can download from your Overcast account that includes episode history and decided to write a script that would automate downloading and parsing it for me.

Enter overcast-history, my python script for checking to see when I last downloaded the OPML file, getting a new copy if needed, and parsing it if a new copy was downloaded (or if I passed it the -f flag to force it to parse the local OPML file anyway).

You might be thinking “hold on here, Robb also wrote a Python script, don’t act like you’re inventing the wheel!”, and that’s a fair point. I actually thought he was manually downloading his OPML file until I finished the video today (after writing my own Python script). Now I realize he’s at a high level of automation on this task.

Another key difference between Robb’s approach and mine so far, besides the fact that our Python scripts are completely different¹, is that I believe he creates a JSON file with it and consumes that as part of his site build process to completely automatically update his listen history.

In contrast to Robb, I’m not very automated with my /now page yet. This python script is part of a collection of tools for quickly automating certain aspects of updating my site, which I build locally and ftp to my server. I haven’t decided yet how much I want to automate the build process again.

Therefore, with the understanding that this is ONLY an example of how to grab and parse information off the internet, and with the understanding that my Python coding skills are shaky at best, here’s my approach to getting recently listened to podcast episodes from my Overcast history into a Markdown list.

overcast-history

You’ll see immediately that I’m a terrible Python programmer and that I have no idea what Python best practices are yet. I have 6 files to do this one simple task:

constants.py (purpose of which should be self-evident)
session.py (used to keep the overcast login active across modules)
main.py (entry point script that gets run directly to make it all happen)
oc_login.py (logs in to my Overcast account)
oc_history.py (handles downloading the extended OPML file from my Overcast account)
oc_opml_parse.py (parses the OPML file and gives me the recent list of podcast episodes I want)

1
ACCOUNT_URL = 'https://overcast.fm/account'
2
ACCOUNT_PATH = '/account'
3
LOGIN_URL = 'https://overcast.fm/login?then=account'
4
EMAIL = '[email protected]'
5
PASSWORD = 'xxxxxxxxxxxxxxxxxxxxxxxxxxx'
6
LOGIN_PATH = '/login'
7
OPML_AGE_LIMIT_DAYS = 2
8
OPML_LINK = 'https://overcast.fm/account/export_opml/extended'
9
SUCCESS = 200
10
TOO_MANY_REQUESTS = 429
11
OPML_FILE_PATH = 'overcast_history.opml'
12
NUMBER_OF_EPISODES = 10

Right away I’ve made you cry. Yes, I have my Overcast account password in my constants file. THIS WILL BE REMEDIED SOON! I plan to use keyring to fix this issue. Maybe. Probably.

1
import requests
2

3
session = requests.Session()

This one creates a requests session object which can then be imported into any other modules that need to use requests to grab stuff. That’s it. There’s probably a way better way to do this that I should know about.

1
#!/Users/scott/Scripts/python/venv/bin/python
2
import argparse
3
import os
4
from datetime import datetime, timedelta
5
import constants as const
6
from oc_history import load_oc_history
7
from oc_opml_parse import oc_opml_parse
8

9
p = argparse.ArgumentParser()
10
p.add_argument('-f', '--force', action='store_true', help='Force local OPML file parsing')
11

12
args = p.parse_args()
13

14
def file_is_old(file_path):
15
    if not os.path.exists(file_path):
16
        return True
17

18
    file_mod_date = os.path.getmtime(file_path)
19
    display_date = datetime.fromtimestamp(file_mod_date)
20
    print(f'OPML file created on {display_date.strftime("%Y-%m-%d")}')
21
    file_datetime = datetime.fromtimestamp(file_mod_date)
22
    print(f'file_datetime = {file_datetime}')
23
    stale_date = datetime.now() - timedelta(days=const.OPML_AGE_LIMIT_DAYS)
24
    print(f'stale_date = {stale_date}')
25

26
    return file_datetime < stale_date
27

28
def main():
29
    history_was_loaded = False
30
    if file_is_old(const.OPML_FILE_PATH):
31
        print(f'OPML file is older than {const.OPML_AGE_LIMIT_DAYS} days or doesn\'t exist. Downloading new data...')
32
        history_was_loaded = load_oc_history()
33
    else:
34
        print(f'OPML file is less than {const.OPML_AGE_LIMIT_DAYS} days old. Skipping download.')
35

36
    if history_was_loaded or args.force:
37
        print('Parsing OPML file...')
38
        if oc_opml_parse():
39
            print('Done!')
40
        else:
41
            print('You have to update your podcast list manually, dude.')
42
    else:
43
        print('No new Overcast history generated.')
44

45

46

47
if __name__ == "__main__":
48
    main()

I run main.py as the script entry point and it gets all the work going. It checks to see if the date of my copy of the OPML file is older than the value in the OPML_AGE_LIMIT_DAYS constant and redownloads it if so, using the load_oc_history() function from oc_history.py.

If a new OPML file was downloaded OR I ran main.py with the -f flag, then it parses the OPML file by running the oc_opml_parse() function in oc_opml_parse.py.

1
import os
2
import constants as const
3

4
from session import session
5

6
def oc_login():
7
    if oc_test_login():
8
        return True
9
    else:
10
        return False
11

12
def oc_enter_login():
13
    print('Attempting login')
14
    r = session.post(const.LOGIN_URL, data={'email': const.EMAIL, 'password': const.PASSWORD})
15
    print(f"Response {r.status_code}")
16
    if r.status_code == const.SUCCESS:
17
        print("Successfully logged in")
18
        return True
19
    else:
20
        print("Failed login attempt")
21
        return False
22

23
def oc_test_login():
24
    print('Testing login status')
25
    r = session.get(const.ACCOUNT_URL)
26

27
    if const.ACCOUNT_PATH in r.url:
28
        print('Already logged in')
29
        return True
30
    elif const.LOGIN_PATH in r.url:
31
        print('Login required')
32
        if oc_enter_login():
33
            return True
34
    else:
35
        print(f"I have no idea what happened\n{r.url}")
36

37
    return False

Right now this doesn’t make sense, but if I actually store auth tokens somewhere later, maybe it will. Right now it always checks to see if I’m logged in or not by checking to see if I stayed on the /account page or got bounced back to the /login page. If I got bounced back, it logs me in.

The reason it doesn’t make sense is I don’t persist any login tokens across script runs, so if I need to download an OPML file, it’s always going to need to log into my Overcast account. I may just keep that workflow and simplify this script to not even check instead, and just admit it’s going login to the account every time.

1
import os
2
import constants as const
3
from session import session
4
from oc_login import oc_login
5

6
def load_oc_history():
7
    if not oc_login():
8
        print("Couldn't log in to Overcast.fm account!")
9
        return False
10

11
    print("Loading history...")
12
    r = session.get(const.OPML_LINK)
13
    print(f"Response {r.status_code}")
14

15
    match r.status_code:
16
        case const.SUCCESS:
17
            print('OPML file downloaded')
18
            file_path = 'overcast_history.opml'
19
            try:
20
                with open(file_path, 'w', encoding='utf-8') as file:
21
                    file.write(r.text)
22
                print(f'OPML file saved to {os.path.abspath(file_path)}')
23
                return True
24
            except IOError as e:
25
                print(f'Error saving OPML file: {e}')
26
        case const.TOO_MANY_REQUESTS:
27
            print(r.headers)
28
            print(f'Too many requests - Retry-After = {r.headers.get('Retry-After')}')
29
        case _:
30
            print(f'Unexpected status code: {r.status_code}')
31

32
    return False

This is pretty simple. I download the OPML file and it either downloads ok or it doesn’t. It’s funny that I have the file name hardcoded here but I use constants for everything else. I’ll have to fix that.

1
import pyperclip
2
import xml.etree.ElementTree as ET
3
import constants as const
4
from datetime import datetime, timezone, timedelta
5

6
def find_podcast_name(root, episode_id):
7
    for podcast in root.findall(".//outline[@type='rss']"):
8
        for ep in podcast.findall("outline[@type='podcast-episode']"):
9
            if ep.get('overcastId') == episode_id:
10
                return podcast.get('text')
11
    return "Unknown"
12

13
def oc_opml_parse():
14
    with open(const.OPML_FILE_PATH, 'r') as f:
15
        content = f.read()
16
    try:
17
        with open(const.OPML_FILE_PATH, 'r') as f:
18
            content = f.read()
19
    except FileNotFoundError:
20
        print(f"File not found: {const.OPML_FILE_PATH}")
21
        return None
22

23
    root = ET.fromstring(content)
24

25
    # Find all podcast episode entries
26
    episodes = root.findall(".//outline[@type='podcast-episode']")
27

28
    current_date = datetime.now(timezone.utc)
29

30
    # Filter episodes with played="1"
31
    # played_episodes = [ep for ep in episodes if ep.get('played') == '1']
32
    played_episodes = [
33
        ep for ep in episodes
34
        if ep.get('played') == '1' and
35
        (current_date - datetime.strptime(ep.get('userUpdatedDate'), "%Y-%m-%dT%H:%M:%S%z")).days <= (const.OPML_AGE_LIMIT_DAYS + 1)
36
    ]
37

38
    # Sort episodes by userUpdatedDate, most recent first
39
    played_episodes.sort(key=lambda ep: datetime.strptime(ep.get('userUpdatedDate'), "%Y-%m-%dT%H:%M:%S%z"), reverse=True)
40

41
    # Get the most recent episodes
42
    top_episodes = played_episodes[:const.NUMBER_OF_EPISODES]
43

44
    # Print the results
45
    episodes_list = ""
46
    for ep in top_episodes:
47
        episodes_list += f"- [{find_podcast_name(root, ep.get('overcastId'))} – {ep.get('title')}]({ep.get('overcastUrl')})\n"
48

49
        # print(f"Title: {ep.get('title')}")
50
        # print(f"Updated Date: {ep.get('userUpdatedDate')}")
51
        # print(f"URL: {ep.get('url')}")
52
        # print(f"Overcast URL: {ep.get('overcastUrl')}")
53
        # podcast_name = find_podcast_name(root, ep.get('overcastId'))
54
        # print(f"Podcast: {podcast_name}")
55
        # print("---")
56

57
    print(episodes_list)
58
    pyperclip.copy(episodes_list)
59

60
    return True

This is the longest one and probably the one where my meager Pythoning probably should embarrass me the most. This parses the OPML file as XML and grabs information about any podcast episodes newer than a certain date (hint: the value of OPML_AGE_LIMIT_DAYS plus 1 day) and then sorts them by the userUpdatedDate value from that episode’s data. After that, it’s just creating a Markdown list of the episodes that match the date and listened to criteria, and copying that list to the clipboard using pyperclip.

I have a Raycast Script Command I can run this from, but obviously in the future it would be better to integrate it more into the site build process itself.

I assume you’re a Python genius compared to me, so please let me know if you have any improvement suggestions beyond the ones I’ve already mentioned.

I haven’t looked at his yet, but I assume they are different since I assume he’s a much better Python programmer than I am! ↩

overcast-history

Footnotes