Tooting toots with Tootly
Few years ago I got inspired by DF Tootbot and decided that I'd like to have something of this sort for my own blog. I knew I'll need something along the lines of what he was using, simply because it's a rather universal approach using RSS feed. From the technology stack, I knew I'll most likely want to tackle this task using something else than a bash script and for storing data I was set from early on to go with SQLite.
The basic idea is straight-forward:
- Publish new post to the blog
- Have a cronjob running "somewhere" to monitor RSS feed
- Push the title and (shortened) URL to the new post to Twitter if it's new or no-op if it was already pushed
- Profit
The obvious benefit for using RSS feed is that the underlying platform doesn't really matter. Whether I self-host my site using Jekyll or Hugo or use something like Micro.blog or Ghost, I'll be able to use exactly the same script, and will only need to ensure that it's pointed at the correct feed.
Well, OK, maybe some slight adjustments to match the new platform be necessary, but they should be rather tiny.
The obvious caveat of this approach is the delay between post being published and when the cronjob runs. Additionally, one needs to keep track of the posts already published—in a flat file, or database or what have you.
The alternative? One I was entertaining an idea of are webhooks, at least when using Ghost. It's a very clean solution in a way that no action would be triggered until new post is published. It has its own set of problems, though: there needs to be something listening to the webhook event, running constantly and reliably; if it's not running constantly and reliably, the webhook event might miss and this would require some sort of repeat logic.
Solution depends on the importance of this entire endeavor, I suppose.
I used this tiny project (called Tweetly at that time) as a good way of building something using Python and figuring out ways of deploying it. I find Python projects particularly appealing to be containerized—something I don't find appealing at all when it comes to, for example, Golang. After playing a bit with GitHub Actions, I had a pipeline ready that resulted in an image I'm able to use with all sorts of solutions (Docker, Podman, Kubernetes etc.).
It was fun to iterate a bit to keep up with some of the changes and adjustments. Here's one example:
post_title = latest_post.find("title").text
post_link = latest_post.find("link").text
post_date = latest_post.find("pubDate").text
- post_date = datetime.strptime(post_date[:-6], "%a, %d %b %Y %H:%M:%S").date()
+ post_date = datetime.strptime(post_date, "%a, %d %b %Y %H:%M:%S %Z").date()
return post_title, post_date, post_link
else:
raise SystemExit("Feed couldn't be reached!")
It's a difference in the way date was handled—in this particular example switch to Ghost made the entire code simpler. No need for a hacky [:-6]
when all I needed was %Z
to handle timezones properly.
Later on, when Twitter announced their API changes, I started hacking in support for Mastodon. At that time I didn't know how exactly things will pan out, so I only ended up adding my Fosstodon account as just another endpoint to push new posts to. Thanks to the amazing Mastodon.py wrapper, it was a breeze.
Eventually Twitter became a shitshow and I moved on entirely to Mastodon, and so did my tiny Python project—this time called Tootly (as I dropped Twitter support completely). I find tooting a bit more flexible due to more characters allowed, so I decided to expand a bit the script, so that it does two additional things:
- Includes not only title of the post, but also excerpt limited to 200 characters
- Auto-tagging
The first point, I suppose, is rather straight-forward—the second one requires some additional work in defining which exact words, case-insensitive, are going to be mapped to tags (but only up to 8 are going to be published). Where? I'm using Pydantic's built-in BaseSettings to handle this task. These variables can be defined in code (as in the linked example), in the .env
file and/or as environment variables. Template is very simple:
ENV_NAME=""
FEED_URL=""
DB_URL=""
SHORTY_URL=""
MASTODON_TOKEN=""
MASTODON_URL=""
TAGS='[""]'
And here's a bit more real-life example:
env:
- name: ENV_NAME
value: Kubernetes
- name: FEED_URL
value: https://chabik.com/rss/
- name: DB_URL
value: "sqlite:////database/tootly.db"
- name: MASTODON_URL
value: https://fosstodon.org
- name: TAGS
value: '["aix",
"bash",
"bastillebsd",
"bcc",
"bpfcc",
[...]
"tapestry",
"ubuntu",
"unix",
"zfs",
"zsh"]'
Yes, that Kubernetes in there is not accidental—I wrote a simple Helm chart and ever since run Tootly as a CronJob on my NUC's Kubernetes. But this is probably another story for some other time (:
Are there ready to go solutions for this kind of stuff? I bet there are, but where's fun in using them instead of writing my own? I learned quite a bit of cool things one can do in Python and I got to appreciate this language even more than before. It's one of the automations I can just leave be and bet that, even with my flaky connection at home, it's going to do its thing in a reliable fashion. I can't ask for more than that.
Discussion