Hannibal, Part 1: Software
A couple of weeks ago, I saw that someone had left a Chromebook in my building’s printer room. It was labeled “Free; working.” So, of course, I took it. What about my life couldn’t be improved by a dodgy laptop with an eBay value of about $50?
As it turns out, the answer is “Not a whole lot.” It took me a good few hours to decide what to actually do with it. But I had the machine, and I knew there had to be something, and eventually I thought that I could make it into a pretty decent picture frame. The plan was to have it connect to the internet and search through repositories of the great paintings of history, and then to give it voice-recognition capabilities. I wanted to be able to sit across the room and shout “Hannibal, show me impressionist art” and have it show me some.
Oh, and I named it Hannibal, after the leader of the A-Team. The A-Team spent most of their time on the run for a crime they didn’t commit. In other words, they were framed.
This post is going to be about the software, which is most of what I’ve managed to do so far. Once I get the hardware rearranged into something shaped less like a laptop and more like a painting, I’ll do a post on that too. But there wasn’t much point in spreading the guts of this thing all over my coffee table before I had at least figured out what kind of operating system I could run on it.
Getting Linux Running on a Chromebook
Yes, yes, I know that ChromeOS is a Linux derivative. The operative word here is “derivative.” I struggle to even call it a distro, given how different it is from the average Linux experience. It’s very locked-down, which is great for its intended use, but awful if you want to completely take apart a device and remake it into something entirely different. I needed more control.
I also needed something that would run on the incredibly weak processor that the Chromebook came with, and fit into its whopping 16GB hard drive with enough room to spare for a serious database of paintings. At this point, to my mind, it pretty obviously had to be Arch. Maybe I could have looked around some more, but Arch was the lightweight system I was most familiar with, and I’d had success resurrecting old laptops with it before. I felt optimistic about things, and I felt even better when I managed to bypass the Chromebook’s BIOS security and boot to my Arch USB stick. Things were going great… and then I had to run pacstrap
.
For those who don’t know, pacstrap
is the command that does the most to actually install Arch onto a new system. Its job is to download crucial packages from the Arch repositories and unzip them onto the hard drive. Normally this works great, but it does require an internet connection. Luckily, the default Arch ISO is supposed to be able to set this up out of the box on most systems.
It turns out that a Chromebook that I picked up for free in the printer room of a stunningly ugly building in Berkeley is not “most systems.” I knew for a fact that it had a WiFi chip, but Arch couldn’t see it at all. It did recognize something that called itself an ethernet adapter; when I poked around a bit more, I figured out that that was the SIM card slot and its equipped radio. Apparently it’s easier just to expose that as ethernet over internal USB.
I suppose I could have slotted a SIM card, but that would (1) cost me money and (2) seem like some kind of a defeat. Instead, I realized that, while the Arch ISO didn’t have the drivers for built-in WiFi, it clearly did have drivers for ethernet-over-USB. And the Chromebook had plenty of USB ports. So I emailed the front-desk administrator at my office, who emailed the head of IT, who dug up an actual USB-to-ethernet dongle, and I was off to the races.
(In retrospect, I could have also made a custom Arch install ISO with archiso
, which is software that I have a fair amount of experience with. For whatever reason, it didn’t cross my mind until I wrote this. The ethernet dongle was easier anyway.)
The rest was a textbook Arch install. There was even a wiki page describing the quirks to watch out for on Chromebooks specifically. It was as easy as these things ever got, which is to say, it was eminently achievable but it still made me feel like a genius when I figured it out. The BIOS screens looked ugly, but there wasn’t much I could do about that. The important thing was that it ran, and I could start thinking about how to talk to it.
Arch’s Voice-Assistant Scene
I briefly toyed with the idea of writing my own voice assistant and just farming out the speech recognition to a pretrained neural net, but I didn’t have to bother. There turned out to be a whole world of voice-assistant software for Linux in general and Arch in particular. Many of them can be traced back to forks of Blather, which is now very deprecated, hasn’t merged a pull request in seven years, and is a hassle to run at the best of times. Its core ideas, however, are pretty solid, and the software I wound up going with, lee](https://git.clarahobbs.com/clara/kaylee), is basically Blather but easier to set up. You write a list of the sentences you want to recognize, you associate each one with a shell command, and you let it run.
This was the first place where I had to manage my expectations. In a perfect world, Kaylee would have recognized everything said by anyone and just mined it for sentences that could have been construed as requests for a certain genre of art. This turns out to be both laborious and potentially inaccurate. To achieve really snappy, accurate transcription on a single laptop processor, you want to scope your neural net down to just a few words. Carnegie Mellon has the lmtool, which does exactly this: you give it a list of commands, and it gives you a neural net that can recognize every word in those commands and nothing more. Kaylee uses this automatically, and the accuracy is excellent.
The problem, of course, is that it can’t recognize anything that it hasn’t explicitly seen before, which means that niche artists are more or less out. Eventually I just got a list of the hundred most famous painters and programmed each of them in in turn. Kaylee doesn’t have a system for making choices from a list, so each one had to be an entirely different command. I wound up writing a script to manage them all and generate the necessary config files. With a little bit of manual editing, the output was good enough to use.
(I was very interested to note how comparatively ancient all this software was. The lmtool’s most recent version is thirteen years old, and Kaylee’s development stopped only a year after Blather’s. My guess is that the world of speech recognition has, like most machine learning, moved into the realm of huge datacenters with nice APIs. There’s very little point in training anything more complex than what the lmtool offers when you can just pay fractions of a cent to get Google to do live transcription for you. Of course, I didn’t want to do that, and Kaylee served its purpose very well. That’s another potential reason why there hasn’t been much recent development here: the problem is solved. Kaylee is stable, effective, and easy-to-use. It even comes with a GUI, although I’ve never tested that. I thought again about writing my own, just so that there would be a more modern alternative out there, but what would be the point?)
The Artplication Programming Interface
With Arch installed, running, and hooked up to the internet, the next step was to find a nice API for art. Lucky for me, the Met had one! It offered instant access to thousands of paintings from across their collection, tagged with title, artist, and all the metadata I could possibly want. Response times were fast and I didn’t even have to sign up for an API key. The only problem was that, when I asked it for Picassos, it spat out two fifteenth-century Italian paintings, one Cézanne, and one error.
I knew for a fact that the Met had Picassos, so I eventually wrote them an email. It said “Please let me know what I’m doing wrong,” which is dev-speak for “I hate you and I hate the thing you’ve built.” Which was true. Apparently, paintings that still had some copyright attached to them weren’t included in the API. I’m still not entirely sure why this is. The Met does have the rights to distribute them online: you can find them in their search engine, and right-click to download the image. If I’d really wanted to, I could just have used a headless browser and built my own “API” out of that. That, however, would have run the risk of the software breaking a few years down the line, as well as being possibly too heavy for a Chromebook that was already going to be running speech-recognition software.
My friend Jake, who’s been tangentially featured here before, suggested that I try the National Gallery of Art instead. These guys are terrific. Rather than exposing an API, they make the gigachad move of publishing their 2+GB database as a series of CSVs in a fully public-domain GitHub repo. They get around copyright by not including the images, although they do include links telling you exactly where to go to download them. The links even let you specify the size you want, and the server automatically resizes them for you. (This isn’t documented anywhere, as far as I can tell, but it’s not hard to figure out.)
So my final art API works as follows. I have a local copy of the repo sitting on Hannibal’s 16GB SSD. When I start up the program, it first does a git pull in that repo. Then it reads the CSVs into memory using Pandas. This might seem like glorious overkill, and it is, but it lets me be very efficient about the necessary filtering and sorting: I have to filter to only paintings, and then only paintings that have an associated picture (which lives in a different file), and then convert everything to lowercase to avoid being messed up by weird capitalization. There’s also a list of all the exhibitions that every painting has appeared in, so I have to tag each painting with all those words so I can search those too. The final dataframe comes out to be about 22MB currently, so there’s plenty of room to scale as the collection grows. It fits comfortably in the Chromebook’s memory. Searching for paintings is done entirely locally, and returns a list of URLs. Then, I just download the relevant one whenever I need it, and I’m away. I even added a bleep sound to acknowledge new art requests.
Putting Everything Together
Like all the best Unix projects, Hannibal is made out of lots of separate pieces that each do one small thing. There’s Arch, that runs everything; there’s X and i3, which take care of starting the programs and managing the single fullscreen Tk window that actually displays the art; and there are Kaylee and my script, which are supposed to work tightly together to turn voice input into image output.
My first thought was to do the most Unix thing possible and have my script take commands from standard input. Then, I could have Kaylee write its recognized artist names to standard output and pipe the one to the other. This ran into two problems:
-
Kaylee starts a new shell to execute each command, so it doesn’t play nice with pipes.
-
Reading asynchronously from standard input in Python sucks so bad.
I found some code snippets and even libraries online, but combining regular asynchronous programming with tkinter asynchronous programming was nasty. I turned instead to the Tom Scott method. My Kaylee instance was set up to write artist names to a file called searchkey
. Ten times a second, the image script checks that file. If it’s been modified since the last check, it reads it and takes in the command.
I know that this is horrible code, and I’d never write something like it for deployment on a wide scale. The saving grace here is that it’s only ever supposed to run on one specific system. Hannibal instances aren’t deployed, they’re built from the ground up, and I can get away with a lot that I would never try if I expected anyone else to have to work with what I made.
Next Steps
To answer the obvious question: it does work! I can ask it for Picassos and get Picassos. (Interestingly enough, the NGA appears to focus on his more realistic artwork.) I can ask it for Frida Kahlo and get an apologetic bleep.
The next step is to build it into an actual picture frame. I’ve managed to take it apart down to its most important components, and I’ve got quite a lot to say about just that already, but this isn’t going to be a one-afternoon piece of work. I’m waiting for parts from China, not to mention access to some of the more high-powered equipment I’d like to use, so the project is on hold for the next few weeks. With luck, I’ll be able to build some of the hardware after Thanksgiving.