toward the promised land of playlists, through the valley of plists

I am trying to produce a SAX-based parser for Apple Property List documents. It’s been a frustrating, educational experience. BDFOY has a module, Mac::PropertyList, but it’s not what I want. First of all, it can’t just produce a simple Perl datastructure from a deep plist. That’s fine: I could just patch it to do so, and it would only take a few lines. The bigger concern is that it does its parsing with regular expressions, and takes more than twelve hours to turn my iTunes Music Library file into an object. I’m not sure how much more than twelve hours, because I gave up, at that point; I didn’t want my CPU pegged while I was on the bus back to work.

Maybe XML::SAX will not do any better. I’ll probably write a do-nothing SAX handler to see how long it takes just to dispatch all the events to no-op subs. I really hope it’s less than half a day, since the hard work should be getting done by expat.

Anyway, this has been my first experience with SAX, and, really, with much XML work at all. I know a fair bit about XML, but I’ve done nearly nothing with regard to handling it with something other than a specialized module. (I have, for example, used XML::RSS a lot, but that mostly shields me from the XML.) SAX seems like a really cool system. I’m not entirely clear on how to do what I want, yet. My first pass at a solution nearly did what it needed to, but was completely hideous. I think my biggest problem is that I want to change handlers and perform a recursive descent. That is, I want each element to do something like say: “now that I’m handling a dict element, I want to use the dict parser until I see the end_element event, and then I will invoke some callback with the value generated by the dict parser.”

I think this is possible – and not a wildly misinformed plan – but I’m not sure how to do it from the XML::SAX documentation. I think those docs assume that I’m familiar with SAX itself, which I’m not.

John R. suggested that I should just write some sort of XS interface to Apple’s plist parser, but I’m pretty sure that my C skills are nowhere near the required level. I guess that could be a reason to brush up on my C!

In the end, this really wouldn’t be so ridiculous, if Apple had just done something more reasonable for plists. Why aren’t they traditional Lisp-form-like plists? If they wanted to translate them to XML, why didn’t they translate them to sensible XML? It’s weird to imagine that the programmers in charge just failed to understand XML, but I guess I shouldn’t be surprised, anymore, at bad XML applications.

I just want my iPod to suggest that I should listen to all of “OK Computer” once in a while. Is that so wrong?

Written on January 17, 2006
🏷 parsing
🐫 perl
🧑🏽‍💻 programming
🏷 xml