reading perlpodspec

I’ve begun work on my Pod-munging grant. My first real task has been reviewing perlpodspec, which is the most useful document one can read about writing a Pod parser. It’s a bit unstructured, but for the most part it’s clear, unambiguous, and useful.

One of the things I’m doing to make my life easy is to avoid considering formatting codes at all. Formatting codes are those things in angle brackets in pod, like I<<use the C<force> method>>. Since I’m only concerned about producing event streams and trees of Pod paragraphs, I can ignore formatting codes entirely without losing much. It would be possible, later, to build something atop this code to handle formatting codes, but I sure don’t plan on doing it!

The only tiny edge case I’ve wondered about is:

This is a B<

very

> big idea!

The spec does not address this edge directly, but I’m pretty sure I can ignore it as garbage. (I’m working on a list of feedback on the spec, and this point will be in it.)

So, to see how much of a difference it makes to exclude formatting codes from my plans, let’s look at the spec. I printed out a copy this morning, and it was 24 pages. They break down roughly like this:

overview of paragraph syntax    |   2 pages
overview of defined =commands   |   2 pages
overview of formatting codes    |   3 pages
misc. notes on implementing     |   5 pages (about 2 on formatting codes)
specific quirks of the L<> code |   4 pages
notes on =over, =back, =item    |   3 pages
notes on =begin, =end, =for     |   5 pages

So, by totally ignoring formatting codes, I can ignore about nine pages of the spec. That’s more than a third of it. What’s better, those pages contain some of the most complicated parts of the spec. Dealing with things like L<> and S<> is not fun, and for the most part it’s only important to formatters. I’m not looking to implement formatting tools, so I can punt and let existing formatters do the job they’ve been designed to do.

The place where Pod::Elemental had previously gotten tied up was in the interaction of the Nester and Document classes. I believe that my closer reading of perlpodspec today has really helped me form a good plan on how to deal with this.

The things I’d been thinking of as “subdocuments” in Pod are better imagined as “regions,” which is a term that the spec itself uses. Regions can be isolated immediately after event production and then dealt with as atomic (or not) units within the pre-nested element collection – the Document. I’ll need to put this into practice, but I think it will work well.

So far, the only case that I’m finding somewhat annoying in my plans is this one:

=begin data

=head1 Not a header.

=end data

Since data doesn’t start with a colon, the contents of the region are not Pod. Because of that, =head1 can’t represent a header. Instead of being a data paragraph beginning with =head1, however, the paragraph is invalid Pod. This means that you can’t have Pod-like content inside a non-Pod region. I suppose this is an attempt to prevent errors like:

=begin y

=begin x

=end y

=end x

=end y

…but I can’t say I’m a big fan. Still, it’s a pretty unlikely use, so I think I can live with it.

My next step is probably going to be the creation of a “blank line” event for Pod::Eventual. I’m hoping to to handle region checking in Eventual, if I can help it, so I’ll need to be able to have blank lines on hand to reconstitute data paragraphs that were broken during a first-pass parse. If that works, Pod::Eventual can remain entirely event-based, with no concept of nesting.

Written on May 23, 2009