| Subcribe via RSS

Perl Module Monday: HTTP Parsing Triple-Play

September 14th, 2009 Posted in CPAN, HTTP, Perl

For this week’s Module Monday, I’m going to break form a little bit and actually look at three modules. All of these address the same basic problem, which I wrote about yesterday: parsing HTTP messages.

Right after writing the previous post, I discovered (by means of my CPAN Twitter-bot) two other solutions to this problem, both using linked C/C++ code for speed. So let’s have a look at all of them:

  • HTTP::Parser is the first one I discovered, and the one I’ve stepped up to help maintain. It has a pretty straight-forward interface, but requires that the content be passed to it as strings (though it can handle incremental chunks). Unlike the code in HTTP::Daemon that I hope to eventually replace with this, it does not read directly from a socket or any other file-handle-like source. It uses integer return codes to signal when it is finished parsing a message, at which point you can retrieve a ready-to-use object that will be either a HTTP::Request or an HTTP::Response, depending on the message.
  • HTTP::Parser::XS is the one I discovered via the Twitter-bot, and is also the newest of the pack. Tatsuhiko Miyagawa took this and wrote a pure-Perl fallback, then integrated them into Plack (more on the overall Plack progress in this blog post). The interface is a little unusual, compared to the more minimal approach of the previous option, in that it stuffs most of the information into environment variables in accordance with the PSGI specification (though in this case it uses a hash-table which is passed by reference, rather than actual environment variables). Which is great for projects (like Plack) that are specifically built around PSGI, but may not be as great for more light-weight parsing needs. Also, being very new, the documentation is very spare. It also uses integer return-codes to signal progress, and the codes are very similar in nature to those used by HTTP::Parser (the meaning of -1 seems to differ).
  • HTTP::HeaderParser::XS is the third of the set, and the one I discovered most-recently, as a result of a reference to it in the POD docs of the previous module. This one is over a year old, but seems to have just the one release. It is based on a C++ state-machine, and also offers only sparse documentation.

So, as I move forward with making HTTP::Parser a more generally-useful piece of code, these are my competition and hopefully inspiration. I’d like to see the speed of XS code eventually, but would prefer to make PSGI support an option so that the code is useful in more contexts.

Suggestions always welcome!

Leave a Reply