| Subcribe via RSS

Parsing HTTP Headers

September 13th, 2009 Posted in CPAN, GitHub, HTTP, Perl

So, I’ve volunteered to co-maintain the HTTP::Parser CPAN module. I did this because I’ve been looking for something I can use in RPC::XML::Server instead of my current approach, which is to rely on the parsing capabilities built in to HTTP::Daemon. This is somewhat clumsy, and definitely over-kill; I only have to do this in cases where the code is not already running under HTTP::Daemon or Apache. If the code is already using HTTP::Daemon, then it has its own accept() loop it can use, and if the code is running under Apache then the request object has already parsed the headers.

My need comes when the code is not in either of these environments, it has to be able to take the socket it gets from a typical TCP/IP-based accept() and read off the HTTP request. To avoid duplicating code, I trick the socket into thinking that it’s an instance of HTTP::Daemon::ClientConn, which is itself just a GLOB that’s been blessed into that namespace for the sake of calling methods. So it works. But it makes the code dependent on having HTTP::Daemon loaded, even when the user is not utilising that class for the daemon functionality of the server. I’ve needed to drop this for a while, now.

(I’m not impugning HTTP::Daemon or the libwww-perl package itself– both are excellent and I utilise them extensively within this module. But if you are not running your RPC server under HTTP::Daemon, then you probably would prefer to not have that code in memory since you aren’t really using it.)

Thing is, you can use the request and response objects without having to load the user-agent or daemon classes. But there isn’t an easy, clean way to use just the header-parsing part of the code by itself. The ClientConn class has a get_request() method that can be instructed to parse only the headers and return the HTTP::Request object without the body filled in. The content of the request can then be read off of the socket/object with sysread(). This is why I use the minor hack that I do.

What I want to do, is be able to do this parsing-out of headers without the ugly hack, without loading all of HTTP::Daemon just so I can call one subroutine (albeit 200+ lines of subroutine). (And to be fair, I also call the read_buffer() routine after the header has been read, to get any content that was already read but not part of the header.) So I came across HTTP::Parser. It has a lot of promise, but it’s not quite where I need it to be. For one thing, it won’t stop at just parsing the headers. This is something I need, for cases where the user wants to spool larger elements of a message to disk or for handling compressed content. But most of all, it seemed to not be in active maintenance– there were two bugs in RT that had been sitting there, with patches provided, for over a year.

Fortunately, an e-mail to the author let me offer to help out, and he accepted. The code was not in any repository, so I set up a repo on GitHub for it here, and seeded it with the four CPAN releases so that there would be something of a history to fall back on. I’ve applied the patches (well, applied one, and implemented the other with a better solution) and pushed the changes.

Now, I have to decide how to move forward with this, how to make it as efficient (or more so) than the code in HTTP::Daemon, how to make it into something I can use in RPC::XML::Server to eliminate the unsightly hack I have to rely on currently.

Tags: , , ,

3 Responses to “Parsing HTTP Headers”

  1. Tatsuhiko MiyagawaNo Gravatar Says:

    Heh, cool that we come across the same thing: today Kazuho Oku shipped HTTP::Parser::XS to CPAN, that stops after the header parsing is done, and yeah, I created Plack::HTTPParser::PP which is a copy of HTTP::Parser but has the same interface with his new XS module. Take a look.

    (There is also HTTP::HeaderParser::XS, which Catalyst and Perlbal uses)


  2. rjrayNo Gravatar Says:

    I saw both HTTP::Parser::XS and HTTP::HeaderParser::XS after I took the steps to move in and help with HTTP::Parser.

    Ideally, I’d like to see this package be a wrapper that selects between a pure-Perl solution (perhaps HTTP::Parser::PP) and HTTP::Parser::XS, based on availability. That is, IDEALLY I’d like to see this become flexible-enough and useful-enough that Plack, Catalyst, etc. are able to plug it in.


  3. Dereferenced.com » Blog Archive » Perl Module Monday: HTTP Parsing Triple-Play Says:

    [...] little bit and actually look at three modules. All of these address the same basic problem, which I wrote about yesterday: parsing HTTP [...]


Leave a Reply