| Subcribe via RSS

Perl Module Monday: HTTP::Tiny

October 17th, 2011 | 2 Comments | Posted in CPAN, HTTP, Perl

I’m still deep in the Stanford AI class, so this will be a light-weight posting. And since it’s going to be light-weight anyway, I’ll cover a module in the *::Tiny namespace: HTTP::Tiny.

HTTP::Tiny is a simple HTTP/1.1 client library with plenty of options. It handles HTTPS (if you have IO::Socket::SSL available) as well as HTTP requests, and does all the basic HTTP verbs. As is the case with most *::Tiny modules, the goal is to do as much as one can, without the overhead or dependency chain of a larger module. In this case HTTP::Tiny stands as a replacement for LWP::UserAgent, for those cases when you don’t need the full functionality that LWP provides.

The main methods of HTTP::Tiny that you’re likely to utilize (besides the constructor) are request() and get() (which is just a front-end to request(), with the ‘method’ argument set to GET). There is also a method called mirror(), which is handy for making a local copy of a web resource on your filesystem. mirror() even sets an “If-Modified-Since” header on the request, if the file already exists. A nice touch to have added! The request() method allows for a very useful range of options, that make it easy to pass specific headers, use call-back subroutines for either (or both) of the request body or the processing of the response, and provide trailer headers for chunked transfer-encoding. One thing I find curious, though, is why the author provides a short-hand method for the GET request, but not for the other verbs. Since all are called using the same semantics, it seems to me like it would have made as much sense to provide head(), put(), etc.

Still, it’s a nice little approach to HTTP communication, that doesn’t require as much setting-up of resources as LWP generally does. It doesn’t have the flexibility that LWP does, either, but sometimes you just don’t need that. You just need to get going in a few lines:

use HTTP::Tiny;

my $http = HTTP::Tiny->new();

for my $url (@ARGV)
{
    (my $file = $url) =~ s{^.*/}{};
    if (! $file)
    {
        warn "Skipping $url (no file component)\n";
        next;
    }
    $http->mirror($url, $file);
}

The above just mirrors all the URLs passed in via @ARGV, using the last file element of the URL as the file name to save to. It doesn’t have the progress-bar and summary that LWP’s “lwp-download” has, but it gets the job done.

So have a look, this could be a useful addition to your toolkit, sitting beside LWP and handling some of the simpler tasks for it.

Tags: , , ,

Perl Module Monday: HTTP Parsing Triple-Play

September 14th, 2009 | No Comments | Posted in CPAN, HTTP, Perl

For this week’s Module Monday, I’m going to break form a little bit and actually look at three modules. All of these address the same basic problem, which I wrote about yesterday: parsing HTTP messages.

Right after writing the previous post, I discovered (by means of my CPAN Twitter-bot) two other solutions to this problem, both using linked C/C++ code for speed. So let’s have a look at all of them:

  • HTTP::Parser is the first one I discovered, and the one I’ve stepped up to help maintain. It has a pretty straight-forward interface, but requires that the content be passed to it as strings (though it can handle incremental chunks). Unlike the code in HTTP::Daemon that I hope to eventually replace with this, it does not read directly from a socket or any other file-handle-like source. It uses integer return codes to signal when it is finished parsing a message, at which point you can retrieve a ready-to-use object that will be either a HTTP::Request or an HTTP::Response, depending on the message.
  • HTTP::Parser::XS is the one I discovered via the Twitter-bot, and is also the newest of the pack. Tatsuhiko Miyagawa took this and wrote a pure-Perl fallback, then integrated them into Plack (more on the overall Plack progress in this blog post). The interface is a little unusual, compared to the more minimal approach of the previous option, in that it stuffs most of the information into environment variables in accordance with the PSGI specification (though in this case it uses a hash-table which is passed by reference, rather than actual environment variables). Which is great for projects (like Plack) that are specifically built around PSGI, but may not be as great for more light-weight parsing needs. Also, being very new, the documentation is very spare. It also uses integer return-codes to signal progress, and the codes are very similar in nature to those used by HTTP::Parser (the meaning of -1 seems to differ).
  • HTTP::HeaderParser::XS is the third of the set, and the one I discovered most-recently, as a result of a reference to it in the POD docs of the previous module. This one is over a year old, but seems to have just the one release. It is based on a C++ state-machine, and also offers only sparse documentation.

So, as I move forward with making HTTP::Parser a more generally-useful piece of code, these are my competition and hopefully inspiration. I’d like to see the speed of XS code eventually, but would prefer to make PSGI support an option so that the code is useful in more contexts.

Suggestions always welcome!

Tags: , , ,

Parsing HTTP Headers

September 13th, 2009 | 3 Comments | Posted in CPAN, GitHub, HTTP, Perl

So, I’ve volunteered to co-maintain the HTTP::Parser CPAN module. I did this because I’ve been looking for something I can use in RPC::XML::Server instead of my current approach, which is to rely on the parsing capabilities built in to HTTP::Daemon. This is somewhat clumsy, and definitely over-kill; I only have to do this in cases where the code is not already running under HTTP::Daemon or Apache. If the code is already using HTTP::Daemon, then it has its own accept() loop it can use, and if the code is running under Apache then the request object has already parsed the headers.

My need comes when the code is not in either of these environments, it has to be able to take the socket it gets from a typical TCP/IP-based accept() and read off the HTTP request. To avoid duplicating code, I trick the socket into thinking that it’s an instance of HTTP::Daemon::ClientConn, which is itself just a GLOB that’s been blessed into that namespace for the sake of calling methods. So it works. But it makes the code dependent on having HTTP::Daemon loaded, even when the user is not utilising that class for the daemon functionality of the server. I’ve needed to drop this for a while, now.

(I’m not impugning HTTP::Daemon or the libwww-perl package itself– both are excellent and I utilise them extensively within this module. But if you are not running your RPC server under HTTP::Daemon, then you probably would prefer to not have that code in memory since you aren’t really using it.)

Thing is, you can use the request and response objects without having to load the user-agent or daemon classes. But there isn’t an easy, clean way to use just the header-parsing part of the code by itself. The ClientConn class has a get_request() method that can be instructed to parse only the headers and return the HTTP::Request object without the body filled in. The content of the request can then be read off of the socket/object with sysread(). This is why I use the minor hack that I do.

What I want to do, is be able to do this parsing-out of headers without the ugly hack, without loading all of HTTP::Daemon just so I can call one subroutine (albeit 200+ lines of subroutine). (And to be fair, I also call the read_buffer() routine after the header has been read, to get any content that was already read but not part of the header.) So I came across HTTP::Parser. It has a lot of promise, but it’s not quite where I need it to be. For one thing, it won’t stop at just parsing the headers. This is something I need, for cases where the user wants to spool larger elements of a message to disk or for handling compressed content. But most of all, it seemed to not be in active maintenance– there were two bugs in RT that had been sitting there, with patches provided, for over a year.

Fortunately, an e-mail to the author let me offer to help out, and he accepted. The code was not in any repository, so I set up a repo on GitHub for it here, and seeded it with the four CPAN releases so that there would be something of a history to fall back on. I’ve applied the patches (well, applied one, and implemented the other with a better solution) and pushed the changes.

Now, I have to decide how to move forward with this, how to make it as efficient (or more so) than the code in HTTP::Daemon, how to make it into something I can use in RPC::XML::Server to eliminate the unsightly hack I have to rely on currently.

Tags: , , ,