| Subcribe via RSS

Perl Module Monday: HTTP Parsing Triple-Play

September 14th, 2009 | No Comments | Posted in CPAN, HTTP, Perl

For this week’s Module Monday, I’m going to break form a little bit and actually look at three modules. All of these address the same basic problem, which I wrote about yesterday: parsing HTTP messages.

Right after writing the previous post, I discovered (by means of my CPAN Twitter-bot) two other solutions to this problem, both using linked C/C++ code for speed. So let’s have a look at all of them:

  • HTTP::Parser is the first one I discovered, and the one I’ve stepped up to help maintain. It has a pretty straight-forward interface, but requires that the content be passed to it as strings (though it can handle incremental chunks). Unlike the code in HTTP::Daemon that I hope to eventually replace with this, it does not read directly from a socket or any other file-handle-like source. It uses integer return codes to signal when it is finished parsing a message, at which point you can retrieve a ready-to-use object that will be either a HTTP::Request or an HTTP::Response, depending on the message.
  • HTTP::Parser::XS is the one I discovered via the Twitter-bot, and is also the newest of the pack. Tatsuhiko Miyagawa took this and wrote a pure-Perl fallback, then integrated them into Plack (more on the overall Plack progress in this blog post). The interface is a little unusual, compared to the more minimal approach of the previous option, in that it stuffs most of the information into environment variables in accordance with the PSGI specification (though in this case it uses a hash-table which is passed by reference, rather than actual environment variables). Which is great for projects (like Plack) that are specifically built around PSGI, but may not be as great for more light-weight parsing needs. Also, being very new, the documentation is very spare. It also uses integer return-codes to signal progress, and the codes are very similar in nature to those used by HTTP::Parser (the meaning of -1 seems to differ).
  • HTTP::HeaderParser::XS is the third of the set, and the one I discovered most-recently, as a result of a reference to it in the POD docs of the previous module. This one is over a year old, but seems to have just the one release. It is based on a C++ state-machine, and also offers only sparse documentation.

So, as I move forward with making HTTP::Parser a more generally-useful piece of code, these are my competition and hopefully inspiration. I’d like to see the speed of XS code eventually, but would prefer to make PSGI support an option so that the code is useful in more contexts.

Suggestions always welcome!

Tags: , , ,

Parsing HTTP Headers

September 13th, 2009 | 3 Comments | Posted in CPAN, GitHub, HTTP, Perl

So, I’ve volunteered to co-maintain the HTTP::Parser CPAN module. I did this because I’ve been looking for something I can use in RPC::XML::Server instead of my current approach, which is to rely on the parsing capabilities built in to HTTP::Daemon. This is somewhat clumsy, and definitely over-kill; I only have to do this in cases where the code is not already running under HTTP::Daemon or Apache. If the code is already using HTTP::Daemon, then it has its own accept() loop it can use, and if the code is running under Apache then the request object has already parsed the headers.

My need comes when the code is not in either of these environments, it has to be able to take the socket it gets from a typical TCP/IP-based accept() and read off the HTTP request. To avoid duplicating code, I trick the socket into thinking that it’s an instance of HTTP::Daemon::ClientConn, which is itself just a GLOB that’s been blessed into that namespace for the sake of calling methods. So it works. But it makes the code dependent on having HTTP::Daemon loaded, even when the user is not utilising that class for the daemon functionality of the server. I’ve needed to drop this for a while, now.

(I’m not impugning HTTP::Daemon or the libwww-perl package itself– both are excellent and I utilise them extensively within this module. But if you are not running your RPC server under HTTP::Daemon, then you probably would prefer to not have that code in memory since you aren’t really using it.)

Thing is, you can use the request and response objects without having to load the user-agent or daemon classes. But there isn’t an easy, clean way to use just the header-parsing part of the code by itself. The ClientConn class has a get_request() method that can be instructed to parse only the headers and return the HTTP::Request object without the body filled in. The content of the request can then be read off of the socket/object with sysread(). This is why I use the minor hack that I do.

What I want to do, is be able to do this parsing-out of headers without the ugly hack, without loading all of HTTP::Daemon just so I can call one subroutine (albeit 200+ lines of subroutine). (And to be fair, I also call the read_buffer() routine after the header has been read, to get any content that was already read but not part of the header.) So I came across HTTP::Parser. It has a lot of promise, but it’s not quite where I need it to be. For one thing, it won’t stop at just parsing the headers. This is something I need, for cases where the user wants to spool larger elements of a message to disk or for handling compressed content. But most of all, it seemed to not be in active maintenance– there were two bugs in RT that had been sitting there, with patches provided, for over a year.

Fortunately, an e-mail to the author let me offer to help out, and he accepted. The code was not in any repository, so I set up a repo on GitHub for it here, and seeded it with the four CPAN releases so that there would be something of a history to fall back on. I’ve applied the patches (well, applied one, and implemented the other with a better solution) and pushed the changes.

Now, I have to decide how to move forward with this, how to make it as efficient (or more so) than the code in HTTP::Daemon, how to make it into something I can use in RPC::XML::Server to eliminate the unsightly hack I have to rely on currently.

Tags: , , ,

Perl Module Monday: Plack

September 7th, 2009 | 1 Comment | Posted in GitHub, Perl, Web Services

This will be a slightly unusual installment of PMM, as I want to look at a module so new that it isn’t actually on CPAN yet, just GitHub: Plack. (When it makes it to CPAN, it should be here.)

Plack is a reference implementation of the burgeoning PSGI initiative. What is PSGI? Well, if you follow that link you’ll get a more complete explanation, but the short form is that it is a Perl alternative to Python’s WSGI (Web Server Gateway Interface) and Ruby’s Rack. The longer-form is that it’s a specification layer to decouple web applications from the specifics of how they’re being run, whether that’s CGI, FastCGI, Apache with mod_perl, etc. The longer explanation can be had at the link.

Back to Plack: Plack is the first reference implementation of the PSGI spec, and already it can pass all of the Catalyst tests. And as of this commit, the plackup script can coerce a an app written for Catalyst, CGI, etc. into running under different environments, thanks to the magic of PSGI.

I’ll be watching Plack very closely. I see a PSGI connector for my XML-RPC server in the not-too-distant future.

Tags: , , ,

Muscle Memory, Part 1: The Strain of Repetitiveness

September 3rd, 2009 | No Comments | Posted in Metaprogramming, Perl

Earlier this morning, I worked a bit on my (other) hobby. Specifically, I fired up my airbrush[1] and painted the road wheels for a WWII Soviet tank that I’m working on.

Ask any modeler who builds armor subjects (assuming you know any, other than myself) and odds are that the road wheels are their least-favorite part of the model. They’re numerous, and worst of all, they’re numbingly repetitive. For this model, I had a total of 36 wheels to paint: on each side of the tank there are 12 road wheels in 6 pairs, plus 3 pairs of smaller wheels that act as return-rollers (keeping the tread from sagging too close to the tops of the road wheels) for a total of 18 per side. For some tank designs, the wheels are fairly simple, smooth affairs that are easy to paint. These, however, had a lot of tight corners and angles that I had to work the paint into. To be fair, this is not the worst-case I’ve dealt with; some years back I built a Panzerkampfwagen 35(t), which sports a numbing total of 24 wheels per side. At least those wheels were easier to paint than this morning’s were.

But it got me thinking about repetitive activity, and how it crops up in my coding. Like most dutiful Perl programmers, I use the “strict” and “warnings” pragmas almost religiously. I even set up templates in editors when and where I can, to ensure that these are always present in my modules. (Well, the use of “warnings” is a little more recent, so I still have some older code on CPAN that lacks the pragmata.)

Some would look at my repetitive use of these, and point out the recent addition to CPAN, common::sense. In many ways, this is a useful tool. But it suffers from some drawbacks:

  • It isn’t part of the core, so it would be an additional dependency.
  • It includes features that are specific to 5.10, so if you’re trying to maintain compatibility for older Perls, it isn’t an option.
  • Most of all, it hides too much of what is being done.

That last point is the most salient to me (that, and the fact that I have modules being used by large-ish projects that are still using 5.6.1). People sometimes talk about “self-documenting” code, code that is very clear in its purpose just from reading it. Truly, a name like “common::sense” is pretty clear. What isn’t as clear is what the author defines as “common sense”, and whether that matches your definition of such. The pragma-module does do its thing fast and with less memory usage than loading the individual parts does. But the user has to ask themselves if their code is clearer and more self-documenting with or without it.

As programmers, we loathe repeating ourselves. We program our editors with cut-and-paste and macro-definition capabilities, just to save a few keystrokes here and there. But we also often find ourselves committing bug-fixes to our repositories with a commit-message that is some variation of “cut/paste error… oops!”

In reasonable, small doses, repeating yourself can be an acceptable thing. Some people in my hobby clean their airbrushes by just running paint thinner through until it comes out clear. But I disassemble and carefully clean mine after every use, even if I plan on immediately loading another color and using it again. A friend in my hobby club back in Denver once said that he does that for the simple reason that the 5 minutes or so that it takes lets him rest his mind and refocus his thoughts on what his next steps are going to be.

When I start a new module or application, putting in the repetitive parts (even if it means only loading a template and making small adjustments) helps me narrow my focus from the project as a whole, down to this one file in particular that I’m about to work on. So, maybe repeating yourself isn’t always a bad thing.

(Edit: This entry is not meant as a critique of common::sense, but rather an argument that repeating oneself is not always a bad thing.)

[1] Before anyone asks: no, I can’t do any custom work for your car or motorcycle. I lack the skill at this point, and my airbrushes are designed for working with model paints. The lacquers one uses for automotive work would be hard on the internal workings.

Tags: ,

Perl Module Monday: Test::Formats

August 24th, 2009 | No Comments | Posted in CPAN, Perl, XML

I was on vacation most of last week, so this week’s installment of PMM is going to be both short and self-serving. For this week, I’m going to “cheat” and talk about one of my own modules: Test::Formats. (I promise to not make a regular habit of using this feature to promote my own projects.)

This is a pretty simple concept: Rather than using lengthy, confusing regular expressions to test the validity of generated XML documents, why not use the validation already built in to the parser itself? The module isn’t for use on snippets, but then those can usually be tested with much simpler, easier-to-read regexp’s.

The tests you would write with this module are tests of the XML your Perl generates, not necessarily the Perl itself. Alas, time constrains me from any useful examples, so I hope you’ll check out the module itself on CPAN. Next week will be better, I promise!

Tags: , ,

Perl Module Monday: IPC::Run3

August 17th, 2009 | No Comments | Posted in CPAN, Perl

For this week’s Module Monday, I’m looking at a recent discovery: IPC::Run3.

I came across this one while looking for best-practices tools to use when executing a sub-process and manipulating all of the file-handles, not just the input, or just the resulting output. I’m going to need this for an upcoming project, one that is needed at $DAY_JOB but for which I’ve been cleared to develop it as a CPAN module rather than an internal one.

What sets this module apart, in my consideration, is the ease with which it allows you to manipulate the input and capture the output. IPC::Open3 does very much the same sort of thing, and has the benefit of already being part of the core. But it uses only open file-handles as its currency, which leaves me doing much of the same open/write-or-read/close logic over and over. This module, in contrast, is very Perl-ish in how it regards each of the parameters for STDIN, STDOUT and STDERR. You can use file-handles, of course, but you can also pass the content for STDIN directly, save the results from the output streams directly, redirect them from/to /dev/null, etc.

Time and tuits permitting, I should have my new work on CPAN within the next 3-4 weeks. And when I do, IPC::Run3 will figure prominently in how it functions.

Tags: , ,

Inaugural Perl Module Monday: Test::XT

August 10th, 2009 | No Comments | Posted in CPAN, Perl

(This kicks off what I hope to be a regular, weekly series on my blog: focusing on a Perl module that’s unsung, or at least under-sung, and hopefully in doing do drawing some extra attention to a tool I feel can help other Perl developers.)

For my first “Perl Module Monday” post, I would like to introduce you to Adam Kennedy’s Test::XT. This module has been around for several months, but I only recently took the time to look at it, and see how I could utilize it.

When I first discovered the CPANTS effort, and the enormous amount of work its creators had put into it, I immediately set about improving my scoreboard. In CPAN circles, this was known as “gaming CPANTS”. And for good reason– a high score is an indicator of nothing more than the fact that your modules pass those particular metrics, none of which measure actual code quality. They only measure the quality of your distribution. I argued (which is almost too strong a word, as the discussion never really got that heated) that as more authors took the CPANTS guidelines to heart, the end result would be worthy in and of itself, a different sort of quality that stood on its own. Think of Ruby’s “gems”, and the perception of how effortless they are to install; many people have the (mistaken) impression that Perl modules are difficult, and that impression most likely came from one or two isolated incidents (whether personal or related anecdotally). And, at least in my case, it has led to better overall module development. I no longer release even the initial version of a module unless I’m pretty confident that it will meet at least the “required” metrics, if not the optional ones as well.

This dedication, though I pat myself on the back so publicly for it, has its price: a fair amount of duplicated effort. One example of this are the author-tests, or maintainer-tests if you prefer.

These are the tests that are really meant to be run only by we the authors, on our own modules. You, the user, really have nothing to gain from watching them run, because if any of them fail you really don’t have a stake in it. These are the tests for the cleanness of the POD structure, tests of the integrity of the YAML metadata file, etc. If “META.yml” doesn’t pass that test, that’s a lot less meaningful to you than if the test script for the actual functionality has one or more failures.

This is where Adam K. stepped in with Test::XT. It generates these boilerplate author/maintainer tests for you. Which handily beats my old practice of copying from an existing project when creating a new one. The test-files that it generates include checks, based on documented environment variables, that prevent the test-suites from running unless you have specified that you (as the author or maintainer) want to run them. It looks at two variables, in fact, to let you choose whether to run them during author-initiated builds, during designated “integration” (nightly, hourly, etc.) builds, or both. The logic is set up in a way that ensures the dependent modules (Test::Pod, Perl::Critic, etc.) don’t get loaded even for the purpose of the “can-we-run-these-tests” test. Which helps to avoid failing the “list of prereqs does not match actual use” metric on CPANTS. (And yes, I still have some modules that fail that, as I haven’t back-ported this to everything yet!)

It’s a simple module, not at all complex. I hope to offer some extensions or patches to it in the future, as it has been greatly helpful to me and I want to help make it even more so. So check it out– even if you aren’t a CPAN author you may find it useful for the tests you develop in your day-to-day work!

Tags: , ,

Stamp Out Tab-Stops! (An Unsung Holy War)

August 3rd, 2009 | 2 Comments | Posted in Perl, PHP

At $DAY_JOB, I have for the first time encountered a coding style that I have problems adapting to. It’s not that I can’t do it; I easily re-configured my editors and IDE settings to accomodate it. It just leaves a bad taste in my mouth every time I commit code with tab characters still present.

It’s a quiet holy war, nowhere nearly as loudly fought as Perl vs. Python (Perl, naturally) or Emacs vs. Vi/Vim (I take the Switzerland stance and use both while advocating neither). Do you expand your tabs to spaces? Do you even have tabs at all, have you instead configured your editor-of-choice to bypass them entirely? Do you prefer to retain tabs? Or do you, like the author of these styling guidelines, advocate mixing spaces and tabs together?

To me, that’s the worst of the options– even though you’re using spaces to make sure that sub-elements of a statement still line up, it only takes a few missed lines and a viewer (such as “less”, or converted-printing such as “enscript”) with tab settings that are different than yours. Then, you have source code whose flow is so much harder to read than it needs to be.

And what are the arguments for even using tabs? They’re a hold-over from the days of mechanical typewriters, and some cool steampunk hacks notwithstanding, we don’t use those anymore. Non-software-professional office workers don’t even use them anymore! The closest they come is in word-processing software such as OpenOffice.org or Microsoft Word. Neither of which, last I checked, were popular as source-code editors.

My coding style has evolved over the years, and the way I code now doesn’t always match what I wrote 15 years ago. Sometimes when I look at my older code it can be painful to see stylistic shortcomings. But when I look at code that uses tabs, let alone that mixes tabs and spaces, it’s often outright jarring. Block structures in Java, Perl, even PHP can be all over the place if the author wasn’t careful. And if there were multiple authors, it’s almost guaranteed to be that much worse.

I’ve heard people say, fairly-recently even, that a tab-stop saves bytes over using spaces for the same indentation. Why is this even remotely an argument when terabyte disks are under $200? Is the savings of a few bytes here and there worth the headache of reading code who’s format is skewed by inconsistent tab layout?

I’ve also heard that it’s a matter of keeping people to a consistent style, and that not everyone has their editors set up to convert tabs. That’s an even weaker argument, since even as I started re-learning Vi/Vim, it took me mere minutes to find the proper option to make it always use spaces in place of tabs (that would be “expandtab”, in case you were wondering). In Emacs you can also configure it (because really, what can’t you configure in Emacs?), and likewise it is trivial to set up in Eclipse, Padre, jEdit, etc., whichever editor is your favorite. And consistency in style is so much more than just tabs versus spaces, it seems like an insult to the question of stylistic guidelines to reduce them to this argument. And with tools such as pertidy, there’s no reason to be that concerned about the relationship between editing and style-adherence.

So help me in my goal of stamping out every last tab-stop in source code that you find! End the tyranny of the mechanical typewriter!

Tags: ,

GreaseMonkey: Hide the ToC on search.cpan.org Module Pages

July 31st, 2009 | No Comments | Posted in CPAN, GitHub, JavaScript

I love it when tools do exactly what they’re supposed to, do it effectively, and in doing so let me do things quickly and easily.

Like any serious Perl programmer, I use search.cpan.org several times a day. Indeed, I even have it as a search plug-in for Firefox. But when I am looking at the manual pages for modules, I find the auto-generated table of contents to be cumbersome. I always just scroll past it to get at the meat. Yes, I could click on the links to jump to sections, but then browser-back has to be clicked that many more times to get back to the main distribution page, search results page, etc.

So I did the obvious thing: I set out to write a GreaseMonkey script to scratch this particular itch. And in this case, JavaScript was just easy-enough to use, and GM just helpful-enough, that before I knew it the task was done. I had initially planned to just put together the basic framework, make sure it was locating the needed <div> tags, etc. But that worked right the first time, so I thought I’d go ahead and have it hide the ToC. That took even less time, so I went ahead and put in a clickable <span> tag to toggle the hidden/exposed state of the ToC. And before I realized it, I’d pretty much finished the whole task.

There were two “gotchas,” of sorts, that I had to spend a little time fixing:

  • Seems that you can’t set the style for CSS pseudo-selectors, “:hover” in this case, via JavaScript. So I solved this by creating an additional element, a <script> tag, to provide hover-styling for the clickable text.
  • The <span> tag displayed as a block object, and as such the “hot” area actually extended well to the right of the text. I solved this part by just wrapping the span in another <div>. Then the span was treated as inline, and the “hot” part is limited to the bounding-box of the text only.

So, the script is being tracked on GitHub, the project page is here. Also, I uploaded it to userscripts.org, and you can install it from it’s page there.

Tags: , , ,

OSCON Day 3: All Over Except the Crying

July 24th, 2009 | No Comments | Posted in Conferences, Perl

(Just to be clear, there’s no crying at OSCON!)

As is usual, the last day of the show is a short one, with only one pair of session-tracks. I skipped the first one as I was deep in conversation with Andy Lester about designing plug-ins for ack. On a side-note, if you’re a developer and you aren’t using ack, you aren’t getting the most you could out of file searching, period. We talked at length about just where and when in the flow hooks should be made, things like that. It’s still quite early, and to be honest I don’t think we’ll have a truly definitive scheme of things until a few plug-ins that are inherently diverse have been written, each contributing their unique needs to the design.

The second set of sessions was a no-brainer: I went to “The Damien Channel” to see what Damien Conway has been up to lately. And what has he been? How about blending regular expressions and recursive-descent grammars into one combined entity. In a more general sense, the talk was about duality in expression and purpose, and in this the new module he introduced succeeds amazingly. As I said in my Twitter stream, I didn’t even TRY to take notes for fear I’d miss something important while looking at the keyboard. I don’t even know yet how I’d actually use this in anything, but I’m dying to try it out just to get a feel for it. It does require that you are running the 5.10.0 version of Perl, though. Not a problem personally, as my desktop and laptop both are based off of the most-current Ubuntu. However, none of my CPAN modules are utilizing 5.10.0 features because I’m not ready to force my users to move up. But that’s a topic for a later post.

As I’ve said more than a few times, I was just glad to be back at OSCON after a long absence. Regardless of where it’s held next year, I hope it doesn’t take me another 6 years to make it again.

Tags: ,