<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dereferenced.com &#187; Web</title>
	<atom:link href="http://www.dereferenced.com/topics/web/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dereferenced.com</link>
	<description>A preponderance of Perl, an excess of XML, and additional alliterations.</description>
	<lastBuildDate>Sun, 29 Aug 2010 23:16:38 +0000</lastBuildDate>
	
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Dueling Twitter-Bots</title>
		<link>http://www.dereferenced.com/2009/11/19/dueling-twitter-bots/</link>
		<comments>http://www.dereferenced.com/2009/11/19/dueling-twitter-bots/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 00:19:13 +0000</pubDate>
		<dc:creator>rjray</dc:creator>
				<category><![CDATA[CPAN]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.dereferenced.com/?p=87</guid>
		<description><![CDATA[It looks as though I have some &#8220;competition&#8221; for my CPAN Twitter bot (@cpan_linked). The last few days I&#8217;ve been seeing posts from @cpanlive in my Perl search-column (I use Seesmic Desktop, with a permanent column for searches on &#8220;#perl&#8220;). This seems to be pretty much identical in intent to my bot, with some differences. [...]]]></description>
			<content:encoded><![CDATA[<p>It looks as though I have some &#8220;competition&#8221; for my CPAN Twitter bot (<a href="http://twitter.com/cpan_linked">@cpan_linked</a>). The last few days I&#8217;ve been seeing posts from <a href="http://twitter.com/cpanlive">@cpanlive</a> in my Perl search-column (I use <a href="http://seesmic.com/desktop.html">Seesmic Desktop</a>, with a permanent column for searches on &#8220;<a href="http://twitter.com/#search?q=%23perl">#perl</a>&#8220;). This seems to be pretty much identical in intent to my bot, with some differences. I&#8217;ll cover those and what I think of them:</p>
<h3>No URL-Shortening</h3>
<p>Where I do URL-shortening (currently through TinyURL due to some down-time with Metamark, though I think they&#8217;re back up now), <tt>@cpanlive</tt> doesn&#8217;t. I believe that if the status message were to exceed 140 characters, Twitter would notice this and scan the message for URLs to shorten automatically. In the end, I suppose it&#8217;s a matter of taste&#8211; with @cpanlive you&#8217;ll see the actual URL the majority of the time.</p>
<h3>Content</h3>
<p>I usually provide two links, the second being to the &#8220;Changes&#8221; (Changes, ChangeLog, etc.) file, or the README if I can&#8217;t find a change-log. <tt>@cpanlive</tt> provides just the main link. I also include the author&#8217;s name. This goes back to the use of URL-shortening&#8211; I&#8217;m careful to keep my status under 140 characters, but having pre-shortened the links gives me more room to play with.</p>
<h3>Hash-Tagging</h3>
<p>I don&#8217;t use any hash-tags, currently. <tt>@cpanlive</tt> uses both &#8220;#Perl&#8221; and &#8220;#CPAN&#8221;. On the one hand, I probably wouldn&#8217;t have even known about this if it weren&#8217;t for the tagging, as I wouldn&#8217;t have seen it otherwise. On the other hand, this puts a lot of data from a single bot into the #perl search-stream. Most people know to follow a given bot if they want CPAN stream updates, and would prefer to not have them cluttering up #perl.</p>
<p>That said, I am considering adding #CPAN to the updates that <tt>@cpan_linked</tt> puts out, as well as the re-write that I&#8217;m (slowly) working on. I think that such data is more useful to users searching on #CPAN than those searching on #Perl.</p>
<h3>Speed and Pacing</h3>
<p>When <tt>@cpan_linked</tt> gets a cluster of several CPAN updates at once, it tries to spread them out over the next period between polls of <a href="http://search.cpan.org">search.cpan.org</a>&#8217;s RDF feed. Currently, I poll it every 15 minutes, so if I get 5 new items to post they get posted roughly 3 minutes apart. It looks like <tt>@cpanlive</tt> doesn&#8217;t do anything like this, as the updates seem be in &#8220;clumps&#8221;, which is what I was trying to avoid. Again, a matter of taste. I didn&#8217;t want the bot to suddenly spew 10-20 updates into my Twitter stream, pushing everything else &#8220;below the fold&#8221; as it were. Other followers might not care one way or the other.</p>
<p>Something I&#8217;ve noticed, though, is that <tt>@cpanlive</tt> seems to be about an hour or so behind <tt>@cpan_linked</tt>, on average. I assume that they&#8217;re polling the RDF source hourly, rather than the 15-minute interval I use.</p>
<h3>Conclusion</h3>
<p>Well, there&#8217;s no &#8220;conclusion&#8221; here, really. I mean, it&#8217;s not like I have an exclusive license to relay CPAN releases to Twitter. If such an exclusive right existed, it wouldn&#8217;t be mine in the first place! I do wonder about the reason for doing it over again, though it may just be someone&#8217;s project for learning how to use the Net::Twitter modules. It does push me to get cracking on my re-write, though, as I have other features planned that should make it even more useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dereferenced.com/2009/11/19/dueling-twitter-bots/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Perl Module Monday: Net::Twitter(::Lite)</title>
		<link>http://www.dereferenced.com/2009/09/28/perl-module-monday-nettwitterlite/</link>
		<comments>http://www.dereferenced.com/2009/09/28/perl-module-monday-nettwitterlite/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 05:42:59 +0000</pubDate>
		<dc:creator>rjray</dc:creator>
				<category><![CDATA[CPAN]]></category>
		<category><![CDATA[GitHub]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[module-monday]]></category>

		<guid isPermaLink="false">http://www.dereferenced.com/?p=62</guid>
		<description><![CDATA[(If I keep covering multiple modules in a post, I&#8217;m going to have to change the title and tag I use&#8230;)
I generally try to use these posts to highlight lesser-known modules, and I imagine that the Net::Twitter module is fairly higher-profile than most of my previous choices. But are you familiar with Net::Twitter::Lite, as well?
It&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>(If I keep covering multiple modules in a post, I&#8217;m going to have to change the title and tag I use&#8230;)</p>
<p>I generally try to use these posts to highlight lesser-known modules, and I imagine that the <a href="http://search.cpan.org/dist/Net-Twitter">Net::Twitter</a> module is fairly higher-profile than most of my previous choices. But are you familiar with <a href="http://search.cpan.org/dist/Net-Twitter-Lite">Net::Twitter::Lite</a>, as well?</p>
<p>It&#8217;s not unusual for CPAN to offer more than one solution to a given problem. The wide range of XML parsers is a testament to this. And when a subject is popular, the odds are even greater that people may choose to &#8220;roll their own&#8221; rather than trying to contribute to an existing effort. Fortunately, the interface to the social messaging service <a href="http://twitter.com">Twitter</a> has been spared this. Maybe it&#8217;s because the source code is <a href="http://github.com/semifor/Net-Twitter">hosted on GitHub</a>, and thus it is easier for people to contribute. Whatever the reason, the only real competition to Net::Twitter for basic Twitter API usage is Net::Twitter::Lite. And it&#8217;s not actually a competitor in the general sense.</p>
<p>Rather than representing a competing implementation, Net::Twitter::Lite came about as an (almost completely) interface-compatible alternative to Net::Twitter after it was refactored to use Moose internally. While it doesn&#8217;t have 100% of the features that Net::Twitter has, both modules strive for 100% coverage of Twitter&#8217;s API. Where N::T::Lite runs without the additional requirement of Moose, N::T gives you finer-grained control over which parts of the API are loaded and made available to connection objects.</p>
<p>I&#8217;ve used both modules, and can attest to the fact that the interface is kept consistent between them. At $DAY_JOB I authored a tool to echo data to a Twitter stream, for which N::T::L was the best choice as it had the fewest dependencies and our needs did not call for the additional functionality of N::T. My Twitter-bot (<a href="http://twitter.com/cpan_linked">cpan_linked</a>) was written with N::T in the pre-Moose days, and has not had a single problem since I seamlessly upgraded N::T to the Moose-based version. As I work on the next generation CPAN-bot, I&#8217;ll be using the OAuth support, as well as possibly the search API. Since it will be a long-running daemon, I&#8217;ll stick with the more-featureful N::T for it. But thanks to the diligence of the modules&#8217; authors, I could just as easily swap between them at will.</p>
<p>If you&#8217;re planning to interface to Twitter from Perl, these two modules should be your starting point. But be sure to look at the <a href="http://search.cpan.org/search?query=twitter&amp;mode=all">other Twitter-oriented modules</a>, just to be sure. There&#8217;s a lot of activity around this API, and Perl developers have kept on top of it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dereferenced.com/2009/09/28/perl-module-monday-nettwitterlite/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Perl Module Monday: HTTP Parsing Triple-Play</title>
		<link>http://www.dereferenced.com/2009/09/14/perl-module-monday-http-parsing-triple-play/</link>
		<comments>http://www.dereferenced.com/2009/09/14/perl-module-monday-http-parsing-triple-play/#comments</comments>
		<pubDate>Tue, 15 Sep 2009 05:09:31 +0000</pubDate>
		<dc:creator>rjray</dc:creator>
				<category><![CDATA[CPAN]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[module-monday]]></category>

		<guid isPermaLink="false">http://www.dereferenced.com/?p=54</guid>
		<description><![CDATA[For this week&#8217;s Module Monday, I&#8217;m going to break form a little bit and actually look at three modules. All of these address the same basic problem, which I wrote about yesterday: parsing HTTP messages.
Right after writing the previous post, I discovered (by means of my CPAN Twitter-bot) two other solutions to this problem, both [...]]]></description>
			<content:encoded><![CDATA[<p>For this week&#8217;s Module Monday, I&#8217;m going to break form a little bit and actually look at <em>three</em> modules. All of these address the same basic problem, which I <a href="http://www.dereferenced.com/2009/09/13/parsing-http-headers/">wrote about yesterday</a>: parsing HTTP messages.</p>
<p>Right after writing the previous post, I discovered (by means of my <a href="http://twitter.com/cpan_linked">CPAN Twitter-bot</a>) two other solutions to this problem, both using linked C/C++ code for speed. So let&#8217;s have a look at all of them:</p>
<ul>
<li><a href="http://search.cpan.org/dist/HTTP-Parser/">HTTP::Parser</a> is the first one I discovered, and the one I&#8217;ve stepped up to help maintain. It has a pretty straight-forward interface, but requires that the content be passed to it as strings (though it can handle incremental chunks). Unlike the code in HTTP::Daemon that I hope to eventually replace with this, it does not read directly from a socket or any other file-handle-like source. It uses integer return codes to signal when it is finished parsing a message, at which point you can retrieve a ready-to-use object that will be either a HTTP::Request or an HTTP::Response, depending on the message.</li>
<li><a href="http://search.cpan.org/dist/HTTP-Parser-XS/">HTTP::Parser::XS</a> is the one I discovered via the Twitter-bot, and is also the newest of the pack. <a href="http://bulknews.typepad.com/blog/">Tatsuhiko Miyagawa</a> took this and wrote a <a href="http://github.com/miyagawa/Plack/blob/5f68ec28d2c103d93e31097e249fc7ad18433c86/lib/Plack/HTTPParser/PP.pm">pure-Perl fallback</a>, then integrated them into <a href="http://github.com/miyagawa/Plack">Plack</a> (more on the overall Plack progress in <a href="http://bulknews.typepad.com/blog/2009/09/plack-standalone-and-apache2-support.html">this blog post</a>). The interface is a little unusual, compared to the more minimal approach of the previous option, in that it stuffs most of the information into environment variables in accordance with the <a href="http://github.com/miyagawa/psgi-specs/blob/master/PSGI.pod">PSGI</a> specification (though in this case it uses a hash-table which is passed by reference, rather than actual environment variables). Which is great for projects (like Plack) that are specifically built around PSGI, but may not be as great for more light-weight parsing needs. Also, being very new, the documentation is very spare. It also uses integer return-codes to signal progress, and the codes are very similar in nature to those used by HTTP::Parser (the meaning of -1 seems to differ).</li>
<li><a href="http://search.cpan.org/dist/HTTP-HeaderParser-XS/">HTTP::HeaderParser::XS</a> is the third of the set, and the one I discovered most-recently, as a result of a reference to it in the POD docs of the previous module. This one is over a year old, but seems to have just the one release. It is based on a C++ state-machine, and also offers only sparse documentation.</li>
</ul>
<p>So, as I move forward with making HTTP::Parser a more generally-useful piece of code, these are my competition and hopefully inspiration. I&#8217;d like to see the speed of XS code eventually, but would prefer to make PSGI support an option so that the code is useful in more contexts.</p>
<p>Suggestions always welcome!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dereferenced.com/2009/09/14/perl-module-monday-http-parsing-triple-play/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parsing HTTP Headers</title>
		<link>http://www.dereferenced.com/2009/09/13/parsing-http-headers/</link>
		<comments>http://www.dereferenced.com/2009/09/13/parsing-http-headers/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 04:53:07 +0000</pubDate>
		<dc:creator>rjray</dc:creator>
				<category><![CDATA[CPAN]]></category>
		<category><![CDATA[GitHub]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://www.dereferenced.com/?p=51</guid>
		<description><![CDATA[So, I&#8217;ve volunteered to co-maintain the HTTP::Parser CPAN module. I did this because I&#8217;ve been looking for something I can use in RPC::XML::Server instead of my current approach, which is to rely on the parsing capabilities built in to HTTP::Daemon. This is somewhat clumsy, and definitely over-kill; I only have to do this in cases [...]]]></description>
			<content:encoded><![CDATA[<p>So, I&#8217;ve volunteered to co-maintain the <a href="http://search.cpan.org/dist/HTTP-Parser/">HTTP::Parser</a> CPAN module. I did this because I&#8217;ve been looking for something I can use in <a href="http://search.cpan.org/dist/RPC-XML/lib/RPC/XML/Server.pm">RPC::XML::Server</a> instead of my current approach, which is to rely on the parsing capabilities built in to <a href="http://search.cpan.org/dist/libwww-perl/lib/HTTP/Daemon.pm">HTTP::Daemon</a>. This is somewhat clumsy, and definitely over-kill; I only have to do this in cases where the code is <em>not</em> already running under HTTP::Daemon or Apache. If the code is already using HTTP::Daemon, then it has its own <code>accept()</code> loop it can use, and if the code is running under Apache then the request object has already parsed the headers.</p>
<p>My need comes when the code is not in either of these environments, it has to be able to take the socket it gets from a typical TCP/IP-based <code>accept()</code> and read off the HTTP request. To avoid duplicating code, I trick the socket into thinking that it&#8217;s an instance of HTTP::Daemon::ClientConn, which is itself just a GLOB that&#8217;s been blessed into that namespace for the sake of calling methods. So it works. But it makes the code dependent on having HTTP::Daemon loaded, even when the user is not utilising that class for the daemon functionality of the server. I&#8217;ve needed to drop this for a while, now.</p>
<p>(I&#8217;m not impugning HTTP::Daemon or the <a href="http://search.cpan.org/dist/libwww-perl/">libwww-perl</a> package itself&#8211; both are excellent and I utilise them extensively within this module. But if you are not running your RPC server under HTTP::Daemon, then you probably would prefer to not have that code in memory since you aren&#8217;t really using it.)</p>
<p>Thing is, you can use the request and response objects without having to load the user-agent or daemon classes. But there isn&#8217;t an easy, clean way to use just the header-parsing part of the code by itself. The ClientConn class has a <code>get_request()</code> method that can be instructed to parse only the headers and return the HTTP::Request object without the body filled in. The content of the request can then be read off of the socket/object with sysread(). This is why I use the minor hack that I do.</p>
<p>What I <em>want</em> to do, is be able to do this parsing-out of headers without the ugly hack, without loading all of HTTP::Daemon just so I can call one subroutine (albeit 200+ lines of subroutine). (And to be fair, I also call the <code>read_buffer()</code> routine after the header has been read, to get any content that was already read but not part of the header.) So I came across HTTP::Parser. It has a lot of promise, but it&#8217;s not <em>quite</em> where I need it to be. For one thing, it won&#8217;t stop at just parsing the headers. This is something I need, for cases where the user wants to spool larger elements of a message to disk or for handling compressed content. But most of all, it seemed to not be in active maintenance&#8211; there were two bugs in <a href="http://rt.cpan.org">RT</a> that had been sitting there, with patches provided, for over a year.</p>
<p>Fortunately, an e-mail to the author let me offer to help out, and he accepted. The code was not in any repository, so I set up a repo on GitHub for it <a href="http://github.com/rjray/http-parser">here</a>, and seeded it with the four CPAN releases so that there would be something of a history to fall back on. I&#8217;ve applied the patches (well, applied one, and implemented the other with a better solution) and pushed the changes.</p>
<p>Now, I have to decide how to move forward with this, how to make it as efficient (or more so) than the code in HTTP::Daemon, how to make it into something I can use in RPC::XML::Server to eliminate the unsightly hack I have to rely on currently.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dereferenced.com/2009/09/13/parsing-http-headers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Perl Module Monday: Plack</title>
		<link>http://www.dereferenced.com/2009/09/07/perl-module-monday-plack/</link>
		<comments>http://www.dereferenced.com/2009/09/07/perl-module-monday-plack/#comments</comments>
		<pubDate>Tue, 08 Sep 2009 05:38:45 +0000</pubDate>
		<dc:creator>rjray</dc:creator>
				<category><![CDATA[GitHub]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Web Services]]></category>
		<category><![CDATA[module-monday]]></category>

		<guid isPermaLink="false">http://www.dereferenced.com/?p=48</guid>
		<description><![CDATA[This will be a slightly unusual installment of PMM, as I want to look at a module so new that it isn&#8217;t actually on CPAN yet, just GitHub: Plack. (When it makes it to CPAN, it should be here.)
Plack is a reference implementation of the burgeoning PSGI initiative. What is PSGI? Well, if you follow [...]]]></description>
			<content:encoded><![CDATA[<p>This will be a slightly unusual installment of PMM, as I want to look at a module so new that it isn&#8217;t actually on CPAN yet, just <a href="http://github.com/">GitHub</a>: <a href="http://github.com/miyagawa/Plack/tree/master">Plack</a>. (When it makes it to CPAN, it should be <a href="http://search.cpan.org/dist/Plack">here</a>.)</p>
<p>Plack is a reference implementation of the burgeoning <a href="http://github.com/miyagawa/psgi-specs/blob/master/PSGI.pod">PSGI</a> initiative. What is PSGI? Well, if you follow that link you&#8217;ll get a more complete explanation, but the short form is that it is a Perl alternative to Python&#8217;s <a href="http://www.wsgi.org/">WSGI</a> (Web Server Gateway Interface) and Ruby&#8217;s <a href="http://rack.rubyforge.org/">Rack</a>. The longer-form is that it&#8217;s a specification layer to decouple web applications from the specifics of how they&#8217;re being run, whether that&#8217;s CGI, FastCGI, Apache with mod_perl, etc. The longer explanation can be had at the link.</p>
<p>Back to Plack: Plack is the first reference implementation of the PSGI spec, and already it can pass all of the Catalyst tests. And as of <a href="http://github.com/miyagawa/Plack/commit/6c51fb7c27d74fe25d9c460237c0ae664f33a2e7">this commit</a>, the plackup script can coerce a an app written for Catalyst, CGI, etc. into running under different environments, thanks to the magic of PSGI.</p>
<p>I&#8217;ll be watching Plack very closely. I see a PSGI connector for my XML-RPC server in the not-too-distant future.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dereferenced.com/2009/09/07/perl-module-monday-plack/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
