| Subcribe via RSS

Perl Module Monday: IMDB::Film

October 3rd, 2011 | 2 Comments | Posted in CPAN, Perl

For this week’s PMM, I’m going to go with something a little more fun: the IMDB::Film module. Though, to be fair, I’ll be offering it up with some caveats and reservations.

Still, I’m a huge fan of movies; I try to see a new film every week or two, and my DVD collection has out-grown two different shelves. I’ve even gone so far as to get an Android app on my phone (Packrat) for the sole purpose of keeping track of my collection so that I don’t impulse-buy something I already have (usually because I’ve found it on sale). And don’t get me started on slowly replacing my most-favorite films with Blu-Ray copies! Anyway, I’ve also been a huge fan of the IMDb web site since it first got its start. But they don’t offer an API to their data (which I find strange, given their huge reliance on open-source software and user-generated content). Until and unless they see the error of their ways, we’ll have to get by with modules like IMDB::Film, which does a lot of the heavy-lifting when it comes to screen-scraping IMDb.

The IMDB::Film class (and the companion IMDB::Persons class) handles all the page-fetching and parsing that you would otherwise have to do, and presents you with a reasonably-encapsulated object representing an IMDb film (or person). Based on the criteria you give it, it either goes directly to the necessary page, or it does a search and returns you the first matching record (along with enough additional information to get the remaining matched records). For example, the snippet here:

use IMDB::Film;

my $film = IMDB::Film->new(crit => 'Harry Potter');

This returns as the match in $film, “Harry Potter and the Sorcerer’s Stone”. And calling $film->matched(), you get an array-reference to the 43 (!) total matches for the string, “Harry Potter”. Part of each hash-reference in those 43 slots is the IMDb key for the given title, meaning you can fetch the subsequent titles without first going to the search form:

my $other_film = IMDB::Film->new(crit => $film->matched->[0]->{id});

This will go directly to that page and fill in $other_film with the info from it. Read the docs for the class to see the other accessors you can call, and see the docs for the IMDB::Persons class for what you can do with it. In particular, the cast() method on a film object will give you a list-reference of hash-references, one key of which is the IMDb ID for each cast member. You can use this to get their page info with IMDB::Persons.

Now, the dreaded caveats and reservations:

  • The current version (0.51 as of this writing) has left some debugging lines in the code, so calls to new() (in both the ::Film and ::Persons classes) send cruft to STDOUT.
  • And, by the way, why call one class “Film” (singular) and the other class “Persons” (plural)? I consider that bad design.
  • The cast() method only lists the cast that are listed on the main page of the film’s IMDb entry. In the Harry Potter example, this means only the first 15 people, most of whom are actually minor players.
  • In general, there seems to be no deeper-drilling for any information— you can get the short bio for an actor, but not the full bio for example.
  • You can get URLs for certain of the data elements (images, etc.), but not for the full page itself. If I wanted to extract data for Tom Cruise, for example, then render that data along with a link back to the IMDb page for him, I cannot get that URL from the IMDB::Persons record for Tom Cruise. This despite the fact that it had to have fetched that URL to get the data.

There are other minor nits, but those are the high points. I will be watching this module, to see if any of these get addressed (and I opened an RT ticket for the errant debugging messages, hopefully that will be addressed in the next release). But while I may seem to be harsh on it, I still think it’s a useful little module, and worth playing around with. Scraping IMDb is no small task, and I’m glad someone is doing the grunt-work of keeping up with their content-layout changes.

Tags: , ,

Perl Module Monday: Object::Tiny (and friends)

September 26th, 2011 | 3 Comments | Posted in CPAN, Perl

Seriously, I’m going to have to create a new tag with “module” in the plural form, at this rate. This week’s post started out just looking at one module, but as it happens there are three variant forms of it that are just as interesting and useful as the original. So it would be highly unfair to leave any of them out.

Let’s start with the module at the core of it all, Object::Tiny. Object::Tiny is exactly as the name implies: tiny. With the POD it’s just under 9000 bytes, and not counting the POD the current version (1.08) is 553 bytes. I like the *::Tiny modules, I like seeing people getting the most functionality in a truly modular way with a minimum of code and no dependencies. Object::Tiny gives you a truly minimal way to create objects with simple (read-only) accessors. It also gives you a dead-simple constructor if you don’t already have one (well, most of the time— I’ll come back to this in a bit). And it does it all with an extremely small footprint and a minimum of intrusiveness.

Some might be wondering why you would need this— after all, the object you create from Object::Tiny is little more than a hash-reference with delusions of grandeur: the storage is just a basic hash reference (no extra keys or meta-data), the accessors are read-only by design (but see the bit on related modules further down) no additional methods besides an inheritable new() are provided, etc. But it’s a hash-reference that can call methods, for one thing. And your users don’t need to know that it is just a lowly hash-ref under the hood. Plus, there are plenty of applications for read-only data structures. Indeed, at its core, functional programming calls for immutable data. Writing methods to effect changes by returning new objects with the updated member values would be pretty trivial. You could, for example, define a clone() operation that allows updated values to be passed in as:

sub clone {
    my $self = shift;

    return __PACKAGE__->new(%{$self}, @_);
}

(For the less-experienced Perl users, this calls the new() of the package that the code is in, with the contents of the existing object flattened from a hash to an array. In addition to that, it also passes any arguments to clone() itself. Because of the way array/hash flattening/conversion works, anything in the arguments will override similar-named keys in the original hash, effectively “updating” those keys.)

Of course, this being CPAN, you don’t need to do that. You just need to consider one of the alternatives:

  • Object::Tiny::RW – This variant creates accessors that are read/write, by slightly altering the code that is generated. Accessors will then be able to accept an argument that, if present, becomes the new value for the key: $obj->foo(2) sets the “foo” key to 2. Of course, no type-checking is done.
  • Object::Tiny::Lvalue – This variant also creates read/write accessors, but rather than using the “$obj->foo(2)” syntax it creates the accessors as lvalue methods, allowing you to do the same thing with “$obj->foo = 2“.
  • Object::Tiny::XS – This variant breaks the “tiny” rule slightly by depending on Class::XSAccessor. It generates read-only accessors and the inheritable new() method using Class::XSAccessor. Interestingly, it does not generate read/write accessors like the other two variants do, even though Class::XSAccessor provides a simple alternative that does this. That might be worthy of a feature-request, if someone feels strongly about it.

Since the usage syntax of these is all identical (well, except for the lvalue variant), one could even prototype with one module and switch to another later on if so needed. Particularly with Object::Tiny vs. Object::Tiny::XS, since both adhere to the read-only model of the generated accessors.

One thing I have noticed in the logic of Object::Tiny that seems to be reflected in all three of the variants, is: the class is only added to your (calling) class’ inheritance hierarchy if you do not already have something in your @ISA variable. If you do, it will not add itself to your inheritance path. This means that you won’t get an inherited constructor in cases where you might be expecting to. For example, your class might utilize some form of an exporter and have that in the @ISA array. But it doesn’t provide you a constructor. If you were expecting Object::Tiny to provide the constructor, you may be surprised when it doesn’t. Mind you, the constructor it provides is dead-simple, but just be sure to know this behavior before it catches you off-guard. This behavior isn’t a bug, either, because it makes sense that if your class inherits from another class, it is probably either inheriting a constructor or providing its own constructor as an override. So I don’t consider this a problem with Object::Tiny, it’s just something to be aware of when using it.

Object::Tiny (and friends): for your minimalist class-construction needs!

Tags: , ,

Perl Module Monday: Desktop::Notify

September 19th, 2011 | No Comments | Posted in CPAN, Perl

Boy am I slipping (again). Been a few weeks since I wrote at all, and on top of that I can’t seem to type AT ALL right now, so who knows how long it’ll take me to write this (and whether it actually gets posted while it is still Monday).

For this week, I’m going to continue on the theme of my previous feature, only this time for Linux systems: Desktop::Notify. If you use a GNOME desktop that is fairly recent in release, then you probably are already familiar with the DBus messages that various applications pop up from time to time; chat clients, web browsers, etc. Many of the current-generation graphical apps have some sort of notification needs, and if they are running on a desktop that has DBus notifications they are probably using this system. So in comes this module, to let you also take part.

It’s quite an easy module to use, and the manual page is pretty reasonable given the overall simplicity. It’s even simple-enough to use in a one-liner:

perl -MDesktop::Notify -e 'Desktop::Notify->new->create(
    summary => "Desktop::Notify", body => "A notification...",
    timeout => 3000)->show'

(Broken up for line-length and clarity, of course.) Something like that could be easily incorporated into a shell script (though the “notify-send” utility can do that just as easily).

But if you are looking for a simple way to send messages to the user in a fairly unobtrusive fashion, have a look at this!

Tags: , ,

Perl Module Monday: Growl::Tiny

August 29th, 2011 | 2 Comments | Posted in CPAN, Perl

I’m a fairly-new MacOS user. I’ve had this MacBook Pro since just a few days before this last OSCON, and I’m growing more and more fond of it by the day. So for this week’s PMM, I’m going to look at a Mac-centric module, apologies in advance to those who have no use for this— there’s something similar for Linux desktops that I might explore in the future.

Growl::Tiny is the latest (it seems) entry into the Growl-glue arena, and certainly the most-recently-updated. As the author states, he had run into problems with the prerequisites for modules such as Mac::Growl, so he wrote this as a solution. In following with the “tiny” convention, the module has no prerequisites. It only requires that you have /usr/local/bin/growlnotify installed (as well as Growl itself, of course). I found that it built and installed just fine on my system, with only one hitch that I’ll come back to later.

So what does it do? Well, it makes Growl notifications! If you are a MacOS user and you don’t know about Growl, you should probably get over to http://growl.info/ first and check it out. Seriously, it’s a terribly useful utility. Once you’ve done that, this module will let you, in a fairly lightweight fashion, use Growl to send desktop notifications to your users. It’s the “lightweight” part that appeals to me, as the older Growl libraries seemed to be dependent on some other modules that haven’t been kept up-to-date with the changes in MacOS. This one gets around this by opting to use a command-line utility (the aforementioned growlnotify) instead of native bindings. It’s not without limitations, as the author points out in his documentation. But then, if you’re sending several notifications per second, you might need something more substantial than this anyway, so the limitations might not be of concern.

The only issue I had with it was that two tests will fail if you don’t set up Growl to listen for localhost-directed network connections. The author recommends using the network feature in the docs, but it shouldn’t be a requirement for building and installation. A more graceful way of detecting that network connections are not enabled, and skipping the tests, would be preferable.

Tags: , ,

Perl Module Monday: Carp::Always

August 22nd, 2011 | 1 Comment | Posted in CPAN, Perl

(With special bonus-module this week!)

Ever run into an error or warning in someone else’s module, and wished you could get more information about it without having to wade through their code? It’s probably almost certain that you have, at least once. At least once, you’ve seen a warning or had a program die, and wished that you could see a stack trace without having to go in and edit the module. After all, the module is probably installed into a system location, and even if you don’t need root/sudo to edit it, you then have the hassle of going back and undoing the changes (or re-doing them, if/when you update the module in question) after your debugging has been done.

Enter Carp::Always. Carp::Always plays around with $SIG{__WARN__} and $SIG{__DIE__}, and gives you stack-traces on every die and warn that come from code. I was introduced to this module by a commenter on my “Chasing an Elusive Warning” post last week. Alas, it didn’t help in that case, as the warning was coming from within Perl itself. But I found the module to be a nice concept, and it’s easy to use on a only-when-you-need-it basis. Have a script that is generating warnings you want more information about?

perl -MCarp::Always script.pl

Carp::Always quietly slips in and make the necessary alteration to the die/warn handlers, and your script runs and does its thing. And when the warnings (or termination) come, you should have your stack trace.

The author does point out that this module may not play well with other modules that alter $SIG{__WARN__} and $SIG{__DIE__}, so there is that to be aware of. But that aside, hopefully this can be of great aid to your next debugging session!

(And as a bonus, here’s another module that does the essentially the same thing: Devel::SimpleTrace. I actually know much less about this one, except that it lists Data::Dumper as a dependency, which leads me to believe that it tries to do some pretty-printing with the data in stack-frames. But unlike Carp::Always, I haven’t used it. Still, if you like the concept, it’s worth checking out both modules and seeing which one you like more.)

Tags: , ,

Perl Module Release: RPC-XML 0.76

August 21st, 2011 | No Comments | Posted in CPAN, Perl, Software
Version: 0.76
Released: Saturday August 20, 2011, 06:30:00 PM -0700

Changes:

  • etc/make_method
  • lib/RPC/XML/Server.pm

RT #70258: Fixed typos in docs pointed out by Debian team.

  • lib/Apache/RPC/Server.pm

Better version of the fix for infinite loops. This is the patch originally suggested by Eric Cholet, who found the bug.

  • t/00_load.t

RT #70280: This test was still testing RPC/XML/Method.pm. Rewrote to remove that but include the (forgotten) XMLLibXML.pm module. That test has to be conditional on the presence of XML::LibXML.

  • Makefile.PL
  • t/51_client_with_host_header.t

Clean up test suite to work with older Test::More. Also specify a minimum Test::More that supports subtest(). This is also a part of RT #70280.

  • t/11_base64_fh.t
  • t/20_xml_parser.t
  • t/21_xml_libxml.t
  • t/40_server.t

These tests had failures when run as root. Permissions-based negative tests were incorrectly passing.

  • t/10_data.t

Moved the 64-bit “TODO” tests to a SKIP block. Non-64-bit systems will skip, rather than fail, these tests.

  • lib/RPC/XML/Server.pm

RT #65616: Fix for slow methods killing servers. Applied and modified patch from person who opened the ticket.

  • MANIFEST
  • lib/RPC/XML.pm
  • t/10_data.t
  • t/14_datetime_iso8601.t (added)

RT #55628: Improve flexibility of date parsing. This adds the ability to pass any ISO 8601 string to the RPC::XML::datetime_iso8601 constructor.

Tags: , , ,

Elusive Warning Found and Quashed!

August 17th, 2011 | No Comments | Posted in Perl

After several hours of puzzling through the XS code for XML::LibXML (and WOW is my XS-fu rusty!), I gave up and filed an RT ticket on it. I gave it all the info I had, including the specific test-suite that was generating the error. One thing I didn’t provide was the location within the test suite, because that seemed to change from platform to platform.

Well, great kudos to Shlomi Fish, who has already found and quashed the bug! It was definitely more complex, and buried more deeply, than I thought. It was related to my use of the XML::LibXML::CallBack class, usage of which was new in this release (there’s an issue with the slightly out-of-date libxml2 package that MacOS X Snow Leopard has, that interfered with a simpler solution, which forced me to use the callback class). Anyway, thanks to everyone who responded! You got me deep-enough into the code that I was able to craft a better bug-report than I might have otherwise. And hopefully, Shlomi is as happy to have the bug be found and dealt with as I am!

Tags:

Chasing a Very Elusive Warning

August 16th, 2011 | 7 Comments | Posted in Perl

I am getting a warning in one of my test suites for RPC-XML. I only get this warning when I run “make test“. I do not get the warning if I run the test directly with “perl -Mblib t/…“, or if I run it via prove. The test this occurs in is t/21_xml_libxml.t, which tests the XML::LibXML-based parser, and the warning I get is:

Use of uninitialized value in subroutine entry at .../site_perl/5.14.1/darwin-2level/XML/LibXML.pm line 843.

Because this is occurring within Perl itself, I cannot seem to get it to give me a stack-trace in place of the warning. My attempts to check for undefined values have not found any, yet I get the warning. Running in the debugger does not generate the warning.

Because my tests pass, I have confidence that this parser is fine, despite this only-in-test-harness warning. But the OCD side of me is going crazy over having this still show up when I run my tests locally (and yes, it shows up on all my platforms: Darwin, 64-bit Linux and 32-bit Linux). So, if anyone reading this can give me a helpful hint as to how I can get more/better diagnostic information on this, I would be truly grateful.

Tags: ,

How To Tell That CPAN Testers Is Doing Its Job

August 16th, 2011 | 1 Comment | Posted in CPAN, Perl

…or, “How To Generate Well Over 100 Failure Reports With Just One Release.”

Ever wonder whether CPAN Testers was really gathering data for you? Ever wonder whether you’re getting the daily summary emails you signed up for? Here’s how you can tell, in just a few (moderately) easy steps:

  1. Start with a fairly-robust CPAN distribution. Let’s call it “RPC-XML“, just for argument’s sake.
  2. Have it include a test suite like this one, that test-loads all the classes.
  3. Remove one of the classes, let’s say, “RPC/XML/Method.pm“.
  4. This is the important part, don’t change the test suite.
  5. Dutifully run your tests, note 100% pass rate due to the fact that you have a perfectly usable RPC/XML/Method.pm from an existing installation somewhere under /usr/lib/perl or /usr/share/perl.
  6. Release, and wait. The reports will come rolling in within just a few short days.
  7. Now you know that the emails really do get sent reliably!

So, technically this is an issue with the use_ok() predicate in Test::More. And I’ve opened an issue on this, in GitHub. But this may very well be a case of, “Doctor, it hurts when I do this.” And if Schwern decides that it isn’t a bug, I’ll have a hard time arguing with him. But hopefully he’ll agree with my suggestion, and maybe someone, someday, will be spared falling into the same hole I’ve very recently fallen into.

Tags: ,

No PMM Post This Week

August 15th, 2011 | 1 Comment | Posted in Perl

I haven’t had time to research a new module for Perl Module Monday this week, sorry. I have a few modules I’d like to write about, but just haven’t had the time to look at them in-depth. I am looking at doing a meta-PMM post on export/import modules, given that my last few PMM’s have generated a reasonable amount of feedback. But I’m not ready to do that one yet, either, and I’d prefer to not do something as a half-measure.

Tags: ,