| Subcribe via RSS

Perl Module Monday: IMDB::Film

October 3rd, 2011 Posted in CPAN, Perl

For this week’s PMM, I’m going to go with something a little more fun: the IMDB::Film module. Though, to be fair, I’ll be offering it up with some caveats and reservations.

Still, I’m a huge fan of movies; I try to see a new film every week or two, and my DVD collection has out-grown two different shelves. I’ve even gone so far as to get an Android app on my phone (Packrat) for the sole purpose of keeping track of my collection so that I don’t impulse-buy something I already have (usually because I’ve found it on sale). And don’t get me started on slowly replacing my most-favorite films with Blu-Ray copies! Anyway, I’ve also been a huge fan of the IMDb web site since it first got its start. But they don’t offer an API to their data (which I find strange, given their huge reliance on open-source software and user-generated content). Until and unless they see the error of their ways, we’ll have to get by with modules like IMDB::Film, which does a lot of the heavy-lifting when it comes to screen-scraping IMDb.

The IMDB::Film class (and the companion IMDB::Persons class) handles all the page-fetching and parsing that you would otherwise have to do, and presents you with a reasonably-encapsulated object representing an IMDb film (or person). Based on the criteria you give it, it either goes directly to the necessary page, or it does a search and returns you the first matching record (along with enough additional information to get the remaining matched records). For example, the snippet here:

use IMDB::Film;

my $film = IMDB::Film->new(crit => 'Harry Potter');

This returns as the match in $film, “Harry Potter and the Sorcerer’s Stone”. And calling $film->matched(), you get an array-reference to the 43 (!) total matches for the string, “Harry Potter”. Part of each hash-reference in those 43 slots is the IMDb key for the given title, meaning you can fetch the subsequent titles without first going to the search form:

my $other_film = IMDB::Film->new(crit => $film->matched->[0]->{id});

This will go directly to that page and fill in $other_film with the info from it. Read the docs for the class to see the other accessors you can call, and see the docs for the IMDB::Persons class for what you can do with it. In particular, the cast() method on a film object will give you a list-reference of hash-references, one key of which is the IMDb ID for each cast member. You can use this to get their page info with IMDB::Persons.

Now, the dreaded caveats and reservations:

  • The current version (0.51 as of this writing) has left some debugging lines in the code, so calls to new() (in both the ::Film and ::Persons classes) send cruft to STDOUT.
  • And, by the way, why call one class “Film” (singular) and the other class “Persons” (plural)? I consider that bad design.
  • The cast() method only lists the cast that are listed on the main page of the film’s IMDb entry. In the Harry Potter example, this means only the first 15 people, most of whom are actually minor players.
  • In general, there seems to be no deeper-drilling for any information— you can get the short bio for an actor, but not the full bio for example.
  • You can get URLs for certain of the data elements (images, etc.), but not for the full page itself. If I wanted to extract data for Tom Cruise, for example, then render that data along with a link back to the IMDb page for him, I cannot get that URL from the IMDB::Persons record for Tom Cruise. This despite the fact that it had to have fetched that URL to get the data.

There are other minor nits, but those are the high points. I will be watching this module, to see if any of these get addressed (and I opened an RT ticket for the errant debugging messages, hopefully that will be addressed in the next release). But while I may seem to be harsh on it, I still think it’s a useful little module, and worth playing around with. Scraping IMDb is no small task, and I’m glad someone is doing the grunt-work of keeping up with their content-layout changes.

2 Responses to “Perl Module Monday: IMDB::Film”

  1. Dereferenced.com » Blog Archive » Perl Module Monday: IMDB::Film | Pici's Movie Blog Says:

    [...] you with a reasonably-encapsulated object representing an IMDb … Read the original post: Dereferenced.com » Blog Archive » Perl Module Monday: IMDB::Film This entry was posted in Uncategorized and tagged companion, imdb, object-representing, [...]

  2. Robert SedlacekNo Gravatar Says:

    There’s also TMDB, which provides an API and a bit more free licensing IIRC. I haven’t really used it yet though, so I can’t comment on the designs of the interfaces on CPAN.

Leave a Reply