February 13th, 2014
| | Posted in CPAN, Perl
Or, How I Learned It (Mostly) Wasn’t My Fault
So, in my previous post I noted how my reward for getting back on the horse had been a slew of failure reports from the CPAN Testers service. This is a follow-up on what I’ve learned these last 6 days.
First, Do Not Panic
You’ve just gotten a barrage of FAIL reports from CPAN Testers. Don’t get yourself agitated over this. Think about it: You’ve just been handed (free of charge) the results of someone else taking the time to test your product for you, probably on a combination of architecture/OS/Perl that you didn’t have available to test yourself. This is a good thing for you. Once upon a time, you wouldn’t have had this until someone was actually using your code in the field. Then the report you got would have been from someone annoyed and/or angry over breakage. Instead, you’ve gotten some number of reports that look like this, which can hopefully help you zoom in on the issue.
I (kind of) panicked. Fortunately, after someone responded to my previous post I was able to calm down and start looking at the content of the reports more rationally.
Second, Figure Out What’s Gone Wrong
Look, you wrote the module in question. This means that you are almost certainly the best-suited to determine what went wrong. (Not always, as I’ll point out a little later, but probably.) Look at the reports carefully, see what tests failed. Look at those tests yourself and see what could have tripped them up. Look for counter-examples, or noteworthy differences. In my case, the majority of my test failures were caused by my server class (RPC::XML::Server) failing to instantiate because (it claimed) the address it was told to bind to was in use. This didn’t make sense, as I had code I was using that actually scanned for a port that wasn’t in use. The port that the server was trying to bind to had, just moments before, been judged available by virtue of the fact that a raw socket connection to it had failed. So what was wrong? Why was it failing? And more interestingly, why was it failing in these cases, but not in the dozens of PASS reports that I had also racked up?
I noticed that all the failing cases had something in common: a comment in the test output stream that a package called Acme::Override::INET was monkey-patching IO::Socket::IP in place of IO::Socket::INET. I had my first big clue.
Third, Prepare an Environment for Testing
If you aren’t using perlbrew, then either you aren’t testing as well as you could be or you’re spending way more effort on your testing than you need to be. I first started using it because the Perl that comes with Mac OS is older, and I wanted to be able to install and update CPAN modules more easily. Then I started keeping a half-dozen or so different additional installations of Perl around, at various versions with and without threading. If you’re developing modules, it’s an invaluable tool for testing your modules across different versions. It also meant that I could install Acme::Override::INET and IO::Socket::IP in a Perl other than my “main” one, so that I didn’t upset my primary work/development environment.
So I did just that; I picked a version of Perl that I had that was identical to one that had failed in one of the reports, and installed those two modules.
Fourth, Learn What You Can
Suddenly, I too was getting the exact same failures in the exact same places. This was great! I started throwing in some debugging statements, and verified that in each case a previous server allocation had succeeded but the next allocation (which was re-using the same port) would fail. For some reason, IO::Socket::IP seemed to be causing sockets (or at least the port allocations) to hold on beyond the life of the socket. I even managed to show that the error occurred over the process boundary, when I ran a test twice in rapid succession and the second run failed in the first server allocation! When I removed Acme::Override::INET and IO::Socket::IP, it immediately started working again. Same tests, same code, same Perl executable… the only difference was IO::Socket::IP.
(I should note here that un-doing the installation of Acme::Override::INET is tricky. It installs a “fake” IO/Socket/INET.pm file, but it uses Module::Build to do so. Unless you have M::B configured to use the same installation locations as ExtUtils::MakeMaker, M::B will install in a directory that is earlier in your search path than the directory that E::MM uses. So re-installing the IO module will not overwrite the IO::Socket::INET fake that Acme::Override::INET installed. I had to delete all four files that were installed.)
Fifth, Do Something with Your New Knowledge
So I opened up an RT ticket for IO::Socket::IP, and explained my situation at length in it. The module’s author replied and pointed out that the way I was selecting a port was much more work than I needed to do in that case. See, I’m not actually very adept at network programming. That’s actually an understatement… if it weren’t for HTTP in general, and HTTP-related Perl modules like LWP in particular, I would probably not have a CPAN module to my name that is so network-centric. Anyway, I at least felt at this point that there wasn’t anything wrong with RPC::XML::Server. But based on the suggestions from the IO::Socket::IP author, I changed most of my server-oriented tests to let the kernel allocate a random port instead of me selecting one. Now the tests all pass, even with the newer module and the monkey-patch installed. Mind you, there is still an issue with IO::Socket::IP, but I can’t help there. I can only hope that the information I’ve provided helps him out.
Sixth, and This Is Important, Be Content With It
It’s frustrating to know that there will still be some FAILs coming in, but I can’t help that. I can be happy with the fact that my tests are fixed, that there wasn’t actually anything wrong with my networking code (because I’d be close to lost if I had to go into a deep dive to troubleshoot that part). There are still things to fix and add to the RPC::XML package, but my server tests are all now a bit more robust than they were this time last week. I’ve found and reported an issue that will hopefully help another CPAN author. And, importantly for me, I’ve stayed engaged in my code for this last week. That’s been hard for me to do for a long time now, and it feels really good to have done it. Even though it’s only been about 5-6 hours of effort spread across that many days, I’m happy about it, and that will hopefully mean that the next 5-6 hours will come more easily.