| Subcribe via RSS

Understanding Encoding (Trying To, At Least)

March 3rd, 2014 | 2 Comments | Posted in Perl, XML

Earlier this evening was a presentation at the San Francisco PM group, on “Unicode & Everything”. I wanted to go, but had a conflict so I had to miss it.

Character encoding is an area I’m weak in, and one that I need to be better at. My biggest module, RPC::XML, supposedly supports encodings other than us-ascii but in truth it’s pretty broken. I recently applied a patch that fixes the handling of UTF-8 content, but that’s not what I need. What I need is for it to properly handle content in (theoretically) any encoding. I don’t think that a talk focused on Unicode would have covered that, but I was hoping that I might be able to corner the speaker afterwards to bounce some questions off of him.

What it comes down to, is this: my library creates requests and responses in well-formed XML, complete with an encoding attribute in the declaration line:

<?xml version="1.0" encoding="us-ascii"?>
<methodCall>
    <methodName>someName</methodName>
    <params>
        <param>
            <value><string>Some string data</string></value>
        </param>
        <param>
            <value><int>42</int></value>
        </param>
    </params>
</methodCall>

What matters here is not the structure of XML (in this case)– it’s the encoding="us-ascii" part, and this line:

<value><string>Some string data</string></value>

See, my library generates the XML around the “Some string data“, but the string data itself comes from whatever the user provides, and the user expects that to be in the encoding he or she specified. And here is where I start to get confused: I know that the boilerplate code is US-ASCII (in the range that makes it passable as UTF-8), but I suspect that I can’t just paste in a string encoded in ShiftJIS and slap on encoding="shiftjis" in the XML declaration. Or can I?

XML-RPC has a very limited vocabulary and set of data-types. The character range, funny-encoded-strings notwithstanding, is just basic ASCII. You have the tags, then strings, integers, doubles, date/time values (ISO8601) and base-64 data. Regardless of encoding, all of these except the strings stay in the ASCII range.

So for those reading this that are more adept as working with encodings than I, how to approach this? Is the magic sauce somewhere in Perl’s Encode module? I really want to get this part of the RPC::XML module working right, so I can move on to the next big hassle, data compression…

(I also need to figure out why my code-highlighting plugin isn’t doing its job…)

(I think I got it now. Something was missing from one of the theme files. Gotta love WordPress/PHP…)

Tags: , ,

Perl Module Release: RPC-XML 0.78

February 6th, 2014 | No Comments | Posted in CPAN, Perl, Software, XML
Version: 0.78
Released: Thursday February 6, 2014, 08:00:00 PM -0800

Changes:

  • lib/RPC/XML.pm

A patch to loop detection in smart_encode from Dag-Erling Smørgrav. Some other minor bits.

  • lib/RPC/XML/Procedure.pm

RT #83108: Fixed a spelling error. Some other fixes, too.

  • lib/RPC/XML.pm

RT #86187: Force key-ordering in struct as_string and serialize. Was getting some intermittent bug reports of failures in t/15_serialize.t that amounted to the keys in a fault struct not being in consistent order.

  • lib/RPC/XML.pm
  • t/15_serialize.t

Undo the previous change and fix the test. The previous change didn't feel right, so this rolls it back and fixes the problem at the level of the test, instead.

  • Makefile.PL
  • lib/RPC/XML.pm
  • lib/RPC/XML/Client.pm
  • lib/RPC/XML/Server.pm

Replace direct evals for loading optional modules with Module::Load. Required adding this to Makefile.PL because Module::Load is not core in 5.8.8. Also did some slight doc tweaking.

  • lib/RPC/XML.pm
  • lib/RPC/XML/Client.pm

Merge pull request #5 from alexrj/utf8-encode. Use utf8::encode() instead of utf8::downgrade().

  • lib/RPC/XML.pm
  • lib/RPC/XML/Client.pm
  • lib/RPC/XML/Server.pm

Finish the uft8 encode vs. downgrade change from the previous commit. Changed in places that were overlooked, and adjusted the version number in all three modules.

  • lib/RPC/XML.pm
  • lib/RPC/XML/Parser/XMLLibXML.pm

Merge pull request #6 from dctabuyz/master. Added 'no_blanks' libxml option to skip blank XML::LibXML::Text nodes.

  • lib/RPC/XML/Server.pm

Merge pull request #7 from kvar/master. Initialize $do_compress in RPC::XML::Server between requests.

  • lib/RPC/XML.pm
  • lib/RPC/XML/Parser/XMLLibXML.pm
  • lib/RPC/XML/Server.pm

Bump version numbers on modules changed in github pulls.

Tags: , , ,

Perl Module Release: RPC-XML 0.77

September 3rd, 2012 | No Comments | Posted in CPAN, Perl, Software, XML
Version: 0.77

Released: Monday September 3, 2012, 12:00:00 PM -0700

Changes:

  • t/15_serialize.t

Fix a test failure on Windows.

  • lib/RPC/XML.pm

RT #70408: Fix spelling error in man page, reported by Debian group.

  • t/90_rt54183_sigpipe.t

Fix to handle cases where server creation fails. Now skips the tests rather than dying.

  • lib/RPC/XML/Client.pm

RT #67486: Add port to Host header in client requests.

  • lib/RPC/XML/Server.pm

RT #65341: Added “use” of forgotten library File::Temp. This was causing failure when “message_file_thresh” kicked in.

  • t/10_data.t

RT #78602: Changed 64-bit test from use64bitint to longsize. On some systems (such as OS X), use64bitint can be true even when in 32-bit mode.

  • t/21_xml_libxml.t

Fix from Christian Walde, skip passed test on Windows.

  • lib/RPC/XML/Server.pm
  • t/40_server.t

Checkpoint refactoring and additional tests. Work is not complete here, but the Net::Server changes demand immediate attention

  • t/20_xml_parser.t

RT #72780: Check for a possible parser failure. One instance of XML::Parser failing to parse the extern entities test. Cannot reproduce, so wrap it in a “skip” block for now.

  • lib/RPC/XML/Procedure.pm
  • t/30_method.t

RT #71452: Correct handling of dateTime parameters. Existing code in lib/RPC/XML/Procedure.pm did not properly handle parameters of the dateTime.iso8601 type. Also, there were no tests for these.

  • MANIFEST
  • t/30_method.t (deleted)
  • t/30_proceudre.t (added)

Renamed t/30_method.t to t/30_procedure.t.

  • lib/RPC/XML/Server.pm

RT #77992: Make RPC::XML::Server work with Net::Server again, after the API changes of Net::Server 2.x.

Tags: , , ,

Perl Module Release: RPC-XML 0.75

August 14th, 2011 | No Comments | Posted in CPAN, Perl, Software, XML

MetaCPAN.org: https://metacpan.org/release/RJRAY/RPC-XML-0.75

Version: 0.75

Released: Saturday August 13, 2011, 05:30:00 PM -0700

Changes:

  • MANIFEST

Somehow, t/13_no_deep_recursion.t never got added to MANIFEST.

  • lib/RPC/XML/Parser/XMLLibXML.pm

RT #65154: Fixed a cut/paste error in an error message.

  • lib/RPC/XML/Client.pm
  • t/51_client_with_host_header.t (added)

RT #68792: Merge pull request #2 from dragon3/master (https://github.com/dragon3). Allow setting of “Host” header, and test suite for it.

  • MANIFEST
  • t/51_client_with_host_header.t

Added new test suite to MANIFEST, fixed spelling. Also added “plan tests” line to the test suite.

  • lib/RPC/XML/Parser/XMLLibXML.pm
  • t/20_xml_parser.t
  • t/21_xml_libxml.t
  • t/41_server_hang.t

Merge pull request #3 from yannk/master (https://github.com/yannk). Expat parser subclass is protected against ext ent attack, libxml isn’t.

  • t/41_server_hang.t

Undo a change to this suite from yannk’s pull.

  • etc/make_method
  • lib/Apache/RPC/Server.pm
  • lib/Apache/RPC/Status.pm
  • lib/RPC/XML.pm
  • lib/RPC/XML/Client.pm
  • lib/RPC/XML/Function.pm
  • lib/RPC/XML/Method.pm
  • lib/RPC/XML/Parser.pm
  • lib/RPC/XML/Parser/XMLLibXML.pm
  • lib/RPC/XML/Parser/XMLParser.pm
  • lib/RPC/XML/ParserFactory.pm
  • lib/RPC/XML/Procedure.pm
  • lib/RPC/XML/Server.pm

More perlcritic-driven clean-up. This is mostly POD sections, but also includes heavy re-working of etc/make_method and parts of lib/RPC/XML.pm.

  • lib/RPC/XML/Parser/XMLLibXML.pm
  • t/21_xml_libxml.t

Fixed external entity handling on MacOS. Also made small change to the test suite to be cleaner.

  • lib/RPC/XML/Parser/XMLLibXML.pm
  • lib/RPC/XML/Parser/XMLParser.pm

Took out warnings on external entities blocking. Now it blocks silently. Also cleaned up some docs.

  • t/15_serialize.t

Additions to increase code coverage in XML.pm.

  • lib/RPC/XML.pm

Turns out this wasn’t exporting RPC_I8.

  • lib/Apache/RPC/Server.pm
  • lib/Apache/RPC/Status.pm
  • lib/RPC/XML.pm
  • lib/RPC/XML/Client.pm
  • lib/RPC/XML/Function.pm
  • lib/RPC/XML/Method.pm
  • lib/RPC/XML/Parser.pm
  • lib/RPC/XML/Parser/XMLLibXML.pm
  • lib/RPC/XML/Parser/XMLParser.pm
  • lib/RPC/XML/ParserFactory.pm
  • lib/RPC/XML/Procedure.pm
  • lib/RPC/XML/Server.pm
  • xt/02_pod_coverage.t

Made 5.8.8 the new minimum-required perl. Also dropped the utf8_downgrade hack, which affected an xt test.

  • lib/RPC/XML/Client.pm

Improved arguments-checking in send_request.

  • lib/RPC/XML/Client.pm
  • lib/RPC/XML/Parser/XMLLibXML.pm
  • lib/RPC/XML/Parser/XMLParser.pm
  • lib/RPC/XML/Server.pm

Fixed error-cases in usage of File::Temp->new(). File::Temp::new croaks on errors, doesn’t return undef like I thought.

  • MANIFEST
  • lib/RPC/XML/Function.pm (deleted)
  • lib/RPC/XML/Method.pm (deleted)
  • lib/RPC/XML/Procedure.pm

Roll Method.pm and Function.pm into Procedure.pm. Remove Method.pm and Function.pm from distro.

  • lib/RPC/XML/Parser/XMLLibXML.pm

Fixed regexp for methodName validation.

  • t/10_data.t
  • t/11_base64_fh.t
  • t/12_nil.t
  • t/15_serialize.t
  • t/20_xml_parser.t
  • t/21_xml_libxml.t
  • t/25_parser_negative.t (added)
  • t/29_parserfactory.t
  • t/30_method.t
  • t/40_server.t
  • t/40_server_xmllibxml.t
  • t/50_client.t
  • t/BadParserClass.pm (added)
  • t/meth_good_1.xpl
  • t/namespace3.xpl
  • t/svsm_text.b64 (added)
  • t/util.pl

First round of Devel::Cover-inspired improvements. These are the changes to the test suites to increase coverage of the code.

  • lib/RPC/XML.pm
  • lib/RPC/XML/Procedure.pm
  • lib/RPC/XML/Server.pm

Fixes and such from Devel::Cover analysis.

  • lib/RPC/XML/Procedure.pm
  • lib/RPC/XML/Server.pm
  • t/30_method.t
  • t/meth_good_1.xpl
  • t/meth_good_2.xpl (added)
  • t/meth_good_3.xpl (added)

Fixes for file-based method loading/reloading. New tests in the suite, and re-working of the ugliest hacky part of this package.

  • lib/RPC/XML/Procedure.pm
  • lib/RPC/XML/Server.pm
  • t/30_method.t
  • t/meth_good_3.xpl

RPC::XML::Procedure test-coverage improvement. Also removed some unneeded code.

  • lib/RPC/XML/Procedure.pm
  • lib/RPC/XML/Server.pm
  • t/30_method.t
  • t/40_server.t

Last round of RPC::XML::Procedure test coverage. This is mostly in t/40_server.t, though some bugs were found and addressed in the modules and in t/30_method.t.

  • lib/Apache/RPC/Server.pm
  • lib/Apache/RPC/Status.pm
  • lib/RPC/XML.pm
  • lib/RPC/XML/Client.pm
  • lib/RPC/XML/Parser.pm
  • lib/RPC/XML/Parser/XMLLibXML.pm
  • lib/RPC/XML/Parser/XMLParser.pm
  • lib/RPC/XML/ParserFactory.pm
  • lib/RPC/XML/Procedure.pm
  • lib/RPC/XML/Server.pm

Documentation clean-up and update.

  • lib/Apache/RPC/Server.pm
  • lib/Apache/RPC/Status.pm
  • lib/RPC/XML/Client.pm
  • lib/RPC/XML/Parser/XMLLibXML.pm
  • lib/RPC/XML/Parser/XMLParser.pm
  • lib/RPC/XML/Procedure.pm
  • lib/RPC/XML/Server.pm

Changes from new Perl::Critic::Bangs policies.

  • xt/01_pod.t
  • xt/02_pod_coverage.t
  • xt/03_meta.t
  • xt/04_minimumversion.t
  • xt/05_critic.t

Adjustments to reflect moving from t to xt. Also made changes to xt/02_pod_coverage.t to reflect changes to modules.

  • lib/RPC/XML/Client.pm

Removed some error checks that can never fail.

  • lib/RPC/XML/Server.pm
  • t/40_server.t

Code-coverage-driven changes and added tests.

  • etc/make_method

Fixes from new Perl::Critic::Bangs policies.

  • lib/RPC/XML/Server.pm

Removed usage of AutoLoader completely.

  • lib/RPC/XML/Server.pm
  • t/40_server.t
  • xt/02_pod_coverage.t

Removed some dead code and better did the aliases. This required a change in t/40_server.t for a private sub that no longer exists. Also updated xt/02_pod_coverage.t for private subs that have no pod.

  • lib/Apache/RPC/Server.pm

RT #67694: Fix a potential infinite-loop condition.

Tags: , , ,

Perl Module Release: RPC-XML 0.74

January 23rd, 2011 | No Comments | Posted in CPAN, Perl, Software, XML
Version: 0.74

Released: Sunday January 23, 2011, 12:50:00 PM -0800

Changes:

  • t/90_rt54183_sigpipe.t

RT #56800: Make this suite skip all tests on Windows platforms.

  • lib/Apache/RPC/Server.pm

Clean up some run-time “use of undefined value” messages.

  • lib/RPC/XML/Parser/XMLLibXML.pm
  • lib/RPC/XML/Parser/XMLParser.pm
  • t/90_rt58323_push_parser.t (added)

RT #58323: Started as making the parser interfaces correctly report errors when passed null-length strings or “0″ values. Turned out that the error return interface from XMLLibXML.pm was not consistent with the rest of the system, so fixed that as well.

  • lib/RPC/XML/Server.pm
  • t/40_server.t

RT #58240: Applied a patch from Martijn van de Streek that adds access to the HTTP::Request object to called method code.

  • lib/RPC/XML.pm
  • lib/RPC/XML/Parser/XMLLibXML.pm
  • lib/RPC/XML/Parser/XMLParser.pm
  • t/90_rt58065_allow_nil.t (added)

RT #58065: Allow the parsing of <nil /> tags when they are encountered, even if $RPC::XML::ALLOW_NIL is not set. Only limit the generation of these tags.

  • lib/RPC/XML/Server.pm
  • t/41_server_hang.t

This test sporadically fails, so enhance the error message for more info. Also alter the test slightly, hoping it fixes the random failures.

  • etc/make_method

Applied perlcritic to the make_method tool.

  • lib/XML/RPC.pm
  • t/10_data.t
  • t/20_xml_parser.t
  • t/21_xml_libxml.t

RT #62916: Previous adjustments to the dateTime.iso8601 stringification caused it to no longer fit the XML-RPC spec. Fixed.

  • lib/RPC/XML.pm
  • lib/RPC/XML/Client.pm
  • lib/RPC/XML/Parser/XMLParser.pm
  • lib/RPC/XML/ParserFactory.pm
  • lib/RPC/XML/Server.pm

Used warnings::unused to find unused variables not found by Perl::Critic.

  • t/10_data.t

Realized I had no boundary-tests for ints in smart_encode(). This revealed some problems with i8 values on my 32-bit system. Don’t want to introduce dependency on BigInt right now, so marked those tests “TODO”.

Tags: , , ,

Perl Module Release: RPC-XML 0.73

March 16th, 2010 | No Comments | Posted in CPAN, Perl, Software, XML

Version: 0.73

Released: Tuesday March 16, 2010, 10:45:00 PM -0700

Changes:

  • MANIFEST
  • t/28_parser_bugs_50013.t (deleted)
  • t/90_rt50013_parser_bugs.t (added)

Rename of t/28_parser_bugs_50013.t to fit more universal scheme for
test suites that directly address specific RT bugs.

  • lib/RPC/XML/Server.pm
  • t/90_rt54183_sigpipe.t (added)

RT #54183: Provide handling of SIGPIPE when sending the response to the client,
in case they’ve terminated the connection.

  • MANIFEST

Forgot to add the new test suite to MANIFEST.

  • lib/RPC/XML/Server.pm

Forgot to update the module version number.

  • lib/RPC/XML.pm

Fix typo in reftype() call.

  • lib/RPC/XML.pm
  • t/90_rt54494_blessed_refs.t (added)

RT #54494: Fix handling of blessed references in smart_encode().

  • lib/Apache/RPC/Server.pm
  • lib/Apache/RPC/Status.pm
  • lib/RPC/XML.pm
  • lib/RPC/XML/Client.pm
  • lib/RPC/XML/Function.pm
  • lib/RPC/XML/Method.pm
  • lib/RPC/XML/Parser.pm
  • lib/RPC/XML/Parser/XMLLibXML.pm
  • lib/RPC/XML/Parser/XMLParser.pm
  • lib/RPC/XML/ParserFactory.pm
  • lib/RPC/XML/Procedure.pm
  • lib/RPC/XML/Server.pm

Large-scale code clean-up driven by Perl::Critic. All critic flags
down to severity 1 now removed.

  • MANIFEST

Forgot to add t/90_rt54494_blessed_refs.t when it was created.

Tags: , , ,

Perl Module Release: RPC-XML 0.72

December 13th, 2009 | No Comments | Posted in CPAN, Perl, Software, XML
Version: 0.72 Released: Sunday December 13, 2009, 09:45:00 PM -0700

Changes:

  • Makefile.PL
  • t/40_server_xmllibxml.t

RT #52662: Fix requirement specification for XML::LibXML.

  • lib/RPC/XML.pm

Some more clean-up of the docs, removing a redundant section.
Tags: , , ,

Perl Module Release: RPC-XML 0.71

December 7th, 2009 | No Comments | Posted in CPAN, Perl, Software, XML
Version: 0.71 Released: Monday December 7, 2009, 08:00:00 PM -0700

Changes:

  • MANIFEST
  • t/01_pod.t (deleted)
  • t/02_pod_coverage.t (deleted)
  • t/03_meta.t (deleted)
  • t/04_minimumversion.t (deleted)
  • t/05_critic.t (deleted)
  • xt/01_pod.t (added)
  • xt/02_pod_coverage.t (added)
  • xt/03_meta.t (added)
  • xt/04_minimumversion.t (added)
  • xt/05_critic.t (added)

Moved author-only tests to xt/, updated MANIFEST.
  • MANIFEST

Add test suite 28_parser_bugs_50013.t, which was omitted from last release.
  • xt/01_pod.t
  • xt/02_pod_coverage.t
  • xt/03_meta.t
  • xt/04_minimumversion.t
  • xt/05_critic.t

Re-engineered the author-only/release tests, since they're no longer in the t/ directory and thus should not interfere.
Tags: , , ,

Perl Module Release: RPC-XML 0.70

December 6th, 2009 | No Comments | Posted in CPAN, Perl, Software, XML
Version: 0.70 Released: Sunday December 6, 2009, 10:00:00 PM -0700

Changes:

  • lib/RPC/XML.pm
  • t/10_data.t

RT #49406: Make Base64 data-type allow zero-length data.

  • lib/RPC/XML.pm
  • t/10_data.t

Hand-applied a patch (most likely from Bill Moseley) to extend the construction of dateTime.iso8601 data-types.
  • t/40_server.t

Fixed another corner-case for the url() test.
  • lib/RPC/XML.pm

Fixed a case from previous work that caused "undef" warnings.
  • lib/RPC/XML.pm
  • lib/RPC/XML/Parser.pm
  • t/28_parser_bugs_50013.t

RT #50013: Restore backwards-compatibility for projects that use RPC::XML::Parser directly.

  • lib/RPC/XML/Procedure.pm

RT #50143: Incorrectly called server_fault() as if it were a coderef.

  • lib/Apache/RPC/Server.pm

Applied patch from Frank Wiegand to fix a POD problem.
  • lib/RPC/XML.pm

Some additional regexp issues on dateTime.iso8601, to handle backwards-compatibility.
  • lib/RPC/XML/ParserFactory.pm

Fixed some minor doc errors.
  • lib/RPC/XML/Parser/XMLParser.pm

Moved the 'require' of some libraries to the point where they are first needed, to delay loading until/unless necessary.
  • lib/RPC/XML/Parser/XMLLibXML.pm (added)
  • t/21_xml_libxml.t (added)
  • t/29_parserfactory.t
  • t/40_server_xmllibxml.t (added)

Implement support for XML::LibXML in the parser-factory.
Tags: , , ,

Idle Thoughts on Parsing XML (slightly Perlish)

October 7th, 2009 | No Comments | Posted in Perl, XML

(Side note: There was no Module Monday post this week, as I was too swamped to look for one to cover. Check back next week…)

I’m in the (achingly slow) process of writing a new XML-RPC parser using XML::LibXML. Because (according to their own docs) their SAX support is spotty, I’m letting the library parse the whole message into a DOM object and then using that object to get the request or response. This has proven to be a serious pain in the lower regions.

The XML::Parser approach I’ve had since RPC::XML’s inception is an event-based parser: I use a state-machine/stack approach and push/pop items as needed, based on whether my event is a tag-start, tag-end, text, etc. As a side effect, I validate the document, since the stack/state machine will throw an exception if some event doesn’t fit in to what it is expecting.

Taking a DOM approach means more work, as not only am I drilling down for the data I need, I also have to do some checking for validity as well. (Some might point out that XML::LibXML supports checking document validity against any of a DTD, XML Schema or RelaxNG schema… I’m actually familiar with that. But there is no “real” (i.e., “official”) DTD or schema for XML-RPC for me to use in this case.)

So here’s my observation, which is probably blindingly-obvious to everyone else who’s worked with XML: SAX/event-based parsing is the way to go for processing a whole document, and DOM is better for cherry-picking pieces from different parts of it.

Like I said, probably pretty obvious to the rest of you, but it’s hitting me over the head pretty hard these days.

Tags: , ,