<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Patrick Tulskie &#187; xml</title>
	<atom:link href="http://www.patricktulskie.com/tag/xml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.patricktulskie.com</link>
	<description>Building a Better Internet</description>
	<lastBuildDate>Wed, 16 Jun 2010 19:12:25 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>libxml-ruby vs nokogiri vs hpricot</title>
		<link>http://www.patricktulskie.com/2009/03/libxml-ruby-vs-nokogiri-vs-hpricot/</link>
		<comments>http://www.patricktulskie.com/2009/03/libxml-ruby-vs-nokogiri-vs-hpricot/#comments</comments>
		<pubDate>Wed, 18 Mar 2009 05:05:26 +0000</pubDate>
		<dc:creator>Patrick Tulskie</dc:creator>
				<category><![CDATA[Comedy]]></category>
		<category><![CDATA[New Stuff]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[gems]]></category>
		<category><![CDATA[hpricot]]></category>
		<category><![CDATA[libxml]]></category>
		<category><![CDATA[libxml-ruby]]></category>
		<category><![CDATA[nokogiri]]></category>
		<category><![CDATA[parsing]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.patricktulskie.com/?p=115</guid>
		<description><![CDATA[Patrick talks about Ruby XML parser testing with libxml-ruby, nokogiri, hpricot, and rexml.  New test results using the test suite written by Tenderlove (Aaron Patterson) and modified to satisfy some of Why the Lucky Stiff's complaints about the tests.]]></description>
			<content:encoded><![CDATA[<p><em><strong>Update: Aaron told me that he is going to be re-running the benchmarks this weekend so we&#8217;ll get a more complete set of data from the machine that originally ran the tests.</strong></em></p>
<p>If you&#8217;re into parsing XML or HTML with ruby then chances are you&#8217;re familiar with the various gems out there for getting the job done.  Lately, there have been a lot of things flying around about which is the fastest and to settle it, Aaron Patterson (author of Nokogiri and Mechanize) wrote a test suite.</p>
<p>After it&#8217;s release, RubyInside posted about how the tests showed how fast Nokogiri was compared to Hpricot in this article here: <a title="Ruby XML Performance Shootout: Nokogiri vs LibXML vs Hpricot vs REXML - RubyInside" href="http://www.rubyinside.com/ruby-xml-performance-benchmarks-1641.html">Ruby XML Performance Shootout: Nokogiri vs LibXML vs Hpricot vs REXML</a>.  Later in the day, I saw Why&#8217;s posting about the release of Hpricot here: <a title="hpricot 0.7" href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/331411">hpricot 0.7</a> and decided to modify Aaron&#8217;s tests to use Hpricot.XML and here are the results:<br />
<span id="more-115"></span></p>
<pre><code>Tests were run at N=5 to get a clearer picture of the differences between the various gems.  At N=2, tests were pretty close, which indicated that a larger sample was needed.

test_IO_parsing(XmlTruth::DOM::XML::LargeDocumentParsingTest) N=5
user     system      total        real   kBps
null          0.690000   0.070000   0.760000 (  0.768641) 46343.68
nokogiri      2.790000   0.130000   2.920000 (  3.015303) 11813.62
libxml-ruby   2.970000   0.140000   3.110000 (  3.130175) 11380.08
hpricot      13.660000   0.370000  14.030000 ( 14.088780) 2528.37
.
test_in_memory_parsing(XmlTruth::DOM::XML::LargeDocumentParsingTest) N=5
user     system      total        real   kBps
null          1.240000   0.010000   1.250000 (  1.260841) 28252.30
nokogiri      4.360000   0.060000   4.420000 (  4.444468) 8014.83
libxml-ruby   4.570000   0.050000   4.620000 (  4.641338) 7674.87
hpricot      13.750000   0.210000  13.960000 ( 14.045647) 2536.13
.
test_simple_xpath(XmlTruth::DOM::XML::LargeDocumentXPathSearchTest) N=5
user     system      total        real   kBps
nokogiri     44.430000   0.300000  44.730000 ( 44.972003) 792.09
libxml-ruby  40.950000   0.210000  41.160000 ( 41.300780) 862.49
hpricot      18.410000   0.090000  18.500000 ( 18.540239) 1921.32
.
test_IO_parsing(XmlTruth::DOM::XML::SmallDocumentParsingTest) N=1944
user     system      total        real   kBps
null          8.150000   0.130000   8.280000 (  8.326070) 4278.17
nokogiri     17.850000   0.100000  17.950000 ( 17.950534) 1984.36
libxml-ruby  19.010000   0.260000  19.270000 ( 19.370769) 1838.87
hpricot      25.320000   0.460000  25.780000 ( 25.827516) 1379.16
.
test_in_memory_parsing(XmlTruth::DOM::XML::SmallDocumentParsingTest) N=1944
user     system      total        real   kBps
null          3.960000   0.030000   3.990000 (  4.005522) 8892.82
nokogiri     18.140000   0.200000  18.340000 ( 18.403396) 1935.53
libxml-ruby  19.760000   0.230000  19.990000 ( 19.999905) 1781.03
hpricot      15.980000   0.150000  16.130000 ( 16.133157) 2207.90
.
Finished in 426.233021 seconds.

5 tests, 0 assertions, 0 failures, 0 errors</code></pre>
<p>You can find my fork of the test suite on github here: <a title="Patrick Tulskie's fork of XMLTruth on Github" href="http://github.com/PatrickTulskie/xml_truth/tree/master">Patrick Tulskie&#8217;s Fork of XMLTruth</a></p>
<p>From this small sample of tests, it appears as though Nokogiri and libxml-ruby are similar in performance for most items.  This makes sense though since Nokogiri utilizes the native libxml of the current operating environment.  Nokogiri clearly excels at parsing larger documents while Hpricot appears to handle smaller, in-memory documents rather quickly.</p>
<p>In real-world scenarios, one might expect Nokogiri to be the ideal solution to parsing large XML or HTML documents from the disk into a database, whereas Hpricot might be a more ideal gem for use in a web crawler where it is rare that a page&#8217;s DOM is more than a 1MB.</p>
<p>Please post any other thoughts you might have in the comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.patricktulskie.com/2009/03/libxml-ruby-vs-nokogiri-vs-hpricot/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

