5/30/2011

Every technology has its purpose (about XPath)

Recently I had to fix a bug in our production enviroment caused by a performance issue. Basically the problem was that we were missing a couple of requests by timeout because the process parsing a xml was taking too long, and the reason was the misuse of Xpath and the size of the xml (about 700 children nodes under the root element). Just to be clear, the xml is parsed and cached in lazy mode, and unless the file change we never parse it again.

When I started the tests in my old heavy loaded laptop, the process took 25 seconds in parse the xml, after the fix it took less than a second.

Our xml has the following structure:

<root>
<element id="1"/>
   <param id="1"/>
   <param id="2"/>
   ...
   <section>
      <sec-element id="1"/>
      <sec-element id="2"/>
      ...
      <sec-element id="n"/>
   </section>
</element>
...
</root>

We map the 'element' nodes as java objects so we iterate over each one of the elements and get the data, but to get the elements under the 'section' we use the following xpath expression "section/sec-element" on each 'element' node.

I guess I was lazy the day I coded the parser (yes, mea culpa) and I wanted to be sure the 'sec-element' nodes where under a 'section' node, so I decided to use xpath, and that was my mistake. The wikipedia says: "XPath, the XML Path Language, is a query language for selecting nodes from an XML document", but I wasn't using it for selecting nodes, I knew the nodes were there (and if I didn't, I should have used XML Schema), I wasn't searchin the nodes with an attribute or the ones with a specific attribute value, I just wanted all of them.

After remove the xpath querys for a plain getElementsByTagName, the problem was solved.

Just remember, before use a technology/tool make sure is the right one for the job, it will save you a lot of problems and time.