Archive for August, 2011

Querying messy XML in Oracle

I found some XML stored in a XMLType column in an Oracle table that doesn’t conform to a structured XML schema. It looks like this:

<root>
  <fields>
    <field_abc>
      <name>field_abc</name>
      <value>1</value>
      <enabled>true</enabled>
    </field_abc>
  </fields>
  <fields>
    <field_def>
      <name>field_def</name>
      <value>2</value>
      <enabled>false</enabled>
    </field_def>
  </fields>
</root>

Note how all the nodes under /root/fields are named after the field name, rather than each node being named <field>. So I can’t just ask for all the nodes named field or all the field nodes in the fields collection, and I don’t know what the field names are or how many there are either.

I can’t easily change the format of the XML to conform to a schema since this table has been in production for a few years on a few dozen Oracle instances.

So how can I query and/or loop through all the children in the /root/fields path without knowing the child node names? It turns out to be fairly easy!

Example Query:

SELECT EXTRACTVALUE (VALUE (xml), '*/name') AS "name",
       EXTRACTVALUE (VALUE (xml), '*/value') AS "value",
       EXTRACTVALUE (VALUE (xml), '*/enabled') AS "enabled"
  FROM my_table_with_xml tbl,
       TABLE (XMLSEQUENCE (EXTRACT (tbl.xml_type_col, '/root/fields/*'))) xml
 WHERE tbl.primary_key = 1234;

Example Output:

name        value   enabled
---------------------------
field_abc   1       true
field_def   2       false

The trick is this line:

TABLE (XMLSEQUENCE (EXTRACT (tbl.xml_type_col, '/root/fields/*')))

What we do is we EXTRACT() all the nodes below the /root/fields path regardless of their names (* wildcard). Then XMLSEQUENCE() will split the extracted nodes into a VARRAY of XMLTypes. Finally, the TABLE() function allows us to query the VARRAY results by converting the multiple array values into a pseudo table containing multiple rows.

We could even apply a WHERE clause against the XML data to further restrict the results:

Example Query:

SELECT EXTRACTVALUE (VALUE (xml), '*/name') AS "name",
       EXTRACTVALUE (VALUE (xml), '*/value') AS "value",
       EXTRACTVALUE (VALUE (xml), '*/enabled') AS "enabled"
  FROM my_table_with_xml tbl,
       TABLE (XMLSEQUENCE (EXTRACT (tbl.xml_type_col, '/root/fields/*'))) xml
 WHERE tbl.primary_key = 1234
   AND EXTRACTVALUE (VALUE(xml), '*/enabled') = 'true';