Querying messy XML in Oracle
I found some XML stored in a XMLType column in an Oracle table that doesn’t conform to a structured XML schema. It looks like this:
<root> <fields> <field_abc> <name>field_abc</name> <value>1</value> <enabled>true</enabled> </field_abc> </fields> <fields> <field_def> <name>field_def</name> <value>2</value> <enabled>false</enabled> </field_def> </fields> </root> |
Note how all the nodes under /root/fields are named after the field name, rather than each node being named <field>. So I can’t just ask for all the nodes named field or all the field nodes in the fields collection, and I don’t know what the field names are or how many there are either.
I can’t easily change the format of the XML to conform to a schema since this table has been in production for a few years on a few dozen Oracle instances.
So how can I query and/or loop through all the children in the /root/fields path without knowing the child node names? It turns out to be fairly easy!
Example Query:
SELECT EXTRACTVALUE (VALUE (xml), '*/name') AS "name", EXTRACTVALUE (VALUE (xml), '*/value') AS "value", EXTRACTVALUE (VALUE (xml), '*/enabled') AS "enabled" FROM my_table_with_xml tbl, TABLE (XMLSEQUENCE (EXTRACT (tbl.xml_type_col, '/root/fields/*'))) xml WHERE tbl.primary_key = 1234; |
Example Output:
name value enabled --------------------------- field_abc 1 true field_def 2 false
The trick is this line:
TABLE (XMLSEQUENCE (EXTRACT (tbl.xml_type_col, '/root/fields/*'))) |
What we do is we EXTRACT() all the nodes below the /root/fields path regardless of their names (* wildcard). Then XMLSEQUENCE() will split the extracted nodes into a VARRAY of XMLTypes. Finally, the TABLE() function allows us to query the VARRAY results by converting the multiple array values into a pseudo table containing multiple rows.
We could even apply a WHERE clause against the XML data to further restrict the results:
Example Query:
SELECT EXTRACTVALUE (VALUE (xml), '*/name') AS "name", EXTRACTVALUE (VALUE (xml), '*/value') AS "value", EXTRACTVALUE (VALUE (xml), '*/enabled') AS "enabled" FROM my_table_with_xml tbl, TABLE (XMLSEQUENCE (EXTRACT (tbl.xml_type_col, '/root/fields/*'))) xml WHERE tbl.primary_key = 1234 AND EXTRACTVALUE (VALUE(xml), '*/enabled') = 'true'; |