Saturday 21 April 2007

Accessing CDATA section in XML DOM from Javascript

I had a hard time figuring out how to access CDATA Section in XML DOM using JavaScript. I read many different comments on that but I have not find a satisfying answer. Let's take a look at an xml example:
<page>
<uri>http://www.somepage.com/page1.html</uri>
<content>
<!--[CDATA[
<p>Paragraph 1</p>
<p>Paragraph 2 <a href="lnk.html">lnk</a></p>
]]-->
</content>
</page>
I was trying to access the CDATA section the way I access all the text nodes:
xmlDocument.getElementsByTagName('content')[0].firstChild.nodeValue;
Unfortunately it doesn't work. I was wondering what am I doing wrong. I run some tests and discovered that the <content> node has actually 3 children (I thought it has only 1). The first and third childNodes are empty text nodes; the middle one is the CDATASection node I was trying to access. So the final code looks like this:
xmlDocument.getElementsByTagName('content')[0].childNodes[1].nodeValue;

I'm using this for parsing a server response returned by AJAX Request (invoked using prototype.js library). I am not an expert here but I guess it may differ between XML DOM implamantations.

11 comments:

KernelPanic said...

Sadly in it works different in IE and FF

Raoul Duke said...

I wrote a function to handle this issue:

function getCDATA(element){

var ie = (typeof window.ActiveXObject != 'undefined');
var returnText;

if(ie){

if(element.hasChildNodes){
returnText = element.childNodes[0].nodeValue;
}
}
else{

if(element.hasChildNodes){
returnText = element.childNodes[1].nodeValue;
}


}

return returnText;

}

Fabrizio Picca said...

on which browser you've tested the script? It works like a charm in Firefox, but i get some problems with IE...

Anonymous said...

This was pretty helpful. It works the way you thought in IE but in FF I have to use .childNodes[1]. Thanks for the post though, got me looking in the right direction.

Anonymous said...

The reason you see the text showing up in the second child node is because your XML file as written with a new line after the CDATA markup puts a newline "\n" element into the DOM tree in Firefox as the first child node.

Start instead with a single line :

[CDATA[Paragraph 1...

and your first method will work.

Or even better, keep your first text format and use "textContent":

xmlDocument.getElementsByTagName('content')[0].textContent;

Anonymous said...

textContext is exactly what I've been looking for.

Filip Czaja said...

I've tested this script on FF & Opera. It wasn't for commercial use so I skipped IE. Sorry for that

Anonymous said...

The solution you use isn't exactly a clean and proper one. What if for some reason the CData block is suddenly at index 2? The best way would be to take advantage of xpath. Here's how I go about it:

xmlDocument.selectSingleNode("//thexpath-to-your-node/text()").nodeValue

"selectSingleNode" is only available in IE, though using a cross browser XMLDOM implementation library like Sarissa would fix that, as well as "nodeValue".

or since you are using Sarissa now, a cross browser implementation of innerText is available as well. Giving you: "element.innerText"

Cheers !

Rodolfo said...

This code works alson on IE and FF, indpendent of \n

function getNodeCDATA(node)
{
if (node.hasChildNodes())
{ crsXMLDOM.getText(node); //firefox & IE
var node2=node.firstChild;
while ((node2) && (node2.nodeType!=4))
node2=node2.nextSibling;
if ((node2) && (node2.nodeType==4))
return node2; //encontrou CDATA
};
return null;
}

Gilang said...

Works in chrome...
It's really help me...

Thx bro.. :D

Unknown said...

Here is my solution to this problem:

var getContent = function(p_xmlNode){
if(p_xmlNode.childNodes.length == 0){
console.log('getContent() | No text found in "'+getXmlString(p_xmlNode)+'"');
return;
}
var txt;
for(var i=0; i<p_xmlNode.childNodes.length; i++){
txt = p_xmlNode.childNodes[i].nodeValue.replace(/^\s+|\s+$/g,'');
//console.log('['+i+'] = Node Name = '+p_xmlNode.childNodes[i].nodeName+' : Node Type = '+p_xmlNode.childNodes[i].nodeType+' : Is Empty = '+(txt == '')+' : '+txt);
if(txt != ''){
return p_xmlNode.childNodes[i].nodeValue;
}
}
};