Home / Resources / Blog / Handling Reserved Characters in Your XML Data

Handling Reserved Characters in Your XML Data

3 min read
ibm i, rpg, xml
Kato Integrations Team
March 24, 2015

One of the more common support questions we receive from RPG API Express users is:
“How can I use XML reserved characters in my XML request or response?”

The following XML produces parsing errors:

				
					<CompanyName>Smith & Doe Incorporated</CompanyName>

What do I do?

The questioner is referring to the characters “&” and “<“, although sometimes they believe that other characters are also reserved. Those two characters have special meaning to the XML language and cannot be used within the “content” or “data” of the XML stream. Those characters are used to signal special meaning to an XML parser and that meaning breaks down if they are used in the wrong context.

So how do you include those reserved characters in the XML without breaking the rules?

One way would be to utilize what’s called XML entities. Instead of “&” you use “&” and instead of “<” you use “<”. When a parser processes the XML, it recognizes the entities and converts them back to the single digit value they represent. So with the use of XML entities, you simply need to replace the reserved characters with their XML entity equivalents.

Sounds easy enough, right?

But, then you realize that you’ll need to scan and replace the contents of any fields that may contain XML reserved characters. Yes you can use one of our subprocedures (or your own) to make that easier when doing the replacements. But, you still have to insert the calls to the subprocedure in all the right places and potentially add a lot of unwanted bulk and processing time to your program.

So what if there was a way to avoid additional RPG coding altogether?

Well there is a way to avoid additional coding by using what’s called CDATA sections. A CDATA section is a concept in the XML specification that allows wrapping data inside some special tags which signal the XML parser to suspended the normal rules when processing the wrapped data. For instance, the earlier example of invalid use of XML reserved characters would become valid when used like this:

				
					<CompanyName><![CDATA[Smith & Doe Incorporated]]&gt;</CompanyName>

By wrapping the original data inside <![CDATA[ and ]]> the parser is able to handle any special characters within the <CompanyName> element. The first part of the CDATA syntax (<![CDATA[) signals a suspension of the normal rules, while the second part of the CDATA syntax (]]>) signals that normal rules are back into effect. The best part is that since you can simply place the two pieces of CDATA syntax into your template file, there’s no need for any RPG coding changes at all!

				
					<CompanyName><![CDATA[.:Company_Name:.]]&gt;</CompanyName>

So does this mean you can place any character you like inside a CDATA section?

In a word, no. The XML specification does not allow for including any possible character and generally excludes characters whose hexadecimal value is less than a space or blank character. There are exceptions to this for characters that are considered “whitespace”, such as line control characters, etc. But just remember that CDATA sections don’t give you permission to include any characters, but instead allow you an easy way to include reserved characters that normally have special meaning to an XML parser. If you need to include characters that are illegal in the XML specification, then a technology like BASE64 processing is needed (and BASE64 processing is a topic for a future post). For now, simply remember that CDATA sections are an easy to use and effective method to avoid parsing errors when reserved XML characters are present in the content you need to compose in your XML.

Questions?

Get in touch with our support team here!