Monday 9 February 2009

XML and the Copyright symbol, ©

For anyone parsing an XMl file and wondering what in the world is an org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence error message, check to see whether the XML file has any special characters like the copyright symbol, ©. The default encoding of UTF-8 will choke on these.

Solved my woes by changing the encoding to <?xml version="1.0" encoding="iso-8859-1"?> to get the character to display correctly. You can also filter it out or replace the symbol with its character reference.