lxml will not parse unicode xml#17
lxml will not parse unicode xml#17cappy123abc wants to merge 1 commit intoopenlabs:developfrom cappy123abc:develop
Conversation
|
@cappy123abc thanks for the pr. changing the encoding on the xml header is fine, but my guess is all the content inside the xml is still unicode encoded as UTF-8. There is a test file which generates xml and spits it to the terminal. Perhaps we could add one more with a name with unicode characters like |
|
Ok, I'll add a test. I was looking at the documents at http://lxml.de/parsing.html and the author writes " You should generally avoid converting XML/HTML data to unicode before passing it into the parsers. It is both slower and error prone." Which seems odd to me since so much xml must have international characters in it and unicode is the most elegant way to deal with it. What does he expect ASCII:) |
This is my first pull request... so be Nice! I get the following error when trying to parse the returned xml as following: ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.