Here's an actual example:
Correct:
José
Byte stream as downloaded from getTransactionDetails:
4a 6f 73 c3 83 c2 a9
J o s à ©
Correct byte stream:
4A 6F 73 C3 A9
J o s é
The getTransactionDetails API xml header claims that the encoding is UTF-8,
the API is actually encoding the unicode twice.
One frequent cause of this is to have a MySQL database with tables defined as the
default latin1 charset, but the data is actually stored in the table as UTF8. MySQL doesn't
care until you SELECT the data as UTF-8, at which point it encodes the already UTF8
as UTF8 again.
Some versions of the mysqldump tool can also store the dump incorrectly,
even if the tables are properly declared as UTF8 charset.
When restored from such a dump, the table will have doubly-UTF8-encoded text.
Workaraound:
Decoding from downloaded "UTF-8" byte stream to unicode,
then decoding from the unicode (which is actually UTF-8) to unicode fixes the problem.
07-12-2011 07:35 AM
Hey there,
Hey sorry for the delay, but I've got the development teams looking into this one. We'll post more once we hear back from them.
Thanks,
Michelle
Developer Community Manager
07-15-2011 10:39 AM
I was able to confirmt hat this isn't exactly what is happening, so unfortunately it isn't just a simple adjustment to stop the behavior. You are correct that our database stores the values in a western character set. We are currently configured to expect that all text is sent to us in a western encoding. You should be able to send characters such as é using extended western encoding. It is likely that we will move to support UTF-8 in the future, but it isn't something that we can provide a timeline on.
08-19-2011 04:40 PM