|
|
XMLSocket.onData And UTF-8 Zero Bytes
I'm currently playing around a bit with Java and Flash's XMLSocket to get them to talk together. The Flash documentation doesn't mention this anywhere, so I assumed that Flash's XMLSocket sends out its data encoded in UTF-8 just like XML.load expects its XML files to be UTF-8 encoded (and won't take anything else even if you tell it to!). This seems to be correct, because I've had Java send the string back as UTF-8 and Flash displayed it fine.
However, there is a slight complication. XMLSocket's EOF marker for both input and output is a single zero byte, but certain high-codepoint Unicode characters get encoded in UTF-8 containing several zero bytes. When I have Java send back a byte sequence containing one of these multi-bytes characters, XMLSocket's onData seems to fire for the zero bytes that are part of the UTF-8 encoded string, as well as the zero byte EOF marker.
Short example: say I have Flash send "abc" over XMLSocket, what it'll actually send is "abc ". When encoded as UTF-8, the byte sequence (in hex) for this string is:
Code: 61 62 63 00 My Java server receives this sequence successfully and echoes it back to Flash identically. So Flash then receives:
Code: 61 62 63 00 XMLSocket reads this, fires onData when it hits the last zero byte, and all is fine.
But when I have Flash send a string that contains one or more high-codepoint Unicode characters, say "abc嘹嘻", then it ends up sending "abc嘹嘻 | | |