I'd look at the portion of the Python code that transmits the (utf-8) string over the TCP socket and insure that that translation is occurring correctly. I'm ignorant of what happens when a Unicode string is passed to a socket write call. You mention you changed String_write() but did you change String_read() to examine the returned string and treat it as Unicode as appropriate?
Re how to support it in the bigger scheme, the original Xanadoers believed that it ought to be transparent to the backend and to indicate whether the byte sequence in a particular document is 8 or 16 bits or encoded in some manner, a link would be added by the front-end indicating that. Of course all front-ends must then query for and respect that link, but no such standardization has yet been done.
I wonder whether 16-bit chars ought to be done with a different resource type (1 = bytes, 2 = links, 3 = words) so that it isn't even possible to address the bytes out-of-phase as you could using a link-type. I wouldn't use a different resource type for each encoding though, just for each physical chunk size.
-Jeff Aaron Bingham wrote:
Hello, Has anyone looked at supporting this? I looked at it briefly yesterday and came across a couple of problems in Pyxi, but after fixing these I was still getting incorrect data from the backend, so there must be a deeper issue here. In order to get Pyxi to handle non-ASCII chars, I had to change x88.XuConn.write() to pass both regular and Unicode strings to String_write(). I also had to change the default encoding to utf-8. After doing this I was able to insert some German text, but when I reloaded it, regular ASCII characters were substituted. For example, a 223 LATIN SMALL LETTER SHARP S became 67 LATIN CAPITAL LETTER C. Any ideas? Regards,