Mail/Login: Password : forgot my password!

Uniface 9 and UTF8 multibyte chars

Direct link
Written by -GHAN- // Tags: utf8 params webinfo input multibyte ajax post get

Strange things happen out in the web. Everybody opens up just to make some international character set possible. Web pages are shown in bunch of languages and you can even google in Klingon!
Allthough germans seem to have problems with talking other languages than german, I'm fine with that and try to produce code, which doesn't care about characters being transferred forth and back in standard, chinese, hebrew, danish or whatever characters needed.

Lately I found this little problem while doing my new look-up widgets. In the main collum there was a "m:ju" (Charcode hex B5). I found it to be interesting and tried to search for that value in the html page to double-check my widget.
After typing this char with [CTRL]+[M] (on my keyboard) the search result didn't show the expected value. Something had gone wrong there ...

UTF8, Unicode, Uniface?

The html code was set up to be UTF8 and so the transmitted characters got encoded as that. This makes it a normal string with some hex values initiated by a % sign. But hey, what had happened to the mju? ... Shouldn't this become a %B5? Apparently not! The browsers turned it to a double hex value with the value %C2%B5. And as Uniface received those values, it took it as two characters ;)

I started to look at the code tables and found that this little char was UTF8 coded while Uniface took it as Unicode. And as I took a sharp look at the html request header, it got pretty clear (to me) what happens while you do that. The browsers declare the request as UTF8 and transmit it. Uniface won't recognize that declaration while getting data from the WRD as this seem to drop that information :) So Uniface assumes it to be normal chars and starts to decode them to Unicode. In detail, the WRD doesn't transport that information. The transmission says "Content-Type: text/html; charset=UTF8" but if you debug at that point, that information has vanished. Maybe this could be the reason.


As mentioned here in a previous version of this contribution, I did send a test set to Amsterdam and they confirmed this effect. After some weeks of digging for the needle in the haystack, yesterday came a good hint.

In my case the µ gets entered into an <input> field. This is passed over to a JavaScript string and send to Uniface via AJAX. I used to have a encodeURIComponent() running there. This JavaScript method converts every kind of extendted char to a UTF-8 code as described above. But for the tests I removed it to be sure about what is transmitted.

Finally a developer in Amsterdam gave me the hint to transfer the inputs with the POST-Method instead of with GET. While transfering with POST the content transmitted gets a special kind of conversion called 'application/x-www-form-urlencoded'. After telling the GHANIFIED! ToolKit to send the request with POST, Uniface got the chars right and my mju came up as expected.


Uniface will at the moment NOT recognize UTF8 chars which are URLencoded and transmitted with AJAX and the HTTP GET method. Multibyte chars are interpreted as multiple chars causing to corrupt some chars here and there. The problem affects all major browsers but IE. IE seem to transmit single byte chars and passes this to Uniface. So don't get the creeps while finding some strange chars in your fields- simply be aware of it ;)) For a workaround with Uniface the best solution is to switch to the POST method.

However, in Uniface 9.4 two new functions are introduced to handle these issues. $encode() and $decode() are assumed to take care of this. They are as new as can be and not available in my Uniface 9.4RC1 ... ;)

I'll keep you updated on that


1568 view(s) / 2010-02-26 15:40:07 / LAST UPDATED: 2010-04-09 07:47:25