Click here to Skip to main content
15,881,803 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am converting utf8 to utf16/unicode and I want to get the same result under windows and linux.

Note: "0xBF" is not valid utf8 character.

c:
C
char str[] = { 0xBF, 0 };
MultiByteToWideChar(CP_UTF8, 0, str, nLen8, pszUtf16, nLen16);


c#
C#
byte[] abCH = new byte[] { 0xBF };
string str = System.Text.Encoding.UTF8.GetString(abCH);
byte[] abCHUni = Encoding.Unicode.GetBytes(str);


Under windows, c and c# code will return:
"0xBF" is converted to "0xFD 0xFF".

Under linux (same executable run by WINE):
"0xBF" is not converted.

Under windows (call to MultiByteToWideChar from WINE source code):
"0xBF" is not converted.

Microsoft and WINE function "MultiByteToWideChar" does not return the same result for invalid utf8 characters.

Does anyone have source code for "MultiByteToWideChar"(from Microsoft) or code that will produce the same result under windows / linux for invalid utf8 characters?

Regards,
Tomice
Posted

1 solution

Both handling is compliant. The Unicode specs allow signaling an error, removing the invalid code, or replacing it with a replacement character (usually the U+FFFD REPLACEMENT CHARACTER).

If you need identical behaviour you can try to use the MB_ERR_INVALID_CHARS flag. Then the function calls should fail.

The MS code is not public and won't help you. Having a look at the Wine sources shows that the function wine_utf8_mbstowcs (see http://source.winehq.org/source/libs/wine/utf8.c[^]) called by WideCharToMultiByte simply skips invalid codes.
 
Share this answer
 
Comments
Tomice 21-Jan-15 7:26am    
Thanks a lot for your answer :) It helped me a lot.
I'll use WINE utf8 and add "U+FFFD REPLACEMENT CHARACTER" functionality.

Regards,
Tomice
Jochen Arndt 21-Jan-15 7:33am    
Thank you for the feedback and accepting the solution.
Tomice 21-Jan-15 7:47am    
Do you have a reddit account? I would like to send you some bits.
Jochen Arndt 21-Jan-15 7:58am    
No. Only a not really used Google+ account. I ever wonder how others have time to spare for social networking.
Tomice 21-Jan-15 8:09am    
Do you have a bitcoin address? I would like to send you some bits.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900