Click here to Skip to main content
15,888,579 members
Articles / Programming Languages / C++
Tip/Trick

Useful function for conversion between MBCS and WCS

Rate me:
Please Sign up or sign in to vote.
4.67/5 (9 votes)
19 Nov 2010CPOL 21.3K   9   4
Wrapping WideCharToMultiByte and MultiByteToWideChar
When I program on wince or windows mobile platform, Character encoding constantly bother me. Sometimes, I need disply multibyte character set (not string) on UNICODE platform and Using MultiByteToWideChar can't build a wide string directly. Sometimes, I need save a wide string to file using ANIS without '\0' and Using WideCharToMultiByte can't build a multibyte character set directly, too. So, I write the below code to convert different encoding and different types(set and string) freely.

/******************************************************
*function:  convert multibyte character set to wide-character set
*param:      pwStr--[out] Points to a buffer that receives the translated buffers.
*            pStr--[in] Points to the multibyte character set(or string) to be converted.
*            len --[in] Specify the size in bytes of the string pointed to by the pStr parameter,
*                       or it can be -1 if the string is null terminated.
*            IsEnd--[in]Specify whether you add '\0' to the end of converted array or not.
*return: the length of converted set (or string )
*******************************************************/
int ToWideString( WCHAR* &pwStr, const char* pStr, int len, BOOL IsEnd)
{
    ASSERT_POINTER(pStr, char);
    ASSERT(len >= 0 || len == -1);
    int nWideLen = MultiByteToWideChar(CP_ACP, 0, pStr, len, NULL, 0);
    if (len == -1)
    {
        --nWideLen;
    }
    if (nWideLen == 0)
    {
        return 0;
    }
    if (IsEnd)
    {
        pwStr = new WCHAR[(nWideLen+1)*sizeof(WCHAR)];
        ZeroMemory(pwStr, (nWideLen+1)*sizeof(WCHAR));
    }
    else
    {
        pwStr = new WCHAR[nWideLen*sizeof(WCHAR)];
        ZeroMemory(pwStr, nWideLen*sizeof(WCHAR));
    }
    MultiByteToWideChar(CP_ACP, 0, pStr, len, pwStr, nWideLen);
    return nWideLen;
}
/******************************************************
*function:   convert wide-character  set to multibyte character set
*param:      pStr--[in] Points to a buffer that receives the translated buffer.
*            pwStr--[out] Points to the wide character set ( or string ) to be converted.
*            len --[in] Specify the size in bytes of the string pointed to by the pwStr parameter,
*                       or it can be -1 if the string is null terminated.
*            IsEnd--[in]Specify whether you add '\0' to the end of converted array or not.
*return:     the length of converted set (or string )
*******************************************************/
int ToMultiBytes( char* &pStr, const WCHAR* pwStr, int len, BOOL IsEnd)
{
    ASSERT_POINTER(pwStr, WCHAR) ;
    ASSERT( len >= 0 || len == -1 ) ;
    int nChars = WideCharToMultiByte(CP_ACP, 0, pwStr, len, NULL, 0, NULL, NULL);
    if (len == -1)
    {
        --nChars;
    }
    if (nChars == 0)
    {
        return 0;
    }
    if(IsEnd)
    {
        pStr = new char[nChars+1];
        ZeroMemory(pStr, nChars+1);
    }
    else
    {
        pStr = new char[nChars];
        ZeroMemory(pStr, nChars);
    }
    WideCharToMultiByte(CP_ACP, 0, pwStr, len, pStr, nChars, NULL, NULL);
    return nChars;
}


How to Use
char *pStr = "test";
WCHAR* pwStr;
int nWideLen = ToWideString(pwStr, pStr, -1, TRUE);

WCHAR* pwStr = _T("test");
char *pStr;
int nWideLen = ToMultiBytes(pStr, pwStr, -1, TRUE);

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
China China
In my first English Lesson, I learned "What's your name?My name is Yubao li",hehe...Here, My name is DotCpp, Which comes from suffix of the c++ file.

I am from beautiful city Chongqing China, famous for "Chongqing hilly" and "Chongqing foggy"

I am interested in c++ programming, but just as a junior.

If you, like me, also come from China, you can visit my Baidu space:
hi.baidu.com/anglecloudy

Comments and Discussions

 
Generalmbstowcs, wcstombs? Pin
alejandro29A29-Dec-10 9:25
alejandro29A29-Dec-10 9:25 
GeneralOne of your function calls is using CP_ACP (actual codepage ... Pin
pasztorpisti9-Dec-10 4:34
pasztorpisti9-Dec-10 4:34 
One of your function calls is using CP_ACP (actual codepage of the user's windows) and the other is using CP_UTF8 making the two classes inconsistent. A "widechar" string is not a unicode string. Some ppl use these words exchangeably. The unicode character set is the one that contains all characters (code points). The size of this set is about 1 million. And there are encodings that somehow convert a truly unicode string to a "binary format". These binary formats are usually codepages, utf-8, utf-16, or utf-32. It is possible only in utf-32 so that the unit used by the encoding (32 bit integer in case of utf-32) is possible to represent every unicode character. So in utf32 you don't have to make tricks like in utf-8 to use more than one encoding units to encode a unicode character. Ages ago in unicode 1.0 we didn't have more unicode characters than what we were able to encode in a wchar_t (see the UCS-2 encoding for more info). Because of this, a lot of ppl think that a wchar_t is a unicode character, however it is just a unit that is used within an utf-16 encoded string, and as of unicode2 it is possible that not one, but 2 wchar_ts encode a single unicode character (namely its a surrogate pair, it is used to represent unicode chars above 0xFFFF).
Use CP_ACP with care, because its behaviour may change from windows to windows depending on the locale settings of the user. Its very good when you have to convert some strings inside the program for example in case of user input, but if you serialize the converted string for example to a file in "ANSI" CP_ACP encoding then you don't know what encoding/codepage is actually used and you will be in trouble if someone carries this file to another machine where CP_ACP refers to a different encoding.
GeneralI have wrote these two help class: class CAnsi { public: C... Pin
qiuchengw2-Dec-10 15:34
qiuchengw2-Dec-10 15:34 
Generala question.. Pin
alejandro29A29-Dec-10 9:26
alejandro29A29-Dec-10 9:26 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.