Click here to Skip to main content
15,902,198 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a Hebrew ANSI text file i should convert to Unicode Hebrew ( file ) conversion is done but iam not able to get the desired output as expected. please let me know how to do it.

What I have tried:

C#
//code page
int nlanguageCodePage = this->GetCodepage(lpszOldFileName);

while (fgets(chAnsiBuff, NMLANG_MaxNBuf, pFile) != NULL)
{
    sUnicodeBuff = chAnsiBuff;

    //CONVERTING TO UNICODE
    nSize = MultiByteToWideChar(nlanguageCodePage, 0, sUnicodeBuff, -1, NULL, NULL);
    MultiByteToWideChar(nlanguageCodePage, 0, sUnicodeBuff, -1, chUniocodeBuff, nSize);

    // bom at starting
    if (nBOM == 0) { arcOut.Write(&bom, 2); }
    arcOut.WriteString(chUniocodeBuff);

    nBOM++;
}
Posted
Updated 15-Nov-16 22:52pm
v3

1 solution

You are using the same buffer for input and output. That won't work. See the MultiByteToWideChar function (Windows)[^].

It should be like this:
int nSize = MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, NULL, NULL);
LPWSTR sUnicodeBuf = new WCHAR[nSize];
MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, sUnicodeBuff, nSize);
// Use sUnicodeBuff here
delete [] sUniocodeBuff;

However, when having a fixed size for the ANSI input buffer, it can be also used for the output buffer because the Unicode string will never have more wide characters than the number of ANSI characters in the input string:
C++
WCHAR wUnicodeBuf[NMLANG_MaxNBuf];
while (fgets(chAnsiBuff, NMLANG_MaxNBuf, pFile) != NULL)
{
    MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, wUnicodeBuff, NMLANG_MaxNBuf);
 
    // bom at starting
    if (nBOM == 0) { arcOut.Write(&bom, 2); }
    arcOut.WriteString(wUnicodeBuff);
 
    nBOM++;
}

That should work. If the result is not as expected, check your other involved functions like arcOut.WriteString(), if the BOM is correct, and if your input file is really encoded with the code page nlanguageCodePage.

[EDIT]
Another possible source may be the arcOut.WriteString() call when it converts the Unicode string back to ANSI. You may then use a binary write instead:
C++
int len = MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, wUnicodeBuff, NMLANG_MaxNBuf);

// bom at starting
if (nBOM == 0) { arcOut.Write(&bom, 2); }
if (len > 0)
    arcOut.Write(wUnicodeBuff, len * sizeof(WCHAR));

nBOM++;

[/EDIT]
 
Share this answer
 
v3
Comments
Member 12677926 16-Nov-16 5:34am    
still not working ..please can you do one sample and let me know if it works .please
Member 12677926 16-Nov-16 5:34am    
thanks...and give any samples
Jochen Arndt 16-Nov-16 6:01am    
You should give a more detailed problem description ("not working as expected" does not tell others anything).

I have used MultiByteToWideChar quite often and never had problems (but not with Hebrew so far). So I expect the error source somewhere else.

Is MultiByteToWideChar() returning an error or is the file content not as expected?

What is the value of nlanguageCodePage?

How is WriteString() defined? Is it a library function or written by you?
I'm asking this because that function might convert the passed string back to ANSI using the current code page. If so, use a binary write:
arcOut.Write(wUnicodeBuf, length_returned_by_MultiByteToWideChar);
I will update my answer regarding this.

Finally you may give a short example (a Hebrew text line, the corresponding hex dump, and the hex dump from the output file).
Member 12677926 16-Nov-16 6:06am    
i ported to Unicode translator ...from ANSI ... .before how they are using was they will keep locale as hebrew and that is showing characters different and Unicode file is showing different text.
Member 12677926 16-Nov-16 6:10am    
i have a ansi file in that some hebrew characters are there before ..... as shown below

ID_TEST ="כיצד מתחברים ניתן לקבוע חריץ באם נחוץ"

when i converted to Unicode its showing same characters as ANSI means above both ANSI and unicode characters are same ... but when before when they kept locale as hebrew it showing diffrent... hebrew as locale and convrted string from ansi to Unicode is not same ....

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900