Click here to Skip to main content
15,887,214 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Which data type should i use to read chinese and english characters from a stream?

Should i use Byte or Char?
Posted
Comments
Lakamraju Raghuram 22-Mar-12 6:59am    
what is the prog lang you are speaking off
Balaji1982 22-Mar-12 7:06am    
c#

Not byte! How can it be? You should use System.String which is the Unicode string, so it supports most languages at the same time. When you communicate through the network or any other kind of stream, all the text data is converted to/from the array of bytes anyway, but each character takes different number of bytes, 1 to 4, because Unicode supports code points in the range 0 to 0x10FFF. The particular presentation depends on UTF used for serialization. Internally, in memory, .NET (and Windows itself) uses UTF-16LE, where each character takes a 2-byte words or two such words called surrogate pairs, which is needed for characters beyond Base Multilingual Plane (BMP) which takes first 00 to 0xFFFF code points (excluding special ranges reserved for surrogates themselves).

All UTFs are equivalent. Despite their names showing number of bits, they all support all code points. In the files, there are usually detected by the BOM. Please see:

http://en.wikipedia.org/wiki/Unicode/[^],
http://en.wikipedia.org/wiki/Code_point/[^],
http://en.wikipedia.org/wiki/Byte_order_mark/[^],

http://unicode.org/[^],
http://unicode.org/faq/utf_bom.html[^].

[EDIT]

In memory, you always work with strings. When you need to pass the via network or persist it in the file, you choose some encoding which presents the text in the form of array if characters and visa versa. You need to choose only one of UTFs. Prefer UTF-8. To do it directly, use the class System.Text.Encoding or/and its derived classes for every particular encoding. Please see:
http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx[^].

You can directly use the methods GetBytes (text to array of bytes) and GetChars (to get Unicode characters).

For example, to get a string from array of bytes:
C#
byte[] data = //let's say, received from network...

//...
string value = new string(System.Text.Encoding.UTF8.ToChars(data));


—SA
 
Share this answer
 
v2
Comments
Espen Harlinn 22-Mar-12 13:38pm    
Good reply :-D
Sergey Alexandrovich Kryukov 22-Mar-12 13:39pm    
Thank you, Espen.
--SA
ProEnggSoft 22-Mar-12 14:20pm    
Good explanation and links. 5!
Sergey Alexandrovich Kryukov 22-Mar-12 14:28pm    
Thank you.
--SA
Balaji1982 24-Mar-12 2:54am    
Hi Sa,

Thanks for the info. If i have a byte array with english and chinese characters in it, how do i preserve the encoding and at the same time extract the important information from the array?

Can i convert it into char array and then into a string or is there any other way to do this safely?
Hi,


Nvarchar in database side for inserting data and in front end while passing parameter u need to add Nbefore the value!

verify the below link!

http://forums.asp.net/t/1427585.aspx/1?C+datatype+for+all+world+languages[^]

Happy coding!!!
 
Share this answer
 
Comments
Uday P.Singh 25-Mar-12 5:54am    
where does OP asked for database insertion?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900