Arabic words won't convert to byte?

Question

1.10/5 (3 votes)

See more:

Hi, I didn't tell you that I'm an Iraqi and ofcourse I write some arabic in my application.
My application is chat server-client. As you know that before TCP sending we should translate string to array of bytes.

C#

byte[] DataBytes = new byte[Message.Length + ("<EOP>").Length + 1];
for (int x = 0; x < Message.Length; x++)
{
    DataBytes[x] = Convert.ToByte(Message[x]);
}

DataBytes[Message.Length + 1] = Convert.ToByte('<');
DataBytes[Message.Length + 2] = Convert.ToByte('E');
DataBytes[Message.Length + 3] = Convert.ToByte('O');
DataBytes[Message.Length + 4] = Convert.ToByte('P');
DataBytes[Message.Length + 5] = Convert.ToByte('>');

clientSocket.Send(DataBytes);

When I wrote an English word (like: Hello) and send it... sending is succeeded.
But When I wrote an Arabic word (like: هلو) and send it... an error occurred tells me that "Value was either too large or too small for an unsigned byte".

What this mean???
And how to solve it???

Posted 10-Jan-13 9:11am

Rasool Ahmed

Updated 2-Jun-18 9:38am

Add a Solution

Comments

Sergey Alexandrovich Kryukov 10-Jan-13 16:37pm

Do you understand that none of Arabic characters can fit in just one byte? Not just your code makes no sense, but the whole idea is wrong.
First of all, get some idea on what Unicode is...
—SA

2 solutions

Solution 1

You can try this:

C#

string s = ... // Your string source here
byte[] bytes = System.Text.Encoding.ASCII.GetBytes(s);

But I don't guarantee it. It may work if you use a different encoding.

Posted 10-Jan-13 9:14am

OriginalGriff

Comments

[no name] 10-Jan-13 15:21pm

Body, I have tried it but the when the server receive it... it look likes this "???" instead of this "هلو".

OriginalGriff 10-Jan-13 15:29pm

Did you use the reverse process to convert it back? Remember, bytes are only 8 bit quantities, Unicode characters are (generally) 16 or 32 bits, but a charcater can be spread over several "codepoints" (Wiki can help you if you want to understand this)
So the characters have to be "translated" to bytes and then back again to reassemble the original input.

[no name] 10-Jan-13 15:49pm

I have reverse the process in the other side but the result is "???" instead of above word.

OriginalGriff 10-Jan-13 15:54pm

What code did you use to reverse the process?
Did you try other encoding?

Sergey Alexandrovich Kryukov 10-Jan-13 16:44pm

Not exactly. A character can of course spread across several "words", such as 8-bit (in UTF-8) or 16-bit (UTF-16 surrogate pairs) ones, but it is still one code point; this term is used for abstract (not related to machine representation) mathematical ordering number, one-to-one corresponding to a character (also as abstracted cultural notion, not related to glyph graphics, etc.)

(Also, I'm not talking about the combined diacritical, which is a combination of characters (and hence code point). As far as I remember, this is not the case for Arabic characters.)

—SA

[no name] 10-Jan-13 17:33pm

look at my solution, I have used this in the past

[no name] 10-Jan-13 15:57pm

I have used this:
byte[] bytes; //which comes from the client.
string s = System.Text.Encoding.ASCII.GetString(bytes);

Sergey Alexandrovich Kryukov 10-Jan-13 16:48pm

What are you doing?! Forget about ASCII! You are working with Arabic; where none of the characters fit in ASCII, which is also 8-bit.
All Arabic characters need 16 bits per character!

Forget ASCII forever, it's gone. Even not all English characters fit into ASCII, only alphabet and part of punctuation; some punctuation is already beyond ASCII, such as '—' or '«»'...

You need to use only some UTF, most usually UTF-8, but all UTFs are equivalent. Unlike ASCII.

Honestly. This is you who are from Iraq, why should I explain to you how Arabic writing system works in computers? You are supposed to know better then I.

—SA

[no name] 10-Jan-13 17:23pm

byte[] bytes = Encoding.UTF32.GetBytes(iString);

Philippe Mori 10-Jan-13 21:37pm

You have the proper idea but ASCII encoding is plainly wrong. I would recommand UTF-8 encoding. At the other end, you do the reverse decoding and you should get the original string.

UTF-8 is becoming the standard encoding on the web for example. You can find information on it on Wikipédia.

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CafedeJamaica · Accepted Answer · 2013-01-10T11:24:00

Solution 2

try this

using System.Text;

byte[] bytes = Encoding.UTF32.GetBytes(iString);

Posted 10-Jan-13 11:24am

CafedeJamaica

Updated 10-Jan-13 11:25am

v2

Comments

[no name] 11-Jan-13 3:41am

Yes, It works.
Thank you guys all of you.