Click here to Skip to main content
15,884,001 members
Please Sign up or sign in to vote.
4.80/5 (4 votes)
See more:
Hi guys,

I'm converting an old VB application into C#, and part of the system requires me to convert certain special characters to their ASCII equivalent.

In VB, the code is:

XML
sValue = Asc("œ")  'which gives 156

sValue = Asc("°")  'which gives 176

sValue = Asc("£")  'which gives 163


These are the correct values according to http://www.ascii-code.com/.


But when doing the same conversion in C#, the first of these values gives a strange answer.

Here is the code:

C#
As ints:

int i1 = (int)Convert.ToChar("œ");    // which gives 339

int i2 = (int)Convert.ToChar("°");    // which gives 176

int i3 = (int)Convert.ToChar("£");    // which gives 163


As bytes:

byte i1 = (byte)Convert.ToChar("œ");    // which gives 83

byte i2 = (byte)Convert.ToChar("°");    // which gives 176

byte i3 = (byte)Convert.ToChar("£");    // which gives 163


What gives?! :( I'm suspecting it's something to do with the sign bit, but I can't see what.

Many thanks
Posted
Updated 24-Sep-21 5:08am
Comments
Sergey Alexandrovich Kryukov 27-Dec-12 22:23pm    
Who told you it should be ASCII? ASCII won't work for you...
—SA

Richard is right. To get the same bytes in C# as the bytes in VB, use this:
C#
byte i1 = Encoding.Default.GetBytes("œ")[0];

The GetBytes method returns a byte array, with Encoding.Default.GetBytes("œ")[0] you get the first value of the byte array.

Hope this helps.
 
Share this answer
 
v2
Comments
Nick Fisher (Consultant) 28-Dec-12 5:45am    
Yes, this works now. Many thanks. Nick
Thomas Daniels 28-Dec-12 7:45am    
You're welcome!
Deki syahputra 19-Apr-15 21:38pm    
Many Thanks Bro.
Hello Nick,

What you refer to as being ASCII is *not* ASCII (see http://en.wikipedia.org/wiki/ASCII[^]).
Only the 7-bit ASCII character encoding is unambiguously given.

There exist several 8-bit extensions to the original 7-bit encoding.

Your page claims to list œ as being part of latin-1. But reding carefully, the page says

[...] The extended ASCII codes (character code 128-255)
There are several different variations of the 8-bit ASCII table. The table below is according to ISO 8859-1, also called ISO Latin-1. Codes 129-159 contain the Microsoft® Windows Latin-1 extended characters. [...]


Microsoft decided some years ago to "modify" the standard to fit their needs. See http://www.cs.tut.fi/~jkorpela/chars.html[^] or more specific on http://www.cs.tut.fi/~jkorpela/chars.html#win[^].

Standard Latin-1 does *not* contain œ. That is included in Latin-9 (also known as ISO/IEC-8859-15), see also ISO Latin 9 as compared with ISO Latin 1[^] and http://en.wikipedia.org/wiki/ISO/IEC_8859-15[^].

Now, how to solve your issue?
Neither latin-1 nor latin-9 works on Windows.
You need to take Encoding.GetEncoding(1252) which happens to be the same result as calling Encoding.Default (as ProgramFOX[^] described in Solution #3).

Cheers
Andi
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 27-Dec-12 22:19pm    
Exactly. This is some legacy trash called "extended ASCII". Practically, none of the modern systems support it, for a good reason.
Unicode representation of these characters should be used, that's it.
My 5.
—SA
Andreas Gieriet 27-Dec-12 22:43pm    
Hello Sergey,
thanks for your 5!
Cheers
Andi
Nick Fisher (Consultant) 28-Dec-12 5:45am    
Excellent answer, thanks. Nick
Andreas Gieriet 28-Dec-12 8:01am    
You are welcome!
Andi
Espen Harlinn 28-Dec-12 7:40am    
Good guess, a 5 :-D
Use the GetBytes[^] of the Encoding.ASCII[^] encoding to get the characters converted to ascii.

Best regards
Espen Harlinn
 
Share this answer
 
Comments
Andreas Gieriet 27-Dec-12 21:24pm    
Hello Espen,
this would remove diacritics by mapping the windows code page 1252 characters to 7-bit ASCII instead of converting to unicode encoding. See also Solution #3 and #4.
Cheers
Andi
Espen Harlinn 28-Dec-12 7:39am    
OP asked for ASCII, repeatedly ...

And as you wrote in your answer - you're doing a conversion to code page 1252, which is what OP actually needed, but it wasn't what he asked for.
Andreas Gieriet 28-Dec-12 8:06am    
Hello Espen,
I focussed more on his example code and felt that asking for ASCII ist wrong...
It's interesting though, that converting to ASCII results in removing diacritics (œ --> o) - that was new to me.
Cheers
Andi
Sergey Alexandrovich Kryukov 27-Dec-12 22:21pm    
Sorry, but won't work in this case. You probably answered formally, but did not look at the characters themselves. Please see the correct solution #4 and my comments.
(I did not vote this time.)
—SA
C# uses Unicode rather than ASCII to represent characters and strings.
 
Share this answer
 
byte i1 = (byte)Convert.ToChar("œ");

C# uses unicode and unicode of 'œ' is 339 in both cases. (byte and int)

As we know that the range of this byte is from 0-255 so it can't hold as it is unsigned in C# but unicode of "œ" character is 339 so the Unicode value is overflowing range of byte. But as we are not concerned with overflow or underflow So there exists a pattern on which overflow value is stored in byte
Range of byte = 2^8 = 256
In case of overflow : (339 - 256 = 83 )
Now 83 is storing in a byte.

There is a way to check overflow and underflow.
byte i1 = checked((byte)Convert.ToChar("œ"));
Now using checked you will get a runtime exception which is System.overflow exception.
And u know Exception handling ......!
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900