Click here to Skip to main content
15,886,689 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello,

I want to extract Nowrgerian lanaguage text available from below websites:

Edit: Rohan Leuva
[Links to website removed]


I have prepared a small windows application, with below code:

WebClient web = new WebClient();
System.IO.Stream stream = web.OpenRead("http://www.xyz.com/");

using (System.IO.StreamReader reader = new System.IO.StreamReader(stream, Encoding.Default, true))
{
text = reader.ReadToEnd();
}

richTextBox1.Text = text;
===============================================================

Now in above case I am not able to get all special characters correctly that appear in Norwgerian, even by using Encoding.Default.

Norwgerian special characters:

«
»

å
æ
é
ø
à
æ


Kindly suggest how should I move ahead. Also, my ultitmate object is to get plain text (without any tags) from these website, any additional support in this matter will also help.

Thanks in advance for sparing time and reading my issue. Hope to get some suitable solution.

Regards,
Ankit
Posted
Updated 24-May-15 21:21pm
v2
Comments
Thanks7872 25-May-15 3:22am    
Why you mentioned site names? We don't need them at all.
Kenneth Haugland 25-May-15 4:13am    
You consistently spell Norwegian incorrectly, just for the record. There are just three chars that is special to this language, namely æ (Æ),ø (Ø) and å (Å). The « is the start quotation mark of a quote, and » is en quotation mark. You might use " " instead. The rest of the letters are used all over Europe.
http://en.wikipedia.org/wiki/Apostrophe

1 solution

Try Encoding.UTF8 or Encoding.Unicode :
C#
using (System.IO.StreamReader reader = new System.IO.StreamReader(stream, Encoding.UTF8, true))
{
text = reader.ReadToEnd();
}
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900