How do I...extract Nowrgerian lanaguage text (with special characters) from website correctly ?

Question

0.00/5 (No votes)

See more:

Hello,

I want to extract Nowrgerian lanaguage text available from below websites:

Edit: Rohan Leuva
[Links to website removed]

I have prepared a small windows application, with below code:

WebClient web = new WebClient();
System.IO.Stream stream = web.OpenRead("http://www.xyz.com/");

using (System.IO.StreamReader reader = new System.IO.StreamReader(stream, Encoding.Default, true))
{
text = reader.ReadToEnd();
}

richTextBox1.Text = text;
===============================================================

Now in above case I am not able to get all special characters correctly that appear in Norwgerian, even by using Encoding.Default.

Norwgerian special characters:

«
»
…
å
æ
é
ø
à
æ

Kindly suggest how should I move ahead. Also, my ultitmate object is to get plain text (without any tags) from these website, any additional support in this matter will also help.

Thanks in advance for sparing time and reading my issue. Hope to get some suitable solution.

Regards,
Ankit

Posted 24-May-15 21:09pm

ankswe

Updated 24-May-15 21:21pm

Thanks7872

v2

Add a Solution

Comments

Thanks7872 25-May-15 3:22am

Why you mentioned site names? We don't need them at all.

Kenneth Haugland 25-May-15 4:13am

You consistently spell Norwegian incorrectly, just for the record. There are just three chars that is special to this language, namely æ (Æ),ø (Ø) and å (Å). The « is the start quotation mark of a quote, and » is en quotation mark. You might use " " instead. The rest of the letters are used all over Europe.
http://en.wikipedia.org/wiki/Apostrophe

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Mehdi Gholam · Answer 1 · 2015-05-24T21:37:00

Solution 1

Try Encoding.UTF8 or Encoding.Unicode :

C#

using (System.IO.StreamReader reader = new System.IO.StreamReader(stream, Encoding.UTF8, true))
{
text = reader.ReadToEnd();
}

Posted 24-May-15 21:37pm

Mehdi Gholam