Click here to Skip to main content
15,886,693 members
Articles / Web Development / HTML
Alternative
Tip/Trick

Remove all the HTML tags and display a plain text only inside (in case XML is not well formed)

Rate me:
Please Sign up or sign in to vote.
5.00/5 (2 votes)
15 Feb 2012CPOL 11.5K   2  
I think the following Regex and HtmlDecode would do:string html = ...;string textonly = HttpUtility.HtmlDecode( Regex.Replace(html, @"|", ""));Any HTML construct that would not be stripped off properly by this?
I think the following Regex and HtmlDecode would do:

C#
string html = ...;
string textonly = HttpUtility.HtmlDecode(
         Regex.Replace(html, @"<!--[\S\s]*?-->|<(?:"".*?""|'.*?'|[\S\s])*?>", ""));


Any HTML construct that would not be stripped off properly by this?

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Founder eXternSoft GmbH
Switzerland Switzerland
I feel comfortable on a variety of systems (UNIX, Windows, cross-compiled embedded systems, etc.) in a variety of languages, environments, and tools.
I have a particular affinity to computer language analysis, testing, as well as quality management.

More information about what I do for a living can be found at my LinkedIn Profile and on my company's web page (German only).

Comments and Discussions

 
-- There are no messages in this forum --