I think the following Regex and HtmlDecode would do:
string html = ...;
string textonly = HttpUtility.HtmlDecode(
Regex.Replace(html, @"<!--[\S\s]*?-->|<(?:"".*?""|'.*?'|[\S\s])*?>", ""));
Any HTML construct that would not be stripped off properly by this?
I feel comfortable on a variety of systems (UNIX, Windows, cross-compiled embedded systems, etc.) in a variety of languages, environments, and tools.
I have a particular affinity to computer language analysis, testing, as well as quality management.
More information about what I do for a living can be found at
my LinkedIn Profile and on my company's
web page (German only).