The Html Agility Pack is equiped with a utility class called HtmlEntity
. It has a static method with the following signature:
public static string DeEntitize(string text)
It supports well-known entities (like
) and encoded characters such as '
as well.
Once you've extracted the string from the document, use this method to convert the HTML-encoded entities back to text characters.
Don't HTML-decode the source before trying to load the document; you'll completely change the meaning of the markup.