To get your input, you may or may not need to parse HTML (maybe, you generate HTML from data, then, instead of parsing, you would need to generate both HTML and PDF). Ideally, you HTML should be well-formed as XML, then you could parse it using one of .NET XML parsers. Not all Web pages are like that, unfortunately, so you may need HTML parser which does not require well-formed XML compliance. Try this one:
http://www.majestic12.co.uk/projects/html_parser.php[
^].
To work with PDF, use iText, or its .NET port, iTextSharp:
http://en.wikipedia.org/wiki/IText[
^],
http://itextpdf.com/[
^],
http://sourceforge.net/projects/itextsharp/[
^].
In included the reference to Java iText site as well, because most documentation is there. If you understand C#, it would not be difficult to understand Java-bases API documentation.
—SA