Introduction
Before I go into details, I want you to know what EfTidy
actually is. EfTidy
is a wrapper component of Tidy
library, and if you don't know what Tidy
is, here is a little description:
"TidyLib
is an open source utility for tidying up HTML. Tidy
is composed from an HTML parser and an HTML pretty printer. The parser goes to considerable lengths to correct common markup errors. It also provides advice on how to make your pages more accessible to people with disabilities, and can be used to convert HTML content into XML as XHTML. Tidy
is W3C open source and available free. It has been successfully compiled on a large number of platforms, and is being integrated into many HTML authoring tools."
- By Mr. Dave Raggett
This is the .NET version of the EfTidyCom
component (also present on The Code Project). Before moving further, this library is dedicated to the memory of my mother Late Mrs. Saroj Gupta, whom I lost recently (29th January, 2008), just want to say Mummy!, I love you.
I have had a lot of demand to provide the .NET version of EfTidyCom
library as COM is losing focus and .NET seems to be the future. This library is written in VC++.NET (by mixing managed and unmanaged code). Please find a reference and test cases in this article. Thanks and just pray for my mother that she live happy wherever she is.
This is also an updated version of EfTidyCom
. Some features (Node
and Attribute
classes) have been removed as I think they are not of much use!
Library Reference
EfTidy
contains two classes:
TidyNetOpt
[under EfTidyNet
namespace] TidyNet
[under EfTidyNet::EfTidyOpt
namespace]
EfTidy
also contains four enumerations:
ECharEncodingType
EOutputType
EIndentScheme
EDoctypeModes
Now, let's take each interface one by one.
1. TidyNet
First, let's check out each and every method or property present in this interface
, and the functions they perform:
Property/Method name | Parameters | Get/Put | Description |
TidyFiletoMem | const String^ SFileName , String^ % SResult | n/a | Write output to memory. |
TidyFileToFile | const String^ SsourceFileName , const String^ SDestFile | n/a | Write output in file. |
TidyMemToMem | String^ SsourceData , String^ % SResult | n/a | Write output to memory. |
TidyMemtoFile | String^ SBuffer , String^ SDestFile | n/a | Take input as buffer and output in file. |
TotalWarnings | long %pVal | Get | Return the total number of warnings after the above four operations. |
TotalErrors | long %pVal | Get | Return the total number of errors after the above four operations. |
ErrorWarning | void | String^ | Return the buffer, which contains human readable errors/ warnings. |
Option | void | EfTidyOpt:: TidyNetOpt^ | Set the Option for the Tidy library. |
2. TidyNetOpt
Here is a list of properties and methods for the ItidyOption interface
:
Property/Method name | Parameter | Get/Put | Description |
LoadConfigFile | String^ | n/a | Load option settings from a configuration file. |
ResetToDefaultValue | Void | n/a | Reset options to default settings. |
Doctype | String^ | Both | Doctype declaration generated by Tidy . |
TidyMark | BOOL | Both | For meta element indicating tidied doc. |
HideEndTag | BOOL | Both | Suppress optional end tags. |
EncloseText | BOOL | Both | If yes, text in the body is wrapped in <p> . |
EncloseBlockText | BOOL | Both | If yes, text in blocks is wrapped in <p> |
LogicalEmphasis | BOOL | Both | Replace i by em and b by strong . |
DefaultAltText | String^ | Both | Default text for alt attribute. |
Clean | BOOL | Both | Replace presentational clutter by style rules. |
DropFontTags | BOOL | Both | Discard presentation tags. |
DropEmptyParas | BOOL | Both | Discard empty p elements. |
Word2000 | BOOL | Both | Both draconian cleaning for Word2000. |
FixBadComment | BOOL | Both | Both fix comments with adjacent hyphens. |
FixBackslash | BOOL | Both | Both fix URLs by replacing \ with / . |
NewEmptyTags | String^ | Both | Declared empty tags. |
NewInlineTags | String^ | Both | Declared inline tags. |
NewBlockLevelTags | String^ | Both | Declared block tags. |
NewPreTags | String^ | Both | Declared pre tags. |
OutputType | EOutputType | Both | You can set the output type from here, like you can get the output as XML, XHTML or pure HTML. |
InputAsXML | BOOL | Both | Treat input as XML. |
ADDXmlDecl | BOOL | Both | Add >?xml ?< for XML docs. |
AddXmlSpace | BOOL | Both | If set to yes, adds XML: space attr as needed. |
Bare | BOOL | Both | Make bare HTML. |
AssumeXmlProcins | BOOL | Both | If set to yes, PIs must end with ?>. |
CharEncoding | ECharEncodingType | Both | Set/Get in/out character encoding. |
InCharEncoding | ECharEncodingType | Both | Input character encoding (if different). |
OutCharEncoding | ECharEncodingType | Both | Output character encoding (if different). |
NumericsEntities | BOOL | Both | Use numeric entities for symbols. |
QuoteMarks | BOOL | Both | Output " marks as " . |
QuoteNBSP | BOOL | Both | Both output non-breaking space as entity. |
QuoteAmpersand | BOOL | Both | Output naked ampersand as & . |
OutputTagInUpperCase | BOOL | Both | Output tags in upper not lower case. |
OutputAttrInUpperCase | BOOL | Both | Output attributes in upper not lower case. |
WrapScriptlets | BOOL | Both | Wrap within JavaScript string literals. |
WrapAttVals | BOOL | Both | Wrap within attribute values. |
WrapSection | BOOL | Both | Wrap within section tags. |
WrapAsp | BOOL | Both | Wrap within ASP pseudo elements. |
WrapJste | BOOL | Both | Wrap within JSTE pseudo elements. |
WrapPhp | BOOL | Both | Wrap within PHP pseudo elements. |
Indent | EIndentScheme | Both | Indent the content of appropriate tags. |
IndentSpace | long | Both | Indentation of n spaces. |
WrapLen | long | Both | Set wrap margin for output. |
TabSize | long | Both | Expand tabs to n spaces. |
IndentAttributes | long | Both | New-line + indent before each attribute. |
BreakBeforeBR | BOOL | Both | Output new-line before or not. |
LiteralAttribs | BOOL | Both | If true , attributes may use new-lines. |
MarkUp | BOOL | Both | |
ShowWarnings | BOOL | Both | On/Off |
Quiet | BOOL | Both | No 'Parsing X', guessed DTD or summary. |
KeepTime | BOOL | Both | If yes, last modified time is preserved. |
ErrorFile | String^ | Both | File name to write errors to. |
GnuEmacs | BOOL | Both | If true , format error output for GNU Emacs |
FixUrl | BOOL | Both | Applies URI encoding if necessary. |
BodyOnly | BOOL | Both | Output BODY content only. |
HideComments | BOOL | Both | Hides all (real) comments in output. |
DoctypeMode | EDoctypeModes | Both | Sets the doctype mode for output. |
Using the Code
I have used the Test.htm (included with the project) to test EfTidyNet
responses. Here is what Test.htm contains:
<html>
<head><title>tidy Library</title></head>
<body>
<blockquote>
<p> </p> --(1)
<p><fontsize="5"color=
"#FF00FF">TidyLibrary</font></p>
</blockquote>
<P><p><fontsize="5"color="#FF00FF"></font></p>
<table border="1" cellpadding="0" cellspacing="0"
style="border-collapse: collapse"
bordercolor="#111111" width="100%" id="AutoNumber1">
<tr>
<td width="50%" style="border-left-style: solid;
border-left-width: 1; border-right-style: none;
border-right-width: medium; border-top-style: solid;
border-top-width: 1; border-bottom-style:
none; border-bottom-width: medium"> --(2)
</td>
<td width="50%" style="border-left-style: none;
border-left-width: medium; border-right-style:solid;
border-right-width: 1; border-top-style: solid;
border-top-width: 1;border-bottom-style: none;
border-bottom-width: medium">
</td>
</tr>
</table>
<b>Tidy --- (3)
</h1> <tidy> ---(4)
</body>
</html>
In test.htm, I have added the following mistakes:
- A dummy
<Tidy>
tag at (4) - Missing
<h1>
tag at (4) - Empty para
<p>
tag (1) - Un-closed
<b>
tag at (3)
Test Case # 1 using TidyNet
First, create an object of our component. Here is a listing of how to achieve that:
TidyNet objTidyNet = new TidyNet();
Now, clean the test.htm file using this object. The code listing for that is given below:
private void button1_Click(object sender, EventArgs e)
{
int iTotalWarn = 0,iTotalErrs = 0;
String SReturnData ="";
String SError = "";
TidyNet objTidyNet = new TidyNet();
objTidyNet.TidyFiletoMem("C:\\MyProjects\\Test\\hello.htm",
ref SReturnData);
objTidyNet.TotalWarnings(ref iTotalWarn);
SError = objTidyNet.ErrorWarning();
objTidyNet.TotalErrors(ref iTotalErrs);
}
And here is the result produced by Tidy
listing showing what test1.htm (created by EfTidyNet
) contains:
<html>
<head>
<meta name="generator"
content="HTML Tidy for Windows (vers 1st September 2004),
see www.w3.org">
<title>tidy Library</title>
</head>
<body>
<blockquote>
<p> </p>
<p><font size="5" color="#FF00FF">Tidy Library</font>
</p>
</blockquote>
<p><font size="5" color= "#FF00FF"> </font></p>
<table border="1" cellpadding="0" cellspacing="0"
style= "border-collapse: collapse" bordercolor="#111111"
width="100%" id= "AutoNumber1">
<tr>
<td width="50%" style= "border-left-style: solid;
border-left-width: 1; border-right-style: none;
border-right-width: medium; border-top-style: solid;
border-top-width: 1; border-bottom-style: none;
border-bottom-width: medium">
</td>
<td width="50%"
style= "border-left-style: none;border-left-width: medium;
border-right-style: solid; border-right-width: 1;
border-top-style: solid; border-top-width: 1;
border-bottom-style: none;border-bottom-width: medium">
</td>
</tr>
</table>
<b>Tidy</b> --(1)
</body>
</html>
If you see the above cleaned HTML page - the dummy <tidy>
tag and the </h1>
have been removed near (1), and </b>
is added after Tidy
at (1).
Here is a summary of the errors/warnings produced by EfTidyNet
, showing you the details of each action it has performed:
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 22 column 10 - Warning: discarding unexpected </h1>
line 23 column 1 - Error: <tidy> is not recognized!
line 23 column 1 - Warning: discarding unexpected <tidy>
line 15 column 1 - Warning: <table> proprietary attribute
"bordercolor"
line 15 column 1 - Warning: <table> lacks "summary" attribute
Info: Document content looks like HTML Proprietary
5 warnings, 1 error were found!
Test Case # 2 using TidyNet with TidyNetOpt
Now, apply some options to Test.htm to get the custom output. So, I am using these options:
Clean =TRUE
(to make separate class for style) DoctypeMode = DoctypeUser
(to enable display string
) Doctype = "Ef Tidy library"
(display string
) OutputType = XhtmlOut
(output type) NewInlineTags = "tidy"
(Make our dummy <tidy>
tag legal)
Here is the code listing to achieve the above:
private void TestCase2_Click(object sender, EventArgs e)
{
int iTotalWarn = 0, iTotalErrs = 0;
String SReturnData = "";
String SError = "";
TidyNet objTidyNet = new TidyNet();
objTidyNet.Option.Clean(true);
objTidyNet.Option.NewInlineTags("tidy");
objTidyNet.Option.OutputType(EfTidyNet.EfTidyOpt.EOutputType.XhtmlOut);
objTidyNet.Option.DoctypeMode(EfTidyNet.EfTidyOpt.EDoctypeModes.DoctypeUser);
objTidyNet.Option.Doctype("Ef Tidy Library");
objTidyNet.TidyFiletoMem("C:\\MyProjects\\Test\\hello.htm", ref SReturnData);
objTidyNet.TotalWarnings(ref iTotalWarn);
SError = objTidyNet.ErrorWarning();
objTidyNet.TotalErrors(ref iTotalErrs);
}
And here is the result produced by Tidy
listing showing what test1.htm (created by EfTidyNet
) contains after applying our options:
<!DOCTYPE html PUBLIC "Ef Tidy library" ""> --(1)
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Windows (vers 1st September 2004),
see www.w3.org" />
<title>tidy Library</title>
<style type="text/css"> --(2)
table.c4 {border-collapse: collapse}
td.c3 {border-left-style: none;
border-left-width: medium; border-right-style: solid;
border-right-width: 1; border-top-style: solid;
border-top-width: 1;
border-bottom-style: none; border-bottom-width: medium}
td.c2 {border-left-style: solid; border-left-width: 1;
border-right-style: none;
border-right-width: medium; border-top-style: solid;
border-top-width: 1;
border-bottom-style: none; border-bottom-width: medium}
h2.c1 {color: #FF00FF}
</style>
</head>
<body>
<blockquote>
<p> </p>
<h2 class="c1">Tidy Library</h2>
</blockquote>
<h2 class="c1">
</h2>
<table border="1" cellpadding="0" cellspacing="0" class="c4"
bordercolor="#111111" width="100%" id="AutoNumber1">
<tr>
<td width="50%" class="c2"> </td> ----(3)
<td width="50%" class="c3"> </td>
</tr>
</table>
<b>Tidy <tidy></tidy></b> ----(4)
</body>
</html>
Now, let us see what Tidy
cleans for us:
- In (1), our custom
string
"Ef Tidy Library
" is visible. - In (2) and (3), the styles are cleaned and a class is created for that.
- In (4), our
<Tidy>
tag gets legal, though it does nothing in the actual HTML page.
Here is a summary of all the errors/warnings:
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 22 column 10- Warning: discarding unexpected </h1>
line 23 column 1 - Warning: <tidy> is not approved by W3C
line 23 column 1 - Warning: missing </tidy> before </body>
line 22 column 2 - Warning: missing </b> before </body>
line 15 column 1 - Warning: <table> proprietary attribute
"bordercolor"
line 15 column 1 - Warning: <table> lacks "summary" attribute
Info: Document content looks like HTML Proprietary
7 warnings, 0 errors were found!
Here, all I have given is a small overview of the Tidy
library and EfTidyCom
. For more information on the Tidy
library, visit Tidy
home page.
Author Comment
I know there is much scope for improvement in this component. I promise these improvements will be there in the next version/update of the library. If you encounter any bugs, please intimate so that I could improve the code further.
Files Listed with the Project
EfTidy Version 1.0.2.0
- Source zip contains:
TidyLib
(original Tidy
library) 2009 March release source code EfTidyNet
source code with multilingual support- Source code updated for Visual Studio 2010
- Project zip contains:
- Release version of
EfTidyNet
Library - C# test project (with source)
- Test.htm
EfTidy Version 1.0.1.3
- Source zip contains:
TidyLib
(original Tidy
library) 2009 March release source code EfTidyNet
source code with multilanguage support
- Project zip contains:
- Release version of
EfTidyNet
Library - C# test project (with source)
- Test.htm
EfTidy Version 1.0.1.2 (Latest)
- Source zip contains:
TidyLib
(original Tidy
library) 2008 release source code EfTidyNet
source code with multilanguage support - Thanks to Wingogo and megger83 for bug reporting!
- Project zip contains:
- Release version of
EfTidyNet
Library
EfTidy Version 1.0.1.1
- Source zip contains:
TidyLib
(original Tidy
library) 2008 release source code EfTidyNet
source code with multilanguage support EfTidyNetx64
version by Spike!
- Project zip contains:
- Release version of
EfTidyNet
Library - C# test project (with source)
- Test.htm
EfTidy Version 1.0
- Source zip contains:
TidyLib
(original Tidy
library) source code EfTidyNet
source code
- Project zip contains:
- Release version of
EfTidyNet
library - C# Test project (with source)
- Test.htm
Special Thanks
- Mr. Saurabh Gupta [Director Efextra eSolutions Pvt. Ltd.]
- Mr Spike! for creating X64 version of
EfTidyNet
- Tidy SourceForge group for
Tidy
library
Update History
- 06 September 2013: EfTidyNet version 1.0.2.0
- 20 July, 2009: EfTidyNet version 1.0.1.3
- 23rd June, 2008:
EfTidyNet
version 1.0.1.2 - 5th March, 2008:
EfTidyNet
version 1.0.1.1 - 15th February, 2008:
EfTidyNet
version 1.0