Context | Knowldege Center Site | Need | Allow for pages with predefined styles to be added | Feature | Support a template engine in combination with a set of stylesheets | Requirement | Implement an editor for manipulating text, images, and tables without interfering with stylesheets' fonts, colors, alignments, and overall structure. |
There are a number of available WYSIWYG editors, such as CKEditor[^] and TinyMCE[^]. However, with their full spectrum of functionality, they are heavy-weight and detract from the predictability of the content layout. While a much simpler tool is required, it seems uneffective to implement one from scratch.
It was decided to use the ExtJS Ext.form.HtmlEditor[^] component as the system already uses the ExtJS framework. But in order for this to work, the component had to be configured correctly and extended to handle:
The copy and paste problem turn out to be a little tricky. There are a lot of bits and pieces of information on the web that, when collected, seemed random (and inconsitent); e.g.,
cleanWordHtml: function(html){
html = html.replace(/<!--[\s\S]*-->/g,"");
html = html.replace(/<o:p>\s*<\/o:p>/g, "") ;
html = html.replace(/<o:p>.*?<\/o:p>/g, " ") ;
html = html.replace( /\s*mso-[^:]+:[^;"]+;?/gi, "" ) ;
html = html.replace( /\s*MARGIN: 0cm 0cm 0pt\s*;/gi, "" ) ;
html = html.replace( /\s*MARGIN: 0cm 0cm 0pt\s*"/gi, "\"" ) ;
html = html.replace( /\s*TEXT-INDENT: 0cm\s*;/gi, "" ) ;
html = html.replace( /\s*TEXT-INDENT: 0cm\s*"/gi, "\"" ) ;
html = html.replace( /\s*TEXT-ALIGN: [^\s;]+;?"/gi, "\"" ) ;
html = html.replace( /\s*PAGE-BREAK-BEFORE: [^\s;]+;?"/gi, "\"" ) ;
html = html.replace( /\s*FONT-VARIANT: [^\s;]+;?"/gi, "\"" ) ;
html = html.replace( /\s*tab-stops:[^;"]*;?/gi, "" ) ;
html = html.replace( /\s*tab-stops:[^"]*/gi, "" ) ;
html = html.replace( /\s*face="[^"]*"/gi, "" ) ;
html = html.replace( /\s*face=[^ >]*/gi, "" ) ;
html = html.replace( /\s*FONT-FAMILY:[^;"]*;?/gi, "" ) ;
html = html.replace(/<(\w[^>]*) class=([^ |>]*)([^>]*)/gi, "<$1$3") ;
html = html.replace( /<(\w[^>]*) style="([^\"]*)"([^>]*)/gi, "<$1$3" ) ;
html = html.replace( /\s*style="\s*"/gi, '' ) ;
html = html.replace( /<SPAN\s*[^>]*>\s* \s*<\/SPAN>/gi, ' ' ) ;
html = html.replace( /<SPAN\s*[^>]*><\/SPAN>/gi, '' ) ;
html = html.replace(/<(\w[^>]*) lang=([^ |>]*)([^>]*)/gi, "<$1$3") ;
html = html.replace( /<SPAN\s*>(.*?)<\/SPAN>/gi, '$1' ) ;
html = html.replace( /<FONT\s*>(.*?)<\/FONT>/gi, '$1' ) ;
html = html.replace(/<\\?\?xml[^>]*>/gi, "") ;
html = html.replace(/<\/?\w+:[^>]*>/gi, "") ;
html = html.replace( /<H\d>\s*<\/H\d>/gi, '' ) ;
html = html.replace( /<H1([^>]*)>/gi, '' ) ;
html = html.replace( /<H2([^>]*)>/gi, '' ) ;
html = html.replace( /<H3([^>]*)>/gi, '' ) ;
html = html.replace( /<H4([^>]*)>/gi, '' ) ;
html = html.replace( /<H5([^>]*)>/gi, '' ) ;
html = html.replace( /<H6([^>]*)>/gi, '' ) ;
html = html.replace( /<\/H\d>/gi, '<br>' ) ;
html = html.replace( /<(U|I|STRIKE)> <\/\1>/g, ' ' ) ;
html = html.replace( /<(B|b)> <\/\b|B>/g, '' ) ;
html = html.replace( /<([^\s>]+)[^>]*>\s*<\/\1>/g, '' ) ;
html = html.replace( /<([^\s>]+)[^>]*>\s*<\/\1>/g, '' ) ;
html = html.replace( /<([^\s>]+)[^>]*>\s*<\/\1>/g, '' ) ;
html = html.replace( /(<P)([^>]*>.*?)(<\/P>)/gi, "<div$2</div>" ) ;
html = html.replace( /(<font|<FONT)([^*>]*>.*?)(<\/FONT>|<\/font>)/gi, "<div$2</div>") ;
html = html.replace( /size|SIZE = ([\d]{1})/g, '' ) ;
html = html.replace(/<!--(\w|\W)+?-->/gi, '');
html = html.replace(/<title>(\w|\W)+?<\/title>/gi, '');
html = html.replace(/\s?class=\w+/gi, '');
html = html.replace(/\s+style='[^']+'/gi, '');
html = html.replace(/<(meta|link|\/?o:|\/?style|\/?div|\/?st\d|\/?head|\/?html|body|\/?body|\/?span|!\[)[^>]*?>/gi, '');
html = html.replace(/(<[^>]+>)+ (<\/\w+>)+/gi, '');
html = html.replace(/\s+v:\w+=""[^""]+""/gi, '');
html = html.replace(/(\n\r){2,}/gi, '');
// // http://www.tim-jarrett.com/labs_javascript_scrub_word.php
html = html.replace(new RegExp(String.fromCharCode(8220), 'gi'), '"'); //"
html = html.replace(new RegExp(String.fromCharCode(8221), 'gi'), '"');
html = html.replace(new RegExp(String.fromCharCode(8216), 'gi'), "'");
html = html.replace(new RegExp(String.fromCharCode(8217), 'gi'), "'");
html = html.replace(new RegExp(String.fromCharCode(8211), 'gi'), "-");
html = html.replace(new RegExp(String.fromCharCode(8212), 'gi'), "--");
html = html.replace(new RegExp(String.fromCharCode(189), 'gi'), "1/2");
html = html.replace(new RegExp(String.fromCharCode(188), 'gi'), "1/4");
html = html.replace(new RegExp(String.fromCharCode(190), 'gi'), "3/4");
html = html.replace(new RegExp(String.fromCharCode(169), 'gi'), "(C)");
html = html.replace(new RegExp(String.fromCharCode(174), 'gi'), "(R)");
html = html.replace(new RegExp(String.fromCharCode(8230), 'gi'), "...");
return html;
}
I ended up refining and splitting the set into tag replacement and character replacement sets that work in the Ext HtmlEditor component (and do not interfere with its tags).
dirtyHtmlTags: [
{regex: /<!--[\s\S]*?-->/gi, replaceVal: ""},
{regex: /<\\?\?xml[^>]*>/gi, replaceVal: ""},
{regex: /<\/?\w+:[^>]*>/gi, replaceVal: ""},
{regex: /\s*MSO[-:][^;"']*/gi, replaceVal: ""},
{regex: /\s*MARGIN[-:][^;"']*/gi, replaceVal: ""},
{regex: /\s*PAGE[-:][^;"']*/gi, replaceVal: ""},
{regex: /\s*TAB[-:][^;"']*/gi, replaceVal: ""},
{regex: /\s*LINE[-:][^;"']*/gi, replaceVal: ""},
{regex: /\s*FONT-SIZE[^;"']*/gi, replaceVal: ""},
{regex: /\s*LANG=(["'])[^"']*?\1/gi, replaceVal: ""},
{regex: /<(P|H\d)[^>]*>([\s\S]*?)<\/\1>/gi, replaceVal: "$2"},
{regex: /\s*\w+=(["'])(( |\s|;)*|\s*;+[^"']*?|[^"']*?;{2,})\1/gi, replaceVal: ""},
{regex: /<span[^>]*>( |\s)*<\/span>/gi, replaceVal: ""},
//{regex: /<([^\s>]+)[^>]*>( |\s)*<\/\1>/gi, replaceVal: ""},
// http://www.codinghorror.com/blog/2006/01/cleaning-words-nasty-html.html
{regex: /<(\/?title|\/?meta|\/?style|\/?st\d|\/?head|\/?html|\/?body|!\[)[^>]*?>/gi, replaceVal: ""},
{regex: /(\n(\r)?){2,}/gi, replaceVal: ""}
],
cleanHtml: function(html) {
if (!html) return;
Ext.each(this.dirtyHtmlTags, function(tag, idx){
html = html.replace(tag.regex, tag.replaceVal);
});
// http://www.tim-jarrett.com/labs_javascript_scrub_word.php
html = html.replace(new RegExp(String.fromCharCode(8220), 'gi'), '"'); //"
html = html.replace(new RegExp(String.fromCharCode(8221), 'gi'), '"');
html = html.replace(new RegExp(String.fromCharCode(8216), 'gi'), "'");
html = html.replace(new RegExp(String.fromCharCode(8217), 'gi'), "'");
html = html.replace(new RegExp(String.fromCharCode(8211), 'gi'), "-");
html = html.replace(new RegExp(String.fromCharCode(8212), 'gi'), "--");
html = html.replace(new RegExp(String.fromCharCode(189), 'gi'), "1/2");
html = html.replace(new RegExp(String.fromCharCode(188), 'gi'), "1/4");
html = html.replace(new RegExp(String.fromCharCode(190), 'gi'), "3/4");
html = html.replace(new RegExp(String.fromCharCode(169), 'gi'), "(C)");
html = html.replace(new RegExp(String.fromCharCode(174), 'gi'), "(R)");
html = html.replace(new RegExp(String.fromCharCode(8230), 'gi'), "...");
return Ext.ux.form.HtmlLintEditor.superclass.cleanHtml.call(this, html);
}
These regular expressions seem to cover most cases and have been successfully tested for the purposes of the project I worked on. I thought I might share the info.
See Also:
Introduction to Ranges[^]
Intercepting the Clipboard data on Paste[^]
|