Click here to Skip to main content
15,890,438 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
how to convert word to HTML with all unicodes. Means if i have 'space' in word document after conversion should be shown me as  
Posted

Hey Prasad,

You can use HtmlEncode and then replace the white space characters using extension method.

HtmlEncode is only meant to encode characters for display in HTML. It specifically does not encode whitespace characters. You could use String.Replace() to encode the newlines and spaces (or Regex.Replace if you need better matching).
http://stackoverflow.com/questions/524528/asp-net-mvc-html-encode-new-lines[^]

I hope this helps.
 
Share this answer
 
You should not convert all white space characters to non-breaking spaces. That will result in one huge, unbreakable line per paragraph! Also, your question has nothing to do with Unicode.

If Word's default 'Save as HTML' is not sufficient for your requirements, you should write some macros which wrap each paragraph, and in-line formatted blocks, with HTML tags to reflect what you want – e.g. Heading 2 -> <h2>, Normal -> <p>, Code block -> <pre>. You should also use HtmlEncode to encode characters which have special meaning in HTML – but spaces are not one of those.

Remember that not everything in a Word document is representable exactly in HTML, and the conventions for documents and web pages are different. Don't expect an exact conversion.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900