Click here to Skip to main content
15,889,362 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
Hi guys,

I'm hoping someone wouldn't mind writing me a complex regular expression, preferably in c# for splitting html text.

basically looking for an array that contains all the formatting tags as strings in the array and also every separate word. Maintaining the order of the words is important

This may not be possible but if it is i could use some assistance.

Thanks
Carl
Posted
Comments
Christian Amado 21-Jul-14 10:28am    
What did you try?
PIEBALDconsult 21-Jul-14 11:00am    
http://blog.codinghorror.com/parsing-html-the-cthulhu-way/

1 solution

Don't do it - it's pretty horrible, and far, far too easy to break. And far, far to complicated to fix! :laugh:

Look at parsing it properly instead - there are some good ones out there, but this one will get you started: Parsing HTML Tags in C#[^]
Or this: Another C# Legacy HTML Parser Using Tag Processing[^]
Or this: AfterWork HTML Parser in C#[^]
Or this: HTML Meta Tag Parser[^]
Depending on what exactly you are trying to do.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900