Click here to Skip to main content
15,881,852 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a replacing regular expression to remove whitespace from a HTML document: /\s\s+/

Is there a way to remove all whitespace except from within pre tags?

eg.
<html>
<head>
	<title>Rawr</title>
</head>
<body>

<pre>
	Some stuff
</pre>

</body>
</html>
To look like this:
<html><head><title>Rawr</title></head><body><pre>
	Some stuff
</pre></body></html>
Is this even possible?
Posted

Can't you do it in two steps? In other words, remove the whitespace and then replace "<pre>" with "<pre>\n" and "</pre>" with "\n</pre>".
 
Share this answer
 
You can find all whitespace that does not exist after an opening PRE (that isn't followed by a closing PRE) and that does not exist before a closing PRE (that isn't preceeded by an opening PRE). That would require a negative lookahead and a negative lookbehind that themselves contain a negative lookahead and a negative lookbehind. I've never tried combining negative lookaheads/lookbehinds like that, so I'm not sure it would work, but it seems like that would be the way to go. I'll let you figure out the regex, but pseudo-regex would be something like this:

(must not contain a PRE that is not followed by a /PRE)
(?<=(.|\n)*)
whitespace
(?=(.|\n)*)
(must not contain a /PRE that is not preceeded by a PRE)


Here is what a negative lookbehind looks like:
(?<!\<PRE\>)

Here is what a negative lookahead looks like:
(?!\<\/PRE\>)

You'll have to do some testing to play with the nesting, as I am not quite sure how that works.

Check out this reference for more C# regex help.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900