Click here to Skip to main content
15,886,137 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more: , +
From the below sting how can I remove all the <ins>...</ins> and <script>...</script> tags without affecting table data using Regular expression and c#.net.

XML
<table> <tbody><tr> <th>Commodity</th><th>Price</th><th>Chg</th><th>%Chg</th><th>Open</th><th>High</th><th>Low</th><th>Time</th></tr><tr><td colspan="8" style="padding-removed 0%;"><ins class="adsbygoogle" style="display:inline-block;width:468px;height:15px" data-ad-client="ca-pub-2029169313533355" data-ad-slot="7818964028"></ins><script> (adsbygoogle = window.adsbygoogle || []).push({});</script></td></tr><tr id="GOLD"><td>GOLD</td><td>26082</td><td>70</td><td>0.27</td><td>26073</td><td>26096</td><td>26036</td><td>11:49:17</td></tr><tr id="GOLDM"><td>GOLDM</td><td>26120</td><td>62</td><td>0.24</td><td>26096</td><td>26135</td><td>26062</td><td>11:49:17</td></tr><tr id="SILVER"><td>SILVER</td><td>35960</td><td>-12</td><td>-0.03</td><td>36000</td><td>36044</td><td>35838</td><td>11:49:21</td></tr><tr id="SILVERM"><td>SILVERM</td><td>35990</td><td>-10</td><td>-0.03</td><td>35920</td><td>36071</td><td>35684</td><td>11:49:21</td></tr><tr id="COPPER"><td>COPPER</td><td>367.3</td><td>1.3</td><td>0.36</td><td>365.1</td><td>367.75</td><td>364.3</td><td>11:49:21</td></tr><tr id="COPPERM"><td>COPPERM</td><td>367.25</td><td>1.2</td><td>0.33</td><td>366.4</td><td>367.7</td><td>364.3</td><td>11:49:13</td></tr><tr id="ALUMINIUM"><td>ALUMINIUM</td><td>112</td><td>0.05</td><td>0.04</td><td>112</td><td>112.25</td><td>111.9</td><td>11:48:05</td></tr><tr id="ALUMINI"><td>ALUMINI</td><td>112.05</td><td>0.05</td><td>0.04</td><td>111.85</td><td>112.25</td><td>111.85</td><td>11:48:49</td></tr><tr><td colspan="8" style="padding-removed 0%;"><ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-2029169313533355" data-ad-slot="1912031226" data-ad-format="auto"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({})</script></td></tr><tr id="CRUDEOIL"><td>CRUDEOIL</td><td>3121</td><td>-17</td><td>-0.54</td><td>3120</td><td>3122</td><td>3107</td><td>11:49:21</td></tr><tr id="LEAD"><td>LEAD</td><td>114.15</td><td>-0.4</td><td>-0.35</td><td>114.3</td><td>114.45</td><td>113.9</td><td>11:49:16</td></tr><tr id="LEADMINI"><td>LEADMINI</td><td>114.25</td><td>-0.3</td><td>-0.26</td><td>114.1</td><td>114.45</td><td>113.8</td><td>11:48:59</td></tr><tr id="ZINC"><td>ZINC</td><td>127.15</td><td>0.45</td><td>0.36</td><td>126.65</td><td>127.35</td><td>126.5</td><td>11:49:12</td></tr><tr id="ZINCMINI"><td>ZINCMINI</td><td>127.2</td><td>0.45</td><td>0.36</td><td>126.6</td><td>127.35</td><td>126.5</td><td>11:49:18</td></tr><tr id="NICKEL"><td>NICKEL</td><td>899.7</td><td>-4.5</td><td>-0.5</td><td>901</td><td>902.9</td><td>895.5</td><td>11:49:16</td></tr><tr id="NICKELM"><td>NICKELM</td><td>899.5</td><td>-4.6</td><td>-0.51</td><td>903.1</td><td>903.1</td><td>895.6</td><td>11:49:20</td></tr><tr id="NATURALGAS"><td>NATURALGAS</td><td>173.2</td><td>-5.5</td><td>-3.08</td><td>176.3</td><td>176.3</td><td>172.8</td><td>11:48:37</td></tr><tr id="MENTHAOIL"><td>MENTHAOIL</td><td>801.7</td><td>-1.2</td><td>-0.15</td><td>799.8</td><td>803.9</td><td>798.6</td><td>11:44:38</td></tr><tr id="CRUDEOILM"><td>CRUDEOILM</td><td>3122</td><td>-16</td><td>-0.51</td><td>3124</td><td>3135</td><td>3107</td><td>11:49:19</td></tr></tbody></table>
Posted
Updated 10-Mar-15 1:09am
v3
Comments
Sinisa Hajnal 10-Mar-15 6:57am    
What have you tried? There are plenty of resources on the net helping you write Regex and even testing on your own text...if your list of tags is limited to two, any reason you want to waste time writing Regex instead of String.Replacing the whole thing?
mayank.bhuvnesh 10-Mar-15 7:06am    
There may be any number of occurrences. Suppose there are two.
I am trying <ins.*/ins> to replace but it checks the <ins> for first occurrence and </ins>for last occurrence removing all the table data between these. Same case is with script tag.
Sinisa Hajnal 10-Mar-15 7:45am    
This says find < ins then take any number of other chars in between until you find ins > and replacae them :) The solution below seems alright.

1 solution

Hi,

You can try this:
C#
var regex = new Regex(
   "(\\<script(.+?)\\</script\\>)|(\\<ins(.+?)\\>)", 
   RegexOptions.Singleline | RegexOptions.IgnoreCase
);

string ouput = regex.Replace(yourString, "");


This worked for me. Hope this helps !! :) :)
 
Share this answer
 
v2
Comments
mayank.bhuvnesh 10-Mar-15 7:58am    
Hi I am rcvng bellow error

parsing "(\<script(.+?)\))" - Too many )'s.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.ArgumentException: parsing "(\<script(.+?)\))" - Too many )'s.
[no name] 10-Mar-15 8:01am    
This is how I did:

System.Net.WebClient wc = new System.Net.WebClient();
System.IO.Stream stream = wc.OpenRead("C:/Test.html");
System.IO.StreamReader reader = new System.IO.StreamReader(stream);
string str = reader.ReadToEnd();

var regex = new Regex(
"(\\<script(.+?)\\</script\\>)|(\\<ins(.+?)\\>)",
RegexOptions.Singleline | RegexOptions.IgnoreCase
);

string ouput = regex.Replace(str, "");

This is working for me
Matt T Heffron 10-Mar-15 12:57pm    
+5
[no name] 11-Mar-15 0:05am    
Thanks Matt :)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900