Click here to Skip to main content
15,908,264 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am trying to figure out how to retrieve text from a web page. The page does not have textboxes (figures because I can do that). Here is a sample from the page. I want to retrieve the word VALID

HTML
Drivers License</a></td><td class="formSectionContent"><div class="summaryDetail"><table><tr><td class="alignRight"><label class="readOnly">Status:</label></td><td class="alignLeft">VALID&nbsp;(Non-<acronym title="Commercial Driver's License">CDL</acronym>)</td></tr><tr><td class="alignRight"><label class="readOnly">License Class:</label></td><td 


I need the word VALID

This is what the information looks like on the web page:

Status: VALID (Non-CDL)
Posted
Updated 14-Jul-12 4:17am
v2
Comments
Salman Ali Hero 14-Jul-12 10:13am    
This looks like incomplete code
wiswalld 14-Jul-12 12:06pm    
<div id="viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_:resultsForm:_id139" class="panelSection"><div id="viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_:resultsForm:_id139-header" class="panelSection-header"><div id="viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_:resultsForm:_id139-header-opened" class="twistyOpen"><div>Results</div></div><div id="viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_:resultsForm:_id139-header-closed" class="twistyClosed"><div>Results</div></div></div><script type="text/javascript">jQuery("#viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_\\:resultsForm\\:_id139-header-opened-link").click(function() { jQuery("#viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_\\:resultsForm\\:_id139-body").toggle("blind"); jQuery("#viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_\\:resultsForm\\:_id139-header-opened-link").toggle(); jQuery("#viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_\\:resultsForm\\:_id139-header-closed-link").toggle(); } );jQuery("#viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_\\:resultsForm\\:_id139-header-closed-link").click(function() { jQuery("#viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_\\:resultsForm\\:_id139-body").toggle("blind"); jQuery("#viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_\\:resultsForm\\:_id139-header-opened-link").toggle(); jQuery("#viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_\\:resultsForm\\:_id139-header-closed-link").toggle(); } );</script><script type="text/javascript">jQuery(function(jQuery) { var panelSection = jQuery("#viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_\\:resultsForm\\:_id139")[0]; jQuery("#viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_\\:resultsForm\\:_id139-body script").each(function() { panelSection.appendChild(this); });});</script><div id="viewns_7_BMLR5Q8518NO70IAHI92FK30Q6_:resultsForm:_id139-body" class="panelSection-body"><div style="width: 100%;"><table class="formSection"><tr><td class="formSectionHeaderCriteria">Search Criteria</td><td class="formSectionContentCriteria"><p><span style="white-space: nowrap;"><label class="readOnly">Client ID (CID):</label> REMOVED BY ME</span> </p></td></tr><tr><td class="formSectionHeader">DMV Drivers License</td><td class="formSectionContent"><div class="summaryDetail"><table><tr><td class="alignRight"><label class="readOnly">Status:</label></td><td class="alignLeft">VALID (Non-CDL)   VALID (CDL)</td></tr><tr><td class="alignRight"><label class="readOnly">License Class:</label></td><td class="alignLeft">CDL *AM*   <label class="readOnly">Expiration:</label> 03/15/2019</td></tr><tr><td class="alignTopRight"><label class="readOnly">Name:</label></td><td>REMOVED BY ME   <label class="readOnly">Client ID:</label> REMOVED BY ME</td></tr><tr><td class="alignTopRight"><label class="readOnly">Birth Date:</label></td><td><
wiswalld 14-Jul-12 12:08pm    
Its a big section and some of it I removed as it is sensitive

1 solution

One way would be to use regular expression. something like this but you may need to make it more complex... Other option, you might be able to use an XML reader to read through the html.

XML
<label.*?>Status:</label></td><td.*?>(.*?)</td>


C#
System.Text.RegularExpressions.Regex regx = new System.Text.RegularExpressions.Regex("<label.*?>Status:</label></td><td.*?>(.*?)</td>",
                System.Text.RegularExpressions.RegexOptions.Multiline | System.Text.RegularExpressions.RegexOptions.IgnoreCase);

            if (regx.IsMatch(htmlstr))
            {
                //...
            }


Edit: sorry forgot Vb example...

VB
Dim regx As New System.Text.RegularExpressions.Regex("<label.*?>Status:</label></td><td.*?>(.*?)</td>",
                System.Text.RegularExpressions.RegexOptions.Multiline Or System.Text.RegularExpressions.RegexOptions.IgnoreCase)

            If (regx.IsMatch(htmlstr)) Then
                //...
            End If
 
Share this answer
 
v2
Comments
wiswalld 16-Jul-12 10:57am    
So I found this

Dim strAsHtmlTableData As String = ""
Dim intlndex As Integer = 0
For Each tblElements As System.Windows.Forms.HtmlElement In Me.WebBrowser1.Document.GetElementsByTagName("Client ID:")
Me.TextBox1.Text = tblElements.OuterHtml
intlndex += 1
Next

Trying to get: 784543117

out of this. I cut before and after the part that I need. Not sure what the element name is. I tried

Client ID:


nbsp;  <label class="readOnly">Client ID:</label> 784543117</td></tr><tr><td

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900