Click here to Skip to main content
15,890,043 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
HTML
<div class="vote">
    <input type="hidden" name="_id_" value="1998690">
    <a class="vote-up-off" title="This answer is useful">up vote</a>
    <span itemprop="upvoteCount" class="vote-count-post ">50</span>
    <a class="vote-down-off" title="This answer is not useful">down vote</a>
    <span class="vote-accepted-on load-accepted-answer-date" title="loading when this answer was accepted...">accepted</span>
</div>
            
<td class="answercell">
    <div class="post-text" itemprop="text">
<p>Have you tried this?</p>
<pre><code>//myparent/mychild[text() = 'foo']
</code>


Alternatively, you can use the shortcut for the self axis:



<code>//myparent/mychild[. = 'foo']</code>



Here i need to get the text "//myparent/mychild[text() = 'foo']"

What I have tried:

C#
string htmlCode = "";
using (WebClient client = new WebClient())
{
    client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
    //htmlCode = client.DownloadString("http://www.w3schools.com/html/html_blocks.asp");
    htmlCode = client.DownloadString("http://stackoverflow.com/questions/1998681/xpath-selection-by-innertext");
}
            
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);

//HtmlNode node = doc.DocumentNode.SelectSingleNode(textBox2.Text).InnerText.ToString();
var val = doc.DocumentNode.SelectSingleNode(textBox2.Text).InnerText.ToString();
MessageBox.Show(val.ToString());


I would paste the XPath value in TextBox2, which should inspite give my the inner text corresponding to the XPath.

C#
Xpath = //*[@id="answer-1998690"]/table/tbody/tr[1]/td[2]/div/pre[1]/code/span

The web site which i have tried to get the innertext is as follows:
xml - XPath selection by innertext - Stack Overflow[^]

I am a newbie to XPath, hence not aware on using the same much efficiently....
Posted
Updated 6-Jun-16 5:09am
v3

1 solution

First of all, inner text if a property of an HTML element (instance), not a class. However, you can classify elements by CSS classes, which is routinely done in JavaScript.

But you need to do it in C# which you use to download HTML. Of course you can do it, you just need to parse HTML downloaded. Perhaps the most suitable tool is the open-source HTML Agility Pack, which can do exactly what you want: XPath. Please see:
HTML Agility Pack — Home.

See also:
Web scraping — Wikipedia, the free encyclopedia,
Comparison of HTML parsers — Wikipedia, the free encyclopedia.

See also ScrapySharp, a Web scraping tool which contains a Web client used to simulate a browser and an extension of HTML Agility Pack: https://www.nuget.org/packages/ScrapySharp.

Note that you can use HTML Agility Pack or ScrapySharp for direct downloading of the resources from the Web, so you won't really need to use the class WebClient. However, it's good to know that WebClient is a pretty much rudimentary tool; a really comprehensive facility for retrieving resources from the Web (Web scraping, and stuff like that) is the class System.Net.HttpWebRequest:
HttpWebRequest Class (System.Net).

—SA
 
Share this answer
 
v3
Comments
Mohideenmeera 6-Jun-16 12:14pm    
Hi,
Thanks for your valuable comments.
I have used HtmlAgilityPack to iterate Tables.
But in this case i need to get innertext if i provide the complete xpath like below /html/body/table/tbody[2]/tr/td[2]/div/div[3]/div[2]/div/ol/div[2]/h3/a

This Xpath should return the innertext, could you please help on the same giving an example.

You could use the below url as an example to try out
URL: http://stackoverflow.com/questions/1998681/xpath-selection-by-innertext
Xpath : //*[@id="answer-1998690"]/table/tbody/tr[1]/td[2]/div/pre[2]/code/span
Sergey Alexandrovich Kryukov 6-Jun-16 12:19pm    
This is not "inner text". An element may have a "text node" in the DOM; this is what you need.
Please see, for example:
http://stackoverflow.com/questions/5033955/xpath-select-text-node.

Just learn some DOM and XPath.

Are you going to accept my solution formally? I guess you have all you need.

—SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900