Click here to Skip to main content
15,902,750 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I want c# code for link extractor,like
if i give input as a www.yahoo.com
then my output should be all intranal lnks on www.yahoo.com,which should save in one text file.

i know i have to use LinksExtractor class but not getting how to do

e.g. ip:www.yahoo.com
o/p:
www.yahoo.com/sports
www.yahoo.com/Business
Posted

1 solution

Let's start :)

Create a public List links:

C#
List<string> links = new List<string>();</string></string>


private void FindLinks(string url)
{
   links.Clear();

   WebBrowser webBrowser1 = new WebBrowser();
   webBrowser1.Navigate(url);
   webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
}


void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
   

   if (((WebBrowser)sender).Document != null)
   {
      HtmlElementCollection col = ((WebBrowser)sender).Document.GetElementsByTagName("a");

      foreach (HtmlElement elem in col)
      {
         if (elem.GetAttribute("href").StartsWith("http://"))
            links.Add(elem.GetAttribute("href"));
      }
   }
}


Call function using:
FindLinks(@"http://www.google.gr/");


All links will be in the List links
 
Share this answer
 
v5
Comments
BobJanova 5-Aug-11 10:40am    
'Internal links' means those within the same website, so the href might not start with http:// (a well designed website will have mostly relative links within the site). The OP is not clear about what exactly 'internal' means but I think your check is not correct for the stated problem.
UJimbo 5-Aug-11 11:46am    
I'll try later on without the startswith part and see how it behaves

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900