Click here to Skip to main content
15,888,527 members
Please Sign up or sign in to vote.
1.00/5 (3 votes)
See more:
Hi There.
I want to write a program in c# that in connect to a news webpage And return its topics and ofcourse its summary news...
how can i write this code ? I wrote this code .but i dont know its right or no?
it returns all links .but I want to return just topics and summaries!
tanks
public struct LinkItem
     {
         public string Href;
         public string Text;

         public override string ToString()
         {
             return Href + "\n\t" + Text;
         }
     }

     static class LinkFinder
     {
         public static List<LinkItem> Find(string file)
         {
             List<LinkItem> list = new List<LinkItem>();

             // 1.
             // Find all matches in file.
             MatchCollection m1 = Regex.Matches(file, @"(<a.*?>.*?</a>)",
                 RegexOptions.Singleline);

             // 2.
             // Loop over each match.
             foreach (Match m in m1)
             {
                 string value = m.Groups[1].Value;
                 LinkItem i = new LinkItem();

                 // 3.
                 // Get href attribute.
                 Match m2 = Regex.Match(value, @"href=\""(.*?)\""",
                 RegexOptions.Singleline);
                 if (m2.Success)
                 {
                     i.Href = m2.Groups[1].Value;
                 }

                 // 4.
                 // Remove inner tags from text.
                  string t = Regex.Replace(value, @"\s*<.*?>\s*", "",
                 RegexOptions.Singleline);
                 i.Text = t;

                 list.Add(i);
             }
             return list;
         }
         static void Main(string[] args)
         {
             WebClient w = new WebClient();
             string s = w.DownloadString("http://www.bbc.co.uk/news");


             //
             MemoryStream ms = new MemoryStream();
             StreamWriter sw = new StreamWriter(ms);

             foreach (LinkItem i in LinkFinder.Find(s))
             {
                 sw.WriteLine (i);
                 sw.Flush();


             }
             ms.WriteTo(File.Create(@"E:\oop1.txt"));
             sw.Close();
             ms.Close();
         }
     }
 }
Posted
Updated 5-Oct-15 8:16am
v2
Comments
sreeyush sudhakaran 5-Oct-15 6:52am    
You mean want to get specific HTML tag from the given URL?
Member 11383150 5-Oct-15 14:10pm    
yeah exactly! its true! please help me ...
i really want to know U how can i do that!
sreeyush sudhakaran 6-Oct-15 3:04am    
Use HTML Agility to get specific tag from a website

Refer : http://www.codeproject.com/Articles/659019/Scraping-HTML-DOM-elements-using-HtmlAgilityPack-H
Thanks7872 5-Oct-15 6:55am    
Run this code. If it works as expected, its right,otherwise its not.
Thava Rajan 5-Oct-15 6:56am    
which part of the page you want to read?

1 solution

Read SiteMapper Tool[^], especially the section Traversing the Web Site.

Hope that helps
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900