Click here to Skip to main content
15,881,757 members
Articles / Web Development / HTML
Article

Reading Meta Tags of Any Page Programatically without loading in browser

Rate me:
Please Sign up or sign in to vote.
3.86/5 (3 votes)
11 Oct 2013CPOL4 min read 16.7K   1  
In this article I will show you how to read meta tags programatically using C# and Asp.Net. How this article is different from other articles

This articles was originally at wiki.asp.net but has now been given a new home on CodeProject. Editing rights for this article has been set at Bronze or above, so please go in and edit and update this article to keep it fresh and relevant.

In this article I will show you how to read meta tags programatically using C# and Asp.Net. How this article is different from other articles available on internet is that all the samples available on internet talks about reading and writing tags from page itself but In this article our approach will be do dynamically download the contents of a page and read meta tags from it.

First thing first we need to download the content of page without loading it into browser. For this we will be using WebRequest class. Below Code creates a request to "http://www.microsoft.com/en/us/default.aspx" using default credentials

 // Create a request for the URL.            
 WebRequest request = WebRequest.Create("http://www.microsoft.com/en/us/default.aspx");
 // If required by the server, set the credentials.   
 request.Credentials = CredentialCache.DefaultCredentials;

 Now we are set to get response from the client. To receive response we are going to use WebResponse class as

 // Get the response.   
 HttpWebResponse response = (HttpWebResponse)request.GetResponse();

Once we have response we want to load this into Html DOM. As you should know that Html uses DOM model to load documents. So in next few lines we will get response in form of string and use that string to load IHTMLDocument2  class.

 // Get the stream containing content returned by the server.   
        Stream dataStream = response.GetResponseStream();
        // Open the stream using a StreamReader for easy access.   
        StreamReader reader = new StreamReader(dataStream);
        // Read the content.   
        string responseFromServer = reader.ReadToEnd();

 //reads the html into an html document to enable parsing   
        IHTMLDocument2 doc = new HTMLDocumentClass();
        doc.write(new object[] { responseFromServer });
        doc.close();

Now that we have entire Page loaded in memory in form of HtmlDocument we are going to iterate it and retrieve Meta tags from it.


        //loops through each element in the document to check if it qualifies for the attributes to be set    
        foreach (IHTMLElement el in (IHTMLElementCollection)doc.all)
        {
            // check to see if all the desired attributes were found with the correct values    
            bool qualify = true;
            if (el.tagName == "META")
            {
                HTMLMetaElement meta = (HTMLMetaElement)el;
                Response.Write("Content " + meta.content + "<br/>");
            }

        }   

Of course you can do lot of more things with above code. But we will take that up in some other articles. For your reference I am pasting the complete code below. For the sample to work please add a reference to mshtml by

Steps:-

1.) In the solution explorer, highlight the project to which you want to add the parsing functionality
2.) In the menu, click on Project -> Add reference
3.) In the dialog box that is shown, under the .Net tab - choose the Microsoft.mshtml assembly
4.) Click the select button and click on the OK button

Now we can reference this assembly

 Don't forget to add namespace
 
using mshtml;

Response.Write("Button2_Click");

        // Create a request for the URL.            
        WebRequest request = WebRequest.Create("http://www.microsoft.com/en/us/default.aspx");
        // If required by the server, set the credentials.   
        request.Credentials = CredentialCache.DefaultCredentials;
        // Get the response.   
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        // Display the status.   
        Console.WriteLine(response.StatusDescription);
        // Get the stream containing content returned by the server.   
        Stream dataStream = response.GetResponseStream();
        // Open the stream using a StreamReader for easy access.   
        StreamReader reader = new StreamReader(dataStream);
        // Read the content.   
        string responseFromServer = reader.ReadToEnd();
        // Display the content.   
        Console.WriteLine(responseFromServer);
        // Cleanup the streams and the response.   
        reader.Close();
        dataStream.Close();
        response.Close();

        //reads the html into an html document to enable parsing   
        IHTMLDocument2 doc = new HTMLDocumentClass();
        doc.write(new object[] { responseFromServer });
        doc.close();

        //loops through each element in the document to check if it qualifies for the attributes to be set   
        foreach (IHTMLElement el in (IHTMLElementCollection)doc.all)
        {
            // check to see if all the desired attributes were found with the correct values   
            bool qualify = true;
            if (el.tagName == "META")
            {
                HTMLMetaElement meta = (HTMLMetaElement)el;
                Response.Write("Content " + meta.content + "<br/>");
            }

        }    

 

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
The ASP.NET Wiki was started by Scott Hanselman in February of 2008. The idea is that folks spend a lot of time trolling the blogs, googlinglive-searching for answers to common "How To" questions. There's piles of fantastic community-created and MSFT-created content out there, but if it's not found by a search engine and the right combination of keywords, it's often lost.

The ASP.NET Wiki articles moved to CodeProject in October 2013 and will live on, loved, protected and updated by the community.
This is a Collaborative Group

754 members

Comments and Discussions

 
-- There are no messages in this forum --