Click here to Skip to main content
15,880,503 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I need some help, finding a C# .Net Solution for scraping an Ajax website.
Anyone ??
Posted
Comments
David_Wimbley 27-Nov-12 15:30pm    
What exactly are you trying to grab from these websites? You have a number of options.

1) XPath to load an HTML document, can be tricky with malformed HTML
2) Selenium (Browser automation but has .net capbabilities)
3) Html Agility pack to load a website, it also handles malformed html

A non-c# solution, still browser automation related is watir...its ruby.
Paw Jershauge 27-Nov-12 15:35pm    
Well im not the big website building anymore, i stopped at asp classic ;) im more in winforms. So lets see if i can explain myself, here goes:
I have a website that posts status on some systems. The status message and assosicated information are posted back via ajax, and therefor the normal HTMLElements wont hold the correct text in the innerText property. hope that makes sence ;)
ZurdoDev 27-Nov-12 15:55pm    
AJAX can easily return strings. What exactly is coming back from the AJAX call that can't go into the html elements? Something doesn't seem right here.
Paw Jershauge 27-Nov-12 15:58pm    
ryanb31 its not that that ajax cant return the data, it does. and i can view the message in my browser, but when i look into the Html source code of the site, the message is not there, its only a {{message}} variable or somethinf thats in the place where the message text should be.

1 solution

Well i belive i found a workaround solution for this issue.
I just use the WebBrowser instead of the WebClient and have the WebBrowser render the hole site before extracting the HtmlDocument. takes time, but it works.

heres the code
C#
public HtmlDocument GetHtmlAjax(Uri uri, int AjaxTimeLoadTimeOut)
{
    using (WebBrowser wb = new WebBrowser())
    {
        wb.Navigate(uri);
        while (wb.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();
        Thread.Sleep(AjaxTimeLoadTimeOut);
        Application.DoEvents();
        return wb.Document;
    }
}
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900