Click here to Skip to main content
15,893,663 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am trying to scrape a web page. Using C# and WinForms. I can load the page into a WebBrowser, invoke the post method for a form, and get the results. I'd like to then change the date and resubmit the form. Unfortunately the control for picking the date is a read only control driven by a lot of ajax code (axd files) that aren't easily accessible.

I thought it would be better to try to intercept the POST data before it goes to the server and change the date string.

I can't seem to find a way to get the post data before it is sent to the server. Is there some way to grab it, modify it, and then send it on it's way?

What I have tried:

public void CompletedHander(object sender,
      WebBrowserDocumentCompletedEventArgs e)
{
      WebBrowser wb = ((WebBrowser)sender);

      HtmlElement button = wb.Document.GetElementById("ctl00$WebSplitter1$tmpl1$ContentPlaceHolder1$HeaderBTN1$btnRetrieve");
      if (button != null)
      {
           _pushed = true;
           button.InvokeMember("click");
      }
}
Posted
Updated 24-May-17 6:46am

Having scraped hundreds of sites, I might disagree with your tactic on trying to bot the page controls. Instead, you can manually construct your POST request and populate the content/payload appropriately. For example, in Google Chrome, visit the page you would launch your POST request from. On that page, right-click and select "inspect". Then click on the "Network Tab" to see all the requests. Perform the POST manually and then click on that request in the resultant list of network activity. Click the "Headers" tab on the right to see all headers. On a request of type POST, you will see a section called "Form Data". That is the POST content that you want capture. You would paste that as a template into your own POST request. So for example to insert your own date into the content, your routine might look something like:
C#
public static HttpWebRequest CreatePOSTRequest(string url, DateTime someDate)
{
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create("someURL");
    request.Method = "POST";
    string payload = "somepastedcontent;somedatevalue=" +
                      someDate.ToString() +
                      ";somemoretemplatestuff";
    byte[] byteArray = System.Text.Encoding.UTF8.GetBytes(payload);
    request.ContentType = "application/x-www-form-urlencoded";
    request.ContentLength = byteArray.Length;
    using (System.IO.Stream dataStream = request.GetRequestStream())
    {
        dataStream.Write(byteArray, 0, byteArray.Length);
    }
    return request;
}

Note, you would need to set the above request.ContentType to your target's content type listed in the Headers. Then to execute the above web request you do something like:
C#
string responseText;
WebResponse response = (HttpWebResponse)request.GetResponse();
using (Stream stream = response.GetResponseStream()) {
    using (StreamReader reader = new StreamReader(response.GetResponseStream(), 
                                                  System.Text.ASCIIEncoding.UTF8))
    {
        responseText = _ConditionResponse(reader.ReadToEnd());
    }
}


I would be happy to provide specific help if you provide the page you want to scrape. Otherwise good luck!
 
Share this answer
 
Comments
mjackson11 22-May-17 8:40am    
I tried that approach first. I was able to change the post data but the site comes back with an unspecified error. No code or explanation. Guessing the viewstate didn't match the post and it kicked it out. It also appears they put some sort of random number generator in the page to make it really difficult to simulate a POST.

Thinking I might have to manually go through all the java on the page and figure out how to simulate the date picker actions.
Robert Welliever 22-May-17 10:17am    
I have extensive experience with what you are describing, and I promise it's the correct solution. You just need to capture and parse the page prerequisite to your POST. What is the resource you are trying to scrape? If I can't give you an optimal solution twenty minutes after I read your message, you can have your money back.
Turned out there was a control I could use java to invoke which created the correct viewstate and post data.

public void CompletedHander(object sender,
      WebBrowserDocumentCompletedEventArgs e)
{
    HtmlElement webDatePicker = wb.Document.GetElementById("ctl00_WebSplitter1_tmpl1_ContentPlaceHolder1_dtePickerBegin");

    string szJava = string.Empty;
    szJava = "a = $find(\"ctl00_WebSplitter1_tmpl1_ContentPlaceHolder1_dtePickerBegin\"); a.set_text(\"5/20/2017\");";
    object a = wb.Document.InvokeScript("eval", new object[] { szJava });
    if (webDatePicker != null)
        webDatePicker.InvokeMember("submit");

    HtmlElement button = wb.Document.GetElementById("ctl00$WebSplitter1$tmpl1$ContentPlaceHolder1$HeaderBTN1$btnRetrieve");
    if (button != null)
    {
         button.InvokeMember("click");
    }
}
 
Share this answer
 
Comments
Robert Welliever 25-May-17 21:33pm    
That's good you found your own solution, but I had to mention one thing to possibly save you from future embarrassment. You've mentioned Java twice but Java never enters the picture. You are writing bits of Javascript to parse HTML text to generate DOM objects in that code.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900