Click here to Skip to main content
15,887,135 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
C#
  1  public async Task<string> GetHtmlSelenium(string url)
  2  {
  3      string html = "";
  4      ChromeOptions options = new ChromeOptions();
  5      //options.AddArgument("--headless");
  6      options.AddArgument("--disable-gpu");
  7      options.AddArgument("--no-sandbox");
  8      options.AddArgument("--ignore-certificate-errors"); 
  9      options.AddArgument("--enable-logging");
 10  
 11      using (IWebDriver driver = new ChromeDriver(options))
 12      {
 13          driver.Navigate().GoToUrl(url);
 14  
 15          IList<IWebElement> parentElements = driver.FindElements(By.CssSelector(".competition_row.clickable"));
 16  
 17          foreach (IWebElement parentElement in parentElements)
 18          {
 19              try
 20              {
 21                  parentElement.Click();
 22              }
 23              catch (Exception ex) { }
 24  
 25              var wait = new WebDriverWait(driver, TimeSpan.FromSeconds(25));
 26              wait.Until(ExpectedConditions.ElementIsVisible(By.Id("c87")));
 27              string pageSource = driver.PageSource;
 28          }
 29      }
 30      return html;
 31  }


What I have tried:

I'm trying to scrape the schedule from www.livesoccertv.com/schedules/2023-08-20/ and I can get the top half of the page fine, but once you get halfway down (to the line "Africa - CAF Confederation Cup") then the table is collapsed and the data isn't in the html. The code I've posted seems to move through the elements fine as I can see it scrolling down the page but it never actually loads the data. What am I doing wrong? Thanks
Posted
Updated 24-Aug-23 2:30am
v2
Comments
Richard Deeming 24-Aug-23 8:34am    
We get a lot of "watch sports-ball online!" spam, and a link to a "live soccer TV" site could make your question look like more of the same.

I've removed the link, but left the URL, since the markup of that specific page is pertinent to your question.

1 solution

If I look at the link up to the table row 'Africa - CAF..', it has a span id of 'c88' -
HTML
<div class="competition_row clickable" onclick="showMatches('88','2023-08-20','Live')"><a href="/competitions/international/caf-confederation-cup/" class="flag eurl" title="Go to CAF Confederation Cup page"> </a>Africa - CAF Confederation Cup<span id="clive88" class="clive"></span></div><span id="c88"></span>


Your code is instructed to stop at 'c87' -
C#
wait.Until(ExpectedConditions.ElementIsVisible(By.Id("c87")));


Increase this and it will work. To do this dynamically, replace your single 'wait.Until(...)' statement with a loop that finds all the 'span' elements with IDs starting with 'c'. Then, for each of those elements, it clicks on them, waits until they become stale, and then adds the page source to your 'htmlList' -
C#
public async Task<List<string>> GetHtmlSelenium(string url)
{
    List<string> htmlList = new List<string>();
    ChromeOptions options = new ChromeOptions();
    //options.AddArgument("--headless");
    options.AddArgument("--disable-gpu");
    options.AddArgument("--no-sandbox");
    options.AddArgument("--ignore-certificate-errors"); 
    options.AddArgument("--enable-logging");

    using (IWebDriver driver = new ChromeDriver(options))
    {
        driver.Navigate().GoToUrl(url);

        IList<IWebElement> parentElements = driver.FindElements(By.CssSelector(".competition_row.clickable"));

        foreach (IWebElement parentElement in parentElements)
        {
            try
            {
                parentElement.Click();
            }
            catch (Exception ex) { }

            var wait = new WebDriverWait(driver, TimeSpan.FromSeconds(25));

            //Find all span elements with IDs starting with 'c'...
            IList<IWebElement> spanElements = driver.FindElements(By.CssSelector("span[id^='c']"));

            foreach (IWebElement spanElement in spanElements)
            {
                try
                {
                    wait.Until(ExpectedConditions.ElementToBeClickable(spanElement));
                    spanElement.Click();
                    wait.Until(ExpectedConditions.StalenessOf(spanElement));
                    htmlList.Add(driver.PageSource);
                }
                catch (Exception ex) { }
            }
        }
    }
    return htmlList;
}
 
Share this answer
 
v2
Comments
punk_legend 24-Aug-23 8:07am    
Thanks for the suggestion, but I get a timeout on first wait.until. No joy I'm afraid.
Andre Oosthuizen 24-Aug-23 8:37am    
The 'wait.Until()' line waits for the element to become clickable, we need to check that the element is still part of the DOM when you're trying to interact with it. You can try the 'ExpectedConditions.PresenceOfAllElementsLocatedBy' method, see if it works -
var wait = new WebDriverWait(driver, TimeSpan.FromSeconds(25));

IList<IWebElement> spanElements = driver.FindElements(By.CssSelector("span[id^='c']"));

foreach (IWebElement spanElement in spanElements)
{
    try
    {
        wait.Until(ExpectedConditions.PresenceOfAllElementsLocatedBy(By.CssSelector("span[id^='c']")));
        spanElement.Click();
        wait.Until(ExpectedConditions.StalenessOf(spanElement));
        htmlList.Add(driver.PageSource);
    }
    catch (Exception ex) { }
}
punk_legend 24-Aug-23 9:04am    
get an exception "element not interactable" on every spanElement.Click(). Same behaviour as with my original code, I can see it scrolling down the page but not expanding the menu.
Andre Oosthuizen 24-Aug-23 11:14am    
That is because you are trying to click the span element, will not work. You need to read the id of span, then click the div where that span is, as that uses a 'onclick="showMatches('88','2023-08-20','Live')"' function. See my first code block.
punk_legend 29-Aug-23 16:20pm    
I ended up using the developer tools to see what the was being sent and it gave me the url format which I populated with data from the element and scraped those pages instead. Thanks for your help though.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900