Click here to Skip to main content
15,867,308 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I'm trying to scrap the names of universities and their links from this website **https://www.hec.gov.pk/english/universities/pages/recognised.aspx#k=**
What I want for my scrapper to do is scrape the names of university from the first page, click the next button, scrape from the next page and so on.
However, it scrapes data from the first page, clicks the next button and repeats the scraping on the first page on loop. I'm guessing the XPath is not following the next URL.



You can see in the output the University names repeat themselves from the first page

What I have tried:

Python
import requests
    import lxml.html as lh
    import pandas as pd
    from webdriver_manager.chrome import ChromeDriverManager
    from selenium import webdriver
    import time
    
    driver = webdriver.Chrome(ChromeDriverManager().install())
  web=driver.get('https://www.hec.gov.pk/english/universities/pages/recognised.aspx#k=')
    i='1'
    col=[]
    time.sleep(5)
    for page in range(1,24):
            i=str(int(i)+1)
            button=driver.find_element_by_link_text(i)
            button.click()
            time.sleep(5)
            url=driver.current_url
            page=requests.get(url)
            doc=lh.fromstring(page.content)
            tr_elements = doc.xpath('//tr/td[5]')
            for t in tr_elements: 
                     name=t.text_content()
                     col.append(name)
            print(col)
Posted
Updated 11-Mar-21 23:47pm
v2
Comments
NotTodayYo 12-Mar-21 7:42am    
Debug it and find out what is happening.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900