Retrieve URL not in HTML page?

Question

0.00/5 (No votes)

See more:

Trying to programmatically retrieve an m3u8 file with C# from a webpage. The problem is the webpage does not have the m3u8 file on it anywhere. BUT, if you look at the Network dev tools area and filter on m3u8 there are a few of them so it's there and loading but not part of the "page". The file won't be like <name>.m3u8 but will "contain" m3u8 somewhere in the calling URL. So need to find all m3u8 files and then filter out which I can figure out just trying to figure out how to get these files.

What I have tried:

I have tried using Selenium and have been able to retrieve the main page content. Tried also using some other methods with proxies to capture the info but never really could figure that out. Also tried using Selenium's new Network option but may be a bit beyond me and not any real good full examples I can find.

Posted 27-Nov-20 4:13am

Member 569739

Updated 30-Nov-20 6:27am

Add a Solution

Comments

Richard MacCutchan 27-Nov-20 11:10am

It is quite likely that the owners of the website do not want their files taken by anyone other than people who are known to them.

Afzaal Ahmad Zeeshan 27-Nov-20 22:00pm

So, ultimately the content is inside the HTML (either as a hyperlink or inside the JavaScript). If the later is the case, then a quick JavaScript contains can check where the file is.

But, whether you will be able to download it is a different question in its own self. :)

[no name] 28-Nov-20 1:16am

"It does not have it ... but some tool says it does". Maybe the tool is wrong.

1 solution

Add a Solution

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Yvan Rodrigues · Answer 1 · 2020-11-30T06:27:00

The files that you see in the Network Tools area of the browser is a recursive list of all files linked from the HTML page and all of its children.

For example, you load a single HTML page. The page will like to CSS stylesheets. Those stylesheets may include other stylesheets, fonts, background images, etc. The browser retrieves each one of those at its discretion.

The HTML will also likely include one or more JavaScript links, including jQuery, jQuery libraries, Bootstrap or Foundation etc. Each of those scripts is able to tell the browser to retrieve a remote file.

Then even once everything that the HTML and its children link has been loaded, you have AJAX -- additional web server requests based on asynchronously running scripts, or in reaction to user input such as moving the mouse or clicking.

So to get the information you are looking for you would need to implement this recursive fetching and analysis behaviour, which is equivalent to writing a good chunk of a web browser itself. In that case you could look at the source code for Chromium or perhaps a text-based browser like Lynx.

Alternatively perhaps you could write a browser extension, which would likely give you access to the information that you see in the Network Tools panel.

In terms of being able to accomplish this with a small script of a couple of hundred lines of code, I would say it is close to impossible.

Retrieve URL not in HTML page?

1 solution

Solution 1

Add your solution here

Preview 0

Retrieve URL not in HTML page?

1 solution

Solution 1

Add your solution here

Preview 0

Existing Members

...or Join us