Click here to Skip to main content
15,868,016 members
Articles / Desktop Programming / MFC

Python – Search Youtube for Video

Rate me:
Please Sign up or sign in to vote.
5.00/5 (4 votes)
5 Feb 2015GPL32 min read 48.2K   6   8
This code is for Python 3. I was surprised to discover that I couldn’t really find a good way to do this when I Googled for a solution. I just kept getting results for Google’s youtube API, which is great… but also massive overkill for what I wanted to do.

This code is for Python 3.

I was surprised to discover that I couldn’t really find a good way to do this when I Googled for a solution. I just kept getting results for Google’s youtube API, which is great… but also massive overkill for what I wanted to do. I just wanted to search for a youtube video and return the top result. Here’s some simple code showing you how to do exactly  that. If you don’t care how it works just skip to “Using It”.

import urllib.request
import urllib.parse
import re

query_string = urllib.parse.urlencode({"search_query" : input()})
html_content = urllib.request.urlopen("http://www.youtube.com/results?" + query_string)
search_results = re.findall(r'href=\"\/watch\?v=(.{11})', html_content.read().decode())
print("http://www.youtube.com/watch?v=" + search_results[0])

 How It Works

First let’s look at the anatomy of a youtube search URL. Here’s the one I experimented with:

http://www.youtube.com/results?search_query=Epic+Rock+-+Ready+For+This+%282014%29%28Battle+Rock%29%28All+Good+Things%29

If you ignore the end, it’s really not that complicated. Anything you search for is just http://www.youtube.com/results?search_query= with your search string URL encoded tacked onto it. So that’s step one. The first line takes whatever the user enters and changes it from a user readable query to a URL. It looks like this:

Before: Epic Rock – Ready For This (2014)(Battle Rock)(All Good Things)
After: Epic+Rock+-+Ready+For+This+%282014%29%28Battle+Rock%29%28All+Good+Things%29

The urllib.parse.urlencode function returns key value pairs and in this case will return search_query=Epic+Rock+-+Ready+For+This+%282014%29%28Battle+Rock%29%28All+Good+Things%29. The next line simply “browses” to the URL and returns a file-like URL object. Now we want to find the top result. As it turns out the video results always follow the syntax href=”/watch?v=<11_DIGIT_IDENTIFIER>”. So we just want to search for all instances of href=”/watch?v=<11_DIGIT_IDENTIFIER>”. We use a regular expression to do that. The html_content.read().decode() part simply reads the file object and decodes it into a text string for our regular expression to parse. The regular expression you see basically says match href=”/watch?v= and then the .{11} part means match anything that repeats 11 times. The parenthesis around the 11 are what’s called a group. We’re really not interested in the href part of the expression, we just want the 11 digit identifier of the youtube video. The parenthesis cause the regular expression to just return a list of the groups matched. So we’d get something like this [4fmwMXEUWOI, <another 11 digit identifier>, another, etc].

The last line of the program also leverages the predictable nature of youtube URLs. The video URL will always be: http://www.youtube.com/watch?v=<11 digit identifier>. So we just concatenate the predictable part with the first result from our list.

Example Output

For this code snippet you might type in: Epic Rock – Ready For This (2014)(Battle Rock)(All Good Things)
The program would spit out: http://www.youtube.com/watch?v=4fmwMXEUW0I

Using It

To use the code simply copy and paste it into your program. Replace “input()” in the first line with whatever string you want to query youtube for and then change the last line to use the results as needed. The search_results variable will be a list of your the 11 digit video identifiers. If you just want the first one it’s just search_results[0].

Hope this saves someone else some time.

Grant

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)


Written By
United States United States
Grant is a specialist in computer security and networking. He holds a bachelors degree in Computer Science and Engineering from the Ohio State University. Certs: CCNA, CCNP, CCDA, CCDP, Sec+, and GCIH.

Comments and Discussions

 
QuestionList out of range Pin
Member 1521009221-May-21 4:07
Member 1521009221-May-21 4:07 
Questionerror Pin
Member 1511202921-Mar-21 21:01
Member 1511202921-Mar-21 21:01 
QuestionRe: error Pin
Member 1521009221-May-21 4:05
Member 1521009221-May-21 4:05 
Questionfound it Pin
AYUSH Mondal10-Feb-21 5:41
AYUSH Mondal10-Feb-21 5:41 
Questionsearch_result is out of range Pin
may taj7-Jan-21 6:19
may taj7-Jan-21 6:19 
Suggestionsearchtube module Pin
Member 1502015313-Dec-20 20:47
Member 1502015313-Dec-20 20:47 
QuestionGetting error Pin
Mayank Paikara30-Jun-20 15:56
Mayank Paikara30-Jun-20 15:56 
GeneralMy vote of 5 Pin
Member 116695136-May-15 5:06
Member 116695136-May-15 5:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.