Automate the Boring Stuff with Python Page 25 Read online free by Al Sweigart

Automate the Boring Stuff with Python Page 25

This is what your program does:

Gets search keywords from the command line arguments.

Retrieves the search results page.

Opens a browser tab for each result.

This means your code will need to do the following:

Read the command line arguments from sys.argv.

Fetch the search result page with the requests module.

Find the links to each search result.

Call the webbrowser.open() function to open the web browser.

Open a new file editor window and save it as lucky.py.

Step 1: Get the Command Line Arguments and Request the Search Page

Before coding anything, you first need to know the URL of the search result page. By looking at the browser’s address bar after doing a Google search, you can see that the result page has a URL like https://www.google.com/search?q=SEARCH_TERM_HERE. The requests module can download this page and then you can use Beautiful Soup to find the search result links in the HTML. Finally, you’ll use the webbrowser module to open those links in browser tabs.

Make your code look like the following:

#! python3 # lucky.py - Opens several Google search results. import requests, sys, webbrowser, bs4 print('Googling...') # display text while downloading the Google page res = requests.get('http://google.com/search?q=' + ' '.join(sys.argv[1:])) res.raise_for_status() # TODO: Retrieve top search result links. # TODO: Open a browser tab for each result.

The user will specify the search terms using command line arguments when they launch the program. These arguments will be stored as strings in a list in sys.argv.

Step 2: Find All the Results

Now you need to use Beautiful Soup to extract the top search result links from your downloaded HTML. But how do you figure out the right selector for the job? For example, you can’t just search for all tags, because there are lots of links you don’t care about in the HTML. Instead, you must inspect the search result page with the browser’s developer tools to try to find a selector that will pick out only the links you want.

After doing a Google search for Beautiful Soup, you can open the browser’s developer tools and inspect some of the link elements on the page. They look incredibly complicated, something like this: Beautiful Soup: We called him Tortoise because he taught us..

It doesn’t matter that the element looks incredibly complicated. You just need to find the pattern that all the search result links have. But this element doesn’t have anything that easily distinguishes it from the nonsearch result elements on the page.

Make your code look like the following:

#! python3 # lucky.py - Opens several google search results. import requests, sys, webbrowser, bs4 --snip-- # Retrieve top search result links. soup = bs4.BeautifulSoup(res.text) # Open a browser tab for each result. linkElems = soup.select('.r a')

If you look up a little from the element, though, there is an element like this: