by Al Sweigart
Stepping through the program with the debugger is helpful but can also be slow. Often you’ll want the program to run normally until it reaches a certain line of code. You can configure the debugger to do this with breakpoints.
Breakpoints
A breakpoint can be set on a specific line of code and forces the debugger to pause whenever the program execution reaches that line. Open a new file editor window and enter the following program, which simulates flipping a coin 1,000 times. Save it as coinFlip.py.
import random heads = 0 for i in range(1, 1001): ➊ if random.randint(0, 1) == 1: heads = heads + 1 if i == 500: ➋ print('Halfway done!') print('Heads came up ' + str(heads) + ' times.')
The random.randint(0, 1) call ➊ will return 0 half of the time and 1 the other half of the time. This can be used to simulate a 50/50 coin flip where 1 represents heads. When you run this program without the debugger, it quickly outputs something like the following:
Halfway done! Heads came up 490 times.
If you ran this program under the debugger, you would have to click the Over button thousands of times before the program terminated. If you were interested in the value of heads at the halfway point of the program’s execution, when 500 of 1000 coin flips have been completed, you could instead just set a breakpoint on the line print('Halfway done!') ➋. To set a breakpoint, right-click the line in the file editor and select Set Breakpoint, as shown in Figure 10-5.
Figure 10-5. Setting a breakpoint
You don’t want to set a breakpoint on the if statement line, since the if statement is executed on every single iteration through the loop. By setting the breakpoint on the code in the if statement, the debugger breaks only when the execution enters the if clause.
The line with the breakpoint will be highlighted in yellow in the file editor. When you run the program under the debugger, it will start in a paused state at the first line, as usual. But if you click Go, the program will run at full speed until it reaches the line with the breakpoint set on it. You can then click Go, Over, Step, or Out to continue as normal.
If you want to remove a breakpoint, right-click the line in the file editor and select Clear Breakpoint from the menu. The yellow highlighting will go away, and the debugger will not break on that line in the future.
Summary
Assertions, exceptions, logging, and the debugger are all valuable tools to find and prevent bugs in your program. Assertions with the Python assert statement are a good way to implement “sanity checks” that give you an early warning when a necessary condition doesn’t hold true. Assertions are only for errors that the program shouldn’t try to recover from and should fail fast. Otherwise, you should raise an exception.
An exception can be caught and handled by the try and except statements. The logging module is a good way to look into your code while it’s running and is much more convenient to use than the print() function because of its different logging levels and ability to log to a text file.
The debugger lets you step through your program one line at a time. Alternatively, you can run your program at normal speed and have the debugger pause execution whenever it reaches a line with a breakpoint set. Using the debugger, you can see the state of any variable’s value at any point during the program’s lifetime.
These debugging tools and techniques will help you write programs that work. Accidentally introducing bugs into your code is a fact of life, no matter how many years of coding experience you have.
Practice Questions
Q:
1. Write an assert statement that triggers an AssertionError if the variable spam is an integer less than 10.
Q:
2. Write an assert statement that triggers an AssertionError if the variables eggs and bacon contain strings that are the same as each other, even if their cases are different (that is, 'hello' and 'hello' are considered the same, and 'goodbye' and 'GOODbye' are also considered the same).
Q:
3. Write an assert statement that always triggers an AssertionError.
Q:
4. What are the two lines that your program must have in order to be able to call logging.debug()?
Q:
5. What are the two lines that your program must have in order to have logging.debug() send a logging message to a file named programLog.txt?
Q:
6. What are the five logging levels?
Q:
7. What line of code can you add to disable all logging messages in your program?
Q:
8. Why is using logging messages better than using print() to display the same message?
Q:
9. What are the differences between the Step, Over, and Out buttons in the Debug Control window?
Q:
10. After you click Go in the Debug Control window, when will the debugger stop?
Q:
11. What is a breakpoint?
Q:
12. How do you set a breakpoint on a line of code in IDLE?
Practice Project
For practice, write a program that does the following.
Debugging Coin Toss
The following program is meant to be a simple coin toss guessing game. The player gets two guesses (it’s an easy game). However, the program has several bugs in it. Run through the program a few times to find the bugs that keep the program from working correctly.
import random guess = '' while guess not in ('heads', 'tails'): print('Guess the coin toss! Enter heads or tails:') guess = input() toss = random.randint(0, 1) # 0 is tails, 1 is heads if toss == guess: print('You got it!') else: print('Nope! Guess again!') guesss = input() if toss == guess: print('You got it!') else: print('Nope. You are really bad at this game.')
Chapter 11. Web Scraping
In those rare, terrifying moments when I’m without Wi-Fi, I realize just how much of what I do on the computer is really what I do on the Internet. Out of sheer habit I’ll find myself trying to check email, read friends’ Twitter feeds, or answer the question, “Did Kurtwood Smith have any major roles before he was in the original 1987 RoboCop?”[2]
Since so much work on a computer involves going on the Internet, it’d be great if your programs could get online. Web scraping is the term for using a program to download and process content from the Web. For example, Google runs many web scraping programs to index web pages for its search engine. In this chapter, you will learn about several modules that make it easy to scrape web pages in Python.
webbrowser. Comes with Python and opens a browser to a specific page.
Requests. Downloads files and web pages from the Internet.
Beautiful Soup. Parses HTML, the format that web pages are written in.
Selenium. Launches and controls a web browser. Selenium is able to fill in forms and simulate mouse clicks in this browser.
Project: mapit.py with the webbrowser Module
The webbrowser module’s open() function can launch a new browser to a specified URL. Enter the following into the interactive shell:
>>> import webbrowser >>> webbrowser.open('http://inventwithpython.com/')
A web browser tab will open to the URL http://inventwithpython.com/. This is about the only thing the webbrowser module can do. Even so, the open() function does make some interesting things possible. For example, it’s tedious to copy a street address to the clipboard and bring up a map of it on Google Maps. You could take a few steps out of this task by writing a simple script to automatically launch the map in your browser using the contents of your clipboard. This way, you only have to copy the address to a clipboard and run the script, and the map will be loaded for you.
This is what your program does:
Gets a street address from the command line arguments or clipboard.
Opens the web browser to the Google Maps page for the address.
This means your code will need to do the following:
Read the command line arguments from sys.argv.
Read the clipboard conte
nts.
Call the webbrowser.open() function to open the web browser.
Open a new file editor window and save it as mapIt.py.
Step 1: Figure Out the URL
Based on the instructions in Appendix B, set up mapIt.py so that when you run it from the command line, like so...
C:> mapit 870 Valencia St, San Francisco, CA 94110
... the script will use the command line arguments instead of the clipboard. If there are no command line arguments, then the program will know to use the contents of the clipboard.
First you need to figure out what URL to use for a given street address. When you load http://maps.google.com/ in the browser and search for an address, the URL in the address bar looks something like this: https://www.google.com/maps/place/870+Valencia+St/@37.7590311,-122.4215096,17z/data=!3m1!4b1!4m2!3m1!1s0x808f7e3dadc07a37:0xc86b0b2bb93b73d8.
The address is in the URL, but there’s a lot of additional text there as well. Websites often add extra data to URLs to help track visitors or customize sites. But if you try just going to https://www.google.com/maps/place/870+Valencia+St+San+Francisco+CA/, you’ll find that it still brings up the correct page. So your program can be set to open a web browser to 'https://www.google.com/maps/place/your_address_string' (where your_address_string is the address you want to map).
Step 2: Handle the Command Line Arguments
Make your code look like this:
#! python3 # mapIt.py - Launches a map in the browser using an address from the # command line or clipboard. import webbrowser, sys if len(sys.argv) > 1: # Get address from command line. address = ' '.join(sys.argv[1:]) # TODO: Get address from clipboard.
After the program’s #! shebang line, you need to import the webbrowser module for launching the browser and import the sys module for reading the potential command line arguments. The sys.argv variable stores a list of the program’s filename and command line arguments. If this list has more than just the filename in it, then len(sys.argv) evaluates to an integer greater than 1, meaning that command line arguments have indeed been provided.
Command line arguments are usually separated by spaces, but in this case, you want to interpret all of the arguments as a single string. Since sys.argv is a list of strings, you can pass it to the join() method, which returns a single string value. You don’t want the program name in this string, so instead of sys.argv, you should pass sys.argv[1:] to chop off the first element of the array. The final string that this expression evaluates to is stored in the address variable.
If you run the program by entering this into the command line...
mapit 870 Valencia St, San Francisco, CA 94110
... the sys.argv variable will contain this list value:
['mapIt.py', '870', 'Valencia', 'St, ', 'San', 'Francisco, ', 'CA', '94110']
The address variable will contain the string '870 Valencia St, San Francisco, CA 94110'.
Step 3: Handle the Clipboard Content and Launch the Browser
Make your code look like the following:
#! python3 # mapIt.py - Launches a map in the browser using an address from the # command line or clipboard. import webbrowser, sys, pyperclip if len(sys.argv) > 1: # Get address from command line. address = ' '.join(sys.argv[1:]) else: # Get address from clipboard. address = pyperclip.paste() webbrowser.open('https://www.google.com/maps/place/' + address)
If there are no command line arguments, the program will assume the address is stored on the clipboard. You can get the clipboard content with pyperclip.paste() and store it in a variable named address. Finally, to launch a web browser with the Google Maps URL, call webbrowser.open().
While some of the programs you write will perform huge tasks that save you hours, it can be just as satisfying to use a program that conveniently saves you a few seconds each time you perform a common task, such as getting a map of an address. Table 11-1 compares the steps needed to display a map with and without mapIt.py.
Table 11-1. Getting a Map with and Without mapIt.py
Manually getting a map
Using mapIt.py
Highlight the address.
Highlight the address.
Copy the address.
Copy the address.
Open the web browser.
Run mapIt.py.
Go to http://maps.google.com/.
Click the address text field.
Paste the address.
Press ENTER.
See how mapIt.py makes this task less tedious?
Ideas for Similar Programs
As long as you have a URL, the webbrowser module lets users cut out the step of opening the browser and directing themselves to a website. Other programs could use this functionality to do the following:
Open all links on a page in separate browser tabs.
Open the browser to the URL for your local weather.
Open several social network sites that you regularly check.
Downloading Files from the Web with the requests Module
The requests module lets you easily download files from the Web without having to worry about complicated issues such as network errors, connection problems, and data compression. The requests module doesn’t come with Python, so you’ll have to install it first. From the command line, run pip install requests. (Appendix A has additional details on how to install third-party modules.)
The requests module was written because Python’s urllib2 module is too complicated to use. In fact, take a permanent marker and black out this entire paragraph. Forget I ever mentioned urllib2. If you need to download things from the Web, just use the requests module.
Next, do a simple test to make sure the requests module installed itself correctly. Enter the following into the interactive shell:
>>> import requests
If no error messages show up, then the requests module has been successfully installed.
Downloading a Web Page with the requests.get() Function
The requests.get() function takes a string of a URL to download. By calling type() on requests.get()’s return value, you can see that it returns a Response object, which contains the response that the web server gave for your request. I’ll explain the Response object in more detail later, but for now, enter the following into the interactive shell while your computer is connected to the Internet:
>>> import requests ➊ >>> res = requests.get('https://automatetheboringstuff.com/files/rj.txt') >>> type(res)
The URL goes to a text web page for the entire play of Romeo and Juliet, provided on this book’s site ➊. You can tell that the request for this web page succeeded by checking the status_code attribute of the Response object. If it is equal to the value of requests.codes.ok, then everything went fine ➋. (Incidentally, the status code for “OK” in the HTTP protocol is 200. You may already be familiar with the 404 status code for “Not Found.”)
If the request succeeded, the downloaded web page is stored as a string in the Response object’s text variable. This variable holds a large string of the entire play; the call to len(res.text) shows you that it is more than 178,000 characters long. Finally, calling print(res.text[:250]) displays only the first 250 characters.
Checking for Errors
As you’ve seen, the Response object has a status_code attribute that can be checked against requests.codes.ok to see whether the download succeeded. A simpler way to check for success is to call the raise_for_status() method on the Response object. This will raise an exception if there was an error downloading the file and will do nothing if the download succeeded. Enter the following into the interactive shell:
>>> res = requests.get('http://inventwithpython.com/page_that_does
_not_exist') >>> res.raise_for_status() Traceback (most recent call last): File "
The raise_for_status() method is a good way to ensure that a program halts if a bad download occurs. This is a good thing: You want your program to stop as soon as some unexpected error happens. If a failed download isn’t a deal breaker for your program, you can wrap the raise_for_status() line with try and except statements to handle this error case without crashing.
import requests res = requests.get('http://inventwithpython.com/page_that_does_not_exist') try: res.raise_for_status() except Exception as exc: print('There was a problem: %s' % (exc))
This raise_for_status() method call causes the program to output the following: