Automate the Boring Stuff with Python
Page 21
Now that the new ZIP file’s name is stored in the zipFilename variable, you can call zipfile.ZipFile() to actually create the ZIP file ➊. Be sure to pass 'w' as the second argument so that the ZIP file is opened in write mode.
Step 3: Walk the Directory Tree and Add to the ZIP File
Now you need to use the os.walk() function to do the work of listing every file in the folder and its subfolders. Make your program look like the following:
#! python3 # backupToZip.py - Copies an entire folder and its contents into # a ZIP file whose filename increments. --snip-- # Walk the entire folder tree and compress the files in each folder. ➊ for foldername, subfolders, filenames in os.walk(folder): print('Adding files in %s...' % (foldername)) # Add the current folder to the ZIP file. ➋ backupZip.write(foldername) # Add all the files in this folder to the ZIP file. ➌ for filename in filenames: newBase / os.path.basename(folder) + '_' if filename.startswith(newBase) and filename.endswith('.zip') continue # don't backup the backup ZIP files backupZip.write(os.path.join(foldername, filename)) backupZip.close() print('Done.') backupToZip('C:\delicious')
You can use os.walk() in a for loop ➊, and on each iteration it will return the iteration’s current folder name, the subfolders in that folder, and the filenames in that folder.
In the for loop, the folder is added to the ZIP file ➋. The nested for loop can go through each filename in the filenames list ➌. Each of these is added to the ZIP file, except for previously made backup ZIPs.
When you run this program, it will produce output that will look something like this:
Creating delicious_1.zip... Adding files in C:delicious... Adding files in C:deliciouscats... Adding files in C:deliciouswaffles... Adding files in C:deliciouswalnut... Adding files in C:deliciouswalnutwaffles... Done.
The second time you run it, it will put all the files in C:delicious into a ZIP file named delicious_2.zip, and so on.
Ideas for Similar Programs
You can walk a directory tree and add files to compressed ZIP archives in several other programs. For example, you can write programs that do the following:
Walk a directory tree and archive just files with certain extensions, such as .txt or .py, and nothing else
Walk a directory tree and archive every file except the .txt and .py ones
Find the folder in a directory tree that has the greatest number of files or the folder that uses the most disk space
Summary
Even if you are an experienced computer user, you probably handle files manually with the mouse and keyboard. Modern file explorers make it easy to work with a few files. But sometimes you’ll need to perform a task that would take hours using your computer’s file explorer.
The os and shutil modules offer functions for copying, moving, renaming, and deleting files. When deleting files, you might want to use the send2trash module to move files to the recycle bin or trash rather than permanently deleting them. And when writing programs that handle files, it’s a good idea to comment out the code that does the actual copy/move/ rename/delete and add a print() call instead so you can run the program and verify exactly what it will do.
Often you will need to perform these operations not only on files in one folder but also on every folder in that folder, every folder in those folders, and so on. The os.walk() function handles this trek across the folders for you so that you can concentrate on what your program needs to do with the files in them.
The zipfile module gives you a way of compressing and extracting files in .zip archives through Python. Combined with the file-handling functions of os and shutil, zipfile makes it easy to package up several files from anywhere on your hard drive. These .zip files are much easier to upload to websites or send as email attachments than many separate files.
Previous chapters of this book have provided source code for you to copy. But when you write your own programs, they probably won’t come out perfectly the first time. The next chapter focuses on some Python modules that will help you analyze and debug your programs so that you can quickly get them working correctly.
Practice Questions
Q:
1. What is the difference between shutil.copy() and shutil.copytree()?
Q:
2. What function is used to rename files?
Q:
3. What is the difference between the delete functions in the send2trash and shutil modules?
Q:
4. ZipFile objects have a close() method just like File objects’ close() method. What ZipFile method is equivalent to File objects’ open() method?
Practice Projects
For practice, write programs to do the following tasks.
Selective Copy
Write a program that walks through a folder tree and searches for files with a certain file extension (such as .pdf or .jpg). Copy these files from whatever location they are in to a new folder.
Deleting Unneeded Files
It’s not uncommon for a few unneeded but humongous files or folders to take up the bulk of the space on your hard drive. If you’re trying to free up room on your computer, you’ll get the most bang for your buck by deleting the most massive of the unwanted files. But first you have to find them.
Write a program that walks through a folder tree and searches for exceptionally large files or folders—say, ones that have a file size of more than 100MB. (Remember, to get a file’s size, you can use os.path.getsize() from the os module.) Print these files with their absolute path to the screen.
Filling in the Gaps
Write a program that finds all files with a given prefix, such as spam001.txt, spam002.txt, and so on, in a single folder and locates any gaps in the numbering (such as if there is a spam001.txt and spam003.txt but no spam002.txt). Have the program rename all the later files to close this gap.
As an added challenge, write another program that can insert gaps into numbered files so that a new file can be added.
Chapter 10. Debugging
Now that you know enough to write more complicated programs, you may start finding not-so-simple bugs in them. This chapter covers some tools and techniques for finding the root cause of bugs in your program to help you fix bugs faster and with less effort.
To paraphrase an old joke among programmers, “Writing code accounts for 90 percent of programming. Debugging code accounts for the other 90 percent.”
Your computer will do only what you tell it to do; it won’t read your mind and do what you intended it to do. Even professional programmers create bugs all the time, so don’t feel discouraged if your program has a problem.
Fortunately, there are a few tools and techniques to identify what exactly your code is doing and where it’s going wrong. First, you will look at logging and assertions, two features that can help you detect bugs early. In general, the earlier you catch bugs, the easier they will be to fix.
Second, you will look at how to use the debugger. The debugger is a feature of IDLE that executes a program one instruction at a time, giving you a chance to inspect the values in variables while your code runs, and track how the values change over the course of your program. This is much slower than running the program at full speed, but it is helpful to see the actual values in a program while it runs, rather than deducing what the values might be from the source code.
Raising Exceptions
Python raises an exception whenever it tries to execute invalid code. In Chapter 3, you read about how to handle Python’s exceptions with try and except statements so that your program can recover from exceptions that you anticipated. But you can also raise your own exceptions in your code. Raising an exception is a way of saying, “Stop running the code in this function and move the program execution to the except statement.”
Exceptions are raised with a raise statement. In code, a raise statement consists of the following:
The raise keyword
A call to the Exception() function
A string with a helpful error m
essage passed to the Exception() function
For example, enter the following into the interactive shell:
>>> raise Exception('This is the error message.') Traceback (most recent call last): File "
If there are no try and except statements covering the raise statement that raised the exception, the program simply crashes and displays the exception’s error message.
Often it’s the code that calls the function, not the fuction itself, that knows how to handle an expection. So you will commonly see a raise statement inside a function and the try and except statements in the code calling the function. For example, open a new file editor window, enter the following code, and save the program as boxPrint.py:
def boxPrint(symbol, width, height): if len(symbol) != 1: ➊ raise Exception('Symbol must be a single character string.') if width <= 2: ➋ raise Exception('Width must be greater than 2.') if height <= 2: ➌ raise Exception('Height must be greater than 2.') print(symbol * width) for i in range(height - 2): print(symbol + (' ' * (width - 2)) + symbol) print(symbol * width) for sym, w, h in (('*', 4, 4), ('O', 20, 5), ('x', 1, 3), ('ZZ', 3, 3)): try: boxPrint(sym, w, h) ➍ except Exception as err: ➎ print('An exception happened: ' + str(err))
Here we’ve defined a boxPrint() function that takes a character, a width, and a height, and uses the character to make a little picture of a box with that width and height. This box shape is printed to the screen.
Say we want the character to be a single character, and the width and height to be greater than 2. We add if statements to raise exceptions if these requirements aren’t satisfied. Later, when we call boxPrint() with various arguments, our try/except will handle invalid arguments.
This program uses the except Exception as err form of the except statement ➍. If an Exception object is returned from boxPrint() ➊➋➌, this except statement will store it in a variable named err. The Exception object can then be converted to a string by passing it to str() to produce a user-friendly error message ➎. When you run this boxPrint.py, the output will look like this:
**** * * * * **** OOOOOOOOOOOOOOOOOOOO O O O O O O OOOOOOOOOOOOOOOOOOOO An exception happened: Width must be greater than 2. An exception happened: Symbol must be a single character string.
Using the try and except statements, you can handle errors more gracefully instead of letting the entire program crash.
Getting the Traceback as a String
When Python encounters an error, it produces a treasure trove of error information called the traceback. The traceback includes the error message, the line number of the line that caused the error, and the sequence of the function calls that led to the error. This sequence of calls is called the call stack.
Open a new file editor window in IDLE, enter the following program, and save it as errorExample.py:
def spam(): bacon() def bacon(): raise Exception('This is the error message.') spam()
When you run errorExample.py, the output will look like this:
Traceback (most recent call last): File "errorExample.py", line 7, in
From the traceback, you can see that the error happened on line 5, in the bacon() function. This particular call to bacon() came from line 2, in the spam() function, which in turn was called on line 7. In programs where functions can be called from multiple places, the call stack can help you determine which call led to the error.
The traceback is displayed by Python whenever a raised exception goes unhandled. But you can also obtain it as a string by calling traceback.format_exc(). This function is useful if you want the information from an exception’s traceback but also want an except statement to gracefully handle the exception. You will need to import Python’s traceback module before calling this function.
For example, instead of crashing your program right when an exception occurs, you can write the traceback information to a log file and keep your program running. You can look at the log file later, when you’re ready to debug your program. Enter the following into the interactive shell:
>>> import traceback >>> try: raise Exception('This is the error message.') except: errorFile = open('errorInfo.txt', 'w') errorFile.write(traceback.format_exc()) errorFile.close() print('The traceback info was written to errorInfo.txt.') 116 The traceback info was written to errorInfo.txt.
The 116 is the return value from the write() method, since 116 characters were written to the file. The traceback text was written to errorInfo.txt.
Traceback (most recent call last): File "
Assertions
An assertion is a sanity check to make sure your code isn’t doing something obviously wrong. These sanity checks are performed by assert statements. If the sanity check fails, then an AssertionError exception is raised. In code, an assert statement consists of the following:
The assert keyword
A condition (that is, an expression that evaluates to True or False)
A comma
A string to display when the condition is False
For example, enter the following into the interactive shell:
>>> podBayDoorStatus = 'open' >>> assert podBayDoorStatus == 'open', 'The pod bay doors need to be "open".' >>> podBayDoorStatus = 'I'm sorry, Dave. I'm afraid I can't do that.'' >>> assert podBayDoorStatus == 'open', 'The pod bay doors need to be "open".' Traceback (most recent call last): File "
Here we’ve set podBayDoorStatus to 'open', so from now on, we fully expect the value of this variable to be 'open'. In a program that uses this variable, we might have written a lot of code under the assumption that the value is 'open'—code that depends on its being 'open' in order to work as we expect. So we add an assertion to make sure we’re right to assume podBayDoorStatus is 'open'. Here, we include the message 'The pod bay doors need to be "open".' so it’ll be easy to see what’s wrong if the assertion fails.
Later, say we make the obvious mistake of assigning podBayDoorStatus another value, but don’t notice it among many lines of code. The assertion catches this mistake and clearly tells us what’s wrong.
In plain English, an assert statement says, “I assert that this condition holds true, and if not, there is a bug somewhere in the program.” Unlike exceptions, your code should not handle assert statements with try and except; if an assert fails, your program should crash. By failing fast like this, you shorten the time between the original cause of the bug and when you first notice the bug. This will reduce the amount of code you will have to check before finding the code that’s causing the bug.
Assertions are for programmer errors, not user errors. For errors that can be recovered from (such as a file not being found or the user entering invalid data), raise an exception instead of detecting it with an assert statement.
Using an Assertion in a Traffic Light Simulation
Say you’re building a traffic light simulation program. The data structure representing the stoplights at an intersection is a dictionary with keys 'ns' and 'ew', for the stoplights facing north-south and east-west, respectively. The values at these keys will be one of the strings 'green', ' yellow', or 'red'. The code would look something like this:
market_2nd = {'ns': 'green', 'ew': 'red'} mission_16th = {'ns': 'red', 'ew': 'green'}
These two variables will be for the intersections of Market Street and 2nd Street, and Mission Street and 16th Street. To start the project, you want to write a switchLights() function, which will take an intersection dictionary as an argument and switch the lights.
At first, you might think that switchLights() should simply switch each l
ight to the next color in the sequence: Any 'green' values should change to 'yellow', 'yellow' values should change to 'red', and 'red' values should change to 'green'. The code to implement this idea might look like this:
def switchLights(stoplight): for key in stoplight.keys(): if stoplight[key] == 'green': stoplight[key] = 'yellow' elif stoplight[key] == 'yellow': stoplight[key] = 'red' elif stoplight[key] == 'red': stoplight[key] = 'green' switchLights(market_2nd)
You may already see the problem with this code, but let’s pretend you wrote the rest of the simulation code, thousands of lines long, without noticing it. When you finally do run the simulation, the program doesn’t crash—but your virtual cars do!
Since you’ve already written the rest of the program, you have no idea where the bug could be. Maybe it’s in the code simulating the cars or in the code simulating the virtual drivers. It could take hours to trace the bug back to the switchLights() function.