Automate the Boring Stuff with Python
Page 34
2. What do you pass to csv.reader() and csv.writer() to create Reader and Writer objects?
Q:
3. What modes do File objects for reader and Writer objects need to be opened in?
Q:
4. What method takes a list argument and writes it to a CSV file?
Q:
5. What do the delimiter and lineterminator keyword arguments do?
Q:
6. What function takes a string of JSON data and returns a Python data structure?
Q:
7. What function takes a Python data structure and returns a string of JSON data?
Practice Project
For practice, write a program that does the following.
Excel-to-CSV Converter
Excel can save a spreadsheet to a CSV file with a few mouse clicks, but if you had to convert hundreds of Excel files to CSVs, it would take hours of clicking. Using the openpyxl module from Chapter 12, write a program that reads all the Excel files in the current working directory and outputs them as CSV files.
A single Excel file might contain multiple sheets; you’ll have to create one CSV file per sheet. The filenames of the CSV files should be
This program will involve many nested for loops. The skeleton of the program will look something like this:
for excelFile in os.listdir('.'): # Skip non-xlsx files, load the workbook object. for sheetName in wb.get_sheet_names(): # Loop through every sheet in the workbook. sheet = wb.get_sheet_by_name(sheetName) # Create the CSV filename from the Excel filename and sheet title. # Create the csv.writer object for this CSV file. # Loop through every row in the sheet. for rowNum in range(1, sheet.get_highest_row() + 1): rowData = [] # append each cell to this list # Loop through each cell in the row. for colNum in range(1, sheet.get_highest_column() + 1): # Append each cell's data to rowData. # Write the rowData list to the CSV file. csvFile.close()
Download the ZIP file excelSpreadsheets.zip from http://nostarch.com/automatestuff/, and unzip the spreadsheets into the same directory as your program. You can use these as the files to test the program on.
Chapter 15. Keeping Time, Scheduling Tasks, and Launching Programs
Running programs while you’re sitting at your computer is fine, but it’s also useful to have programs run without your direct supervision. Your computer’s clock can schedule programs to run code at some specified time and date or at regular intervals. For example, your program could scrape a website every hour to check for changes or do a CPU-intensive task at 4 AM while you sleep. Python’s time and datetime modules provide these functions.
You can also write programs that launch other programs on a schedule by using the subprocess and threading modules. Often, the fastest way to program is to take advantage of applications that other people have already written.
The time Module
Your computer’s system clock is set to a specific date, time, and time zone. The built-in time module allows your Python programs to read the system clock for the current time. The time.time() and time.sleep() functions are the most useful in the time module.
The time.time() Function
The Unix epoch is a time reference commonly used in programming: 12 AM on January 1, 1970, Coordinated Universal Time (UTC). The time.time() function returns the number of seconds since that moment as a float value. (Recall that a float is just a number with a decimal point.) This number is called an epoch timestamp. For example, enter the following into the interactive shell:
>>> import time >>> time.time() 1425063955.068649
Here I’m calling time.time() on February 27, 2015, at 11:05 Pacific Standard Time, or 7:05 PM UTC. The return value is how many seconds have passed between the Unix epoch and the moment time.time() was called.
Note
The interactive shell examples will yield dates and times for when I wrote this chapter in February 2015. Unless you’re a time traveler, your dates and times will be different.
Epoch timestamps can be used to profile code, that is, measure how long a piece of code takes to run. If you call time.time() at the beginning of the code block you want to measure and again at the end, you can subtract the first timestamp from the second to find the elapsed time between those two calls. For example, open a new file editor window and enter the following program:
import time ➊ def calcProd(): # Calculate the product of the first 100,000 numbers. product = 1 for i in range(1, 100000): product = product * i return product ➋ startTime = time.time() prod = calcProd() ➌ endTime = time.time() ➍ print('The result is %s digits long.' % (len(str(prod)))) ➎ print('Took %s seconds to calculate.' % (endTime - startTime))
At ➊, we define a function calcProd() to loop through the integers from 1 to 99,999 and return their product. At ➋, we call time.time() and store it in startTime. Right after calling calcProd(), we call time.time() again and store it in endTime ➌. We end by printing the length of the product returned by calcProd() ➍ and how long it took to run calcProd() ➎.
Save this program as calcProd.py and run it. The output will look something like this:
The result is 456569 digits long. Took 2.844162940979004 seconds to calculate.
Note
Another way to profile your code is to use the cProfile.run() function, which provides a much more informative level of detail than the simple time.time() technique. The cProfile.run() function is explained at https://docs.python.org/3/library/profile.html.
The time.sleep() Function
If you need to pause your program for a while, call the time.sleep() function and pass it the number of seconds you want your program to stay paused. Enter the following into the interactive shell:
>>> import time >>> for i in range(3): ➊ print('Tick') ➋ time.sleep(1) ➌ print('Tock') ➍ time.sleep(1) Tick Tock Tick Tock Tick Tock ➎ >>> time.sleep(5)
The for loop will print Tick ➊, pause for one second ➋, print Tock ➌, pause for one second ➍, print Tick, pause, and so on until Tick and Tock have each been printed three times.
The time.sleep() function will block—that is, it will not return and release your program to execute other code—until after the number of seconds you passed to time.sleep() has elapsed. For example, if you enter time.sleep(5) ➎, you’ll see that the next prompt (>>>) doesn’t appear until five seconds have passed.
Be aware that pressing CTRL-C will not interrupt time.sleep() calls in IDLE. IDLE waits until the entire pause is over before raising the KeyboardInterrupt exception. To work around this problem, instead of having a single time.sleep(30) call to pause for 30 seconds, use a for loop to make 30 calls to time.sleep(1).
>>> for i in range(30): time.sleep(1)
If you press CTRL-C sometime during these 30 seconds, you should see the KeyboardInterrupt exception thrown right away.
Rounding Numbers
When working with times, you’ll often encounter float values with many digits after the decimal. To make these values easier to work with, you can shorten them with Python’s built-in round() function, which rounds a float to the precision you specify. Just pass in the number you want to round, plus an optional second argument representing how many digits after the decimal point you want to round it to. If you omit the second argument, round() rounds your number to the nearest whole integer. Enter the following into the interactive shell:
>>> import time >>> now = time.time() >>> now 1425064108.017826 >>> round(now, 2) 1425064108.02 >>> round(now, 4) 1425064108.0178 >>> round(now) 1425064108
After importing time and storing time.time() in now, we call round(now, 2) to round now to two digits after the decimal, round(now, 4) to round to four digits after the decimal, and round(now) to round to the nearest integer.
Project: Super Stopwatch
Say you want to track how much time you
spend on boring tasks you haven’t automated yet. You don’t have a physical stopwatch, and it’s surprisingly difficult to find a free stopwatch app for your laptop or smartphone that isn’t covered in ads and doesn’t send a copy of your browser history to marketers. (It says it can do this in the license agreement you agreed to. You did read the license agreement, didn’t you?) You can write a simple stopwatch program yourself in Python.
At a high level, here’s what your program will do:
Track the amount of time elapsed between presses of the ENTER key, with each key press starting a new “lap” on the timer.
Print the lap number, total time, and lap time.
This means your code will need to do the following:
Find the current time by calling time.time() and store it as a timestamp at the start of the program, as well as at the start of each lap.
Keep a lap counter and increment it every time the user presses ENTER.
Calculate the elapsed time by subtracting timestamps.
Handle the KeyboardInterrupt exception so the user can press CTRL-C to quit.
Open a new file editor window and save it as stopwatch.py.
Step 1: Set Up the Program to Track Times
The stopwatch program will need to use the current time, so you’ll want to import the time module. Your program should also print some brief instructions to the user before calling input(), so the timer can begin after the user presses ENTER. Then the code will start tracking lap times.
Enter the following code into the file editor, writing a TODO comment as a placeholder for the rest of the code:
#! python3 # stopwatch.py - A simple stopwatch program. import time # Display the program's instructions. print('Press ENTER to begin. Afterwards, press ENTER to "click" the stopwatch. Press Ctrl-C to quit.') input() # press Enter to begin print('Started.') startTime = time.time() # get the first lap's start time lastTime = startTime lapNum = 1 # TODO: Start tracking the lap times.
Now that we’ve written the code to display the instructions, start the first lap, note the time, and set our lap count to 1.
Step 2: Track and Print Lap Times
Now let’s write the code to start each new lap, calculate how long the previous lap took, and calculate the total time elapsed since starting the stopwatch. We’ll display the lap time and total time and increase the lap count for each new lap. Add the following code to your program:
#! python3 # stopwatch.py - A simple stopwatch program. import time --snip-- # Start tracking the lap times. ➊ try: ➋ while True: input() ➌ lapTime = round(time.time() - lastTime, 2) ➍ totalTime = round(time.time() - startTime, 2) ➎ print('Lap #%s: %s (%s)' % (lapNum, totalTime, lapTime), end='') lapNum += 1 lastTime = time.time() # reset the last lap time ➏ except KeyboardInterrupt: # Handle the Ctrl-C exception to keep its error message from displaying. print('nDone.')
If the user presses CTRL-C to stop the stopwatch, the KeyboardInterrupt exception will be raised, and the program will crash if its execution is not a try statement. To prevent crashing, we wrap this part of the program in a try statement ➊. We’ll handle the exception in the except clause ➏, so when CTRL-C is pressed and the exception is raised, the program execution moves to the except clause to print Done, instead of the KeyboardInterrupt error message. Until this happens, the execution is inside an infinite loop ➋ that calls input() and waits until the user presses ENTER to end a lap. When a lap ends, we calculate how long the lap took by subtracting the start time of the lap, lastTime, from the current time, time.time() ➌. We calculate the total time elapsed by subtracting the overall start time of the stopwatch, startTime, from the current time ➍.
Since the results of these time calculations will have many digits after the decimal point (such as 4.766272783279419), we use the round() function to round the float value to two digits at ➌ and ➍.
At ➎, we print the lap number, total time elapsed, and the lap time. Since the user pressing ENTER for the input() call will print a newline to the screen, pass end='' to the print() function to avoid double-spacing the output. After printing the lap information, we get ready for the next lap by adding 1 to the count lapNum and setting lastTime to the current time, which is the start time of the next lap.
Ideas for Similar Programs
Time tracking opens up several possibilities for your programs. Although you can download apps to do some of these things, the benefit of writing programs yourself is that they will be free and not bloated with ads and useless features. You could write similar programs to do the following:
Create a simple timesheet app that records when you type a person’s name and uses the current time to clock them in or out.
Add a feature to your program to display the elapsed time since a process started, such as a download that uses the requests module. (See Chapter 11.)
Intermittently check how long a program has been running and offer the user a chance to cancel tasks that are taking too long.
The datetime Module
The time module is useful for getting a Unix epoch timestamp to work with. But if you want to display a date in a more convenient format, or do arithmetic with dates (for example, figuring out what date was 205 days ago or what date is 123 days from now), you should use the datetime module.
The datetime module has its own datetime data type. datetime values represent a specific moment in time. Enter the following into the interactive shell:
>>> import datetime ➊ >>> datetime.datetime.now() ➋ datetime.datetime(2015, 2, 27, 11, 10, 49, 55, 53) ➌ >>> dt = datetime.datetime(2015, 10, 21, 16, 29, 0) ➍ >>> dt.year, dt.month, dt.day (2015, 10, 21) ➎ >>> dt.hour, dt.minute, dt.second (16, 29, 0)
Calling datetime.datetime.now() ➊ returns a datetime object ➋ for the current date and time, according to your computer’s clock. This object includes the year, month, day, hour, minute, second, and microsecond of the current moment. You can also retrieve a datetime object for a specific moment by using the datetime.datetime() function ➌, passing it integers representing the year, month, day, hour, and second of the moment you want. These integers will be stored in the datetime object’s year, month, day ➍, hour, minute, and second ➎ attributes.
A Unix epoch timestamp can be converted to a datetime object with the datetime.datetime.fromtimestamp() function. The date and time of the datetime object will be converted for the local time zone. Enter the following into the interactive shell:
>>> datetime.datetime.fromtimestamp(1000000) datetime.datetime(1970, 1, 12, 5, 46, 40) >>> datetime.datetime.fromtimestamp(time.time()) datetime.datetime(2015, 2, 27, 11, 13, 0, 604980)
Calling datetime.datetime.fromtimestamp() and passing it 1000000 returns a datetime object for the moment 1,000,000 seconds after the Unix epoch. Passing time.time(), the Unix epoch timestamp for the current moment, returns a datetime object for the current moment. So the expressions datetime.datetime.now() and datetime.datetime.fromtimestamp(time.time()) do the same thing; they both give you a datetime object for the present moment.
Note
These examples were entered on a computer set to Pacific Standard Time. If you’re in another time zone, your results will look different.
datetime objects can be compared with each other using comparison operators to find out which one precedes the other. The later datetime object is the “greater” value. Enter the following into the interactive shell:
➊ >>> halloween2015 = datetime.datetime(2015, 10, 31, 0, 0, 0) ➋ >>> newyears2016 = datetime.datetime(2016, 1, 1, 0, 0, 0) >>> oct31_2015 = datetime.datetime(2015, 10, 31, 0, 0, 0) ➌ >>> halloween2015 == oct31_2015 True ➍ >>> halloween2015 > newyears2016 False ➎ >>> newyears2016 > halloween2015 True >>> newyears2016 != oct31_2015 True
Make a datetime object for the first moment (midnight) of October 31, 2015 and store it in halloween2015 ➊. Make a datetime object for the first moment of January 1, 2016 and store it in newyears2016 ➋. Then make another object for midnight on October 31, 2015 and store it in oct31_2015. C
omparing halloween2015 and oct31_2015 shows that they’re equal ➌. Comparing newyears2016 and halloween2015 shows that newyears2016 is greater (later) than halloween2015 ➍ ➎.
The timedelta Data Type
The datetime module also provides a timedelta data type, which represents a duration of time rather than a moment in time. Enter the following into the interactive shell:
➊ >>> delta = datetime.timedelta(days=11, hours=10, minutes=9, seconds=8) ➋ >>> delta.days, delta.seconds, delta.microseconds (11, 36548, 0) >>> delta.total_seconds() 986948.0 >>> str(delta) '11 days, 10:09:08'
To create a timedelta object, use the datetime.timedelta() function. The datetime.timedelta() function takes keyword arguments weeks, days, hours, minutes, seconds, milliseconds, and microseconds. There is no month or year keyword argument because “a month” or “a year” is a variable amount of time depending on the particular month or year. A timedelta object has the total duration represented in days, seconds, and microseconds. These numbers are stored in the days, seconds, and microseconds attributes, respectively. The total_seconds() method will return the duration in number of seconds alone. Passing a timedelta object to str() will return a nicely formatted, human-readable string representation of the object.