Automate the Boring Stuff with Python

Page 35

by Al Sweigart

In this example, we pass keyword arguments to datetime.delta() to specify a duration of 11 days, 10 hours, 9 minutes, and 8 seconds, and store the returned timedelta object in delta ➊. This timedelta object’s days attributes stores 11, and its seconds attribute stores 36548 (10 hours, 9 minutes, and 8 seconds, expressed in seconds) ➋. Calling total_seconds() tells us that 11 days, 10 hours, 9 minutes, and 8 seconds is 986,948 seconds. Finally, passing the timedelta object to str() returns a string clearly explaning the duration.

The arithmetic operators can be used to perform date arithmetic on datetime values. For example, to calculate the date 1,000 days from now, enter the following into the interactive shell:

>>> dt = datetime.datetime.now() >>> dt datetime.datetime(2015, 2, 27, 18, 38, 50, 636181) >>> thousandDays = datetime.timedelta(days=1000) >>> dt + thousandDays datetime.datetime(2017, 11, 23, 18, 38, 50, 636181)

First, make a datetime object for the current moment and store it in dt. Then make a timedelta object for a duration of 1,000 days and store it in thousandDays. Add dt and thousandDays together to get a datetime object for the date 1,000 days from now. Python will do the date arithmetic to figure out that 1,000 days after February 27, 2015, will be November 23, 2017. This is useful because when you calculate 1,000 days from a given date, you have to remember how many days are in each month and factor in leap years and other tricky details. The datetime module handles all of this for you.

timedelta objects can be added or subtracted with datetime objects or other timedelta objects using the + and - operators. A timedelta object can be multiplied or divided by integer or float values with the * and / operators. Enter the following into the interactive shell:

➊ >>> oct21st = datetime.datetime(2015, 10, 21, 16, 29, 0) ➋ >>> aboutThirtyYears = datetime.timedelta(days=365 * 30) >>> oct21st datetime.datetime(2015, 10, 21, 16, 29) >>> oct21st - aboutThirtyYears datetime.datetime(1985, 10, 28, 16, 29) >>> oct21st - (2 * aboutThirtyYears) datetime.datetime(1955, 11, 5, 16, 29)

Here we make a datetime object for October 21, 2015 ➊ and a timedelta object for a duration of about 30 years (we’re assuming 365 days for each of those years) ➋. Subtracting aboutThirtyYears from oct21st gives us a datetime object for the date 30 years before October 21, 2015. Subtracting 2 * aboutThirtyYears from oct21st returns a datetime object for the date 60 years before October 21, 2015.

Pausing Until a Specific Date

The time.sleep() method lets you pause a program for a certain number of seconds. By using a while loop, you can pause your programs until a specific date. For example, the following code will continue to loop until Halloween 2016:

import datetime import time halloween2016 = datetime.datetime(2016, 10, 31, 0, 0, 0) while datetime.datetime.now() < halloween2016: time.sleep(1)

The time.sleep(1) call will pause your Python program so that the computer doesn’t waste CPU processing cycles simply checking the time over and over. Rather, the while loop will just check the condition once per second and continue with the rest of the program after Halloween 2016 (or whenever you program it to stop).

Converting datetime Objects into Strings

Epoch timestamps and datetime objects aren’t very friendly to the human eye. Use the strftime() method to display a datetime object as a string. (The f in the name of the strftime() function stands for format.)

The strftime() method uses directives similar to Python’s string formatting. Table 15-1 has a full list of strftime() directives.

Table 15-1. strftime() Directives

strftime directive

Meaning

%Y

Year with century, as in '2014'

%y

Year without century, '00' to '99' (1970 to 2069)

%m

Month as a decimal number, '01' to '12'

%B

Full month name, as in 'November'

%b

Abbreviated month name, as in 'Nov'

%d

Day of the month, '01' to '31'

%j

Day of the year, '001' to '366'

%w

Day of the week, '0' (Sunday) to '6' (Saturday)

%A

Full weekday name, as in 'Monday'

%a

Abbreviated weekday name, as in 'Mon'

%H

Hour (24-hour clock), '00' to '23'

%I

Hour (12-hour clock), '01' to '12'

%M

Minute, '00' to '59'

%S

Second, '00' to '59'

%p

'AM' or 'PM'

%%

Literal '%' character

Pass strrftime() a custom format string containing formatting directives (along with any desired slashes, colons, and so on), and strftime() will return the datetime object’s information as a formatted string. Enter the following into the interactive shell:

>>> oct21st = datetime.datetime(2015, 10, 21, 16, 29, 0) >>> oct21st.strftime('%Y/%m/%d %H:%M:%S') '2015/10/21 16:29:00' >>> oct21st.strftime('%I:%M %p') '04:29 PM' >>> oct21st.strftime("%B of '%y") "October of '15"

Here we have a datetime object for October 21, 2015 at 4:29 PM, stored in oct21st. Passing strftime() the custom format string '%Y/%m/%d %H:%M:%S' returns a string containing 2015, 10, and 21 separated by slahes and 16, 29, and 00 separated by colons. Passing '%I:%M% p' returns '04:29 PM', and passing "%B of '%y" returns "October of '15". Note that strftime() doesn’t begin with datetime.datetime.

Converting Strings into datetime Objects

If you have a string of date information, such as '2015/10/21 16:29:00' or 'October 21, 2015', and need to convert it to a datetime object, use the datetime.datetime.strptime() function. The strptime() function is the inverse of the strftime() method. A custom format string using the same directives as strftime() must be passed so that strptime() knows how to parse and understand the string. (The p in the name of the strptime() function stands for parse.)

Enter the following into the interactive shell:

➊ >>> datetime.datetime.strptime('October 21, 2015', '%B %d, %Y') datetime.datetime(2015, 10, 21, 0, 0) >>> datetime.datetime.strptime('2015/10/21 16:29:00', '%Y/%m/%d %H:%M:%S') datetime.datetime(2015, 10, 21, 16, 29) >>> datetime.datetime.strptime("October of '15", "%B of '%y") datetime.datetime(2015, 10, 1, 0, 0) >>> datetime.datetime.strptime("November of '63", "%B of '%y") datetime.datetime(2063, 11, 1, 0, 0)

To get a datetime object from the string 'October 21, 2015', pass 'October 21, 2015' as the first argument to strptime() and the custom format string that corresponds to 'October 21, 2015' as the second argument ➊. The string with the date information must match the custom format string exactly, or Python will raise a ValueError exception.

Review of Python’s Time Functions

Dates and times in Python can involve quite a few different data types and functions. Here’s a review of the three different types of values used to represent time:

A Unix epoch timestamp (used by the time module) is a float or integer value of the number of seconds since 12 AM on January 1, 1970, UTC.

A datetime object (of the datetime module) has integers stored in the attributes year, month, day, hour, minute, and second.

A timedelta object (of the datetime module) represents a time duration, rather than a specific moment.

Here’s a review of time functions and their parameters and return values:

The time.time() function returns an epoch timestamp float value of the current moment.

The time.sleep(seconds) function stops the program for the amount of seconds specified by the seconds argument.

The datetime.datetime(year, month, day, hour, minute, second) function returns a datetime object of the moment specified by the arguments. If hour, minute, or second arguments are not provided, they default to 0.

The datetime.datetime.now() function returns a datetime object of the current moment.

The datetime.datetime.fromtimestamp(epoch) function returns a datetime object of the mo
ment represented by the epoch timestamp argument.

The datetime.timedelta(weeks, days, hours, minutes, seconds, milliseconds, microseconds) function returns a timedelta object representing a duration of time. The function’s keyword arguments are all optional and do not include month or year.

The total_seconds() method for timedelta objects returns the number of seconds the timedelta object represents.

The strftime(format) method returns a string of the time represented by the datetime object in a custom format that’s based on the format string. See Table 15-1 for the format details.

The datetime.datetime.strptime(time_string, format) function returns a datetime object of the moment specified by time_string, parsed using the format string argument. See Table 15-1 for the format details.

Multithreading

To introduce the concept of multithreading, let’s look at an example situation. Say you want to schedule some code to run after a delay or at a specific time. You could add code like the following at the start of your program:

import time, datetime startTime = datetime.datetime(2029, 10, 31, 0, 0, 0) while datetime.datetime.now() < startTime: time.sleep(1) print('Program now starting on Halloween 2029') --snip--

This code designates a start time of October 31, 2029, and keeps calling time.sleep(1) until the start time arrives. Your program cannot do anything while waiting for the loop of time.sleep() calls to finish; it just sits around until Halloween 2029. This is because Python programs by default have a single thread of execution.

To understand what a thread of execution is, remember the Chapter 2 discussion of flow control, when you imagined the execution of a program as placing your finger on a line of code in your program and moving to the next line or wherever it was sent by a flow control statement. A single-threaded program has only one finger. But a multithreaded program has multiple fingers. Each finger still moves to the next line of code as defined by the flow control statements, but the fingers can be at different places in the program, executing different lines of code at the same time. (All of the programs in this book so far have been single threaded.)

Rather than having all of your code wait until the time.sleep() function finishes, you can execute the delayed or scheduled code in a separate thread using Python’s threading module. The separate thread will pause for the time.sleep calls. Meanwhile, your program can do other work in the original thread.

To make a separate thread, you first need to make a Thread object by calling the threading.Thread() function. Enter the following code in a new file and save it as threadDemo.py:

import threading, time print('Start of program.') ➊ def takeANap(): time.sleep(5) print('Wake up!') ➋ threadObj = threading.Thread(target=takeANap) ➌ threadObj.start() print('End of program.')

At ➊, we define a function that we want to use in a new thread. To create a Thread object, we call threading.Thread() and pass it the keyword argument target=takeANap ➋. This means the function we want to call in the new thread is takeANap(). Notice that the keyword argument is target=takeANap, not target=takeANap(). This is because you want to pass the takeANap() function itself as the argument, not call takeANap() and pass its return value.

After we store the Thread object created by threading.Thread() in threadObj, we call threadObj.start() ➌ to create the new thread and start executing the target function in the new thread. When this program is run, the output will look like this:

Start of program. End of program. Wake up!

This can be a bit confusing. If print('End of program.') is the last line of the program, you might think that it should be the last thing printed. The reason Wake up! comes after it is that when threadObj.start() is called, the target function for threadObj is run in a new thread of execution. Think of it as a second finger appearing at the start of the takeANap() function. The main thread continues to print('End of program.'). Meanwhile, the new thread that has been executing the time.sleep(5) call, pauses for 5 seconds. After it wakes from its 5-second nap, it prints 'Wake up!' and then returns from the takeANap() function. Chronologically, 'Wake up!' is the last thing printed by the program.

Normally a program terminates when the last line of code in the file has run (or the sys.exit() function is called). But threadDemo.py has two threads. The first is the original thread that began at the start of the program and ends after print('End of program.'). The second thread is created when threadObj.start() is called, begins at the start of the takeANap() function, and ends after takeANap() returns.

A Python program will not terminate until all its threads have terminated. When you ran threadDemo.py, even though the original thread had terminated, the second thread was still executing the time.sleep(5) call.

Passing Arguments to the Thread’s Target Function

If the target function you want to run in the new thread takes arguments, you can pass the target function’s arguments to threading.Thread(). For example, say you wanted to run this print() call in its own thread:

>>> print('Cats', 'Dogs', 'Frogs', sep=' & ') Cats & Dogs & Frogs

This print() call has three regular arguments, 'Cats', 'Dogs', and 'Frogs', and one keyword argument, sep=' & '. The regular arguments can be passed as a list to the args keyword argument in threading.Thread(). The keyword argument can be specified as a dictionary to the kwargs keyword argument in threading.Thread().

Enter the following into the interactive shell:

>>> import threading >>> threadObj = threading.Thread(target=print, args=['Cats', 'Dogs', 'Frogs'], kwargs={'sep': ' & '}) >>> threadObj.start() Cats & Dogs & Frogs

To make sure the arguments 'Cats', 'Dogs', and 'Frogs' get passed to print() in the new thread, we pass args=['Cats', 'Dogs', 'Frogs'] to threading.Thread(). To make sure the keyword argument sep=' & ' gets passed to print() in the new thread, we pass kwargs={'sep': '& '} to threading.Thread().

The threadObj.start() call will create a new thread to call the print() function, and it will pass 'Cats', 'Dogs', and 'Frogs' as arguments and ' & ' for the sep keyword argument.

This is an incorrect way to create the new thread that calls print():

threadObj = threading.Thread(target=print('Cats', 'Dogs', 'Frogs', sep=' & '))

What this ends up doing is calling the print() function and passing its return value (print()’s return value is always None) as the target keyword argument. It doesn’t pass the print() function itself. When passing arguments to a function in a new thread, use the threading.Thread() function’s args and kwargs keyword arguments.

Concurrency Issues

You can easily create several new threads and have them all running at the same time. But multiple threads can also cause problems called concurrency issues. These issues happen when threads read and write variables at the same time, causing the threads to trip over each other. Concurrency issues can be hard to reproduce consistently, making them hard to debug.

Multithreaded programming is its own wide subject and beyond the scope of this book. What you have to keep in mind is this: To avoid concurrency issues, never let multiple threads read or write the same variables. When you create a new Thread object, make sure its target function uses only local variables in that function. This will avoid hard-to-debug concurrency issues in your programs.

Note

A beginner’s tutorial on multithreaded programming is available at http://nostarch.com/automatestuff/.

Project: Multithreaded XKCD Downloader

In Chapter 11, you wrote a program that downloaded all of the XKCD comic strips from the XKCD website. This was a single-threaded program: It downloaded one comic at a time. Much of the program’s running time was spent establishing the network connection to begin the download and writing the downloaded images to the hard drive. If you have a broadband Internet connection, your single-threaded program wasn’t fully utilizing the available bandwidth.

A multithreaded program that has some threads downloading comics while others are establishing connections and writing the
comic image files to disk uses your Internet connection more efficiently and downloads the collection of comics more quickly. Open a new file editor window and save it as multidownloadXkcd.py. You will modify this program to add multithreading. The completely modified source code is available to download from http://nostarch.com/automatestuff/.

Step 1: Modify the Program to Use a Function

This program will mostly be the same downloading code from Chapter 11, so I’ll skip the explanation for the Requests and BeautifulSoup code. The main changes you need to make are importing the threading module and making a downloadXkcd() function, which takes starting and ending comic numbers as parameters.

For example, calling downloadXkcd(140, 280) would loop over the downloading code to download the comics at http://xkcd.com/140, http://xkcd.com/141, http://xkcd.com/142, and so on, up to http://xkcd.com/279. Each thread that you create will call downloadXkcd() and pass a different range of comics to download.

Add the following code to your multidownloadXkcd.py program:

‹ Prev Next ›