Automate the Boring Stuff with Python

Home > Other > Automate the Boring Stuff with Python > Page 36
Automate the Boring Stuff with Python Page 36

by Al Sweigart


  #! python3 # multidownloadXkcd.py - Downloads XKCD comics using multiple threads. import requests, os, bs4, threading ➊ os.makedirs('xkcd', exist_ok=True) # store comics in ./xkcd ➋ def downloadXkcd(startComic, endComic): ➌ for urlNumber in range(startComic, endComic): # Download the page. print('Downloading page http://xkcd.com/%s...' % (urlNumber)) ➍ res = requests.get('http://xkcd.com/%s' % (urlNumber)) res.raise_for_status() ➎ soup = bs4.BeautifulSoup(res.text) # Find the URL of the comic image. ➏ comicElem = soup.select('#comic img') if comicElem == []: print('Could not find comic image.') else: ➐ comicUrl = comicElem[0].get('src') # Download the image. print('Downloading image %s...' % (comicUrl)) ➑ res = requests.get(comicUrl) res.raise_for_status() # Save the image to ./xkcd. imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb') for chunk in res.iter_content(100000): imageFile.write(chunk) imageFile.close() # TODO: Create and start the Thread objects. # TODO: Wait for all threads to end.

  After importing the modules we need, we make a directory to store comics in ➊ and start defining downloadxkcd() ➋. We loop through all the numbers in the specified range ➌ and download each page ➍. We use Beautiful Soup to look through the HTML of each page ➎ and find the comic image ➏. If no comic image is found on a page, we print a message. Otherwise, we get the URL of the image ➐ and download the image ➑. Finally, we save the image to the directory we created.

  Step 2: Create and Start Threads

  Now that we’ve defined downloadXkcd(), we’ll create the multiple threads that each call downloadXkcd() to download different ranges of comics from the XKCD website. Add the following code to multidownloadXkcd.py after the downloadXkcd() function definition:

  #! python3 # multidownloadXkcd.py - Downloads XKCD comics using multiple threads. --snip-- # Create and start the Thread objects. downloadThreads = [] # a list of all the Thread objects for i in range(0, 1400, 100): # loops 14 times, creates 14 threads downloadThread = threading.Thread(target=downloadXkcd, args=(i, i + 99)) downloadThreads.append(downloadThread) downloadThread.start()

  First we make an empy list downloadThreads; the list will help us keep track of the many Thread objects we’ll create. Then we start our for loop. Each time through the loop, we create a Thread object with threading.Thread(), append the Thread object to the list, and call start() to start running downloadXkcd() in the new thread. Since the for loop sets the i variable from 0 to 1400 at steps of 100, i will be set to 0 on the first iteration, 100 on the second iteration, 200 on the third, and so on. Since we pass args=(i, i + 99) to threading.Thread(), the two arguments passed to downloadXkcd() will be 0 and 99 on the first iteration, 100 and 199 on the second iteration, 200 and 299 on the third, and so on.

  As the Thread object’s start() method is called and the new thread begins to run the code inside downloadXkcd(), the main thread will continue to the next iteration of the for loop and create the next thread.

  Step 3: Wait for All Threads to End

  The main thread moves on as normal while the other threads we create download comics. But say there’s some code you don’t want to run in the main thread until all the threads have completed. Calling a Thread object’s join() method will block until that thread has finished. By using a for loop to iterate over all the Thread objects in the downloadThreads list, the main thread can call the join() method on each of the other threads. Add the following to the bottom of your program:

  #! python3 # multidownloadXkcd.py - Downloads XKCD comics using multiple threads. --snip-- # Wait for all threads to end. for downloadThread in downloadThreads: downloadThread.join() print('Done.')

  The 'Done.' string will not be printed until all of the join() calls have returned. If a Thread object has already completed when its join() method is called, then the method will simply return immediately. If you wanted to extend this program with code that runs only after all of the comics downloaded, you could replace the print('Done.') line with your new code.

  Launching Other Programs from Python

  Your Python program can start other programs on your computer with the Popen() function in the built-in subprocess module. (The P in the name of the Popen() function stands for process.) If you have multiple instances of an application open, each of those instances is a separate process of the same program. For example, if you open multiple windows of your web browser at the same time, each of those windows is a different process of the web browser program. See Figure 15-1 for an example of multiple calculator processes open at once.

  Every process can have multiple threads. Unlike threads, a process cannot directly read and write another process’s variables. If you think of a multithreaded program as having multiple fingers following source code, then having multiple processes of the same program open is like having a friend with a separate copy of the program’s source code. You are both independently executing the same program.

  If you want to start an external program from your Python script, pass the program’s filename to subprocess.Popen(). (On Windows, right-click the application’s Start menu item and select Properties to view the application’s filename. On OS X, CTRL-click the application and select Show Package Contents to find the path to the executable file.) The Popen() function will then immediately return. Keep in mind that the launched program is not run in the same thread as your Python program.

  Figure 15-1. Six running processes of the same calculator program

  On a Windows computer, enter the following into the interactive shell:

  >>> import subprocess >>> subprocess.Popen('C:\Windows\System32\calc.exe')

  On Ubuntu Linux, you would enter the following:

  >>> import subprocess >>> subprocess.Popen('/usr/bin/gnome-calculator')

  On OS X, the process is slightly different. See Opening Files with Default Applications.

  The return value is a Popen object, which has two useful methods: poll() and wait().

  You can think of the poll() method as asking your friend if she’s finished running the code you gave her. The poll() method will return None if the process is still running at the time poll() is called. If the program has terminated, it will return the process’s integer exit code. An exit code is used to indicate whether the process terminated without errors (an exit code of 0) or whether an error caused the process to terminate (a nonzero exit code—generally 1, but it may vary depending on the program).

  The wait() method is like waiting for your friend to finish working on her code before you keep working on yours. The wait() method will block until the launched process has terminated. This is helpful if you want your program to pause until the user finishes with the other program. The return value of wait() is the process’s integer exit code.

  On Windows, enter the following into the interactive shell. Note that the wait() call will block until you quit the launched calculator program.

  ➊ >>> calcProc = subprocess.Popen('c:\Windows\System32\calc.exe') ➋ >>> calcProc.poll() == None True ➌ >>> calcProc.wait() 0 >>> calcProc.poll() 0

  Here we open a calculator process ➊. While it’s still running, we check if poll() returns None ➋. It should, as the process is still running. Then we close the calculator program and call wait() on the terminated process ➌. wait() and poll() now return 0, indicating that the process terminated without errors.

  Passing Command Line Arguments to Popen()

  You can pass command line arguments to processes you create with Popen(). To do so, you pass a list as the sole argument to Popen(). The first string in this list will be the executable filename of the program you want to launch; all the subsequent strings will be the command line arguments to pass to the program when it starts. In effect, this list will be the value of sys.argv for the launched program.

  Most applications with a graphical user interface (GUI) don’t use command line arguments as extensively as command line–based or terminal-based programs do. But most GUI applicat
ions will accept a single argument for a file that the applications will immediately open when they start. For example, if you’re using Windows, create a simple text file called C:hello.txt and then enter the following into the interactive shell:

  >>> subprocess.Popen(['C:\Windows\notepad.exe', 'C:\hello.txt'])

  This will not only launch the Notepad application but also have it immediately open the C:hello.txt file.

  Task Scheduler, launchd, and cron

  If you are computer savvy, you may know about Task Scheduler on Windows, launchd on OS X, or the cron scheduler on Linux. These well-documented and reliable tools all allow you to schedule applications to launch at specific times. If you’d like to learn more about them, you can find links to tutorials at http://nostarch.com/automatestuff/.

  Using your operating system’s built-in scheduler saves you from writing your own clock-checking code to schedule your programs. However, use the time.sleep() function if you just need your program to pause briefly. Or instead of using the operating system’s scheduler, your code can loop until a certain date and time, calling time.sleep(1) each time through the loop.

  Opening Websites with Python

  The webbrowser.open() function can launch a web browser from your program to a specific website, rather than opening the browser application with subprocess.Popen(). See Project: mapit.py with the webbrowser Module for more details.

  Running Other Python Scripts

  You can launch a Python script from Python just like any other application. You just have to pass the python.exe executable to Popen() and the filename of the .py script you want to run as its argument. For example, the following would run the hello.py script from Chapter 1:

  >>> subprocess.Popen(['C:\python34\python.exe', 'hello.py'])

  Pass Popen() a list containing a string of the Python executable’s path and a string of the script’s filename. If the script you’re launching needs command line arguments, add them to the list after the script’s filename. The location of the Python executable on Windows is C:python34python.exe. On OS X, it is /Library/Frameworks/Python.framework/Versions/3.3/bin/python3. On Linux, it is /usr/bin/python3.

  Unlike importing the Python program as a module, when your Python program launches another Python program, the two are run in separate processes and will not be able to share each other’s variables.

  Opening Files with Default Applications

  Double-clicking a .txt file on your computer will automatically launch the application associated with the .txt file extension. Your computer will have several of these file extension associations set up already. Python can also open files this way with Popen().

  Each operating system has a program that performs the equivalent of double-clicking a document file to open it. On Windows, this is the start program. On OS X, this is the open program. On Ubuntu Linux, this is the see program. Enter the following into the interactive shell, passing 'start', 'open', or 'see' to Popen() depending on your system:

  >>> fileObj = open('hello.txt', 'w') >>> fileObj.write('Hello world!') 12 >>> fileObj.close() >>> import subprocess >>> subprocess.Popen(['start', 'hello.txt'], shell=True)

  Here we write Hello world! to a new hello.txt file. Then we call Popen(), passing it a list containing the program name (in this example, 'start' for Windows) and the filename. We also pass the shell=True keyword argument, which is needed only on Windows. The operating system knows all of the file associations and can figure out that it should launch, say, Notepad.exe to handle the hello.txt file.

  On OS X, the open program is used for opening both document files and programs. Enter the following into the interactive shell if you have a Mac:

  >>> subprocess.Popen(['open', '/Applications/Calculator.app/'])

  The Calculator app should open.

  The UNIX Philosophy

  Programs well designed to be launched by other programs become more powerful than their code alone. The Unix philosophy is a set of software design principles established by the programmers of the Unix operating system (on which the modern Linux and OS X are built). It says that it’s better to write small, limited-purpose programs that can interoperate, rather than large, feature-rich applications. The smaller programs are easier to understand, and by being interoperable, they can be the building blocks of much more powerful applications.

  Smartphone apps follow this approach as well. If your restaurant app needs to display directions to a café, the developers didn’t reinvent the wheel by writing their own map code. The restaurant app simply launches a map app while passing it the café’s address, just as your Python code would call a function and pass it arguments.

  The Python programs you’ve been writing in this book mostly fit the Unix philosophy, especially in one important way: They use command line arguments rather than input() function calls. If all the information your program needs can be supplied up front, it is preferable to have this information passed as command line arguments rather than waiting for the user to type it in. This way, the command line arguments can be entered by a human user or supplied by another program. This interoperable approach will make your programs reusable as part of another program.

  The sole exception is that you don’t want passwords passed as command line arguments, since the command line may record them as part of its command history feature. Instead, your program should call the input() function when it needs you to enter a password.

  You can read more about Unix philosophy at https://en.wikipedia.org/wiki/Unix_philosophy/.

  Project: Simple Countdown Program

  Just like it’s hard to find a simple stopwatch application, it can be hard to find a simple countdown application. Let’s write a countdown program that plays an alarm at the end of the countdown.

  At a high level, here’s what your program will do:

  Count down from 60.

  Play a sound file (alarm.wav) when the countdown reaches zero.

  This means your code will need to do the following:

  Pause for one second in between displaying each number in the countdown by calling time.sleep().

  Call subprocess.Popen() to open the sound file with the default application.

  Open a new file editor window and save it as countdown.py.

  Step 1: Count Down

  This program will require the time module for the time.sleep() function and the subprocess module for the subprocess.Popen() function. Enter the following code and save the file as countdown.py:

  #! python3 # countdown.py - A simple countdown script. import time, subprocess ➊ timeLeft = 60 while timeLeft > 0: ➋ print(timeLeft, end='') ➌ time.sleep(1) ➍ timeLeft = timeLeft - 1 # TODO: At the end of the countdown, play a sound file.

  After importing time and subprocess, make a variable called timeLeft to hold the number of seconds left in the countdown ➊. It can start at 60—or you can change the value here to whatever you need or even have it get set from a command line argument.

  In a while loop, you display the remaining count ➋, pause for one second ➌, and then decrement the timeLeft variable ➍ before the loop starts over again. The loop will keep looping as long as timeLeft is greater than 0. After that, the countdown will be over.

  Step 2: Play the Sound File

  While there are third-party modules to play sound files of various formats, the quick and easy way is to just launch whatever application the user already uses to play sound files. The operating system will figure out from the .wav file extension which application it should launch to play the file. This .wav file could easily be some other sound file format, such as .mp3 or .ogg.

  You can use any sound file that is on your computer to play at the end of the countdown, or you can download alarm.wav from http://nostarch.com/automatestuff/.

  Add the following to your code:

  #! python3 # countdown.py - A simple countdown scrip
t. import time, subprocess --snip-- # At the end of the countdown, play a sound file. subprocess.Popen(['start', 'alarm.wav'], shell=True)

  After the while loop finishes, alarm.wav (or the sound file you choose) will play to notify the user that the countdown is over. On Windows, be sure to include 'start' in the list you pass to Popen() and pass the keyword argument shell=True. On OS X, pass 'open' instead of 'start' and remove shell=True.

  Instead of playing a sound file, you could save a text file somewhere with a message like Break time is over! and use Popen() to open it at the end of the countdown. This will effectively create a pop-up window with a message. Or you could use the webbrowser.open() function to open a specific website at the end of the countdown. Unlike some free countdown application you’d find online, your own countdown program’s alarm can be anything you want!

  Ideas for Similar Programs

  A countdown is a simple delay before continuing the program’s execution. This can also be used for other applications and features, such as the following:

  Use time.sleep() to give the user a chance to press CTRL-C to cancel an action, such as deleting files. Your program can print a “Press CTRL-C to cancel” message and then handle any KeyboardInterrupt exceptions with try and except statements.

 

‹ Prev