Book Read Free

Automate the Boring Stuff with Python

Page 17

by Al Sweigart


  Every program that runs on your computer has a current working directory, or cwd. Any filenames or paths that do not begin with the root folder are assumed to be under the current working directory. You can get the current working directory as a string value with the os.getcwd() function and change it with os.chdir(). Enter the following into the interactive shell:

  >>> import os >>> os.getcwd() 'C:\Python34' >>> os.chdir('C:\Windows\System32') >>> os.getcwd() 'C:\Windows\System32'

  Here, the current working directory is set to C:Python34, so the filename project.docx refers to C:Python34project.docx. When we change the current working directory to C:Windows, project.docx is interpreted as C: Windowsproject.docx.

  Python will display an error if you try to change to a directory that does not exist.

  >>> os.chdir('C:\ThisFolderDoesNotExist') Traceback (most recent call last): File "", line 1, in os.chdir('C:\ThisFolderDoesNotExist') FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\ThisFolderDoesNotExist'

  Note

  While folder is the more modern name for directory, note that current working directory (or just working directory) is the standard term, not current working folder.

  Absolute vs. Relative Paths

  There are two ways to specify a file path.

  An absolute path, which always begins with the root folder

  A relative path, which is relative to the program’s current working directory

  There are also the dot (.) and dot-dot (..) folders. These are not real folders but special names that can be used in a path. A single period (“dot”) for a folder name is shorthand for “this directory.” Two periods (“dot-dot”) means “the parent folder.”

  Figure 8-2 is an example of some folders and files. When the current working directory is set to C:bacon, the relative paths for the other folders and files are set as they are in the figure.

  Figure 8-2. The relative paths for folders and files in the working directory C:bacon

  The . at the start of a relative path is optional. For example, .spam.txt and spam.txt refer to the same file.

  Creating New Folders with os.makedirs()

  Your programs can create new folders (directories) with the os.makedirs() function. Enter the following into the interactive shell:

  >>> import os >>> os.makedirs('C:\delicious\walnut\waffles')

  This will create not just the C:delicious folder but also a walnut folder inside C:delicious and a waffles folder inside C:deliciouswalnut. That is, os.makedirs() will create any necessary intermediate folders in order to ensure that the full path exists. Figure 8-3 shows this hierarchy of folders.

  Figure 8-3. The result of os.makedirs('C:\delicious \walnut\waffles')

  The os.path Module

  The os.path module contains many helpful functions related to filenames and file paths. For instance, you’ve already used os.path.join() to build paths in a way that will work on any operating system. Since os.path is a module inside the os module, you can import it by simply running import os. Whenever your programs need to work with files, folders, or file paths, you can refer to the short examples in this section. The full documentation for the os.path module is on the Python website at http://docs.python.org/3/library/os.path.html.

  Note

  Most of the examples that follow in this section will require the os module, so remember to import it at the beginning of any script you write and any time you restart IDLE. Otherwise, you’ll get a NameError: name 'os' is not defined error message.

  Handling Absolute and Relative Paths

  The os.path module provides functions for returning the absolute path of a relative path and for checking whether a given path is an absolute path.

  Calling os.path.abspath(path) will return a string of the absolute path of the argument. This is an easy way to convert a relative path into an absolute one.

  Calling os.path.isabs(path) will return True if the argument is an absolute path and False if it is a relative path.

  Calling os.path.relpath(path, start) will return a string of a relative path from the start path to path. If start is not provided, the current working directory is used as the start path.

  Try these functions in the interactive shell:

  >>> os.path.abspath('.') 'C:\Python34' >>> os.path.abspath('.\Scripts') 'C:\Python34\Scripts' >>> os.path.isabs('.') False >>> os.path.isabs(os.path.abspath('.')) True

  Since C:Python34 was the working directory when os.path.abspath() was called, the “single-dot” folder represents the absolute path 'C:\Python34'.

  Note

  Since your system probably has different files and folders on it than mine, you won’t be able to follow every example in this chapter exactly. Still, try to follow along using folders that exist on your computer.

  Enter the following calls to os.path.relpath() into the interactive shell:

  >>> os.path.relpath('C:\Windows', 'C:\') 'Windows' >>> os.path.relpath('C:\Windows', 'C:\spam\eggs') '..\..\Windows' >>> os.getcwd() 'C:\Python34'

  Calling os.path.dirname(path) will return a string of everything that comes before the last slash in the path argument. Calling os.path.basename(path) will return a string of everything that comes after the last slash in the path argument. The dir name and base name of a path are outlined in Figure 8-4.

  Figure 8-4. The base name follows the last slash in a path and is the same as the filename. The dir name is everything before the last slash.

  For example, enter the following into the interactive shell:

  >>> path = 'C:\Windows\System32\calc.exe' >>> os.path.basename(path) 'calc.exe' >>> os.path.dirname(path) 'C:\Windows\System32'

  If you need a path’s dir name and base name together, you can just call os.path.split() to get a tuple value with these two strings, like so:

  >>> calcFilePath = 'C:\Windows\System32\calc.exe' >>> os.path.split(calcFilePath) ('C:\Windows\System32', 'calc.exe')

  Notice that you could create the same tuple by calling os.path.dirname() and os.path.basename() and placing their return values in a tuple.

  >>> (os.path.dirname(calcFilePath), os.path.basename(calcFilePath)) ('C:\Windows\System32', 'calc.exe')

  But os.path.split() is a nice shortcut if you need both values.

  Also, note that os.path.split() does not take a file path and return a list of strings of each folder. For that, use the split() string method and split on the string in os.sep. Recall from earlier that the os.sep variable is set to the correct folder-separating slash for the computer running the program.

  For example, enter the following into the interactive shell:

  >>> calcFilePath.split(os.path.sep) ['C:', 'Windows', 'System32', 'calc.exe']

  On OS X and Linux systems, there will be a blank string at the start of the returned list:

  >>> '/usr/bin'.split(os.path.sep) ['', 'usr', 'bin']

  The split() string method will work to return a list of each part of the path. It will work on any operating system if you pass it os.path.sep.

  Finding File Sizes and Folder Contents

  Once you have ways of handling file paths, you can then start gathering information about specific files and folders. The os.path module provides functions for finding the size of a file in bytes and the files and folders inside a given folder.

  Calling os.path.getsize(path) will return the size in bytes of the file in the path argument.

  Calling os.listdir(path) will return a list of filename strings for each file in the path argument. (Note that this function is in the os module, not os.path.)

  Here’s what I get when I try these functions in the interactive shell:

  >>> os.path.getsize('C:\Windows\System32\calc.exe') 776192 >>> os.listdir('C:\Windows\System32') ['0409', '12520437.cpx', '12520850.cpx', '5U877.ax', 'aaclient.dll', --snip-- 'xwtpdui.dll', 'xwtpw32.dll', 'zh-CN', 'zh-HK', 'zh-TW', 'zipfldr.dll']

  As you can see, the calc.exe program on my computer is 77
6,192 bytes in size, and I have a lot of files in C:Windowssystem32. If I want to find the total size of all the files in this directory, I can use os.path.getsize() and os.listdir() together.

  >>> totalSize = 0 >>> for filename in os.listdir('C:\Windows\System32'): totalSize = totalSize + os.path.getsize(os.path.join('C:\Windows\System32', filename)) >>> print(totalSize) 1117846456

  As I loop over each filename in the C:WindowsSystem32 folder, the totalSize variable is incremented by the size of each file. Notice how when I call os.path.getsize(), I use os.path.join() to join the folder name with the current filename. The integer that os.path.getsize() returns is added to the value of totalSize. After looping through all the files, I print totalSize to see the total size of the C:WindowsSystem32 folder.

  Checking Path Validity

  Many Python functions will crash with an error if you supply them with a path that does not exist. The os.path module provides functions to check whether a given path exists and whether it is a file or folder.

  Calling os.path.exists(path) will return True if the file or folder referred to in the argument exists and will return False if it does not exist.

  Calling os.path.isfile(path) will return True if the path argument exists and is a file and will return False otherwise.

  Calling os.path.isdir(path) will return True if the path argument exists and is a folder and will return False otherwise.

  Here’s what I get when I try these functions in the interactive shell:

  >>> os.path.exists('C:\Windows') True >>> os.path.exists('C:\some_made_up_folder') False >>> os.path.isdir('C:\Windows\System32') True >>> os.path.isfile('C:\Windows\System32') False >>> os.path.isdir('C:\Windows\System32\calc.exe') False >>> os.path.isfile('C:\Windows\System32\calc.exe') True

  You can determine whether there is a DVD or flash drive currently attached to the computer by checking for it with the os.path.exists() function. For instance, if I wanted to check for a flash drive with the volume named D: on my Windows computer, I could do that with the following:

  >>> os.path.exists('D:\') False

  Oops! It looks like I forgot to plug in my flash drive.

  The File Reading/Writing Process

  Once you are comfortable working with folders and relative paths, you’ll be able to specify the location of files to read and write. The functions covered in the next few sections will apply to plaintext files. Plaintext files contain only basic text characters and do not include font, size, or color information. Text files with the .txt extension or Python script files with the .py extension are examples of plaintext files. These can be opened with Windows’s Notepad or OS X’s TextEdit application. Your programs can easily read the contents of plaintext files and treat them as an ordinary string value.

  Binary files are all other file types, such as word processing documents, PDFs, images, spreadsheets, and executable programs. If you open a binary file in Notepad or TextEdit, it will look like scrambled nonsense, like in Figure 8-5.

  Figure 8-5. The Windows calc.exe program opened in Notepad

  Since every different type of binary file must be handled in its own way, this book will not go into reading and writing raw binary files directly. Fortunately, many modules make working with binary files easier—you will explore one of them, the shelve module, later in this chapter.

  There are three steps to reading or writing files in Python.

  Call the open() function to return a File object.

  Call the read() or write() method on the File object.

  Close the file by calling the close() method on the File object.

  Opening Files with the open() Function

  To open a file with the open() function, you pass it a string path indicating the file you want to open; it can be either an absolute or relative path. The open() function returns a File object.

  Try it by creating a text file named hello.txt using Notepad or TextEdit. Type Hello world! as the content of this text file and save it in your user home folder. Then, if you’re using Windows, enter the following into the interactive shell:

  >>> helloFile = open('C:\Users\your_home_folder\hello.txt')

  If you’re using OS X, enter the following into the interactive shell instead:

  >>> helloFile = open('/Users/your_home_folder/hello.txt')

  Make sure to replace your_home_folder with your computer username. For example, my username is asweigart, so I’d enter 'C:\Users\asweigart\ hello.txt' on Windows.

  Both these commands will open the file in “reading plaintext” mode, or read mode for short. When a file is opened in read mode, Python lets you only read data from the file; you can’t write or modify it in any way. Read mode is the default mode for files you open in Python. But if you don’t want to rely on Python’s defaults, you can explicitly specify the mode by passing the string value 'r' as a second argument to open(). So open('/Users/asweigart/ hello.txt', 'r') and open('/Users/asweigart/hello.txt') do the same thing.

  The call to open() returns a File object. A File object represents a file on your computer; it is simply another type of value in Python, much like the lists and dictionaries you’re already familiar with. In the previous example, you stored the File object in the variable helloFile. Now, whenever you want to read from or write to the file, you can do so by calling methods on the File object in helloFile.

  Reading the Contents of Files

  Now that you have a File object, you can start reading from it. If you want to read the entire contents of a file as a string value, use the File object’s read() method. Let’s continue with the hello.txt File object you stored in helloFile. Enter the following into the interactive shell:

  >>> helloContent = helloFile.read() >>> helloContent 'Hello world!'

  If you think of the contents of a file as a single large string value, the read() method returns the string that is stored in the file.

  Alternatively, you can use the readlines() method to get a list of string values from the file, one string for each line of text. For example, create a file named sonnet29.txt in the same directory as hello.txt and write the following text in it:

  When, in disgrace with fortune and men's eyes, I all alone beweep my outcast state, And trouble deaf heaven with my bootless cries, And look upon myself and curse my fate,

  Make sure to separate the four lines with line breaks. Then enter the following into the interactive shell:

  >>> sonnetFile = open('sonnet29.txt') >>> sonnetFile.readlines() [When, in disgrace with fortune and men's eyes,n', ' I all alone beweep my outcast state,n', And trouble deaf heaven with my bootless cries,n', And look upon myself and curse my fate,']

  Note that each of the string values ends with a newline character, n, except for the last line of the file. A list of strings is often easier to work with than a single large string value.

  Writing to Files

  Python allows you to write content to a file in a way similar to how the print() function “writes” strings to the screen. You can’t write to a file you’ve opened in read mode, though. Instead, you need to open it in “write plaintext” mode or “append plaintext” mode, or write mode and append mode for short.

  Write mode will overwrite the existing file and start from scratch, just like when you overwrite a variable’s value with a new value. Pass 'w' as the second argument to open() to open the file in write mode. Append mode, on the other hand, will append text to the end of the existing file. You can think of this as appending to a list in a variable, rather than overwriting the variable altogether. Pass 'a' as the second argument to open() to open the file in append mode.

  If the filename passed to open() does not exist, both write and append mode will create a new, blank file. After reading or writing a file, call the close() method before opening the file again.

  Let’s put these concepts together. Enter the following into the interactive shell:

  >>> baconFile = open('bacon.txt', 'w') >>> baconFile.write('Hello world!n') 13 >>> baconFile.
close() >>> baconFile = open('bacon.txt', 'a') >>> baconFile.write('Bacon is not a vegetable.') 25 >>> baconFile.close() >>> baconFile = open('bacon.txt') >>> content = baconFile.read() >>> baconFile.close() >>> print(content) Hello world! Bacon is not a vegetable.

  First, we open bacon.txt in write mode. Since there isn’t a bacon.txt yet, Python creates one. Calling write() on the opened file and passing write() the string argument 'Hello world! /n' writes the string to the file and returns the number of characters written, including the newline. Then we close the file.

  To add text to the existing contents of the file instead of replacing the string we just wrote, we open the file in append mode. We write 'Bacon is not a vegetable.' to the file and close it. Finally, to print the file contents to the screen, we open the file in its default read mode, call read(), store the resulting File object in content, close the file, and print content.

  Note that the write() method does not automatically add a newline character to the end of the string like the print() function does. You will have to add this character yourself.

 

‹ Prev