Automate the Boring Stuff with Python

Page 44

by Al Sweigart

Dragging the Mouse

Dragging means moving the mouse while holding down one of the mouse buttons. For example, you can move files between folders by dragging the folder icons, or you can move appointments around in a calendar app.

PyAutoGUI provides the pyautogui.dragTo() and pyautogui.dragRel() functions to drag the mouse cursor to a new location or a location relative to its current one. The arguments for dragTo() and dragRel() are the same as moveTo() and moveRel(): the x-coordinate/horizontal movement, the y-coordinate/vertical movement, and an optional duration of time. (OS X does not drag correctly when the mouse moves too quickly, so passing a duration keyword argument is recommended.)

To try these functions, open a graphics-drawing application such as Paint on Windows, Paintbrush on OS X, or GNU Paint on Linux. (If you don’t have a drawing application, you can use the online one at http://sumopaint.com/.) I will use PyAutoGUI to draw in these applications.

With the mouse cursor over the drawing application’s canvas and the Pencil or Brush tool selected, enter the following into a new file editor window and save it as spiralDraw.py:

import pyautogui, time ➊ time.sleep(5) ➋ pyautogui.click() # click to put drawing program in focus distance = 200 while distance > 0: ➌ pyautogui.dragRel(distance, 0, duration=0.2) # move right ➍ distance = distance - 5 ➎ pyautogui.dragRel(0, distance, duration=0.2) # move down ➏ pyautogui.dragRel(-distance, 0, duration=0.2) # move left distance = distance - 5 pyautogui.dragRel(0, -distance, duration=0.2) # move up

When you run this program, there will be a five-second delay ➊ for you to move the mouse cursor over the drawing program’s window with the Pencil or Brush tool selected. Then spiralDraw.py will take control of the mouse and click to put the drawing program in focus ➋. A window is in focus when it has an active blinking cursor, and the actions you take—like typing or, in this case, dragging the mouse—will affect that window. Once the drawing program is in focus, spiralDraw.py draws a square spiral pattern like the one in Figure 18-2.

Figure 18-2. The results from the pyautogui.dragRel() example

The distance variable starts at 200, so on the first iteration of the while loop, the first dragRel() call drags the cursor 200 pixels to the right, taking 0.2 seconds ➌. distance is then decreased to 195 ➍, and the second dragRel() call drags the cursor 195 pixels down ➎. The third dragRel() call drags the cursor –195 horizontally (195 to the left) ➏, distance is decreased to 190, and the last dragRel() call drags the cursor 190 pixels up. On each iteration, the mouse is dragged right, down, left, and up, and distance is slightly smaller than it was in the previous iteration. By looping over this code, you can move the mouse cursor to draw a square spiral.

You could draw this spiral by hand (or rather, by mouse), but you’d have to work slowly to be so precise. PyAutoGUI can do it in a few seconds!

Note

You could have your code draw the image using the pillow module’s drawing functions—see Chapter 17 for more information. But using GUI automation allows you to make use of the advanced drawing tools that graphics programs can provide, such as gradients, different brushes, or the fill bucket.

Scrolling the Mouse

The final PyAutoGUI mouse function is scroll(), which you pass an integer argument for how many units you want to scroll the mouse up or down. The size of a unit varies for each operating system and application, so you’ll have to experiment to see exactly how far it scrolls in your particular situation. The scrolling takes place at the mouse cursor’s current position. Passing a positive integer scrolls up, and passing a negative integer scrolls down. Run the following in IDLE’s interactive shell while the mouse cursor is over the IDLE window:

>>> pyautogui.scroll(200)

You’ll see IDLE briefly scroll upward—and then go back down. The downward scrolling happens because IDLE automatically scrolls down to the bottom after executing an instruction. Enter this code instead:

>>> import pyperclip >>> numbers = '' >>> for i in range(200): numbers = numbers + str(i) + 'n' >>> pyperclip.copy(numbers)

This imports pyperclip and sets up an empty string, numbers. The code then loops through 200 numbers and adds each number to numbers, along with a newline. After pyperclip.copy(numbers), the clipboard will be loaded with 200 lines of numbers. Open a new file editor window and paste the text into it. This will give you a large text window to try scrolling in. Enter the following code into the interactive shell:

>>> import time, pyautogui >>> time.sleep(5); pyautogui.scroll(100)

On the second line, you enter two commands separated by a semicolon, which tells Python to run the commands as if they were on separate lines. The only difference is that the interactive shell won’t prompt you for input between the two instructions. This is important for this example because we want to the call to pyautogui.scroll() to happen automatically after the wait. (Note that while putting two commands on one line can be useful in the interactive shell, you should still have each instruction on a separate line in your programs.)

After pressing ENTER to run the code, you will have five seconds to click the file editor window to put it in focus. Once the pause is over, the pyautogui.scroll() call will cause the file editor window to scroll up after the five-second delay.

Working with the Screen

Your GUI automation programs don’t have to click and type blindly. PyAutoGUI has screenshot features that can create an image file based on the current contents of the screen. These functions can also return a Pillow Image object of the current screen’s appearance. If you’ve been skipping around in this book, you’ll want to read Chapter 17 and install the pillow module before continuing with this section.

On Linux computers, the scrot program needs to be installed to use the screenshot functions in PyAutoGUI. In a Terminal window, run sudo apt-get install scrot to install this program. If you’re on Windows or OS X, skip this step and continue with the section.

Getting a Screenshot

To take screenshots in Python, call the pyautogui.screenshot() function. Enter the following into the interactive shell:

>>> import pyautogui >>> im = pyautogui.screenshot()

The im variable will contain the Image object of the screenshot. You can now call methods on the Image object in the im variable, just like any other Image object. Enter the following into the interactive shell:

>>> im.getpixel((0, 0)) (176, 176, 175) >>> im.getpixel((50, 200)) (130, 135, 144)

Pass getpixel() a tuple of coordinates, like (0, 0) or (50, 200), and it’ll tell you the color of the pixel at those coordinates in your image. The return value from getpixel() is an RGB tuple of three integers for the amount of red, green, and blue in the pixel. (There is no fourth value for alpha, because screenshot images are fully opaque.) This is how your programs can “see” what is currently on the screen.

Analyzing the Screenshot

Say that one of the steps in your GUI automation program is to click a gray button. Before calling the click() method, you could take a screenshot and look at the pixel where the script is about to click. If it’s not the same gray as the gray button, then your program knows something is wrong. Maybe the window moved unexpectedly, or maybe a pop-up dialog has blocked the button. At this point, instead of continuing—and possibly wreaking havoc by clicking the wrong thing—your program can “see” that it isn’t clicking on the right thing and stop itself.

PyAutoGUI’s pixelMatchesColor() function will return True if the pixel at the given x- and y-coordinates on the screen matches the given color. The first and second arguments are integers for the x- and y-coordinates, and the third argument is a tuple of three integers for the RGB color the screen pixel must match. Enter the following into the interactive shell:

>>> import pyautogui >>> im = pyautogui.screenshot() ➊ >>> im.getpixel((50, 200)) (130, 135, 144) ➋ >>> pyautogui.pixelMatchesColor(50, 200, (130, 135, 144)) True ➌ >>> pyautogui.pixelMatchesColor(50, 200, (255, 135, 144)) False

&n
bsp; After taking a screenshot and using getpixel() to get an RGB tuple for the color of a pixel at specific coordinates ➊, pass the same coordinates and RGB tuple to pixelMatchesColor() ➋, which should return True. Then change a value in the RGB tuple and call pixelMatchesColor() again for the same coordinates ➌. This should return false. This method can be useful to call whenever your GUI automation programs are about to call click(). Note that the color at the given coordinates must exactly match. If it is even slightly different—for example, (255, 255, 254) instead of (255, 255, 255)—then pixelMatchesColor() will return False.

Project: Extending the mouseNow Program

You could extend the mouseNow.py project from earlier in this chapter so that it not only gives the x- and y-coordinates of the mouse cursor’s current position but also gives the RGB color of the pixel under the cursor. Modify the code inside the while loop of mouseNow.py to look like this:

#! python3 # mouseNow.py - Displays the mouse cursor's current position. --snip-- positionStr = 'X: ' + str(x).rjust(4) + ' Y: ' + str(y).rjust(4) pixelColor = pyautogui.screenshot().getpixel((x, y)) positionStr += ' RGB: (' + str(pixelColor[0]).rjust(3) positionStr += ', ' + str(pixelColor[1]).rjust(3) positionStr += ', ' + str(pixelColor[2]).rjust(3) + ')' print(positionStr, end='') --snip--

Now, when you run mouseNow.py, the output will include the RGB color value of the pixel under the mouse cursor.

Press Ctrl-C to quit. X: 406 Y: 17 RGB: (161, 50, 50)

This information, along with the pixelMatchesColor() function, should make it easy to add pixel color checks to your GUI automation scripts.

Image Recognition

But what if you do not know beforehand where PyAutoGUI should click? You can use image recognition instead. Give PyAutoGUI an image of what you want to click and let it figure out the coordinates.

For example, if you have previously taken a screenshot to capture the image of a Submit button in submit.png, the locateOnScreen() function will return the coordinates where that image is found. To see how locateOnScreen() works, try taking a screenshot of a small area on your screen; then save the image and enter the following into the interactive shell, replacing 'submit. png' with the filename of your screenshot:

>>> import pyautogui >>> pyautogui.locateOnScreen('submit.png') (643, 745, 70, 29)

The four-integer tuple that locateOnScreen() returns has the x-coordinate of the left edge, the y-coordinate of the top edge, the width, and the height for the first place on the screen the image was found. If you’re trying this on your computer with your own screenshot, your return value will be different from the one shown here.

If the image cannot be found on the screen, locateOnScreen() will return None. Note that the image on the screen must match the provided image perfectly in order to be recognized. If the image is even a pixel off, locateOnScreen() will return None.

If the image can be found in several places on the screen, locateAllOnScreen() will return a Generator object, which can be passed to list() to return a list of four-integer tuples. There will be one four-integer tuple for each location where the image is found on the screen. Continue the interactive shell example by entering the following (and replacing 'submit.png' with your own image filename):

>>> list(pyautogui.locateAllOnScreen('submit.png')) [(643, 745, 70, 29), (1007, 801, 70, 29)]

Each of the four-integer tuples represents an area on the screen. If your image is only found in one area, then using list() and locateAllOnScreen() just returns a list containing one tuple.

Once you have the four-integer tuple for the area on the screen where your image was found, you can click the center of this area by passing the tuple to the center() function to return x- and y-coordinates of the area’s center. Enter the following into the interactive shell, replacing the arguments with your own filename, four-integer tuple, and coordinate pair:

>>> pyautogui.locateOnScreen('submit.png') (643, 745, 70, 29) >>> pyautogui.center((643, 745, 70, 29)) (678, 759) >>> pyautogui.click((678, 759))

Once you have center coordinates from center(), passing the coordinates to click() should click the center of the area on the screen that matches the image you passed to locateOnScreen().

Controlling the Keyboard

PyAutoGUI also has functions for sending virtual keypresses to your computer, which enables you to fill out forms or enter text into applications.

Sending a String from the Keyboard

The pyautogui.typewrite() function sends virtual keypresses to the computer. What these keypresses do depends on what window and text field have focus. You may want to first send a mouse click to the text field you want in order to ensure that it has focus.

As a simple example, let’s use Python to automatically type the words Hello world! into a file editor window. First, open a new file editor window and position it in the upper-left corner of your screen so that PyAutoGUI will click in the right place to bring it into focus. Next, enter the following into the interactive shell:

>>> pyautogui.click(100, 100); pyautogui.typewrite('Hello world!')

Notice how placing two commands on the same line, separated by a semicolon, keeps the interactive shell from prompting you for input between running the two instructions. This prevents you from accidentally bringing a new window into focus between the click() and typewrite() calls, which would mess up the example.

Python will first send a virtual mouse click to the coordinates (100, 100), which should click the file editor window and put it in focus. The typewrite() call will send the text Hello world! to the window, making it look like Figure 18-3. You now have code that can type for you!

Figure 18-3. Using PyAutogGUI to click the file editor window and type Hello world! into it

By default, the typewrite() function will type the full string instantly. However, you can pass an optional second argument to add a short pause between each character. This second argument is an integer or float value of the number of seconds to pause. For example, pyautogui.typewrite('Hello world!', 0.25) will wait a quarter-second after typing H, another quarter-second after e, and so on. This gradual typewriter effect may be useful for slower applications that can’t process keystrokes fast enough to keep up with PyAutoGUI.

For characters such as A or !, PyAutoGUI will automatically simulate holding down the SHIFT key as well.

Key Names

Not all keys are easy to represent with single text characters. For example, how do you represent SHIFT or the left arrow key as a single character? In PyAutoGUI, these keyboard keys are represented by short string values instead: 'esc' for the ESC key or 'enter' for the ENTER key.

Instead of a single string argument, a list of these keyboard key strings can be passed to typewrite(). For example, the following call presses the A key, then the B key, then the left arrow key twice, and finally the X and Y keys:

>>> pyautogui.typewrite(['a', 'b', 'left', 'left', 'X', 'Y'])

Because pressing the left arrow key moves the keyboard cursor, this will output XYab. Table 18-1 lists the PyAutoGUI keyboard key strings that you can pass to typewrite() to simulate pressing any combination of keys.

You can also examine the pyautogui.KEYBOARD_KEYS list to see all possible keyboard key strings that PyAutoGUI will accept. The 'shift' string refers to the left SHIFT key and is equivalent to 'shiftleft'. The same applies for 'ctrl', 'alt', and 'win' strings; they all refer to the left-side key.

Table 18-1. PyKeyboard Attributes

Keyboard key string

Meaning

'a', 'b', 'c', 'A', 'B', 'C', '1', '2', '3', '!', '@', '#', and so on

The keys for single characters

'enter' (or 'return' or 'n')

The ENTER key

'esc'

The ESC key

'shiftleft', 'shiftright'

The left and right SHIFT keys

'altleft', 'altright'

The left and right ALT keys

'ctrlleft', 'ctrlright'

The left and right CTRL k
eys

'tab' (or 't')

The TAB key

'backspace', 'delete'

The BACKSPACE and DELETE keys

'pageup', 'pagedown'

The PAGE UP and PAGE DOWN keys

'home', 'end'

The HOME and END keys

'up', 'down', 'left', 'right'

The up, down, left, and right arrow keys

'f1', 'f2', 'f3', and so on

The F1 to F12 keys

'volumemute', 'volumedown', 'volumeup'

The mute, volume down, and volume up keys (some keyboards do not have these keys, but your operating system will still be able to understand these simulated keypresses)

'pause'

The PAUSE key

'capslock', 'numlock', 'scrolllock'

The CAPS LOCK, NUM LOCK, and SCROLL LOCK keys

'insert'

The INS or INSERT key

'printscreen'

The PRTSC or PRINT SCREEN key

'winleft', 'winright'

The left and right WIN keys (on Windows)

'command'

‹ Prev Next ›