Book Read Free

Automate the Boring Stuff with Python

Page 38

by Al Sweigart


  Gmail

  imap.gmail.com

  Outlook.com/Hotmail.com

  imap-mail.outlook.com

  Yahoo Mail

  imap.mail.yahoo.com

  AT&T

  imap.mail.att.net

  Comcast

  imap.comcast.net

  Verizon

  incoming.verizon.net

  Once you have the domain name of the IMAP server, call the imapclient.IMAPClient() function to create an IMAPClient object. Most email providers require SSL encryption, so pass the ssl=True keyword argument. Enter the following into the interactive shell (using your provider’s domain name):

  >>> import imapclient >>> imapObj = imapclient.IMAPClient('imap.gmail.com', ssl=True)

  In all of the interactive shell examples in the following sections, the imapObj variable will contain an IMAPClient object returned from the imapclient.IMAPClient() function. In this context, a client is the object that connects to the server.

  Logging in to the IMAP Server

  Once you have an IMAPClient object, call its login() method, passing in the username (this is usually your email address) and password as strings.

  >>> imapObj.login(' my_email_address@gmail.com ', ' MY_SECRET_PASSWORD ') 'my_email_address@gmail.com Jane Doe authenticated (Success)'

  Warning

  Remember, never write a password directly into your code! Instead, design your program to accept the password returned from input().

  If the IMAP server rejects this username/password combination, Python will raise an imaplib.error exception. For Gmail accounts, you may need to use an application-specific password; for more details, see Gmail’s Application-Specific Passwords.

  Searching for Email

  Once you’re logged on, actually retrieving an email that you’re interested in is a two-step process. First, you must select a folder you want to search through. Then, you must call the IMAPClient object’s search() method, passing in a string of IMAP search keywords.

  Selecting a Folder

  Almost every account has an INBOX folder by default, but you can also get a list of folders by calling the IMAPClient object’s list_folders() method. This returns a list of tuples. Each tuple contains information about a single folder. Continue the interactive shell example by entering the following:

  >>> import pprint >>> pprint.pprint(imapObj.list_folders()) [(('\HasNoChildren',), '/', 'Drafts'), (('\HasNoChildren',), '/', 'Filler'), (('\HasNoChildren',), '/', 'INBOX'), (('\HasNoChildren',), '/', 'Sent'), --snip- (('\HasNoChildren', '\Flagged'), '/', '[Gmail]/Starred'), (('\HasNoChildren', '\Trash'), '/', '[Gmail]/Trash')]

  This is what your output might look like if you have a Gmail account. (Gmail calls its folders labels, but they work the same way as folders.) The three values in each of the tuples—for example, (('\HasNoChildren',), '/', 'INBOX')—are as follows:

  A tuple of the folder’s flags. (Exactly what these flags represent is beyond the scope of this book, and you can safely ignore this field.)

  The delimiter used in the name string to separate parent folders and subfolders.

  The full name of the folder.

  To select a folder to search through, pass the folder’s name as a string into the IMAPClient object’s select_folder() method.

  >>> imapObj.select_folder('INBOX', readonly=True)

  You can ignore select_folder()’s return value. If the selected folder does not exist, Python will raise an imaplib.error exception.

  The readonly=True keyword argument prevents you from accidentally making changes or deletions to any of the emails in this folder during the subsequent method calls. Unless you want to delete emails, it’s a good idea to always set readonly to True.

  Performing the Search

  With a folder selected, you can now search for emails with the IMAPClient object’s search() method. The argument to search() is a list of strings, each formatted to the IMAP’s search keys. Table 16-3 describes the various search keys.

  Table 16-3. IMAP Search Keys

  Search key

  Meaning

  'ALL'

  Returns all messages in the folder. You may run in to imaplib size limits if you request all the messages in a large folder. See Size Limits.

  'BEFORE date', 'ON date', 'SINCE date'

  These three search keys return, respectively, messages that were received by the IMAP server before, on, or after the given date. The date must be formatted like 05-Jul-2015. Also, while 'SINCE 05-Jul-2015' will match messages on and after July 5, 'BEFORE 05-Jul-2015' will match only messages before July 5 but not on July 5 itself.

  'SUBJECT string', 'BODY string', 'TEXT string'

  Returns messages where string is found in the subject, body, or either, respectively. If string has spaces in it, then enclose it with double quotes: 'TEXT "search with spaces"'.

  'FROM string', 'TO string', 'CC string', 'BCC string'

  Returns all messages where string is found in the “from” emailaddress, “to” addresses, “cc” (carbon copy) addresses, or “bcc” (blind carbon copy) addresses, respectively. If there are multiple email addresses in string, then separate them with spaces and enclose them all with double quotes: 'CC "firstcc@example.com secondcc@example.com"'.

  'SEEN', 'UNSEEN'

  Returns all messages with and without the Seen flag, respectively. An email obtains the Seen flag if it has been accessed with a fetch() method call (described later) or if it is clicked when you’re checking your email in an email program or web browser. It’s more common to say the email has been “read” rather than “seen,” but they mean the same thing.

  'ANSWERED', 'UNANSWERED'

  Returns all messages with and without the Answered flag, respectively. A message obtains the Answered flag when it is replied to.

  'DELETED', 'UNDELETED'

  Returns all messages with and without the Deleted flag, respectively. Email messages deleted with the delete_messages() method are given the Deleted flag but are not permanently deleted until the expunge() method is called (see Deleting Emails). Note that some email providers, such as Gmail, automatically expunge emails.

  'DRAFT', 'UNDRAFT'

  Returns all messages with and without the Draft flag, respectively. Draft messages are usually kept in a separate Drafts folder rather than in the INBOX folder.

  'FLAGGED', 'UNFLAGGED'

  Returns all messages with and without the Flagged flag, respectively. This flag is usually used to mark email messages as “Important” or “Urgent.”

  'LARGER N', 'SMALLER N'

  Returns all messages larger or smaller than N bytes, respectively.

  'NOT search-key'

  Returns the messages that search-key would not have returned.

  'OR search-key1 search-key2'

  Returns the messages that match either the first or second search-key.

  Note that some IMAP servers may have slightly different implementations for how they handle their flags and search keys. It may require some experimentation in the interactive shell to see exactly how they behave.

  You can pass multiple IMAP search key strings in the list argument to the search() method. The messages returned are the ones that match all the search keys. If you want to match any of the search keys, use the OR search key. For the NOT and OR search keys, one and two complete search keys follow the NOT and OR, respectively.

  Here are some example search() method calls along with their meanings:

  imapObj.search(['ALL']). Returns every message in the currently selected folder.

  imapObj.search(['ON 05-Jul-2015']). Returns every message sent on July 5, 2015.

  imapObj.search(['SINCE 01-Jan-2015', 'BEFORE 01-Feb-2015', 'UNSEEN']). Returns every message sent in January 2015 that is unread. (Note that this means on and after January 1 and up to but not including February 1.)

  imapObj.search(['SINCE 01-Jan-2015', 'FROM alice@example.com']). Returns every message from alice@example.com sent since the start of 2015.

  imapObj.s
earch(['SINCE 01-Jan-2015', 'NOT FROM alice@example.com']). Returns every message sent from everyone except alice@example.com since the start of 2015.

  imapObj.search(['OR FROM alice@example.com FROM bob@example.com']). Returns every message ever sent from alice@example.com or bob@example.com.

  imapObj.search(['FROM alice@example.com', 'FROM bob@example.com']). Trick example! This search will never return any messages, because messages must match all search keywords. Since there can be only one “from” address, it is impossible for a message to be from both alice@example.com and bob@example.com.

  The search() method doesn’t return the emails themselves but rather unique IDs (UIDs) for the emails, as integer values. You can then pass these UIDs to the fetch() method to obtain the email content.

  Continue the interactive shell example by entering the following:

  >>> UIDs = imapObj.search(['SINCE 05-Jul-2015']) >>> UIDs [40032, 40033, 40034, 40035, 40036, 40037, 40038, 40039, 40040, 40041]

  Here, the list of message IDs (for messages received July 5 onward) returned by search() is stored in UIDs. The list of UIDs returned on your computer will be different from the ones shown here; they are unique to a particular email account. When you later pass UIDs to other function calls, use the UID values you received, not the ones printed in this book’s examples.

  Size Limits

  If your search matches a large number of email messages, Python might raise an exception that says imaplib.error: got more than 10000 bytes. When this happens, you will have to disconnect and reconnect to the IMAP server and try again.

  This limit is in place to prevent your Python programs from eating up too much memory. Unfortunately, the default size limit is often too small. You can change this limit from 10,000 bytes to 10,000,000 bytes by running this code:

  >>> import imaplib >>> imaplib._MAXLINE = 10000000

  This should prevent this error message from coming up again. You may want to make these two lines part of every IMAP program you write.

  Using Imapclient’s Gmail_Search( ) Method

  If you are logging in to the imap.gmail.com server to access a Gmail account, the IMAPClient object provides an extra search function that mimics the search bar at the top of the Gmail web page, as highlighted in Figure 16-1.

  Figure 16-1. The search bar at the top of the Gmail web page

  Instead of searching with IMAP search keys, you can use Gmail’s more sophisticated search engine. Gmail does a good job of matching closely related words (for example, a search for driving will also match drive and drove) and sorting the search results by most significant matches. You can also use Gmail’s advanced search operators (see http://nostarch.com/automatestuff/ for more information). If you are logging in to a Gmail account, pass the search terms to the gmail_search() method instead of the search() method, like in the following interactive shell example:

  >>> UIDs = imapObj.gmail_search('meaning of life') >>> UIDs [42]

  Ah, yes—there’s that email with the meaning of life! I was looking for that.

  Fetching an Email and Marking It As Read

  Once you have a list of UIDs, you can call the IMAPClient object’s fetch() method to get the actual email content.

  The list of UIDs will be fetch()’s first argument. The second argument should be the list ['BODY[]'], which tells fetch() to download all the body content for the emails specified in your UID list.

  Let’s continue our interactive shell example.

  >>> rawMessages = imapObj.fetch(UIDs, ['BODY[]']) >>> import pprint >>> pprint.pprint(rawMessages) {40040: {'BODY[]': 'Delivered-To: my_email_address@gmail.comrn' 'Received: by 10.76.71.167 with SMTP id ' --snip-- 'rn' '------=_Part_6000970_707736290.1404819487066--rn', 'SEQ': 5430}}

  Import pprint and pass the return value from fetch(), stored in the variable rawMessages, to pprint.pprint() to “pretty print” it, and you’ll see that this return value is a nested dictionary of messages with UIDs as the keys. Each message is stored as a dictionary with two keys: 'BODY[]' and 'SEQ'. The 'BODY[]' key maps to the actual body of the email. The 'SEQ' key is for a sequence number, which has a similar role to the UID. You can safely ignore it.

  As you can see, the message content in the 'BODY[]' key is pretty unintelligible. It’s in a format called RFC 822, which is designed for IMAP servers to read. But you don’t need to understand the RFC 822 format; later in this chapter, the pyzmail module will make sense of it for you.

  When you selected a folder to search through, you called select_folder() with the readonly=True keyword argument. Doing this will prevent you from accidentally deleting an email—but it also means that emails will not get marked as read if you fetch them with the fetch() method. If you do want emails to be marked as read when you fetch them, you will need to pass readonly=False to select_folder(). If the selected folder is already in readonly mode, you can reselect the current folder with another call to select_folder(), this time with the readonly=False keyword argument:

  >>> imapObj.select_folder('INBOX', readonly=False)

  Getting Email Addresses from a Raw Message

  The raw messages returned from the fetch() method still aren’t very useful to people who just want to read their email. The pyzmail module parses these raw messages and returns them as PyzMessage objects, which make the subject, body, “To” field, “From” field, and other sections of the email easily accessible to your Python code.

  Continue the interactive shell example with the following (using UIDs from your own email account, not the ones shown here):

  >>> import pyzmail >>> message = pyzmail.PyzMessage.factory(rawMessages[40041]['BODY[]'])

  First, import pyzmail. Then, to create a PyzMessage object of an email, call the pyzmail.PyzMessage.factory() function and pass it the 'BODY[]' section of the raw message. Store the result in message. Now message contains a PyzMessage object, which has several methods that make it easy to get the email’s subject line, as well as all sender and recipient addresses. The get_subject() method returns the subject as a simple string value. The get_addresses() method returns a list of addresses for the field you pass it. For example, the method calls might look like this:

  >>> message.get_subject() 'Hello!' >>> message.get_addresses('from') [('Edward Snowden', 'esnowden@nsa.gov')] >>> message.get_addresses('to') [(Jane Doe', 'my_email_address@gmail.com')] >>> message.get_addresses('cc') [] >>> message.get_addresses('bcc') []

  Notice that the argument for get_addresses() is 'from', 'to', 'cc', or 'bcc'. The return value of get_addresses() is a list of tuples. Each tuple contains two strings: The first is the name associated with the email address, and the second is the email address itself. If there are no addresses in the requested field, get_addresses() returns a blank list. Here, the 'cc' carbon copy and 'bcc' blind carbon copy fields both contained no addresses and so returned empty lists.

  Getting the Body from a Raw Message

  Emails can be sent as plaintext, HTML, or both. Plaintext emails contain only text, while HTML emails can have colors, fonts, images, and other features that make the email message look like a small web page. If an email is only plaintext, its PyzMessage object will have its html_part attributes set to None. Likewise, if an email is only HTML, its PyzMessage object will have its text_part attribute set to None.

  Otherwise, the text_part or html_part value will have a get_payload() method that returns the email’s body as a value of the bytes data type. (The bytes data type is beyond the scope of this book.) But this still isn’t a string value that we can use. Ugh! The last step is to call the decode() method on the bytes value returned by get_payload(). The decode() method takes one argument: the message’s character encoding, stored in the text_part.charset or html_part.charset attribute. This, finally, will return the string of the email’s body.

  Continue the interactive shell example by entering the following:

  ➊ >>> message.text_part != None True >>> message.text_part.get_payload().decode(message.text_part.charset) ➋ 'So long, and t
hanks for all the fish!rnrn-Alrn' ➌ >>> message.html_part != None True ➍ >>> message.html_part.get_payload().decode(message.html_part.charset) '

So long, and thanks for all the fish!

-Al
rn'

  The email we’re working with has both plaintext and HTML content, so the PyzMessage object stored in message has text_part and html_part attributes not equal to None ➊ ➌. Calling get_payload() on the message’s text_part and then calling decode() on the bytes value returns a string of the text version of the email ➋. Using get_payload() and decode() with the message’s html_part returns a string of the HTML version of the email ➍.

  Deleting Emails

  To delete emails, pass a list of message UIDs to the IMAPClient object’s delete_messages() method. This marks the emails with the Deleted flag. Calling the expunge() method will permanently delete all emails with the Deleted flag in the currently selected folder. Consider the following interactive shell example:

  ➊ >>> imapObj.select_folder('INBOX', readonly=False) ➋ >>> UIDs = imapObj.search(['ON 09-Jul-2015']) >>> UIDs [40066] >>> imapObj.delete_messages(UIDs) ➌ {40066: ('\Seen', '\Deleted')} >

‹ Prev