Figure 12-2 A Word document that has been modified and quick saved
First off, you need to understand why Quick Save exists. When a document gets big, it can be very time-consuming to save the document, tying up resources and basically slowing down the whole show. With Office’s Auto Save feature, saving can become distracting and time-consuming while you are working. So Quick Save was created to save documents quickly and painlessly with minimum disruption to the user. It does this by not making changes to the body of the document; instead, it appends the changes, and information about where the changes appear goes at the end of the document. Once a certain file size is exceeded, the save goes back, incorporates all the changes into the main body of the document, and shrinks the file size back down. From a forensic investigation standpoint, this can be a great thing because data that a user thinks is deleted actually still exists in the document.
Let’s go back to our example. If you open the file in a binary editor (I recommend XEmacs for non-forensics work), you can look for information that may have been “deleted” but not removed from the file. As you can see in Figure 12-3, a simple search of the document reveals information that appears as though it were still included in the document.
To confirm, we open Word and perform an undo to see what comes back. As you can see in Figure 12-4, the data deleted from the document in Figure 12-3 has been recovered.
This technique will work through multiple changes to the document and can actually go pretty far back in the revision history. You can typically find the data you are looking for using a keyword text search on the document with a tool such as EnCase or a binary editor, and then go back into Word to reconstruct the document.
Figure 12-3 Locating deleted data in a Word document
Figure 12-4 The data after a single undo
Word 97 MAC Address
If you are lucky enough to find a document that was created in Word 97, you can actually get the MAC address of the machine on which the document was created. A MAC address is like the fingerprint of a network card and is typically a number formatted like so: 00-09-5B-E6-24-5D. In the Word document, however, it’s formatted a bit differently. Take a look at Figure 12-5, which shows the MAC address in the document itself.
To find the MAC address in a document, open the file in a binary editor and do a search for PID. This will bring up the entry.
Let’s look at the PID-GUID for the Melissa virus document:
PID_GUID {572 85 8EA-36DD-11D2-885F-004 033E0078E}
Figure 12-5 The MAC address in a Word document
If you look at the last chunk of data, 004033E0078E, and break it down, you get 00-40-33-E0-07-8E; this is clearly the MAC address of the machine on which the document was created. It must be stated, however, that this number can be modified and is nonauthoritative.
You can check for a MAC address by looking at the first three pairs of numbers in the MAC address; this is the vendor ID. You can use any number of Internet database lookup sites to find out who owns that MAC address and who created the card. If you are certain that you know on what machine a document was created, you can use this information for cross validation purposes. If the vendor ID and the actual maker of the card do not match, that is a red flag that tampering has occurred.
When opening an Office document, the program does a couple of very basic file size checks to make sure that nothing has been modified. If the document won’t even open in Office, that should be a red flag that modification of metadata has occurred.
Past Filenames
Older Office (pre-Office 2003) documents actually store every filename under which they have ever been saved in the file. This can be very handy if you are looking for directories to go after or network drives that may have been used, or if you need to subpoena removable media to conduct further investigation. The key to this technique is that the filenames are stored in Unicode instead of straight ASCII, so you need to use an application such as strings.exe from Systernals to extract the files. Running strings.exe with the -u argument will output only Unicode text strings from the document. Here’s an example of running the strings program on a Word document:
Strings -u tester.doc
Strings v2.1
Copyright (C) 1999-2003 Mark Russinovich
Systems Internals - www.sysinternals.com
…
D:mystufftest.doc
…
Times New Roman
Root Entry
C:draft.doc
As you can see, multiple filenames and paths are stored in the document. You can then use your image to trace back these files, and if they point to network shares, you can use this data as a reason to conduct further discovery during litigation.
Working with Office Documents
When you’re working with Office documents, remember to be creative and always look beyond what you see when you open the document. You can pull a wealth of information from these documents if you know where to look for it. In fact, EnCase has built support for reading and searching the Unicode into the latest version to make this type of investigation easier. One caveat, however, is that the data is nonauthoritative by itself. If you base your court case solely upon this data, you are going to have a bad time. Use this information to corroborate evidence you’ve obtained from other sources or to develop new leads that you can follow. That said, a little bit of time with an Office document and a low-level editor can point you in the direction you need to go to investigate your case effectively.
TRACKING WEB USAGE
As an investigator, you will frequently find yourself reconstructing a user’s web activity. Lucky for you, it seems as though everyone who decides to write a forensic tool writes it in a way that reads a browser’s cookies and history. The process of going through the working files and reconstructing activity is actually pretty straightforward, and when properly validated it can be reasonably authoritative. To help you understand what we are going to be looking at, we’ll discuss what kinds of records a web browser would keep that denotes user activity.
First, you have to look at what sites a user visited while using the browser. This information can be obtained from the history file, which stores information on every URL a user has loaded, going back for months. Even if a user has tried to cover her tracks by deleting the history, it may still be recoverable and useful in an investigation. Once you have the URLs that she has visited, you need a way to find out what she did while she was there. Conventionally, you can do this using two methods: by looking at the cookies for the site to determine user behavior or by reconstructing the web pages from the temporary Internet files. Let’s look at how to conduct an investigation for the two most popular browsers: Internet Explorer and Firefox/Netscape.
Internet Explorer Forensics
Internet Explorer (IE) has been the default web browser for the Microsoft Windows platform since Windows 95. In fact, later versions of Windows have built IE to interact very closely with the operating system, opening some interesting paths for forensic investigation of activity. Covering your tracks in IE is a nontrivial task. Even if you delete the history using the IE facilities, it can still be recovered because of its close interaction with the OS.
Viewing the History
The history utility in IE, shown in Figure 12-6, creates a convenient audit trail for what a user likes to do on the Internet. It can be used to show whether the user frequents certain types of sites, if she lands on a site inadvertently, and what she is doing when she visits a site. This information is useful in everything from policy violation cases all the way up to criminal activities.
EnCase comes with an EnScript feature that will automatically search the image for IE history and present it in a report format. If you use EnCase, this can greatly speed your investigation, although you should make sure you understand what the script does and how it does it.
Figure 12-6 Internet Explorer’s history utility
Table 12-1 Breakdown of File Entries in Windows XP<
br />
Luckily, as long as you know where to look, you can use tons of tools to make this job easy. For the sake of demonstration, we will use a freeware command-line utility from Foundstone called Pasco. While completely devoid of any kind of flash or bells and whistles that other commercial products have, it gets the job done. It takes an index.dat file and converts the data into a tab-delimited format. Once you have that, you can import it into Excel and slice and dice it as you see fit. Then the fun begins. If you do a search for index.dat, you will find about five to ten entries. As you can quickly see from looking at any one of them, several different types of entries are included. Table 12-1 shows a breakdown of those that exist in Windows XP, their location, and what each one does.
If you are investigating an older version of Internet Explorer, here are some directories and file locations to look for that will hold the same information:
• C:WindowsCookiesindex.dat
• C:WindowsHistoryindex.dat
• C:WindowsHistoryMSHistXXXXXXXXXXXXXXXXXXindex.dat
• C :Windows/HistoryHistory.IE5/index.dat
• C:WindowsHistoryHistory.IE5MSHistXXXXXXXXXXXXXXXXXXindex.dat
• C:Windows/Temporary Internet Filesindex.dat (only in Internet Explorer 4.x)
• C:WindowsTemporary Internet FilesContent.IE5index.dat
• C:WindowsUserDataindex.dat
• C:WindowsProfiles
• C:WindowsProfiles
• C:WindowsProfiles
MSHistXXXXXXXXXXXXXXXXXXindex.dat
• C:WindowsProfiles
• C:WindowsProfiles
MSHistXXXXXXXXXXXXXXXXXXindex.dat
• C:WindowsProfiles
• C:WindowsProfiles
• C:WindowsProfiles
Now that you know where to look, let’s examine how these interconnect and how you can use them to trace user activity. The first place you want to go is to the main history to locate what Web sites the user has visited. Here’s a listing of the History.IE5 directory:
As you can see, five different directories start with MSHist01 followed by a string of numbers. Let’s decipher the sequence that MS uses for this structure.
The number 2004062820040629, for example, looks pretty meaningless at first glance. If you break it up a bit, though, a pattern emerges: 2004-06-28 and 2004-06-29. If you look at the created time, this suspicion is verified. This is how you tell what dates the directory holds. For our purposes, let’s try to find an event that occurred on 2004-06-28, so we would use the index.dat in MSHist012004062120040628. You would go into the directory and actually extract the data from the file.
C:/Documents and Settings
History File: index.dat
TYPE, URL, MODIFIED TIME, ACCESS TIME, FILENAME, DIRECTORY, HTTP HEADERS ,URL,:2004062120040628:
This is one line from the raw output of Pasco. As you can see, several fields are stored in the record. You need to determine what each one represents, as shown in Table 12-2.
For those who are unfamiliar with the command line, you can use the following command to dump the history into a text file that you can import into Excel:
Pasco
Hacking Exposed Page 27