Hacking Exposed

Page 19

by Aaron Philipp

Figure 8-5 B-tree leaf node data showing node slack

However, not all is lost. Although the leaf node entry may be physically overwritten, other instances of the node data may still exist in unallocated space, in index nodes, and in nodes that have been removed from the tree at a higher level. If all the files in a node are deleted because their common parent (directory) has been deleted, it is not unusual to see “pruned” nodes with all of the records intact.

Within each file entry in the B-tree are numerous bit fields, pointers, keys, and data values that include important things like creation and modification dates, file ID numbers, locations for the data blocks that make up the file, and things like an icon’s location and color. Apple’s developer support and documentation are fairly comprehensive regarding these structures. The Apple Developer Tech Note 1150 and the reference “Inside Macintosh: Files” are great places to start digging deeper.

Recovering Deleted Files

If you are interested in recovering deleted files from an HFS or HFS+ file system, you can go about it in two primary ways. The first way is to aggregate the unallocated space and concatenate it, essentially creating one data stream comprising sequential (although not contiguous) unallocated clusters. This is relatively easy to do if you just invert (Boolean NOT) the bitmap that tells you which clusters are allocated and which ones are not. The second method of data recovery involves identifying file system metadata and mapping those entries to unallocated areas of the volume.

Newer versions of the Mac OS allow files to be “shredded” or physically overwritten with the rm -P option. If you see this in the bash_history file, it is safe to assume the locations referenced by the file at the time of its deletion have been overwritten.

Concatenating Unallocated Space

This method gives you everything marked as unallocated. No file system information is directly associated with the data—it’s just a blob of data. This type of data is great if you are trying to recover data such as graphics, text, and other simple data formats. Fragmentation is eliminated, so if a deleted file were fragmented by an active file’s allocation, the data taken together would be contiguous.

Large Ranges of Null or Other Wasted Space

The aggregate of the unallocated space often consists of very large ranges of null or other wasted space. This can be a formatting pattern, a null, or some other repeating pattern. Depending on what you are looking for, filtering the blob can have many benefits. First, most grep implementations are line-based, so if you grep a blob of 2GB of null, grep will likely exhaust memory trying to buffer the first line. Since no line (just null) exists, memory is needlessly exhausted.

If you are looking for HTML or plaintext, you don’t really need to know about nulls, and a side benefit is that if your term exists as a UTF-encoded term, stripping the nulls allows you to find it without the need for a Unicode search term. If you think about RFC822 headers, URLs, e-mail addresses, contact information, log entries, and other simple data types, you realize that you really need to concern yourself only with printable ASCII characters. Filtering the unallocated space (depending on what you are looking for) saves storage, time, effort, memory, and aggravation. This is called “normalizing” the data that you are working with. If the hex character xFF doesn’t appear in what you are looking for, then why copy and search it?

Normalize Data

You have several friends here to help you—tr is your friend, strings is your friend, and SMART is your friend. Preprocessing (normalizing) the data before searching it for simple data constructs (when done correctly) allows more data to be searched fast using less memory and storage. Why wouldn’t you do this?

While this is great for data recovery, you aren’t helping much in a forensic analysis if all you can say is that the data exists in unallocated space. You want to be able to identify file attributes (name, date and time stamps, original location, and other info). For that, you need to use the second method of data recovery.

Scavenging for Unindexed Files and Pruned Nodes

Scavenging allocated leaf nodes for unindexed files and scavenging unallocated space for “pruned” index and leaf nodes is probably the best practical way to identify file system metadata and map those entries to unallocated areas of the volume.

Searching Unallocated Space for File Entries in Leaf Nodes

You could search unallocated space for file entries in leaf nodes. This presumes that you know what you are looking for, as in a filename or attribute data. When you don’t have that, though, you need to search for the places that are likely to have what you want to find: leaf nodes in the B-tree. You could use a search term like this:

xffx01x00

This expression means, “Since I don’t know about the fLink and bLink, match any two double words (8 bytes) that are followed by xff (leaf node), followed by x01 (leaf level), followed by x00 (index node).” You will probably never see a node with more than 255 descriptors in it. Of course, it helps if you know what you are looking for and where you are most likely to find it.

The searched term xffx01x00 is pretty loose and will likely generate hundreds, if not thousands, of false hits. Since you know that the data you are searching for is node header data, and you know it’s offset within the node, you can search for hits that exhibit sector boundary alignment. Since all file system constructs are based on a sector, this will reduce your hits to those most likely to be responsive and independent of cluster or node size.

Whew. Let’s take a moment and review. We’ve discussed devices, partitions, file systems, trees, and nodes—just scratching the surface of each. It’s all important stuff, but way beyond the scope of this (or any single) chapter. Just like stacking matching bowls, you can see how these things relate to each other, as shown in Figure 8-6.

Figure 8-6 Summary of devices to data

A CLOSER LOOK AT MACINTOSH FILES

Now that we’ve taken a brief look at the file system structure, let’s take a closer look at the files themselves. One of the things that make Macintosh files unique is the resource fork. Nowadays, file systems are more extensible and attributes can often be easily added, but it wasn’t always this way. The Macintosh HFS filing system was one of the, if not the first, file systems to embrace the concept of a file comprising multiple streams of data. This is commonplace now in NTFS, but the concept’s origins can be traced back to the Macintosh (as can the mouse, windows, task bars, and a great many other computer interface elements we see every day).

Archives

Relatively few major advances have occurred lately in compression technology. Compression algorithms are mature and reasonably well standardized. Years ago, StuffIt was the dominant Macintosh compression technology. With the Mac’s interoperability enhancements, Mac-centric compression algorithms such as BinHex have fallen by the wayside in favor of zip, gzip, and tarball formats. This trend seems to be accelerating now that the Mac has more POSIX underpinnings.

Date and Time Stamps

The Mac never really had a Y2K crisis. At worst, it faces a year 2040 crisis. Over the years, the Mac OS and ROMs have used different “time zero” references and have stored date and time stamps in different formats. The original date and time utilities (introduced with the original Macintosh 128K computer in 1984) used a long word to store seconds, starting at midnight, January 1, 1904, Local Time. This approach allows the correct representation of dates up to 6:28:15 A.M. on February 6, 2040. The current date and time utilities, documented in Inside Macintosh: Operating System Utilities (http://developer.apple.com/documentation/mac/osutilities/OSUtilities-2.html), use a 64-bit signed value, which covers dates from 30,081 B.C. to 29,940 A.D.

E-mail

In stark contrast to the Windows experience, e-mail has remained much easier to analyze on Macintosh machines. Although PST files can be created and used to store e-mail on Macs, the native format for non-Microsoft MUAs (Mail User Agents) e-mail stores are plaintext. Remember that mail can exist i
n a wide variety of formats, including cached web pages (Yahoo!, Hotmail, and so on), PST files, mdir, mbox, and others.

The e-mail accounts for the Mac built-in mail client Mail are stored in/Users//Library/Mail. The files named mbox contain the actual plaintext e-mail data, rfc822 data, and attachments. Attachments may be compressed in a variety of formats, including TAR, ZIP, GZ, BZ2, and graphics are typically encoded using base64 or UU-encoding. Programs like SMART may be used to carve individual messages and attachments automatically from mbox and newsgroup files.

/Users//Library/Caches/contains recently cached images, movies, or other data viewed by Mail.app and Safari.app. The mail folder contains many subfolders (labeled 00-15) that appear to be recursive (of depth 3), but it is actually using a hash table (a programming technique used for efficiency). The Safari folder is the same, except it is of depth 2. The MS Internet cache is contained in a standard .waf file.

Graphics

The Mac has been known as a graphically intense machine since its introduction and still enjoys a stronghold in prepress, layout, and graphic design. Contemporary Macintosh operating systems support a myriad of graphics formats, although most often we see the “standard” graphics file formats (GIF and JPEG), particularly when they come from the Internet. The “endian-ness” of the processor doesn’t affect the data format of the files. This is to say that even though memory and words are represented backward on a big-endian system, a GIF header will still be laid down on the disk in the same way, whether it is written to a FAT32 or HFS+ file system.

Web Browsing

Form follows function. Web browsing artifacts are similar to those found in the WinTel world. This is largely due to the standards in place for the various protocols that make the Internet work. HTTP is still a stateless protocol, cookies are still cookies, and HTML is still HTML. (In fact, this is far truer on the Mac compared to the Microsoft HTML implementation.)

/Users//Library/Safari contains the history files and bookmarks for the user. Also included is a folder of thumbnails named Icons. In Safari, when some Web sites are viewed, a thumbnail is displayed next to the URL that is relevant to the Web site being viewed (for example, Google uses its g logo, and CNN uses cnn in red on a white background).

/Users//Library/Cookies contains cookies of recently viewed Web sites. The cookies are stored in XML format. For more information about reviewing Web activity, see Chapter 12.

Resources

Resources are common objects whose templates are already defined elsewhere. A file’s resource fork (stream) can contain anything but is supposed to contain data and customizations of common objects that are unique to the file. A good example is language localization: a program is written once, and all the dialog boxes have their text and buttons in English. To localize the program, all you need to do is edit the resources (the words displayed in menu bars, dialog boxes, and so on), so instead of being labeled No, the button would say Nyet (No in Russian) without having to recompile or rewrite the actual executable code.

Virtual Memory

Most every OS (and many applications) use a backing store, virtual memory, swap file (or slice or partition), or some other method of caching memory to disk. As with other forensic investigations, these artifacts may contain a wealth of pertinent information. Preprocessing or normalizing the data prior to searching for simple data constructs can save you time, disk memory, and storage.

The swap files are located in/var/vm. This is where passwords temporarily stored in memory could be written to disk.

System Log and Other System Files

/var/log/contains a lot of information and is an extremely important file. Some of the information it contains includes serial numbers of removable media (thumb drives, SmartMedia, and so on) and some names of mounted media such as CDs and floppy disks.

/var/log/daily.out contains snapshots of mounted volume names and the dates they were mounted, as well as the used disk space on each of the mounted volumes.

/var/spool/cups contains files that hold information about documents recently printed. This includes the name of the document printed and the user who printed it.

/Library/Receipts is a folder containing system information about updates. This is useful in detecting whether a user had the latest system patches or security updates installed.

/Users//.bash_history contains recent terminal commands issued by the user. Look for rm -P commands, which mean that the user intentionally attempted to wipe data from a drive.

The var/vm folder contains another folder named app_profile. The files here that end with _names contain the names of applications that were recently opened. The files ending with _data contain temporary information useful to the applications in the _name documents.

/private/var/root/.bash_history contains recent terminal commands issued by the administrator. If this file exists, the user is probably familiar with Linux and should be considered to have at least an intermediate knowledge of OS X.

/Users//Library/Preferences/contains preference files of programs installed on the computer. Even deleted programs will still have their preference files left behind if the program generated them. Inside the preferences folder are the following:

• com.apple.Preview.plist contains a list of recently viewed pictures and PDF documents.

• QuickTimeFavorites contains a list of recently viewed movies and the disk location of the movies.

/Users//Library/Logs/DiskUtility.log contains information about disks recently mounted using DiskUtility as well as disks erased or burned by this application.

MAC AS A FORENSICS PLATFORM

If you have the luxury (or budget) of getting a Mac, use it as a “base camp” for some exploration. A great way to start is to wipe a drive with null (this verifies the media as well as facilitates tighter compression); install the Mac OS of your choice; go through the initial first boot process using documented information for username, Internet settings, preferences, and so on; and then make a compressed image of the drive. This way, you can always return to a known state quickly and easily.

If you are using a Macintosh as your forensic platform of choice, you can minimize the possibility of an “Oops!” Whenever possible, use a hardware write-blocking device. Although not essential with a properly configured forensic acquisition and analysis platform and a well-trained examiner, you can think of write-blockers as airbags for your investigation. You wear a seatbelt, keep your eyes on the road, and hope you never see your airbag. You protect yourself by disabling disk arbitration and understanding the system configuration and behavior.

The Mac has been able to mount disk and file system images for quite some time. Numerous utilities let you mount images of CD-ROMs and many other types of file systems. Be aware that these files may contain a complete file system with many files, as well as their own slack space, deleted files, unallocated space, and directory structures. Type and creator codes of DIMG, DDSK, VMK, and IMG should be looked at very carefully, as should any very large file. Virtual PC provides hardware emulation and may mount and create disk images, as can Apple’s Disk Copy.

Images of HFS, HFS+, many of the FAT flavors, and several other file systems may be mounted to this system for analysis. Setting the permissions of the underlying image file(s) to read-only should prevent any modifications to the data contained within the image. It is a common practice to embed the Cyclic Redundancy Check (CRC) or md5sum of the data into the image file’s resource fork. This allows authentication information about the image to be integrated into the image without affecting the data (fork) of the image file.

CHAPTER 9

DEFEATING ANTI-FORENSIC TECHNIQUES

An anti-forensic technique is any intentional or accidental change that can obscure, encrypt, or hide data from forensic tools. Very few anti-forensic techniques work the way a suspect might expect. Most suspects believe that by following the techniques illustrated in this chapter and in other publications, th
ey can hide their tracks. Trying to do this, however, often merely helps the investigator know where to look for evidence. In fact, a suspect who tries to hide evidence can actually strengthen an investigator’s success in uncovering it.

Most forensic examination tools used today tend not to trust data or view it in the same ways they did when computer forensics was a new field. For example, earlier versions of open source forensic tools could miss files and data due to logical coding errors. Most of the concepts discussed by most anti-forensic articles and covered in this chapter don’t affect modern tools, but you should still be aware of them, because you could be expected to know this information in court. Plus, if you design or create your own forensic tools, you’ll need to be aware of these issues.

OBSCURITY METHODS

An obscurity method is used by someone to try to obscure the true nature or meaning of some data, typically by changing its name or its contents. For our purposes, the term refers to a case in which someone has intentionally or accidentally changed the name or contents of a file, resulting in a file that will be either misinterpreted or disregarded in subsequent forensic analyses.

‹ Prev Next ›