CHAPTER 10
ENTERPRISE STORAGE ANALYSIS
You can apply most of the tools and techniques covered so far in this book to any type of investigation. Despite their growing scope and complexity, most investigations scale well with the tools provided in other chapters. However, when you’re forced to deal with terabytes of data with technology that was not designed for easy access from the desktop PC or laptop, you have to reevaluate your situation. This chapter defines how to deal with RAIDs (redundant array of inexpensive disks), SANs (storage area networks), tapes, and the large and expansive datasets that you will gather when dealing with a large investigation.
Many of the techniques discussed in this chapter also apply to electronic discovery. Electronic discovery relates to the collection, processing, review, and production of electronic documents in a lawsuit. Large datasets and wide ranges of system types are the rule, not the exception, in electronic discovery.
THE ENTERPRISE DATA UNIVERSE
First, let’s define the enterprise environment. In this book, enterprise environment refers to all of the systems, servers, and data that make up a company’s computing system. Most sets of data that are included in an enterprise scenario include x86-based servers (any operating system), non-x86 based servers, Network Attached Storage (NAS) systems, SAN systems, servers with RAIDs, tapes (lots of them), and hundreds to thousands of desktops and laptops. This list does not even include the portable devices now popular, such as PDAs, cell phones, and thumb drives, but those devices are covered in other chapters. Dealing with all of this data requires that you use tools that are made to handle searches of large quantities of data, so full-text indexing is also discussed here.
Working with RAID
RAID sets can be created by hardware or software such that either the hardware-based RAID controller creates and maintains the RAID set or the software does. A RAID allows several disks to be viewed by the operating system as a single disk. A RAID can pose a huge problem to an unprepared examiner, however, and most systems that use RAID are on server systems. This means that data stored on RAIDs is typically valuable, and the risk of modifying the RAID system can be harmful to its owner, as he or she would loose access to all of his/her data.
Unless configured as RAID 1, RAIDs will write data across all the disks that make up the RAID set. This means that you will have to image each disk in the RAID and keep the system powered down before you can allow the owner to access his or her data again. SAN and NAS systems, which are covered next, both use RAID.
Acquiring a RAID
Acquiring a RAID is similar to imaging other types of media, except that you need to be able to write down the disk’s original sequence in the drive bays. If possible, you should also get the RAID configuration’s settings stored within either the hardware card or the operating system. The system owner should be able to provide this information to you.
The information you need for the manual reconstruction of a RAID set is the RAID type, number of disks, stripe size or chunk size, and the order of the disks. This will allow the manual reconstruction of the RAID. Hardware RAID will have a supplemental bootup screen that shows where this information is available.
Once you have acquired the RAID set, you can either reassemble it in Windows with Guidance Software’s EnCase or within Linux using the raidtools set. It is difficult to replicate the hardware environment of the original RAID set. Additionally, re-creation using the original hardware could possibly overwrite the restored images in the process of initializing the RAID, resulting in high costs and potential failure.
Rebuilding RAIDs in EnCase
To rebuild a RAID in EnCase, you must select all of the RAID disks’ images that you have created in EnCase when adding evidence to the case. EnCase will “automagically” recognize individual images that are part of a RAID set and attempt to reconstruct the RAID. The newly reconstructed RAID will then appear to you as a single disk.
Rebuilding RAIDs in Linux
To rebuild a RAID in Linux, you must have created each of the images with the dd utility. Once you have these images, you must mount them with the local loopback using the following command:
mount -o loop, ro /path/to/image/path/where/to/mount
Here, -o loop is for local loopback and ro is for read-only. If the mount was successful, you will have a read-only version of the image that will be treated as a disk on the Linux system. This means that you can use the raidtools program that comes with Linux to rebuild the RAID array. Since the RAID images are marked as read-only, you do not have to worry about raidtools or any other RAID toolkit overwriting or changing the image.
WORKING WITH NAS SYSTEMS
NAS allows remote systems to access disk sets created within them either as a network share or as a physical disk in the case of iSCSI (Internet SCSI). NAS systems are normally singular units that provide a large data volume as a single disk or a set of shares to a network. NAS systems can be as small as 200GB but can grow to several terabytes in size.
Acquiring NAS
Unless the NAS you are working with supports iSCSI, you cannot make a direct connection to the NAS to create a true and correct image. Instead, you must shut down the NAS system and image each drive. This is not something to be taken lightly, as it could mean that you would have to shut down the NAS system for an entire day. Make sure that you plan ahead, and give the NAS owner time to prepare for the downtime.
Since the first edition of this book, some NAS systems such as NetApp have added a command shell that you can log into. If you are able to do this, mount a remote drive and use the dd syntax from Chapter 4 to image the NAS system to the network share you have mounted. You should increase the block size to make use of the network frame sizes and use the split command in case of a network error aborting your dd command.
If you do not have to image a NAS system forensically, do not do it. Request a backup tape set instead.
WORKING WITH SAN SYSTEMS
SAN is a series of hard drives combined into different disk sets and available as physical disks to remote systems through some type of storage network, typically a fiber channel. SANs can be a potential nightmare to deal with in the field. A SAN is either a single system or a set of systems interconnected on a dedicated network, normally fiber, to create large sets of disks that can be assigned to specific servers or shared among multiple servers. A single SAN disk can span terabytes of data, which will quickly exceed the capacity of any single drive you put in front of it. Removing disks from a live SAN may cause the SAN to lose its index to data and cause the SAN owner to lose the data. So, if possible, do not remove the disks from the SAN itself.
If you do not have to image a SAN system forensically, do not do it. Request a backup tape set instead.
Acquiring a SAN System
If you are required to make an image of a SAN network, you will need to gather some facts by asking the following questions:
• What type of network is connecting the SAN to the systems using it?
• On that network, are any ports free on the switch (fiber switch or Ethernet switch for iSCSI)?
• What type of adapter cards will you need? (Fiber adapter cards are not sold in stores, so make sure to ask this question early on.)
Next you need to do some research. Your best bet in creating an image of a dataset this large would be to bring a RAID set of your own on which to store all of this data. The only way to mount multi-terabyte volumes and acquire terabytes of data with high throughput—and without ever modifying the evidence—is to use a Linux system. If you use Windows to do this, the first thing Windows will do is touch—write to—the SAN disk. And, as you know, modifying evidence is always a bad idea.
Which distribution of Linux you choose to use depends on the adapter card you must install. Red Hat offers good support for adapter cards and is probably the best solution. Next, on most networks, the operational staff will have to add your system to its SAN so that you can do your job. Then you wi
ll have access to the SAN disks you need to image.
Make sure that any other system that could have access to the SAN is shut down; otherwise, the data can be modified as you collect it.
After you have completed all of this, your adapter card will provide mappings to SAN disks as SCSI devices. Use dd as you normally would, or use SMART. Hashing and verification will take some time, so plan on this taking at least a day to complete.
Many things can go wrong. We do not recommend that you attempt to image a SAN disk unless you are comfortable with the situation and are able to test all of your equipment beforehand. Remember that you don’t want to harm what could be a million-dollar system.
WORKING WITH TAPES
Tapes, specifically backup tapes, come in a wide variety of flavors and have changed a great deal in the last decade. In fact, if you are given a tape that was written to more than five years ago, you may not be able to find a drive to read it. Tapes are slow and prone to breakage. Additionally, tapes are written with proprietary software in proprietary formats, and usually large sets make up a single backup. If you are doing any type of long-term work with an enterprise, you will encounter tapes. Your ability to work with those tapes and show competence doing so is important.
The current dominant tape formats, as of the year 2009, are Super Digital Linear Tape (DLT) and Linear Tape-Open (LTO). Many other formats, including 8mm, 4mm, quarter-inch cartridge (QIC), 16-track, and 32-track, among others, come in a large variety of formats themselves—including Advanced Intelligent Tape (AIT); Exabyte; IBM 3840; DLT 3000, 4000, 6000, 7000; SDLT1, SDLT220, SDLT320, SDLT600 and the value series
DLT-VS; LTO1, LTO2, LTO3, LTO4; DDS-1, 2, 3, and 4; StorageTek, and more. You could spend lots of time and money trying to prepare to handle every type of tape media that exists. Instead, if you understand the basics of how tapes operate and how to interact with them, you will be well served in your efforts to deal with them.
If you receive more than 20 tapes to examine, you should look into getting a tape robot, also called an autoloader. It used to be that any type of tape robot was expensive and required large, complicated systems to work with, but this is no longer the case; the rapid change of tape technologies has created a large market for refurbished and used, low-priced tape robots. Most tape robots, and hopefully all tape robots you are required to handle, support a SCSI connection to your system. With the fall in prices of tape robots and the wide availability of SCSI components, a desktop system can quickly be adapted to handle any tape production. Your only concern, then, becomes how to store all of this data, a topic that is covered in Chapter 3.
Tapes have some inherit qualities that make forensic analysis easier: Almost all tapes have a write protect tab, meaning that upon setting the tab in the proper direction (this varies per tape), your drive will be unable to write to it. This is not a unique ability, but it means that you do not have to worry about tape write-blockers or modifying evidence on a tape while reading it. Before loading any tape, you should make sure to check the write-protect tab and set it properly.
Reading Tapes
When you are handed a tape, ask the following questions:
• Where did the tape come from?
• Who wrote the tape?
• What software wrote it?
• What type of drive wrote this tape?
While the model number on the tape can help you answer most of your drive-related questions, many times you cannot discover what software wrote the tape or even what is stored on it. If you cannot even determine what software wrote the data, let alone what is on the tape, the evidence becomes worthless and you cannot defend against its production to another party in a lawsuit. If this occurs, the opposing party can request to be given the tapes themselves in an attempt to access the data. The act of doing so can be seen as a “waiver of privilege.” See Chapter 15 for details of privilege and evidence production.
Identifying Tapes
We have found it possible and beneficial to access the tape from its raw device—the actual path to the tape drive itself. This method lets you read data directly from the tape without having to use any translation or interpretation software in the middle. This can be done on Windows and UNIX systems.
Accessing Raw Tapes on Windows
Before accessing the raw tape drive in Windows, you should first install Cygwin, a free UNIX emulation environment for Windows, which is found at www.cygwin.com/. UNIX utilities such as dd, covered in Chapter 4, were made to access tape devices and read the data out in blocks. Other utilities, such as type and more, will try to send the tape device control signals that it does not support, and your attempts to access the information will fail.
A plus side to accessing raw tape devices in Windows is that the Windows drivers automatically detect the block sizes of the tapes and any other tape-level settings, so you do not have to spend your time trying out different options.
After you have installed Cygwin, you can access the tape device by executing the following command:
dd if=/dev/stO | less
Cygwin maps the standard Linux location of st0 or “standard tape 0” to the Windows physical, raw device \.Tape0. This is the actual location the operating system uses to access the tape drive. The number at the end of st and Tape will grow to reflect each drive you attach to your system—so st1 = tape 1, st2 = tape 2, and so on. When the command has been successfully executed, abort it after the first screen of data has passed.
You may optionally choose to write out the data to a file by executing the following:
dd if=/dev/st0 > tape0
This command will write the tape’s data out to a file called tape0. You are looking for data typically located in the first five lines of the file or screen. Most backup software identifies itself here with lines such as arcserve, netbackup, tar, and so on. If you are imaging one of the newer tape formats, make sure to pass on the sync option to dd or else the data speed of the tape output may cause dd to give an input/output error.
You can see a listing of all of the tape drives remapped for you by typing the following command:
mount
Accessing Raw Tapes on UNIX
Accessing the raw tape device under UNIX is actually the same as doing it in Windows using Cygwin. You use the same dd command with both operating systems. However, if you receive errors, you will have to attempt to guess the block size used to write to the tape. The easiest remedy to try is the following dd command:
dd if=/dev/st0 bs=0 > tape0
Setting bs, or block size, to 0 tells dd to detect the block size automatically. This usually solves the problem, but if it does not, you will need to install and use the MTX toolkit (available at http://mtx.opensource-sw.net/) to check the status of the tape drive. The MTX toolkit allows you to control attached tape robots manually and access tape devices at the raw SCSI level. Because no single solution to this problem exists, our recommendation would be to start at bs=64 and work your way up in powers of two. If you are imaging one of the newer tape formats, make sure to pass on the sync option to dd or else the data speed of the tape output may cause dd to give an input/output error.
If you are trying to acquire a modern tape, such as linear tape-open (LTO) or SDLT, you will need to pass in the sync option to dd:
dd if=/dev/st0 bs=0 conv=sync > tape0
Commercial Tools for Accessing Tapes
The only commercial tool for identifying and accessing tapes that we have used with good results is eMag Solutions’ MediaMerge for PC (MM/PC), available at www.emaglink.com/MMPC.htm. MM/PC is one of few specialized tools available for accessing tapes in their raw form. This tool attempts to detect the tapes’ format automatically and can read, extract, and—for some formats—catalog the contents of tapes without the original backup software that wrote the tapes.
MM/PC also lets you view the contents of a tape within a GUI environment. With the MM/PC environment, you can see the ASCII and hex values of the data on the tape and skip through
the file records on the tape to view the type of data within it. MM/PC also offers a “forensic option” that allows you, if MM/PC supports the format, to inventory and capture the data on the tape without restoring the tape itself.
Preserving Tapes
You might think that a tape cannot be imaged, but this is not true. Using the dd tool, you can take an exact image of any tape that your tape drives can read. Imaging tapes can be useful when a tape has been reused multiple times in a backup. If older data on the tape was not overwritten—if the new backup did not completely overwrite the old— the older data can be restored by imaging the tape.
Imaging Tapes
The dd command can be used to image an entire tape, including every block that is readable on the tape. Use the following command:
Hacking Exposed Page 22