dd if=/dev/nst0 bs=0 conv=noerror >> tape0.image
Note that nst0 is used instead of st0. The device nst0 means “non-rewinding.” When you access st0, it will automatically rewind the tape before processing your request. Using nst0, you can continue to read into the tape with every consecutive execution of dd. To ensure that you have a complete image of the tape, you must execute the dd command multiple times to guarantee that you have hit the end of the tape and not just a blank file marker. A good rule of thumb is five end of tape error messages. This command is the same on Windows or UNIX systems.
Creating Too Much Evidence
One of the major misconceptions that exists when collecting large amounts of data from many systems is that forensic imaging is the only option available to you. Another larger and more damaging misconception is that forensic imaging is something that should be applied broadly and across any potentially relevant system.
Truth is, if you are collecting a large amount of data from a large number of systems, it is usually in support of some kind of legal action at the request of the court. When you are doing anything in support of a legal action, you must be aware of the ramifications of your decisions in court. In a nutshell, creating too much evidence is almost as bad as not preserving any at all. Why? Because when you create a forensic image without either a court order or the specific understanding that the system involved contains some amount of relevancy to the case, you remove several of the protections on which your legal counsel depends.
Specifically, you have lost your client’s counsel most of his or her argument for “overly burdensome requests” that protects your client from having to spend exorbitant amounts of money to provide evidence to the court. For example, suppose you were approached in either an internal or external capacity to collect data from 100 users. If you decided on your own, without a request from counsel or the court, to incur the expense of creating forensic images of all 100 users’ machines, you have basically cost your client unnecessary money, time, and possibly important defense tactics. If the opposing counsel were aware of the images’ existence (and such information is normally discovered during depositions), the opposing counsel could make a motion for the court to order their production. Because your side would incur no additional costs in creating the images in response to the order, your counsel would have lost a major tactic in defending his or her side from overly broad evidence productions. (Evidence productions are discussed in detail in Chapter 15.)
Live File Collections
A live file collection is a fancy way of saying that you are copying, while preserving the MAC times of active files, to a central location for preservation, review, and production to opposing counsel. The MAC times of active files include Modification, Access, and Creation or Change, depending on the operating system. Active files are known to the operating system and are not marked for deletion.
This obviously is not a technically difficult problem, but it is, however, a logistical and organizational problem. You need to discover where the relevant data exists, which users are relevant, which departments are relevant, which servers they use, and other similar information. After you have defined the scope of where your data lies, you need to deploy a tool that users can run to collect their own data or one that can access their systems over the network to allow the data to be copied.
Collecting Live Data from Windows Systems
Two tools that we have used to collect data in this fashion are Pixelab’s XXCopy, found at www.xxcopy.com, and Microsoft’s RoboCopy, available in the Windows resource kits. Pixelab provides a free version of XXCopy, while RoboCopy is available only as a free tool within the resource kit.
Preserving Files with XXCopy
XXCopy is a DOS-based tool that takes a large variety of command-line parameters for a wide range of results. We recommend the following syntax:
C:xxcopy c: z:/s/h/tca/tcc/tcw
This would tell XXCopy to copy all of the files including all subdirectories, /s, and all hidden files, /h, from the c: drive to the z: drive. The switches /tca, /tcc, and /tcw tell XXCopy to preserve the access, creation, and modification times of the files. It does so by taking the original times from the source files and applying them to the destination files.
Preserving Files with Microsoft RoboCopy
RoboCopy, short for Robust File and Folder Copy, in the most recent version by default will preserve all of the modification, access, and creation times. The syntax for executing it is as follows:
C:robocopy c: z: /e
This tells RoboCopy to copy all files from the c: drive to the z: drive for all subdirectories in c:.
FULL-TEXT INDEXING
When you encounter large datasets (such as those greater than 300 gigabytes), you may find yourself with a “needle in the haystack problem.” The search features of most of the forensic and system tools we have discussed so far are not designed for continuous searches against large amounts of data. What you need is a way to take all of the data, which in some environments can grow to terabytes in size, and place it in some kind of search tree. The exact type of tree structure used varies by vendor.
Binary Search Trees
A binary search tree is a structure that allows your system to store dataset information in two subtrees, one left and one right. Data is sorted by key into one of the subtrees, and the key is used to determine in which subtree the search should continue. The search process continues to divide the information into two parts, narrowing the search to one part in sequence, until the sought item is found. The binary search tree enables you to search terabytes of data in seconds instead of hours or days.
The overall benefit is that you can search through any dataset of any size, the worst case being log base 2 n times, where n is the size of your data. This means that instead of searching through 500 gigabytes of data sequentially, you can find the specific words that make up your search in 39 steps, and you can search through 5 terabytes of data in 43 steps!
Compare this to the steps required to perform multiple sequential searches through the dataset: assuming we are examining 64k-byte blocks from a drive, it would take 7,812,500 steps to search 500 gigabytes of data completely and 78,125,000 steps to search through 5 terabytes of data completely. While the indexer must also perform these steps initially, for each subsequent search of the data it would only have to perform the log base 2 n searches.
This means you can search terabytes of data in seconds. Now you can perform all the searches requested without suggesting it will take a year and you’ll call them when you’re done. The general rule we follow is that if you plan to search the data only once, perform a linear search; if you plan to search it more than once, create a full text index using a binary search tree.
Missing Data When Indexing
As is often the case, you pay a price when using additional functionality, and in the case of indexing, the price is encoded files. When indexing, you must first create a full-text index of all of the data. This means that the indexer, the program that creates the index, must be able to distinguish words from your data to be included in the binary tree. While this is not a problem for source code, basic e-mail, and text files, it is a problem for Office documents, e-mail attachments, e-mail container files (such as Microsoft Outlook PST files), and any other file that is not made up of plain ASCII characters. You need to use an indexing program that will convert these file types for you, or you will have to convert the file types yourself. We will cover both types of indexing.
Glimpse
Glimpse is a free full-text indexing program that is packaged with a search interface called Webglimpse, available at www.webglimpse.org/. Glimpse will give you incredibly advanced indexing options such as merging indexes, appending and deleting files from indexes, and even using regular expressions against indexed data. Webglimpse even offers free and low-cost options for support. However, Glimpse does not automatically convert files for you. Glimpse expects that the files fed to the program wi
ll be in ASCII text form.
The Webglimpse package addresses the file-conversion challenge with a user-customizable listing of programs to call to convert your files. The Webglimpse Web site even provides links and instructions on how to convert the most popular file formats that users encounter. Adding file types and finding conversion utilities as well as testing their accuracy are up to you. Glimpse is not for the novice user, but with some experience and work, you’ll find that Glimpse is a great free indexing system that will make quick work of your data searches.
Webglimpse acts like a familiar web search engine to search the data. In fact, Webglimpse was created primarily to allow for indexed searches of Web sites. The search interface will allow you to select and search across your indexes and will highlight the hits in an abstract of the file on the Web page that it returns. Anyone with a web browser can access the Webglimpse interface, and multiple users can search it at once. Also, Glimpse supports the ability to search the index directly from the command line, making for some great automation possibilities. Glimpse code must be compiled, but it can be compiled in either Linux or through Cygwin in Windows.
dtSearch
dtSearch leads the market of mid-range cost indexing systems and can be found at www.dtsearch.com. dtSearch has several configurations of its indexing system, including just the dtSearch engine for implementation into other products. dtSearch has support for the most popular data formats such as PST files, Office documents, and zip files and will create a full-text index of the data.
dtSearch allows you to search your indexes via a GUI. You pick the index you would like to search and enter in keywords, and then dtSearch will generate a list of files that match your query. Selecting a file will bring up a preview of it with the search strings highlighted within. dtSearch does not have a command-line interface and is available for Windows only. Another product, called the dtSearch engine, does support Linux as well as Windows.
AccessData’s Forensic Toolkit
AccessData’s Forensic Toolkit (FTK), found at www.accessdata.com, makes use of the dtSearch indexing engine. In addition to the standard file types supported by dtSearch, FTK offers internal conversions. FTK also allows you to index whole images: just feed FTK an image from Guidance Software’s EnCase, ASR Data’s SMART, or a dd image, and it will build a full-text index of all of the files and the unallocated space. When you are dealing with a large case, this can be a very useful feature that will quickly pay back its cost.
FTK allows you to search indexes through its GUI. You pick the index you want to search, enter in keywords, and FTK will generate a list of files that match your query. Selecting a file will bring up a preview of it with the search strings highlighted within them. FTK does not have a command-line interface and is available for Windows only.
Paraben’s Text Searcher
Paraben’s Text Searcher, found at www.paraben-forensics.com, also makes use of the dtSearch indexing engine. Text Searcher allows you to search your indexes through its GUI. You pick the index you would like to search, enter keywords, and Text Searcher will generate a list of files that match your query. Selecting a file will open a preview of it with the search strings highlighted within. Text Searcher does not have a command-line interface and is available for Windows only.
Verity
Verity, found at www.verity.com, is the 500-pound gorilla of indexers. Verity’s product line of engines and enterprise-ready indexing systems do not come cheap, but they can handle the largest and most complex situations. Verity’s product line is not a simple desktop-driven application that can be installed in an hour. Rather, the company’s server systems require configuration and customization to create the results you desire. You may find few situations that demand a system as intensive as Verity, but in the event that you do, it is well worth the cost.
EnCase
EnCase, discussed throughout this book, has introduced full-text indexing of images as of version 6.
MAIL SERVERS
When dealing with the types of data and systems we describe in this chapter, it is only a matter of time before you have to deal with mail servers. Mail clients have data files designed to be accessed and have open interfaces to them, such as PST and NSF, with well-documented APIs and tools for using them. Mail servers, on the other hand, are designed to be accessed only by their own systems. Microsoft Exchange, Lotus Domino Mail Server (with Lotus Notes), Novell GroupWise, Netscape iPlanet, and others all contain proprietary methods for storing and accessing e-mail stored on the mail server.
Microsoft Exchange
Exchange servers keep their e-mail data in a file called priv.edb. The .edb, or exchange database format, is a Microsoft database with no known published structure. If you have the time, you can access the Microsoft Developers Network documents, located at http://msdn.microsoft.com, and try to reverse-engineer a solution yourself, but chances are you do not have that kind of time. Being able to search the .edb directly allows you to pull relevant e-mails from the current system and any backup of the system that may relate to your investigation.
Ontrack PowerControls
Ontrack PowerControls, available at www.ontrack.com/powercontrols/, is an excellent tool for accessing, searching, and extracting e-mails from an .edb. Not only will PowerControls allow you to access an .edb on the disk, but a licensed version of the software will also allow you to extract .edb files directly from tapes written by Veritas NetBackup, Veritas Backup Exec, Computer Associates BrightStor ARCserve, Legato NetWorker, IBM Tivoli, and NT Backup, the free backup utility that comes with Windows Server since Windows NT. PowerControls does the job well, while it is a bit more expensive than the other options. PowerControls has two limitations in the enterprise environment: the lack of automation capabilities and the inability to control tape robots while using the extract .edb tape tool in the extraction wizards.
Paraben’s Network E-mail Examiner
On the commercial but lower cost end is Paraben’s Network E-mail Examiner, or NEMX, available at www.paraben-forensics.com. NEMX will allow you to access, search, and extract messages from an .edb. However, it does not give you the tape-restoration abilities that are found in PowerControls. NEMX, like PowerControls, does not provide for any automation ability.
Recovery Manager for Exchange
Quest Software’s Recovery Manager for Exchange, at www.quest.com/recovery-manager-for-exchange/, offers functionality that other tools in the market do not offer. While Recovery Manager for Exchange (RME) allows you to access, search, and extract messages from an .edb, it also allows the software to act as a virtual Exchange server. Why is this useful? If you are restoring Exchange servers from a set of tapes, which is typically the case when you are asked to do this type of work, you can point the backup software at the system with RME installed and running in emulation mode. In emulation mode, RME will take the restored data as if it were an Exchange server and write it to an .edb locally, where you can extract the messages. If you are dealing with a software package that will not allow you to restore an Exchange agent–based backup without restoring it to an Exchange server, the emulation mode is immensely useful. In addition, you can interoperate with the native backup software and take advantage of its abilities to use tape robots in automating the restoration using emulation capabilities.
Microsoft Backup
If you are looking to make a backup of an exchange server and collect its data, Microsoft Backup is able to create a .bkf file that contains the running Exchange server’s .edb. You can then extract it out as an .edb without Exchange and use it on the tools we mention here to access it.
Microsoft Exchange Server
The final option you have is to install a new Microsoft Exchange Server, at www.microsoft. com/exchange/default.mspx, on a Windows server system. (MS Exchange will not install on Windows XP.) You can then re-create the configuration to emulate the name and domain of the original server. In doing so, you can either restore messages to the Exchange system using the back
up software or you can have it access a restored .edb by loading the Exchange Server in what is called recovery mode. Recovery mode allows the Exchange Server to start up and access the .edb as it would normally, allowing you access to the messages within it. Exchange does not provide the ability to search across the text of mail in mailboxes, so you will have to export out each user’s mailbox into a PST and search it afterward to determine whether it contains the evidence you’re looking for. You can, however, search by sender, recipient, and date. This is by far the least preferred solution, and unless you already have licenses for Exchange, this is also likely the most expensive option.
Lotus Domino Mail Server and Lotus Notes
IBM’s Lotus Notes mail client has a corresponding mail server called Domino. Lotus Notes client and server both store their data in Notes Storage Facility (NSF) files. Lotus supports real encryption—the Lotus server and client use public key encryption algorithms that cannot be easily broken. Thankfully for us, the option to use encryption is not the default configuration. If you encounter an encrypted NSF, you should inform your client that the encrypted data might not be recoverable.
Hacking Exposed Page 23