In addition, in recent years, the growth of outsourcing of various business functions to overseas entities that can provide the necessary support at a much cheaper cost creates greater risk for the loss of customer data. Outsourced business functions typically involve more data intensive and often customer- or employee-related aspects of the business. The outsourcing of business functions involving sensitive customer information presents greater opportunities for the misappropriation and misuse of that information as the outsourcing company typically has limited control over the security and data handling procedures of the outsourced company.
What to Understand
When a theft of customer data is confirmed or even suspected, standard protocols are recommended for identifying, reviewing, and evaluating the data in question to determine whether any of the information has been put at risk, and if so, to determine how the information was removed and by whom. As with the potential theft of most IP in electronic format, what you look for will depend on the type and format of the stolen information and how that information was maintained and secured.
You must determine whether the information was held in both electronic and hardcopy formats, and whether the theft could have been perpetrated by the straight copying and removal of the relevant IP. If the suspected IP theft appears to have occurred electronically, you must be familiar with the location, or locations, where the information was held. Was the information stolen from the hard-drive of a laptop computer? A portable storage device? Through direct interaction with the entity’s internal network?
Next, you must get familiar with any security procedures that exist to protect the information, both to evaluate the relative risk of a perpetrator gaining access to the information and to narrow down the potential source of the theft. If the data’s network location had secure or limited access, find out who had access and whether the security procedures could have been, or in fact were, overridden to gain access to the customer data in question.
Historically, most IP theft is committed by insiders. However, the definition of an “insider” has significantly expanded in recent years as the practices of customer, supplier and vendor portals, business partnering, and outsourcing have grown substantially. In addition, while “insider” used to mean the employee down the hall and five cubicles over, in reality, today’s employees can include virtual as well as mobile employees who have access to all of the same corporate information and IP through dial-up connections, virtual private networks (VPNs), and web-based portals. Hence, the list of potential suspects and access points for the IP theft may be significantly greater than you might think.
What to Look For
The particular circumstances surrounding the customer data and suspected IP theft in question determines what you should look for in an investigation. Several aspects are common in the theft of customer data, however.
Access of Customer Data
Unlike other types of IP, customer data is usually stored in a central location. For most companies, this means some type of mainframe, database, or enterprise resource planning (ERP) system. These systems are great from a corporate standpoint because they centralize all the information in one place. They are also dangerous, because if access permissions are incorrectly applied, then someone who wants to access the information can reach it all in one place. These systems contain the crown jewels of the company’s clientele—customer information for competitive advantage or identities for the purpose of ID theft.
Determining When and How Customer Information Is Accessed
When facing the theft of customer data, you must identify where the data was stored. If you are a third party conducting the investigation, discuss the matter with the CIO/CTO and find out who is the custodian of the information system. Map out the system with the IT staff and learn how employees access and use the system.
Even if you are familiar with a company’s system, take the time to go through this process. Today’s data management systems are so complex that almost every installation has its own nuances. Once you know the type of system you are dealing with, the access parameters granted to each employee, and how those employees access the system, you can begin your analysis.
We have identified three main types of systems: relational databases, mainframe systems, and ERP systems such as SAP. Each system requires some general considerations on your part, and each warrants further analysis.
Relational Databases Relational databases such as Microsoft SQL Server, MySQL, and Microsoft Access are the default selections for organizations that want to store customer data but do not want or need to spend huge amounts of money on more complicated solutions. Your first task, if the database allows it and the administrator has turned on logging, is to check the access logs to determine whether they can reveal access information. Logs can provide a surefire sign that something inappropriate has happened. Unfortunately, it is rare that you will be lucky enough to find exactly what you need in one place. In addition, many databases are set up with a default configuration, which means that access logging does not necessarily occur.
The good news about relational databases is that rarely can employees access data and remove it from the system without leaving a trace of their action. If you are dealing with a large-scale database such as a MS SQL database, most employees use some kind of intermediary file to copy data off the system. The key is to find that dump file. For instance, a tech savvy employee may use an SQL manager to dump the tables into a .sql file. This will allow her to go back later and reconstruct the database on a different machine. More commonly, however, is the creation of .csv or .tab files. These files are comma- and tab-separated, common formats used in data processing.
Typically, you will identify the type of database, find the common ways information is exported from the database, and then look for those types of files. For instance, to identify comma-separated files, you would run a search on the machine to look for filenames that end in .csv. Oftentimes, files such as .sql or .xml files will have file signatures that you can use to narrow down potentially relevant files. Also, don’t forget to search by date range. If you know the timeframe when the data may have been copied from the computer, look for files created in that timeframe. The number of files created may be small enough that you can review each file to see what it contains.
Mainframe Systems Mainframe systems such as Tandem or DB2 are still in use all over the world. They have faithfully provided the information backbone for the largest companies and will continue to do so for a long time to come. Now for the bad news: Mainframes can be difficult systems to work with from an investigative standpoint. They are, at best, proprietary, arcane, and incompatible with about every modern forensic tool out there. If you are a mainframe forensics expert, you deserve congratulations, because your services will always be in demand. If you’re not an expert, you may benefit from a few pointers to help you get the information you need.
First, find the custodian of the mainframe to learn how reports are run. Mainframe systems typically use a traditional dumb terminal architecture. Usually a VT100-type terminal software such as HyperTerminal is used to access the information. While the terminal software is great for accessing the system in adverse network conditions, it is not so good for running reports, so most of these mainframe systems have some secondary mechanism for running reports. Some have proprietary tools that are used to connect with the mainframe and can download comma separated value (CSV) or tab-delimited files. Others allow you to execute commands that can place reports on predesignated shared drives.
The good news about mainframes is that the access model is generally very rigid, and the logging is extensive. In addition, the reporting facility is usually something that an end user cannot change. The user can’t, for instance, change the location of a saved report to hide it.
Once you understand exactly how the mainframe works and how it is accessed, you can use this information to find the relevant log entries and report files. Even if you can’t
find the report files—if they were deleted, for example—you can use the log entries to create a preliminary timeline and use that information to search the individual’s computer for activity. More often than not, you will find some secondary and tertiary data points you can use to fill out the picture of what happened more completely.
ERP Systems ERP systems are like a modern hybrid of the database and mainframe systems, with a dash of web server thrown in. These systems usually have a database backend that is not accessible to the user, so the user uses some type of web interface to access the system. This web interface uses the same kinds of input restrictions and access controls you’d see in a mainframe system, but with a few key differences. For instance, the reporting facilities on an ERP system are generally more flexible than those on a mainframe system. This offers convenience to the end user who may need to create many types of reports. This convenience, however, creates an issue for the investigator, because it means that he or she has no single place and report format to look for.
At the risk of sounding like a broken record, remember that if you need to investigate something involving one of these systems, talk to the custodian of the system first. Even if the ERP system is running on software you have used, some integration details can make the system totally different from what you have seen in the past. Typically, however, the reports generated by these systems are a bit more user-friendly than reports from a mainframe or relational database. This is good news for the investigator because it usually means that the filenames will be similar and you can use file header information as a search term to find a file on the hard drive.
In addition to accessing the server-side log files, you may be able to open a new avenue of audit, web browser history and cache, because these systems are web-based. Even if a suspect tries to cover his tracks by deleting a report after it has been copied, he may forget about the Internet history and cache. You can re-create these, as discussed in Chapter 9, to be the same web pages the user accessed. By using this data, you can often put together not only the data that was taken, but you can show the steps the individual used to access the data, showing that access was deliberate and not a result of an innocent mistake such as clicking the wrong button.
The Data Was Accessed. Now What?
After you’ve found the reports, correlated the logs, and reconstructed the dump files and internet cache, you may come to the unsettling conclusion that customer data was in fact accessed with the intent of copying it from the machine. If you do find these electronic reports, you’ll need to determine if and how data was removed from the computer. These reports are considered proprietary information, and once you identify the files that contain information that can be copied, you can use the same techniques discussed in this chapter to determine how it was transmitted and where it was copied.
One last note before leaving the topic of dump files and reports. These files are digital evidence, just like anything else. Make sure that you have forensic copies of the data and that proper chain of custody is initiated. If the reports are stored on a file share, for instance, use a tool that will preserve the metadata, such as Robocopy, to make a copy of the files. When it’s not possible to image or copy the data forensically, do the best you can to grab the records as they would be obtained in the normal course of business, and have the data custodian validate your process. If you have tape backups of the system, make sure you preserve the backups of the relevant timeframe until the matter has been resolved.
Technology
Businesses and individuals use technology to create competitive advantages relative to their competitors. Technology drives how we communicate (via e-mail, telecommunications, and so on); how we create, process, disseminate, and store information (in word processing documents, spreadsheets, graphical presentations, and so on); how we manage aspects of our businesses (such as accounting or inventory); and how efficiently we conduct business. Technology is also used to create distinct advantages in how we conduct business.
Through proprietary software code, computer programs, and specialized databases and spreadsheet applications, among other items, businesses use technology to create tailored and valuable tools that promote efficiency and enhance productivity; but these tools also create distinct competitive advantages, such as specialized software, pricing models, and so on. As with other forms of IP, technology often has significant value not only to the company, but to its competitors and others who may have similar uses for the technology in different applications.
What to Understand
When technology theft is suspected, as with most forms of IP theft, your first step as investigator is to understand in what form the technology exists. Unlike customer data and other types of IP that could be in hard-copy form, the technology in its electronic form has value to the company. In addition, whether source code, a web-based program, or a specialized computer application, the type of technology will dictate how it is maintained by the company, the locations of access points at which someone could have misappropriated the technology, and how it was misappropriated. For example, the theft of source code implies a level of sophistication that most employees within a typical company will typically not possess. In contrast, a specialized pricing model in spreadsheet format may be widely disseminated electronically throughout an organization, including multiple regions, cities, or countries to various sales agents and others.
Unlike some forms of IP, technology theft is typically in electronic form, which means someone had to gain access to the technology, copy it to a computer hard drive or external or removable storage device, and possibly transmit it via e-mail or the Web—all of which leaves potential forensic evidence.
What to Look For
Finding source code/program theft is a topic worthy of an entire book (and, in fact, multiple books already exist). If you write code, read on. If you do not, you may want to defer this issue to someone who has experience with large-scale software and knows how to handle change-management systems and tools. That being said, here are a couple places to start.
Copying Source Code
The simplest way source code can be stolen is by cut-and-paste. A function, a module, or an entire routine can be cut from one program and copied into another. If you have a piece of software with hundreds of thousands of lines of code, finding bits of code cut and pasted into another program can be a daunting task. Thankfully, some developer tools that have been around for decades can help you out.
Finding Cuts-and-Pastes
Your first task is a simple filename and hash comparison. It’s not uncommon for someone to copy code files verbatim and drop them into their own program. In fact, academic studies have found that a simple filename comparison can be used to find copied code in 60 percent of cases studied. This task is relatively simple: Create a hash set of the known source code and apply it to the code you want to check. If anything matches, you have found your source code. Then ensure that the matching files aren’t public domain or compiler files, and you’re set. If you don’t get any custom matches, however, you will need to move on to the next step.
Performing Content Searches on Source Code If the source code you are comparing is Java-based code, you are in luck. A tool called PMD is designed to find cut-and-pasted code in large-scale projects. You can download PMD at http://PMD.sourceforge.net. It is open source and free to use. Simply point it toward the two source trees and watch it go.
If the source code isn’t Java-based, the situation gets trickier. We typically use the UNIX utilities CMP and DIFF. CMP is a standard UNIX utility that compares two files and will tell you if they are reasonably the same (as opposed to a hash, which will tell you whether they are bit-for-bit identical or not). CMP works well for complete files that may have been only slightly modified.
If the source code is embedded in other files, and you can’t do a file-by-file analysis, then you have a tricky situation. Be prepared, because this will take time. The best way to review embedded source code i
s to use DIFF. While DIFF was written to determine differences in source code, you can also use it to review similarities. You can run DIFF in a myriad of ways in terms of what it looks for and what it outputs, but we generally suggest that you run it with defaults and then review the results for similarities.
If you are running Windows and want to access the UNIX utilities, we recommend installing Cygwin on your computer. It will install a bash shell that you can use in a command-line environment to run these applications.
Comparing Without Source Code
If you don’t have access to both source trees, your job gets quite a bit more difficult. First—and this sounds obvious—ask for the source code. If you can’t get it or a suspected code thief refuses to give it to your client, you can still prove helpful by providing your client with enough evidentiary information to compel the other side to give you the code. By using screenshots and showing how layouts and/or other information are similar, you can construct a compelling argument for why it is reasonable to believe that some source code similarity exists.
Hacking Exposed Page 35