Standards Need to Emerge for Collecting and Processing Electronically Stored Evidence (ESE)

Most litigators and their litigation support staff that have been practicing over the past 5-10 years could probably teach a class on the process of preservation, collection, processing, review and production of paper evidence. Or, at least they could stand at a whiteboard and draw a basic workflow diagram of the basic steps.

However, with the dramatic and accellerating increase in the amount of Electronically Stored Information (ESI) which I like to call Electronically Stored Evidence (ESE), the subsequent technical issues and the associated changes to the Federal Rules of Civil Procedures (FRCP), very few, if any of the same litigators and their staff, can now even describe the most basic workflow to to get ESE for a trial. Therefore, although many are talking of their importance (myself included), eDiscovery standards of any substance, are a long way off.

This is certainly not the fault of the lawyers as they have never been required to have much of true understanding of the technology of processing evidence in order to be successful litigators. However, the bar has now literally been raised and litigators can’t even provide adequate representation without an indepth understanding of these new issues.

Maybe we should consider requiring a license or some type of ceritification to practice law when eDiscovery is involved? Or, has ESE become so intertwined in our matters that there isn’t a case without eDiscovery and therfore every lawyer that want to litigate anything should have to be certified?

As a place to start this discussion / debate, we need to start identifying the basic components of ESE and how it is stored, how to preserve it, how to extract it (the new word for collection), how to process it, how to review it, and how to produce it.

Wouldn’t it great if 5 years from now, litigators could stand at a whiteboard and diagram and explain the basic “standard” components of the workflow for processing Electronically Stored Evidence (ESE)?

Eric P. Blank addresses these issues in an excellent article titled,”The Need for E-Discovery Standards: A Call From the Trenches“, posted on October 5, 2009 on the EDD Update Blog.

Eric P. Blank is the founder and managing attorney of Blank Law + Technology PS. His practice focuses on electronic discovery counseling, e-security response planning and implementation, investigations and computer forensics. Mr. Blank has conducted more than 300 investigations into computer and software-related torts and employee misconduct since 2001 and has frequently been a court-appointed special master or neutral in e-discovery matters.

The full text of Mr. Blank’s post is as follows:

Most discussion about standards in electronic discovery focuses on the big-picture issues of scope, cost and cost shifting.

These are important questions eloquently argued in the courts. However, they overlook the mundane, pick-and-shovel e-discovery concerns that affect every case. I’m talking about the elementary technical issues of preservation, extraction, processing, review and production.

I’m talking about extracting data from electronic storage media, processing the data and its metadata into a document review software application platform, supporting the review and producing the data as discovery or evidence.

Outside the e-discovery world, the first stage of this process is known as Extract, Transform, Load (ETL). Identifying and overcoming the challenges of ETL have occupied computer scientists for decades. Principal obstacles to effective ETL include widely diverse and poorly documented storage repositories, asynchronous multimedia platforms, constantly evolving software, hardware and software anomalies, and human error, usually with respect to initial planning.

E-discovery vendors on the ground face those obstacles and more. Consider, as just a few of many examples, the following:

Mobile phones and PDAs: In some models, data can be extracted through forensic imaging. In others, such as many of those without SIM cards, data can only be pulled through live file extraction. Click here and here to read my earlier blog posts about the difference between forensic imaging and live file extraction. In any case, the question is this: Should data extraction scope be defined by current technical capabilities, or should there be a single common standard – such as live files only – for those instances when mobile phones and PDAs are subject to e-discovery?

A multitude of file types: Extraction and processing applications address dozens, sometimes hundreds, of file types. These file types are usually associated with, and identified by, a particular file extension, such as .doc or .xls. However, custom extensions are easy to apply – documents I create might have a .epb file extension, for example – and it is also simple to apply a nonstandard extension to a particular file type (e.g., a .doc extension to a PDF file). These are often missed, or improperly processed, by extraction and processing software.

Computer forensics software in the hands of an experienced technician can reveal documents by file type without relying on extension format and such, but doing so is costly and time consuming. What checks should be done for mislabeled or unusual file extensions? When are such checks required?

Metadata: Most of us think of metadata in basic terms such as the putative author, creation date, modification date, last-access date and so forth. However, metadata varies widely across data types. Microsoft Office documents, for example, have more than 100 metadata fields. It is also possible to create custom fields with many document types. Nearly all of these, such as the ubiquitous P-size and L-size, are nearly never important in civil litigation.

“Nearly never” is not, however, the same as “never.” Such data can be extracted, but it is not, as a rule, supported by processing software, which renders it unavailable at the attorney review level. Is it possible to agree on which metadata fields should be preserved and processed? When they should be processed? Which fields are important forensically? When all fields should be preserved?

Rapid technological change: Software is updated all the time. This affects how metadata is produced and the appearance of electronic documents. Processing software hasn’t kept up. It’s also inconsistent. For example, the last-access date on a Word 2007 document running in Windows Vista is affected differently than an Office XP Word document running on Vista. Both documents, however, are processed the same, as if the metadata means the same, when it does not. How should inconsistencies like this be addressed? What should the typical approach be?

Webmail: Screenshots of Web-based email services such as Hotmail are a common and inexpensive workaround to downloading actual Hotmail files. Which method is preferred? Is either method not preferred? As third-party cloud data repositories multiply, what constitutes best practices with regard to extraction methods will become a critical question.

Capture rates: What percentage capture rate is acceptable for processing software? Many files are often not processed by even the best technology, and must be laboriously hand processed. In a million-item processing job, a 1 percent miss rate equals 10,000 documents not processed and available for review. Is 99 percent acceptable? Is 98 percent? Note: If you think that the processing rate for your document review software is 100 percent, you’re kidding yourself.

Searching: Keyword searching, including keyword searches supported by “fuzzy” search techniques, are giving way to conceptual searching, which is the future of document search and review. Conceptual searching, however, involves proprietary algorithms and processes with a wide range of accuracy. What standards must conceptual searching meet to be accepted? How are these standards applied? When, if ever, is conceptual searching disallowed?

File format: In e-discovery today, most documents are produced in .Tiff format. Putting aside the larger question of whether .Tiff should be the standard for producing electronic documents, what about documents such as spreadsheets that don’t translate well into .Tiff files? In what format should presentation-type documents be produced? As slide shows? As workbook copies with notes and presenters’ comments? How are native files to be tracked and authenticated as a best practice?

Today, e-discovery consultants decide many of these questions on their own or after consulting with litigation counsel. In essence, a consultant decides when it is and isn’t practical to extract files from a system, whether to image a particular hard drive and whether to put aside as unreadable a back-up tape from a set of tapes that must be searched.

Much of the time, the consultant makes the “right” decision, as subsequently decided by the court, the client or the opposing party. It’s a rare consultant, however, who won’t admit that adopting e-discovery standards would bring enormous benefit to the practical challenges of data extraction, processing and production.

I’ll be discussing these and other issues in the future. Any of the problems mentioned above could be an entire article. I look forward to working with the legal and technical community to address these “technical” standards – as opposed to the widely discussed “strategic” standards which may ultimately be addressed by changes in the Federal Rules of Civil Procedure.

About Charles Skamser
Charles Skamser is an internationally recognized technology sales, marketing and product management leader with over 25 years of experience in Information Governance, eDiscovery, Machine Learning, Computer Assisted Analytics, Cloud Computing, Big Data Analytics, IT Automation and ITOA. Charles is the founder and Senior Analyst for eDiscovery Solutions Group, a global provider of information management consulting, market intelligence and advisory services specializing in information governance, eDiscovery, Big Data analytics and cloud computing solutions. Previously, Charles served in various executive roles with disruptive technology start ups and well known industry technology providers. Charles is a prolific author and a regular speaker on the technology that the Global 2000 require to manage the accelerating increase in Electronically Stored Information (ESI). Charles holds a BA in Political Science and Economics from Macalester College.