Commonly referred to as a "digital fingerprint," a hash value is a special encryption code that is associated with each computer file. Hash values provide digital files with a unique identifier that corresponds to its contents. If the contents change, the file's hashtag will change as well, indicating that the file is not the same as it was before. In e-discovery, one can compare hash values before and after collection to verify that a file is the same before and after collection.
To understand how MD5 hashing relates to e-discovery one must first know what a computer hash is. A computer hash is an encryption algorithm that takes the various bits of a file and outputs a unique text string.
Many hash algorithms have been created over the years, but the most commonly applied algorithm in use today for e-discovery is the MD5 (“MD" being short for message-digest). An MD5 hash tag might look something like:
A558c8b8295854fa69a2ad9a7cc75ab7
While the above sequence might look like a random assortment of letters and numbers, it is in fact a revealing digital code, a unique alphanumeric value representing the contents of a single computer file. If one character is modified or deleted from the data contained in a file, its MD5 hash code will be completely different than the original MD5 hash code. If a file is defensibly collected and processed, its hash code will not change--even if the file name has been modified.
Why do hash codes matter for e-discovery?
- Data Integrity: Assigning MD5 algorithms can help ensure that any changes to a document result in the generation of unique hash codes, thus exposing any attempts to manipulate potentially relevant evidence.
- De-Duplication: Accurate MD5 hashing in the collection and processing phases of e-discovery allows duplicate and system files to be accurately identified and removed, which in turn lowers e-discovery costs by reducing data volumes in advance of attorney review.
How does FRE 902 account for hash codes?
The Federal Rules of Evidence were amended to recognize this practical (and cost-effective) means of validating defensible e-discovery collection. FRE 902 governs types of evidence that are self-authenticating. It covers government documents, certified public or business records, and newspapers, relieving attorneys of the need to authenticate these types of documents in court with expert testimony.
Two categories of electronically stored information qualify as self-authenticating as well. In the past, attorneys needed to call qualified witnesses to authenticate ESI, but provisions (13) and (14) make it easier for litigators to authenticate ESI using hashcodes like MD5.
(13) Certified Records Generated by an Electronic Process or System. A record generated by an electronic process or system that produces an accurate result, as shown by a certification of a qualified person that complies with the certification requirements of Rule 902(11) or (12). The proponent must also meet the notice requirements of Rule 902(11).
(14) Certified Data Copied from an Electronic Device, Storage Medium, or File. Data copied from an electronic device, storage medium, or file, if authenticated by a process of digital FEDERAL RULES OF EVIDENCE 3 identification, as shown by a certification of a qualified person that complies with the certification requirements of Rule 902(11) or (12). The proponent also must meet the notice requirements of Rule 902(11).
The Courts' adoption of practical standards of verification reflects their increasing commitment, spelled out in the 2015 Amendments to the FRCP, to achieving "just, speedy, and inexpensive determinations" of civil matters, and bode well for e-discovery and IT professionals looking for common sense ways to demonstrate their e-discovery collection processes are defensible without relying on expensive expert witnesses.
If you want to understand how digital forensics can play an even more important role for your organization, check out our recent report co-sponsored by EDRM, Internal Investigations Benchmarking Report.