August 16, 2017

August 16, 2017

Subscribe to Latest Legal News and Analysis

August 15, 2017

Subscribe to Latest Legal News and Analysis

August 14, 2017

Subscribe to Latest Legal News and Analysis

Court Permits Combination of Predictive Coding and Keyword Search

Focusing on precision rather than recall, district court finds that process complies with discovery obligations.

On April 18, the U.S. District Court for the Northern District of Indiana issued a discovery order in In re Biomet M2a Magnum Hip Implant Products Liability Litigation,[1] finding that defendant Biomet's discovery process, which included the combined use of keyword search and predictive coding, fulfilled its discovery obligations. However, the court accepted Biomet's reliance on precision measurements, rather thanrecall measurements, leading to a potentially substantial underestimation of what proportion of relevant documents Biomet produced.


In response to the plaintiffs' discovery demands, Biomet collected 6 terabytes of data and filtered the resulting 19.5 million documents with keyword searches to identify approximately 3 million documents for review.[2] They performed a predictive coding review on these 3 million records to identify documents for production, but the plaintiffs objected to Biomet's approach, arguing that Biomet should have applied predictive coding to all 19.5 million documents and should be required to do so to find any remaining relevant documents. The plaintiffs alleged that the use of keywords before applying predictive coding polluted the results of the process. The plaintiffs also argued that Biomet should have allowed the plaintiffs to participate in a joint review of the documents used to train the predictive coding software. Biomet did offer the plaintiffs the opportunity to propose additional keyword searches and invited the plaintiffs to review samples of the output of the predictive coding system.

Court's Opinion and Biomet's Statistical Claim

The court rejected the plaintiffs' arguments, focusing its analysis on whether Biomet had satisfied its obligations under Federal Rules of Civil Procedure 26(b) and 34(b)(2) and the Seventh Circuit Principles Relating to the Discovery of Electronically Stored Information. The court found nothing in the duty of cooperation that requires the parties to jointly review data. It also deflected the plaintiffs' argument that limiting the document population with keywords prior to applying predictive coding necessarily diluted the value of the latter process. The court also focused on the cost of the review of all 19.5 million documents proposed by the plaintiffs, finding that the costs were not proportional to the "comparatively modest" increase in the relevant documents that would be found, as based on the statistical testing performed by Biomet.[3]

Biomet's brief in support of its process was the source of the statistical claim that only 0.94% of documents not hit by its keyword searches were relevant. Its expert characterized this as a "very low number of potentially responsive documents" missed compared with the 16% relevance of the keyword search results, which the court echoed in its order. While, the 0.94% figure is comparatively small when measured against the 16% relevance of the keyword search results, it represents a much larger number of actual documents that the percentages seem to indicate. Biomet's measurement showing 0.94% relevance equates to approximately 86,000–210,000 missed responsive documents. Compared with the approximately 180,000–230,000 relevant documents the keywords did retrieve, the keyword searches potentially excluded more responsive documents than they retrieved.


Courts continue to issue orders and opinions allowing (and occasionally requiring) the use of predictive coding as a means of reducing the cost of discovery. The court in Biomet accepted the notion that predictive coding is a reasonable method by which a party may meet its discovery obligations and that cost shifting can be an appropriate means of addressing proportionality concerns. It made clear that cooperation does not require complying with the requesting party's demand for a specific process, and it was also not convinced that keyword search and predictive coding cannot be used together, as the plaintiffs argued.

It is clear, however, that the court did not base its reasonableness assessment on a measure of the level of recall[4] of Biomet's process. Instead, it focused on comparative costs and Biomet's assertions that the keyword search results had a greater proportion of relevant documents than the documents that were not hit by the keyword searches. This focus on precision rather than recall led the court to approve Biomet's process, which may well have left behind more relevant documents than it found.

It is critical to remember that the standards for discovery are reasonableness and proportionality, not perfection. 100% recall of relevant documents is not required by courts' rules, but producing parties should not rely solely on the type of comparative precisionmeasurements that the court agreed with in Biomet. They should instead focus on achieving reasonable recall rates while defensibly managing costs and risks given the specifics of each case. Strategies to achieve this may include limiting the scope of collection, applying keyword searches, using predictive coding, and employing other methods depending on the matter.

[1]In re Biomet M2a Magnum Hip Implant Prods. Liab. Litig., No. 3:12-MD-2391 (N.D. Ind. Apr. 18, 2013) (order regarding discovery of ESI),available here.

[2]. Biomet also used de-duplication to reduce the number of documents for review.

[3]Biomet order, supra note 1, at 5.

[4]. Recall is the actual proportion of relevant documents retrieved out of a population of documents being searched. A related measure, precision, is the proportion of ultimately relevant documents within a set of documents retrieved by a given search.

Copyright © 2017 by Morgan, Lewis & Bockius LLP. All Rights Reserved.


About this Author

Stephanie Tess Blair, Morgan Lewis Law firm, eData Attorney

Stephanie A. "Tess" Blair is a partner in and leader of Morgan Lewis's eData Practice. As leader of the eData Practice, Ms. Blair works with Morgan Lewis attorneys and clients to develop and implement strategies and cutting-edge technologies for successfully managing complex litigation matters, with an emphasis on electronic discovery. As a nationally recognized thought leader in electronic discovery, Ms. Blair has developed industry-leading “best practices” designed to provide clients with state-of-the-art records and discovery management, knowledge sharing, and collaboration resources....

Graham B. Rollins, Business Attorney, Morgan Lewis Law Firm

Graham B. Rollins is an associate in Morgan Lewis's eData practice. eData is an innovative practice founded by Morgan Lewis to address the impact of growing volumes of electronic data on business and legal strategies. The practice works with in-house Legal, IT, and Records Management departments as well as outside technology vendors to apply best practices to help clients engage in effective pre-litigation risk management and defensible discovery response.

The eData Practice plays a large role in defending clients in corporate matters, including complex commercial litigation such as intellectual property litigation, products liability, mass torts, antitrust, M&A, regulatory, white collar, compliance, construction and insurance coverage.

Mr. Rollins focuses his practice on developing discovery management strategies for clients and firm attorneys, with an emphasis on electronic discovery. He provides advice on all phases of discovery, including pre-litigation discovery counseling regarding information management and records retention policies, and discovery response planning and execution for all phases of discovery.