Plaintiffs and Courts are Increasingly Adopting Predictive Coding Because of its Reliability: Should your Company Consider it for your Litigation?

We’ve been hearing a lot about predictive coding from the Sedona Conference, the Da Silva Moore case, and various articles from predictive coding vendors discussing the benefits of predictive coding and how it works. So is predictive coding a win-win for everyone? The Plaintiffs’ Bar certainly wants to utilize predictive coding in order to increase the number of responsive documents that keyword and Boolean searches oftentimes do not reveal; corporate counsel is attracted to cost-saving measures of predictive coding that result from decreasing the number of non-responsive documents that have to be manually reviewed; and litigation support companies undoubtedly have a financial interest in having both sides utilize their variety of predictive coding technologies and services. One can see why plaintiffs and investigators would favor the use of predictive coding’s purported increased accuracy of responsive documents for complex litigation or white-collar investigations.

Predictive coding is more relevant now than ever before due to the incredible amount of electronically stored information that is generated on a daily basis. As the technological capabilities of companies increase exponentially, so too does the amount of data that those companies create and have to store and maintain. In an attempt to help you grasp the amount of data generated by a large business, in 2012, Walmart collected more than 2.5 petabytes of data every hour, which is equivalent to about 20-million filing cabinets’ worth of text.1 In fact, 90% of the data in the world today has been created in the last two years alone.2 Although no clear guidelines have been set regarding the volume of documents necessary for implementation of predictive coding, some experts suggest that predictive coding becomes effective and cost-efficient in matters when there are approximately 75,0003 to 100,000 documents or more.4

One reason for the slow predictive coding adoption rate is that the parties involved are often fearful of the unknown and unwilling to experiment with new “black box” technology in a litigation of the caliber that requires and justifies using predictive coding. The various companies with predictive coding software have been trying hard to dispel fears that predictive coding eliminates human-attorney review. Rest assured, manual review is still required to check for privilege and confidentiality before the responsive documents are handed over to the other side. Even with this manual safety net in place, it is critical to have a robust clawback agreement going into any litigation.

Predictive coding claims to increase the accuracy of relevant documents through the use of sophisticated algorithms. Essentially, a human reviewer, generally an attorney who is intimately familiar with the case, “trains” the computer to find relevant documents by assigning each document in the sample set a score that will allow the computer to weight the responsiveness of documents more accurately. As new keywords are revealed throughout the discovery process, users will need to ensure that their predictive coding software is capable of adapting and integrating changes to keywords on an ongoing basis.5 Some predictive coding experts, and even a Virginia court,6 have determined that predictive coding will find 75% of relevant and responsive documents, whereas keyword searches yield only 20%, and linear human review around 60%.7

Acceptance of predictive coding has been slow. Until 2012, no court had validated the use of predictive coding to coordinate e-discovery. In the unprecedented case of Da Silva Moore v. Publicis Groupe, predictive coding, or computer-assisted review, was judicially approved by e-discovery pioneer Judge Andrew Peck for use in appropriate cases to search for relevant electronically stored information (ESI).8 In his opinion, Judge Peck promoted the use of predictive coding in matters “where it will help secure the just, speedy, and inexpensive determination of cases in our e-discovery world.”9 From Da Silva Moore we can extract the criteria used to determine whether a case is appropriate for predictive coding, including: “(1) the parties’ agreement, (2) the vast amount of ESI to be reviewed (over three million documents), (3) the superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches), (4) the need for cost-effectiveness and proportionality under Rule 26(b)(2)(C), and (5) the transparent process proposed by [the parties].”10

Judge Peck explained that keyword searches “are not overly useful,” but that “[keywords] along with predictive coding and other methodology, can be very instructive.”11 One issue that remains unclear is the level of transparency that will apply to the seed sets, or “training sets,” of documents that are marked as responsive or non-responsive. Although the documents would not be binding, Judge Peck suggested that defendants would have to disclose their seed set, “including the seed documents marked as nonresponsive to the plaintiff ’s counsel” so that plaintiffs can say, “Well, of course you are not getting any [relevant] documents—you’re not appropriately training the computer.”12

Some data suggests that keyword searches by themselves are often ineffective and over-inclusive as they find large numbers of responsive, yet irrelevant documents (false positives), which then become very expensive to review manually.13 However, keyword searches still have a place in the discovery process as parties use keyword searches with connectors “to find documents for the expanded seed set to train the predictive coding software.”14 Likewise, keyword searches are being used to cull the initial universe of documents so that predictive coding can be applied to a more manageable pool of documents.

Judge Peck, endorsing predictive coding in appropriate cases, further opined that “what the Bar should take away from this Opinion is that computer-assisted review is an available tool and should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review.”15

So what is the predictive coding buzz all about? While both parties are encouraged to work together throughout the discovery process, the courts have not yet decided whether both parties are required to disclose the sample set documents or the responsiveness scoring of the documents in the sample set that the key reviewers applied. The issue of whether parties need to disclose the seed set documents used to train the predictive coding programs was not addressed in Da Silva Moore because the defendants volunteered this information.16 For many years, parties have come together to coordinate various methodologies for keyword searches, but questions of transparency might provide an inside look into how the producing party grades, scores, or assigns weight to the responsiveness of documents in the training set. Recently, in the case of Gordon v. Kaleida Health, plaintiffs relied on Da Silva Moore in order to obtain defendants’ seed set of documents used to “train the computer.”17 Defendants opposed the request, arguing that “ESI production is within the sound discretion of the producing party.”18 We are still left in a holding pattern regarding the “seed set” issue as the courts have not had to intervene due to the parties’ working out the issues (defendants in Da Silva Moore volunteered to provide the seed set, and defendants in Gordon agreed to meet with and confer with plaintiff ’s experts).

Plaintiffs are recognizing the enhanced reliability of predictive coding technology in complex litigation. In the recent multidistrict litigation of In re Biomet M2a Magnum Hip Implant Prods. Liab. Litig,19 the court considered whether defendants could filter ESI through keyword searches and then apply predictive coding to the residual data.20 Biomet disregarded plaintiffs’ request not to begin the discovery process and used keyword searches to cull the universe of documents and attachments from 19.5‑million down to 2.5-million documents and attachments before then applying predictive coding to the reduced document pool.21 Plaintiffs argued that keyword searches “tainted” the discovery process and therefore required examination of all formerly discarded material.22 In essence, plaintiffs asserted that the most reliable method for full and accurate disclosure turned on the “find more like this” predictive coding measures used to train the program.23 The court held, without fully endorsing predictive coding as in the Da Silva Moore case, that Biomet fully complied with Federal Rules of Civil Procedure 26(b) and 34(b)(2), and that reexamining all collected documents would be overly burdensome, and therefore, plaintiffs would bear any costs associated with retesting the documents using only predictive coding on the entire pool of documents.24 Further, the court stated that cooperation between parties does not require “counsel from both sides to sit in adjoining seats while rummaging through millions of files that haven’t been reviewed for confidentiality or privilege.”25 Similarly, in Kleen Prods., LLC v. Packaging Corp. of Am.,26 plaintiffs initially demanded utilization of predictive coding technology, but after extensive negotiations between the parties, plaintiffs consented to standard Boolean searches.27

While the use of predictive coding is growing more popular in the courts since the Da Silva Moore decision, courts are also willing to consider the cost-benefit analysis pertaining to the volume of documents to be reviewed. Previously in EORHB, Inc. v. HOA Holdings, LLC, the Delaware court required parties to show cause as to why they should not use a single vendor to conduct document review with predictive coding.28 However, the court recently retracted its position and entered an order that no longer required plaintiffs to utilize predictive coding due to the “low volume of relevant documents.”29

Predictive coding is not going away any time soon, particularly because plaintiffs are following the charge with early adopter Judge Andrew Peck leading the way. It will be interesting to see how the courts handle various issues including transparency issues regarding seed sets. Companies facing similar circumstances in discovery should consider using predictive coding in matters involving voluminous amounts of documents (think millions). Doing so will help reduce the cost of manual document review by increasing the accuracy of relevant documents that need to be reviewed by an attorney. A cost-benefit analysis is further recommended since predictive coding vendors often charge a premium for their services.


[1] McAfee, Andrew, and Erik Brynjolfsson, “Big Data: The Management Revolution,” Harvard Business Review, October 2012. Available at <http://hbr.org/2012/10/big-data-themanagement-revolution>. Last accessed July 7, 2013.

[2] IBM, “Big Data at the Speed of Business.” Available at <http://www-01.ibm.com/software/data/bigdata/>. Last accessed July 7, 2013.

[3] Sohn, Edward, “Predictive Coding Today: Before You Jump In, What Should You Consider?” The Metropolitan Corporate Counsel, May 24, 2013. Available at <http://www.metrocorpcounsel.com/articles/23960/predictive-coding-today-youjump-what-should-you-consider>. Last accessed July 7, 2013.

[4] Looby, Joe, “E-Discovery Steps Outside Of The Black Box,” The Metropolitan Corporate Counsel, Nov. 20, 2012. Available at <http://www.metrocorpcounsel.com/articles/21330/e-discovery-steps-outside-black-box>. Last accessed July 7, 2013.

[5] Sohn, Edward, “Predictive Coding Today: Before You Jump In, What Should You Consider?” The Metropolitan Corporate Counsel, May 24, 2013. Available at <http://www.metrocorpcounsel.com/articles/23960/predictive-coding-today-youjump-what-should-you-consider>. Last accessed July 7, 2013.

[6] Global Aerospace v. Landow Aviation, No. CL 61040 (Va. Cir. Ct., Loudon County, Apr. 23, 2012).

[7] Looby, Joseph H., “E-Discovery – Taking Predictive Coding Out of the Black Box,” FTI Journal, Nov. 2012. Available at <http://www.fticonsulting.com/global2/critical-thinking/ftijournal/predictive-coding.aspx>. Last accessed July 7, 2013.

[8] Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012).

[9] Id. at 183.

[10] Id. at 192.

[11] Id. at 185.

[12] Id.

[13] Id. at 190.

[14] Id.

[15] Id. at 193.

[16] Id.

[17] No. 08-CV-378S(F), 2013 WL 2250579 (W.D.N.Y. May 21, 2013).

[18] Id. at 2.

[19] No. 3:12-MD-2391, 2013 WL 1729682 (N.D. Ind. Apr. 18, 2013) at 1.

[20] Id. at 1.

[21] Id. at 2.

[22] Id.

[23] Id.

[24] Id. at 3.

[25] Id. at 5.

[26] No. 10-C-5711, 2012 U.S. Dist. LEXIS 139632 (N.D. Ill. Sept. 28, 2012) at 6.

[27] Id. at 19-20.

[28] No. 7409-VCL, 2013 WL 1960621 (Del. Ch. May 6, 2013).

[29] Id.

Finis

Citations

  1. McAfee, Andrew, and Erik Brynjolfsson, “Big Data: The Management Revolution,” Harvard Business Review, October 2012. Available at <http://hbr.org/2012/10/big-data-themanagement-revolution>. Last accessed July 7, 2013. Jump back to footnote 1 in the text
  2. IBM, “Big Data at the Speed of Business.” Available at <http://www-01.ibm.com/software/data/bigdata/>. Last accessed July 7, 2013. Jump back to footnote 2 in the text
  3. Sohn, Edward, “Predictive Coding Today: Before You Jump In, What Should You Consider?” The Metropolitan Corporate Counsel, May 24, 2013. Available at <http://www.metrocorpcounsel.com/articles/23960/predictive-coding-today-youjump-what-should-you-consider>. Last accessed July 7, 2013. Jump back to footnote 3 in the text
  4. Looby, Joe, “E-Discovery Steps Outside Of The Black Box,” The Metropolitan Corporate Counsel, Nov. 20, 2012. Available at <http://www.metrocorpcounsel.com/articles/21330/e-discovery-steps-outside-black-box>. Last accessed July 7, 2013. Jump back to footnote 4 in the text
  5. Sohn, Edward, “Predictive Coding Today: Before You Jump In, What Should You Consider?” The Metropolitan Corporate Counsel, May 24, 2013. Available at <http://www.metrocorpcounsel.com/articles/23960/predictive-coding-today-youjump-what-should-you-consider>. Last accessed July 7, 2013. Jump back to footnote 5 in the text
  6. Global Aerospace v. Landow Aviation, No. CL 61040 (Va. Cir. Ct., Loudon County, Apr. 23, 2012). Jump back to footnote 6 in the text
  7. Looby, Joseph H., “E-Discovery – Taking Predictive Coding Out of the Black Box,” FTI Journal, Nov. 2012. Available at <http://www.fticonsulting.com/global2/critical-thinking/ftijournal/predictive-coding.aspx>. Last accessed July 7, 2013. Jump back to footnote 7 in the text
  8. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012). Jump back to footnote 8 in the text
  9. Id. at 183. Jump back to footnote 9 in the text
  10. Id. at 192. Jump back to footnote 10 in the text
  11. Id. at 185. Jump back to footnote 11 in the text
  12. Id. Jump back to footnote 12 in the text
  13. Id. at 190. Jump back to footnote 13 in the text
  14. Id. Jump back to footnote 14 in the text
  15. Id. at 193. Jump back to footnote 15 in the text
  16. Id. Jump back to footnote 16 in the text
  17. No. 08-CV-378S(F), 2013 WL 2250579 (W.D.N.Y. May 21, 2013). Jump back to footnote 17 in the text
  18. Id. at 2. Jump back to footnote 18 in the text
  19. No. 3:12-MD-2391, 2013 WL 1729682 (N.D. Ind. Apr. 18, 2013) at 1. Jump back to footnote 19 in the text
  20. Id. at 1. Jump back to footnote 20 in the text
  21. Id. at 2. Jump back to footnote 21 in the text
  22. Id. Jump back to footnote 22 in the text
  23. Id. Jump back to footnote 23 in the text
  24. Id. at 3. Jump back to footnote 24 in the text
  25. Id. at 5. Jump back to footnote 25 in the text
  26. No. 10-C-5711, 2012 U.S. Dist. LEXIS 139632 (N.D. Ill. Sept. 28, 2012) at 6. Jump back to footnote 26 in the text
  27. Id. at 19-20. Jump back to footnote 27 in the text
  28. No. 7409-VCL, 2013 WL 1960621 (Del. Ch. May 6, 2013). Jump back to footnote 28 in the text
  29. Id. Jump back to footnote 29 in the text