It is not possible to talk about eDiscovery or document review heading in to 2021 without mention of technology-assisted review. In its broadest use as a technical term, TAR can refer to virtually any manner of technical assistance. In its narrower use, TAR refers to techniques that involve the use of technology to predict the decision a human expert would make about a document. In this narrower sense, TAR often comes with a version number – TAR 1.0, TAR 2.0 and, more recently, TAR 3.0.
While some are inclined to advocate for the superiority of a single approach, in fact, each version has its merits and place. Understanding the underlying process and technology is necessary in selecting the right approach for a specific discovery need. It is also important to recognize some of the variables to consider when choosing the right TAR workflow for a specific matter.
The best solution to any discovery challenge can usually be identified by considering several factors, as well as the interaction of those factors with the nature of the set of documents that needs to be searched. Some of the separate, yet interacting considerations include:
Time: Time ought to be considered both in terms of how long it will take to achieve key milestones – starting review, understanding the contents of a document collection and ultimately production – as well as in terms of how long it will take subject matter experts to train a system, when that is required.
Cost: Setting aside the hard costs associated with in-house staff as well as attorneys, vendors and document reviewers, it is necessary to consider the opportunity cost involved with diverting subject matter experts away from other tasks to train a predictive model. Additionally, it is worth considering whether the approach selected allows the early estimation of the number of documents that will need to be reviewed, as well as the number of responsive documents expected to be found, which helps the team plan its review as efficiently and cost-effectively as possible.
Knowledge about the Matter: The degree to which facts of a matter are known prior to document review can impact the ability to train a model, where that is required. Additionally, prior knowledge may impact how quickly a team needs to have access to “the right documents” to inform both tactical and strategic decisions. Knowledge of the case or about the information contained in the document collection may impact a team’s tolerance for finding surprises in the data relatively late in the review process.
Standards for Quality: As TAR becomes more prevalent in compliance and discovery arenas, so does the determination of targets for both precision and recall with respect to satisfaction of discovery obligations. Where minimum thresholds for acceptable quality are known, they can influence the selection of both technology and workflow.
Facts about the Document Collection: Some of the important factors to consider about the document collection itself include its completeness – that is, is all of the data that needs to be evaluated available or is a TAR solution expected to accommodate the rolling ingestion of new data? Additionally, the richness, or prevalence of responsive material in a document population, can influence the performance of different technologies and workflows and greatly impact time-to-completion.
The TAR Landscape
Having evaluated the case and document set that needs to be reviewed, along with other variables outlined above, teams can make an informed decision about which TAR solution is the best fit.
Predictive coding, or TAR 1.0, leverages examples of both relevant and nonrelevant training documents – the training set – to prime a system to classify documents. Typically, the training set is coded by a subject matter expert so that the system can replicate an expert’s knowledge. A hallmark of TAR 1.0 solutions is that the training is a finite process that precedes the scoring or coding of all documents. The predictive model and scores associated with it are frozen once training is complete, so changes to either the SME’s understanding of relevance or the set of documents needing to be evaluated will require building a new model.
An advantage TAR 1.0 solutions have over traditional linear review is that responsive documents are front-loaded during the review process, providing important information to teams as quickly as possible.
In the second generation of technology-assisted review solutions, TAR 2.0, the underlying technique of continuous active learning (CAL) was specifically adopted to improve upon the challenges that one-time training presented for TAR 1.0. Continuous learning reflects that the predictive model updates throughout the review based on all the coding decisions that humans make, and active refers to the system using the updated model to promote the documents with the highest probability of being responsive to the top of the review queue.
TAR 2.0 solutions allow review to begin immediately, without prior training of a model. Additionally, while it may be preferable to have SMEs involved in the early review, this is not a strict requirement, as the model will eventually smooth over inconsistent decisions. The low upfront training investment in TAR 2.0 is considered an advantage over the TAR 1.0 process, especially when this decreases the burden and opportunity cost of having SMEs code documents as part of initial training.
Now that TAR 2.0 is well-established, innovators are considering what the next generation of TAR might be. Some believe the best evolution of TAR is to put advanced tools in the hands of everyone, but TAR 3.0 might better be viewed as the opportunity to combine the advantages of continuous active learning with techniques that minimize the risk of two kinds of surprises: in content and in cost.
- To minimize surprises in content, TAR 3.0 solutions should be designed to give the system access to a diverse population of documents early in the process. Minimizing surprises only comes with robust knowledge of the document population, and this can be achieved through rigorous sampling and validation.
- To minimize surprises in cost, TAR 3.0 solutions should incorporate methods for determining overall richness to support principled review cutoff and to allow teams to predict the overall volume to enable staffing efficiencies.
The hallmark of TAR 3.0 solutions should be the enrichment of CAL through the application of statistically sound methods of providing early access to the full range of documents, including examples of responsive documents.
Over the past few decades, the ability to create and store documents digitally has resulted in an explosion of discoverable data, and that has yielded an array of innovative tools for collecting, producing and reviewing that data quickly and efficiently. Ever-changing technology touches every stage of the discovery life cycle today, and nowhere is that clearer than with document review.
Any conversation about discovery or document review today includes technology-assisted review. As such, it’s important to understand the variables and the different solutions available as part of each version of TAR when considering which approach is optimal for each unique matter.