Legal Analytics Tools Are Not Created Equal

Legal analytics tools are having a tremendous impact on nearly every facet of the business and practice of law. In today’s increasingly data-driven legal marketplace, analytics is making significant improvements in litigation outcomes, legal workflows, and business and legal decision-making. New analytics tools are helping legal professionals become more efficient, effective and competitive, and provide better service to their clients. While the benefits of legal analytics are clear, the capabilities of specific analytics solutions may not be.

To put it bluntly, just because you have an “analytics” solution in your toolbox does not guarantee the results will be helpful.

Legal analytics is all about identifying and comparing sets of cases that are similar to the one you are working on, finding out what worked (and what did not), and making decisions or predictions based on those outcomes. The challenge is, if you are comparing your case against an incomplete or the wrong set of cases, or cases that have missing or erroneous data, then the insights gleaned could actually hinder or harm your case.

In order to work effectively, legal analytics requires two things:

  • A large body of clean and accurate data
  • Practice-specific tags that enable users to get granular and hone-in on the most relevant cases

The importance of clean data

Having complete and accurate data is fundamental in order for legal analytics to meet the high expectations of the legal market. If the underlying data is flawed, then even the best A.I. technology or tagging protocols won’t be able to prevent misleading insights.

Most legal analytics tools rely heavily on PACER because of its large and comprehensive body of up-to-date federal litigation data comprised of millions of cases and legal documents. PACER continues to grow by around 2 million cases and tens of millions documents each year. However helpful PACER is, it is now 20 years old and not without its flaws.

For instance, misspellings and inaccurate attorney and law firm data in PACER docket case headers can cause analytics tools to overlook critical cases handled by that firm. In fact, about 45% of district court cases in PACER have missing, misspelled or wrong attorney information. In some district courts, that figure reaches nearly 60%. To cite just one example, there are more than 100 variations of Quinn Emanuel in PACER. If your analytics tool cannot recognize and reconcile these inconsistencies, you could end up making critical decisions based on incomplete information.

Historical information about attorneys in PACER is a particularly vexing problem. For instance, its inability to identify lawyers working on a case pro hac vice often prevents them from getting credit for their work, which can lead to misinformation about their professional expertise. Similarly, when attorneys change firms, PACER automatically attributes all their past cases to their new firm, which distorts the firm’s expertise. If you are using analytics to determine the strength of your opponents or develop case strategy, such inaccuracies can lead to misguided conclusions and devastating results.

Many analytics tools rely on PACER’s Nature of Suit (NOS) codes to classify cases and perform basic filtering. Unfortunately, NOS codes are not always applied correctly, which can distort analytics outcomes by either wrongly including or omitting cases from the results. The landmark copyright case Oracle America, Inc. v. Google Inc. is a good example. It is misclassified in PACER as a patent case and therefore would not show up in a search for copyright cases.

In addition, PACER does not have NOS codes for certain practice areas, such as commercial and trade secrets. As such, these cases are filed under a range of other NOS codes, with little consistency. That means, for example, an attorney seeking to identify all commercial cases for Judge Otero in Central District of California (C.D. Cal.) can only rely on a tool that uses other techniques to identify Commercial cases and doesn’t just rely on NOS codes.

Analytics tools that rely heavily on PACER data but do nothing to fix inaccuracies will produce misleading information that could result in misguided counsel and legal strategies, increased risk and exposure, and potentially adverse litigation outcomes.  At the very least, your analytics solution should have the ability to “read” case documents and accurately classify it based on its full content—not just case dockets.

Real Analytics Requires Depth

The next step towards achieving deeper, more valuable legal analytics is providing customization options that enable you to find and analyze cases like yours. Most tools provide some form of basic filtering that lets you focus on a particular time period, district, judge, law firm, case type and more. Where many analytics tools fail completely is in the lack of practice-specific case tags, which are not part of PACER metadata. This makes it extremely difficult or impossible for attorneys in those practice areas to find relevant cases or filter out irrelevant ones.

For instance, employment lawyers will find it important to find cases with, “Title VII Discrimination,” while commercial attorneys might be looking for cases involving “Unjust Enrichment.” Without these data tags, isolating these cases among tens of thousands becomes a much more manual process and prone to error. Conversely, if your analytics tool does not allow you to exclude “Hurricanes” for insurance cases or “Internet File-Sharing” for copyright cases, your analytics results could lead you to the wrong conclusions. For instance, PACER data shows that the Eastern District of Louisiana has seen more insurance cases than any other district (7,500+ cases, or 8%). However, once hurricane-related cases are removed, the district’s share plummets to 1% or 1,100 cases. Thus a motion to transfer to the seemingly insurance-friendly Eastern District of Louisiana (E.D.La.) might be disastrous.

The best machine learning algorithms won’t be able to comprehend such nuances on their own – they need guidance from lawyers to fill in the gaps and “teach” the A.I. Without practice-specific insights governing the creation of practice-specific filters and data tags, building a comparative analytics case set becomes much more challenging and time-consuming.

Similarly, skilled human intervention is also required to capture and corroborate critical decision-making elements such as case timing, damages, findings, remedies and other information that could be critical to your litigation strategy. This information is not included in the PACER headers and is therefore invisible to tools that do not dive deep into the documents.

With an increasing number of analytics product offerings available, it is easy for firms and in-house counsel to be misled by marketing hype. To find the best analytics solution, attorneys need to put each offering through its paces, looking not only at the technology but also at the underlying data and the way it is cleaned, structured and tagged. Promises of instant or complete industry coverage, or access to the world’s largest litigation database sound impressive but may not live up to the accuracy and depth that lawyers require.

Instead, litigators should develop detailed questions across a range of practices and run them through different analytics products before making a commitment. Otherwise, ‘mile-wide-inch-deep’ analytics results could result in skewed, misleading and wrongful conclusions that could increase risk and exposure for your firm, company or clients.

Check Also

NFTs And The Law: What Do I Actually Own?

A quick look into NFTs, and how they fit into a legal landscape that isn’t ready for them.