Moving from Science Fiction to Ediscovery Realtiy with Legal AI

By Cat Casey DISCO ediscovery and Legal Tech

The Techno Cat - Legal Technology, Blog, Ediscovery

Legal practitioners today have all faced that moment of existential dread when receiving the frantic call from a client facing a case that has the potential to bankrupt their organization, drown counsel in a volume of data hard to fathom, or damage the company’s reputation beyond repair. Whether your client is facing time-sensitive regulator scrutiny, dealing with hundreds (or thousands) of custodians in disparate jurisdictions, or realizing the volume of potentially relevant data is in the tens of thousands of GB, the sweaty-palm-inducing anxiety is often the same. Thankfully, whether it is a massive multinational investigation or a complex regulatory investigation, there are best practices and technologies for these "bet-the-farm" cases to mitigate risk and manage the massive data volumes involved.

What factors make a case complex?

While there are likely countless flavors of complex ediscovery matters, we are going to focus on ones that have the following key factors in this discussion: terabytes of data, short timelines, large number of custodians subject to GLBA privacy considerations, and a variety of typical and atypical data sources. Matters can range from complex government investigations like Foreign Corrupt Practices Act (FCPA), international investigations, or lightning-fast pre-merger HSR second requests; to multinational investigations spanning the globe; to multi-billion dollar litigations spanning a variety of social, mobile, and collaborative data sources.

While the matters will differ greatly as they progress, the first several stages (and ediscovery in particular) face similar challenges and can likewise benefit from similar best practices to mitigate risk and reduce time and money to insight.

Unique challenges of complex cases

Complex matters often have much higher visibility within an organization and potentially in the market at large. This increased visibility, combined with the increased ediscovery cost, size, and risk, causes in-house counsel and outside counsel alike to exert increased scrutiny on their provider selection and workflow or technology deployed to manage it.

Global data privacy concerns

Large and complex matters today often contain data from custodians and servers in a variety of geographic locations. The physical location of either the custodian or data may obligate both the service provider and the counsel to employ heightened levels of care and notice in handling potentially personally identifiable information (PII) due to regional data privacy regulations.

While in past, practitioners really only had to operate with the heightened level of care in dealing with the EU and a select few other nations, today data privacy regulations proliferate in all corners of the globe including the U.S. It is imperative to understand all the jurisdictions potentially relevant information resides in and what level of care is necessary to ensure compliance with one or multiple disparate standards.

Large pool of custodians

Large and complex cases often involve far more custodians than the 10-20 you might find on a run-of-the-mill case. When dealing with custodian counts in the hundreds or thousands, it is extremely important to prioritize key custodians, to understand where all potential data for each custodian resides, and where you are in the process of locating and/or preserving it.

Massive data volumes

Complex large data volume matters require a level of planning, budgeting, and prioritization that you may be able to get away with not employing on a more straightforward small data volume case. The margin for error when potentially terabytes of data are in scope is much smaller. Leveraging technology to help you do more with less is of paramount importance.

Variety of data types

A matter may qualify as complex not due to the number of custodians or data volume, but rather due to the variety of challenging and atypical data sources potentially in scope. Establishing an understanding early of the atypical methods employed within the business units or groups of custodians at issue and how the data sources are being used is helpful in prioritizing and determining if the additional spend to parse a certain collaboration or short form tool is justified and when.

High cost/high stakes

Complex matters often face the double-edged sword of high stakes and high cost. Whether driven up by data volumes and types, costs for complex matters often range into the billions while the amount at issue may range into the multi-billion levels. As such, tensions are often high and the need for transparent metrics and tracking as well as a clear playbook and operational protocol are extremely high. Planning early is key.

Tight timelines

A final factor that may make a matter fall into the category of complex is tight timelines. Often in dealing with a regulator or complex multi-party litigation there is little room for negotiation on timelines.As a result, there is pressure to extract insight quickly regardless of how daunting the data volume — meaning complex cases often benefit from employing advanced analytics and AI.

Best practices

Now that we have mapped out some of the variables that mark a case as complex, let’s unpack some of the best practices to manage heightened risks associated with them.

Scoping, preservation, and collection

The large data volumes, variety of data types, and volume of custodians makes early planning and understanding of your data universe imperative. When dealing with geographically dispersed and/or large numbers of custodians with a variety of data sources, effectively scoping the entire potentially relevant data universe and quickly implementing legal hold and collection protocols are of key importance.

As early as the preliminary scoping call, legal practitioners should seek to answer the following questions:

Who are the key custodians and where are they located?
How do these custodians and their teams communicate and what applications or tools they use?
What data sources do you anticipate being in scope and relevant (i.e. email, collaboration tools, shares, physical devices)?
Who handles information governance for your organization?
Where are the various data types located? On-premise, in a private cloud or hosted by an external cloud provider or application? If a third party manages that relationship, what sort of contract is in place?
What legal hold technology and protocols are in place and who is in charge of legal hold for the organization?

In complex matters, it is prudent to preserve broadly at the outset of the matter. These cases can often extend over a long period of time and the likelihood that a key custodian may leave the organization or that a device or piece of information may be replaced or deleted absent preservation is high. Cost of preservation, while not insubstantial, is far less costly than paying sanctions or receiving an adverse inference of spoliation.

Prioritization and triangulation

Due to the massive data volume, number of custodians, and disparate data types, prioritization is especially important in complex ediscovery matters. Regardless of the stakes, legal practitioners faced with huge complex matters are still bound by the confines of the space/time continuum. Whether you have a team of 50 or 500 reviewers attacking the millions of potentially in-scope documents, there are still time and cost pressures even on the most high-profile cases.

An effective way to triangulate on the most relevant data from varied data sources quickly is to prioritize key custodians and data sources early. This can be accomplished by using insights gained from digging into each successive data set to refine and reduce the scope of potential data to review in subsequent data sets. This approach allows practitioners to parse large and disparate data sets without exceeding budget or missing deadlines.

Practitioners can conduct social network analysis on an email set or collaboration tool to see which custodians are communicating most with key custodians and prioritize their data first. Digging into how key custodians use different methods of communication can reveal relevant data earlier in a review, while unstructured data analytics like concept clustering or data visualization will refine and prioritize most likely relevant data. With this approach, you can use insights about key people, periods of time, and concepts gained in one data set to infer scope, terms, and prioritization of other data sets without breaking the bank.

Leveraging AI to accelerate time to insight

The time constraints and massive costs associated with complex ediscovery matters necessitate the use of technology to reduce time to key facts, to eliminate non-relevant information quickly, and to amplify the insights and decisions made by the legal team across the data universe efficiently. One benefit to having more data than it is humanly possible to review without bending space/time or bankrupting an organization is that the use of advanced analytics is not only accepted but expected.

Give up eyes on every document

Since you will not face an uphill battle to move away from eyes on every document, the case team is free to explore advanced AI-powered workflows, deployment of continuous active or asynchronous learning (CAL), and other advanced analytics to accelerate time to evidence.

Ensure your partner can handle atypical data

As more and more complex cases entail atypical data it is important to understand the capabilities your service partner handles to create an easily reviewable version of key data types and process the data in an efficient and defensible manner. Request examples of what the data will look like in their review platform and case studies or references that can speak to their abilities with the relevant data (Slack, Bloomberg, texts, etc.)

Partner with a provider that can handle the case

It is important to collaborate with a service provider that has the experience and scalability to handle the increased scrutiny, data volumes, and complex workflows attendant with large bet-the-farm cases. You do not want to necessarily go with the cheapest provider or one with limited experience or infrastructure to support a large matter because transferring a case midway is painful and any cost savings are more than offset by additional project management costs, missed timelines, and mistakes that a more experienced provider would not make.

A cloud-native provider, like DISCO, is able to leverage edge computing and parallelized compute to handle matters far larger than the norm without experiencing latency (the spinning wheel of death) or reduced functionality.

At the end of the day, the best practices for a big matter are not too different from a smaller one. The stakes for not employing them are graver. Pressure-cooker levels of stress, timelines that are outlandish, and data volumes and types that make your head spin are just the beginning when facing a complex ediscovery case. So establish a plan early, leverage all the tech possible, and most importantly work with the right partner to help you navigate the bumpy waters of complex ediscovery.

Best Practices for “Bet the Farm” Cases