Using Predictive Coding to Find Privileged Content

June 6, 2014

Predictive coding, also known as technology-assisted review (TAR), is gaining wider acceptance in litigation and regulatory investigations, but lawyers generally insist that all documents deemed responsive need to be reviewed individually to guard against inadvertent disclosure of privileged documents. The stakes for correctly identifying privileged documents are higher than for any other category of documents, and because of the potential effect on a matter’s outcome the protection of privileged information tends to be the most expensive and time-consuming e-discovery task. Currently, it involves running multiple iterations of search terms with linear eyes-on review of the results, and rigorous quality-control.

Privilege is hard to identify using common predictive coding technology for several reasons. Among them: Text in a document claiming that it is “privileged and confidential” is meaningless, and documents that are in fact privileged often will not contain those words.

New developments in machine learning technology promise to more accurately identify privileged documents. Among these developments is finite state machines modeling. Like other approaches, this mathematical method starts with one or more initial seed sets of documents used to train the tool to detect similar text in the broader population, a process which is then refined. But distinctive advances in finite state machine technology enable it to be far more accurate in identifying privileged content. Perfection is elusive and will remain so, but new technologies promise to significantly reduce the time and cost of e-discovery document review.

Read full article at:

Daily Updates

Sign up for our free daily newsletter for the latest news and business legal developments.

Scroll to Top