With the debate over Predictive Coding entering a feverish pitch, there is an interesting thread of discussion beginning to emerge asking whether or not litigators and other users need to understand what I am going to refer to as Predictive Coding Theory.
In a May 4, 2012 Blog titled, “3 Drawbacks To Predictive Coding“, Sandra E. Serkes, President and CEO of Valora Technologies writes “What is missing (in regards to the Predictive Coding debate), is a discussion of the specific weaknesses of the overall Predictive Coding technique. She then goes on to indicate that, “Predictive Coding tagging algorithms are not transparent”.
To put this into more technical terms, do we need to know what probability theories and related dimension reduction systems are being used as the foundational algorithms for Predictive Coding system to identify relevant documents?
For those of you who are interested in a more detailed overview of Predictive Coding, I suggest that you read a March 25, 2012 Blog post titled, “Predictive Coding Based Legal Methods for Search and Review“, Ralph Losey does an excellent job of discussing the basic technical mechanics and some of the underlying theories of Predictive Coding.
Getting back to my question about how much we need to know about Predictive Coding, I am in the process of developing some unique insight. Over the past 30 days, in preparation for adding a Predictive Coding module to the DCIG/eDSG 2012 Early Case Assessment Buyers Guide, I have been interviewing product managers from some of the Predictive Coding vendors and current users of Predictive Coding system to develop a list of criteria for reviewing and ranking the platforms for our buyers guide. One of the questions that I have been asking is what probability theories and related dimension reduction systems are being used as the foundational algorithms for your Predictive Coding platform to identify relevant documents.
So far, I haven’t gotten a straight answer as most of the product managers either don’t undesrtand the question or want to move the discussion up a couple of layers in technology stack to talk about indexing, semantic search, clustering, relevance ranking, sampling and presentation of results. There is no doubt that these are all very pertinent topics to a perspective buyer of Predictive Coding technology. However, it doesn’t answer the question about the transparency of exactly how these systems are identifying relevant documents.
Whether or not litigators need to understand Predictive Coding theory and the underlying probability theories and related dimension reduction systems is debatable. However, I believe that a minimum level of transparency from the Predictive Coding vendors would at least give buyers the opportunity to understand what they are buying and then compare the various offerings.
In 1969, Edgar F. Codd and some of his associates that I have actually had the honor or knowing, first formulated and proposed the theory of relational database. And, although I am not sure that it reach the level of skepticism and resistance to blind adoption that we are currently seeing with Predictive Coding. However, it was new and therefore many did require an explanation of the underlying theories and mathematics. Eventually, the discipline normalized and everyone just assumed that relational databases worked and there was no longer any need to question “how they worked’.
A similar vetting process would be very healthy for Predictive Coding. Check back to my blog in the coming weeks for updates and more information on this topic.