Cloud Computing Architecture and eDiscovery

Cloud computing is now the defacto Information Management (IT) architecture that enterprises are either already utilizing or have plans to utilize in the near future. The goal of this Blog post is to provide an overview of cloud computing, it’s effect on the practice of eDiscovery and what eDiscovery in the cloud really means.

From a pure conceptual standpoint, cloud computing is actually a marketing term for technologies that provide computation, software, data access, and storage services that do not require end-user knowledge of the physical location and configuration of the system that delivers the services.  From an end user standpoint, conceptually not having to worry about where your data is located is a tremendous benefit.  However, from an eDiscovery collection perspective, conceptually not knowing where data may be located could prove to be an issue or at the very least a concern.

Cloud comping is also a delivery model for IT services based on Internet protocols, and it typically involves provisioning of dynamically scalable and often virtualized resources.  It is a natural byproduct and consequence of the ease-of-access to remote computing sites provided by the Internet. This may take the form of web-based tools or applications that users can access and use through a web browser as if the programs were installed locally on their own computers.  Saleforce.com is the best known example of this type of application of cloud computing.  There are also several eDiscovery vendors that now offer a web-based option and most, if not all of the remaining vendors will be doing so in 2012.

At the foundation of cloud computing is the broader concept of infrastructure convergence, consisting of services delivered through shared data centers, which appear to users as a single point of access for their computing needs. This type of data center environment allows enterprises to get their applications up and running faster, with easier manageability and less maintenance, and enables IT to more rapidly adjust IT resources (such as servers, storage, and networking) to meet fluctuating and unpredictable business demand.  From a pure conceptually standpoint, infrastructure convergence enabling the flexibility of meeting the inevitable demands of eDiscovery processing would seem to be the natural next step.  However, in practice, with much of the legacy eDiscovery technology locked into appliances and complex software configurations that don’t lend themselves to the advantages of  virtualized computing, there are only a few eDiscovery technology vendors that are positioned to truly take advantage of cloud computing and the flexibility of infrastructure convergence.

Once an enterprise decides to go down the cloud computing path they can either implement the concept of infrastructure convergence and shared resources as an internal private cloud, an outsource their IT infrastructure to a third party public cloud through a Cloud Service Provider (CSP) or they can choose a hybrid approach which utilizes both public and private cloud infrastructures.  However, as I stated in the previous paragraph, there are only a few eDiscovery technology vendors that are positioned to truly take advantage of cloud computing and the flexibility of infrastructure convergence.  Therefore, at this point, even though the enterprise decides to implement cloud computing, unless they embrace the new generation of eDiscovery platforms that can “live and work” in the virtual world of the cloud, they may have to leave their eDiscovery processing behind and continue to collect and process data outside the cloud.

Amazon Web Services (AWS)

One of the first and better know Cloud Service Providers (CSPs) is Amazon Web Services (AWS).  Launched in July 2002, Amazon Web Services  is a collection of remote computing services (also called web services) that together make up a cloud computing platform, offered over the Internet by Amazon.com. The most central and well-known of these services are Amazon EC2 and Amazon S3.  Most of these services are not exposed directly to end users, but instead offer functionality that other developers can use. In June 2007, Amazon claimed that more than 330,000 developers had signed up to use Amazon Web Services. Amazon Web Services’ offerings are accessed over HTTP, using Representational State Transfer (REST) and SOAP protocols. All services are billed on usage, but how usage is measured for billing varies from service to service. Please note that as of the writing of this Blog post, AWS had not responded to numerous requests to officially comment on how they are currently handling eDiscovery requests from thier clients.

CLOUD ARCHITECTURE LAYERS

Cloud computing architecture is categorized into three (3) layers; Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS).

Software-as-a-Service (SaaS)
Software-as-a-Service (SaaS) is the best known of these layers as it is the most visible to users. Simply put, Software-as-a-Service (SaaS) enables software vendors to deliver software as a service over the Internet, eliminating the need to install and run the application on the user’s own computers and simplifying maintenance and support.  SaaS is actually a more mature delivery architecture than many realize and is an integral part of cloud computing. According to a Gartner Group estimate, SaaS sales in 2010 reached $10B, and were projected to increase to $12.1b in 2011, up 20.7% from 2010. Gartner Group estimates that SaaS revenue will be more than double its 2010 numbers by 2015 and reach a projected $21.3b. Customer relationship management (CRM) continues to be the largest market for SaaS. SaaS revenue within the CRM market was forecast to reach $3.8b in 2011, up from $3.2b in 2010.

And, as indicated earlier in this post, there are a number of eDiscovery tool vendors that offer SaaS delivery options.  However, don’t confuse SaaS delivery with providing eDiscovery in the Cloud.  There is a major difference.   Since it is highly unlikely that the eDiscovery platform is in the same physical location as the data, eDiscovery SaaS providers requires users to physically collect data and move it the data center (physical location) that houses the eDiscovery platform.    Once loaded onto this platform, the data is processed and then users can access it over the internet.  I contend that this approach of moving data to the eDiscovery platform is not that different that what has occured over the past 5-10 years with other enterprise data and is not eDiscovery in the cloud.  True eDiscovery in the Cloud requires the eDiscovery software to reside in the cloud.  This implementation would in fact be considered SaaS but is much different than the current generation of eDiscovery SaaS platforms.

Platform-as-a-Service (PaaS)
Platform-as-a-Service (PaaS) is a category of cloud computing services that provide a computing platform and a solution stack as a service.  In the classic layered model of cloud computing, the PaaS layer lies between the SaaS and the IaaS layers.Various types of PaaS vendor offerings could be extensive and will include a total application hosting, development, testing, and deployment environment, along with extensive integrated services that consist of scalability, maintenance, and versioning.  PaaS offerings may also include facilities for application design, application development, testing, deployment and hosting as well as application services such as team collaboration, web service integration and marshalling, database integration, security, scalability, storage, persistence, state management, application versioning, application instrumentation and developer community facilitation.

It is within the Platform-as-a-Service (PaaS) layer where eDiscovery services belong.  In fact, this may be a good time to coin the term eDiscovery-as-a-Service (eDaaS).  Unfortunately, as of the writing of this Blog post there are no eDiscovery vendors that offer eDiscovery-as-a-Service (eDaaS).  However, there are several vendors that I am aware of that are working on offerings to be released in early 2012.  And, since providing eDaaS as a standard option for any PaaS offering makes so much sense and could provide a first mover and key competitive advance for Cloud Service Providers (CSPs), I predict that we will see several eDaaS offerings before the end of 2012.  And, I also predict that once the eDaaS offerings hit the market, the legacy eDiscovery platform providers will be forced to re-evaluate the value propositions of their non eDaaS offerings in the cloud.

Please note that I am working on a research paper investigating how the CSPs support the eDiscovery requirements of their client bases and what next generations tools (eDaaS) are going to be available to assist the CSPs with these requirements.

Infrastructure-as-a-Service (IaaS)
Infrastructure-as-a-Service (IaaS) is the least glamorous of the cloud computing layers but provides the real technical “infrastructure” to enable cloud computing to exist.  Infrastructure-as-a-Service (IaaS), simply stated, provides a physical yet virtual processing environment along with raw (block) storage and networking. Rather than purchasing servers, software, data-center space or network equipment, enterprise clients instead buy those resources as a fully outsourced service with the ability to scale up processing, storage and even networking as may be required.  There is a lot more technical details to IaaS.  However, for the purposes of this post, my definition is adequate to get my point across.

CONCLUSION
Cloud computing is now the defacto Information Management (IT) architecture that enterprises are either already utilizing or have plans to utilize in the near future.  Cloud computing architecture is categorized into three (3) layers; Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS).  It is within the Platform-as-a-Service (PaaS) layer where eDiscovery services or eDiscovery-as-a-Service (eDaaS), belong .  Unfortunately, as of the writing of this Blog post there are no eDiscovery vendors that offer eDiscovery-as-a-Service (eDaaS).  However, there are several vendors that I am aware of that are working on offerings to be released in early 2012.  And, since providing eDaaS as a standard option for any PaaS offering makes so much sense and could provide a first mover and key competitive advance for Cloud Service Providers (CSPs), I predict that we will see several eDaaS offerings before the end of 2012.

About Charles Skamser
Charles Skamser is an internationally recognized technology sales, marketing and product management leader with over 25 years of experience in Information Governance, eDiscovery, Machine Learning, Computer Assisted Analytics, Cloud Computing, Big Data Analytics, IT Automation and ITOA. Charles is the founder and Senior Analyst for eDiscovery Solutions Group, a global provider of information management consulting, market intelligence and advisory services specializing in information governance, eDiscovery, Big Data analytics and cloud computing solutions. Previously, Charles served in various executive roles with disruptive technology start ups and well known industry technology providers. Charles is a prolific author and a regular speaker on the technology that the Global 2000 require to manage the accelerating increase in Electronically Stored Information (ESI). Charles holds a BA in Political Science and Economics from Macalester College.