Managing Data Deletion at a Granular Level; Executive Interview with C2C Systems CTO Part III

Most companies recognize the benefits of deleting data when it no longer serves any business purpose or when it legal requirements to retain it have been met. However the act of deleting data still gives many organizations pause. In this third blog entry in my interview series with C2C Systems’ CTO Ken Hughes, he discusses C2C’s policy management features and the granular ways in which users may manage deletion in their data stores.

Charles: A concern for many organizations when deleting emails is the risk that they may delete a potentially relevant email. Does C2C offer the option to do “intelligent PST ingestion,” that deletes emails based on a set of rules that have been agreed upon and provide an audit trail that can explain why they did it?

Ken: Absolutely. You have to set up a policy where you can show the ability to search, select the data and then optionally delete it. This will actually broaden your view as to the way C2C can manage data as it is very much done with the view of eDiscovery in mind.

The key behind is not just finding data to archive, but finding it so you aware of its existence.  In this way, when you go out and delete data, those actions are justifiable or defensible at a later date.

Some of the criteria that C2C uses when deleting data includes measures to make sure no deletions occur during any type of litigation. However you want to make sure that you are only supplying relevant data. Data retention and deletion is something that you do as part of your information governance, your general day to day process.

To give you a view of C2C’s policies, a policy is basically a definition that allows you to select what data, what set of data matches the criteria that you have defined. If you look at a typical retention policy, it might be to delete anything in the user’s Exchange mailbox after six months. Then if the data resides in a PST file that resides on a laptop or on the network, delete that after three years.

So the way that you would achieve that is you basically implement a couple of policies. On the Exchange mailbox, you may have a very granular list of criteria that allow you to select the mail that you want to address with this particular policy.

C2C has four areas that its split criteria into. The first one is store. So whether the data is in a PST, whether you are looking at a mailbox, or whether you’re looking at specifically named mailboxes, we can select just based on those criteria.

We can also select based on the folder that the message might be in. So if the message is in your “Sent” items or if it is in a “Projects” folder or maybe a corporate policy is to create a folder for stuff you want to retain longer, like a 7 year retention folder, we can set up for the particular folder that it’s in also.

Charles: People sometimes think it is clever to change files names, especially the extension of their PST archives. Do you have some way of searching beyond just file extensions to be able to find these PSTs?

Ken:  In version 6.5 C2C just looks at file extensions. It has been able to examine the header of the PST regardless of the extension but as to determine whether any file is a PST file, that is something C2C is looking at for the next version.

What C2C is looking at is really the metadata of the individual messages. There is a whole raft of criteria here. Many of our competitors will let you carry out actions on messages based on their size or age whereas C2C allows you to select data based on virtually any criteria of the message.
What we would do is a message was delivered more than six months ago is we would use that as the criteria. We can also look at attachments, so you can look for any word in the attachment. We do full text indexing on those also. That would be one policy.

You can also search in PST folders. So these PST folders loaded in someone’s Outlook, we can look for what uncoupled or orphaned PSTs, which is just a file on a disk on the file server that no one is using. It is not opened in an Outlook profile; it is just sitting on a disk somewhere. Once we have found all those, we can process any of those locations.

We can then schedule when you want to perform this task, perhaps every day, so you can find all this mail. But when you find that mail, you then want to do something with it. Some people want to do more than archive it. Many people will say, “OK, go and grab all your email data, bring it back to the archive, and then we can manage it.

C2C is about consistent data management regardless of the location or the state of the message. So that data would have to be in the archive for us to manage it, to apply retention, or to do discovery on it.

We can just go ahead and create a report. We can archive it or we can just archive the attachments. We can actually move or copy them to maybe a compliance officer’s mailbox. We can go ahead and delete them; we can delete items from the archive, we can unarchive it; or we can just generate a report for someone.

There’s multiple things you can do when you have matched the data set. The question is, “What do you want to do with it?”  C2C has a range of actions that allow you to carry out on these desires.

The first and most scenario might be to just delete it. But as you can see, C2C get a bit more granular than just deleting the message. Do we want to move it to the trash? Do we want to move it to the end user’s deleted items? Or do we NOT want to delete the message, but just delete the attachments?

So if you are doing data management and there a lot of AVI files or MP3 files that are probably not needed inside of Exchange, just go ahead and delete those attachments. In this way you can delete data, track what data you are deleting  and have a good explanation of why you deleted what you deleted.

In Part IV of this interview series Ken explains how C2C does search using a combination of both centralized and distributed search methodologies.

About Charles Skamser
Charles Skamser is an internationally recognized technology sales, marketing and product management leader with over 25 years of experience in Information Governance, eDiscovery, Machine Learning, Computer Assisted Analytics, Cloud Computing, Big Data Analytics, IT Automation and ITOA. Charles is the founder and Senior Analyst for eDiscovery Solutions Group, a global provider of information management consulting, market intelligence and advisory services specializing in information governance, eDiscovery, Big Data analytics and cloud computing solutions. Previously, Charles served in various executive roles with disruptive technology start ups and well known industry technology providers. Charles is a prolific author and a regular speaker on the technology that the Global 2000 require to manage the accelerating increase in Electronically Stored Information (ESI). Charles holds a BA in Political Science and Economics from Macalester College.