Data Loss Prevention through Big Data analysis

With Stratfor still in the news after a massive security breach (you can read about it here), information security professionals should be looking into Big Data technologies.  While it’s relatively easy to secure a web server, a database, or an isolated service against most common automated attack vectors – securing a global enterprise with its multiple offices and networks worldwide, thousands of employees, contractors, vendors, and millions of documents is an entirely different kind of challenge.


With Data Loss Prevention (DLP), the harsh business realities are:

  • employees will cut corners
  • everybody is “online”
  • policies will [sometimes] be ignored
  • honest mistakes will be made
  • a complex system is only as strong as its weakest link
  • there is no such thing as commercially feasible, 100% reliable, DRM (Digital Rights Management), and there is no protection against analog copying
  • data loss *will* occur
  • a disgruntled employee with sufficient privileges can do a lot of damage
  • a network admin has a lot of power
  • real damage often occurs when a break-in remains undetected for a long time (i.e. it simply takes awhile to copy half a terabyte of data somewhere)

While it’s virtually impossible to protect data at a large organization wihtout instituting so many regulations that effective collaboration becomes virtually impossible, it is absolutely possible to ensure that:

  1. each security incident is detected early
  2. sufficient data exists for intrusion forensics
  3. you have sufficient visibility into the flow of data inside and outside your organization
  4. your “awareness” serves as a deterrent against voluntary unauthorized data sharing

… and that’s where technologies like Hadoop, Hbase, Hive, Nutch etc can be extremely helpful.  From pattern analysis (logs, processes, comunications) to visualizing deficiencies in your existing data management workflow (e.g. Document X is located outside of proper directory structure), you can employ arrays of inexpensive servers to give you Google-like power over your data.

by Michael Alatortsev

If your organization is currently struggling with data, feel free to reach out to us by emailing





Technologist, parallel entrepreneur. Interests: travel, photography, big data, analytics, predictive modeling.

Tagged with: , , , , , , ,
Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: