Tackling the Data Security of Ever-Growing Data Footprint

Business Problem

Organizations are generating the heaps of data every day in the form of structured, semi-structured and unstructured format and storing it on the platforms like Big Data and Cloud based storage apart from the conventional on-premise data storages like Relational Databases and File Servers. As per the published statistics, 80 % of this data is in unstructured format with limited capability of searching, querying and analysis. Typical examples of such data includes word files, emails, PDFs, spreadsheets, presentations, audio files and images.

These format of documents are spread across the various storages system within and outside organization including document management system, end user systems, email servers and external storage. Such distinct formats making it difficult for organizations to keep a track of this data and enforce consistent data protection control across these data sets.

Key challenges organizations are facing today like “Where all such data is stored? What is critical and sensitive data out of this entire data population? Who all have access to this data? What kind of protection is currently provided to this data?”

It is turning out to be lot of data blind spots and exposing organization to the risk of non-compliance to the data protection regulations and increasing likelihood of data breaches.

Not taking in account the various formats and whereabouts of the data, Organizations been continuously pressed by regulators and internal security policies to deploy the set of controls, as indicated below, to remain compliant and safeguarded from data losses;

  • Identification of the information assets, irrespective of its format and location, and establishing the ownership at the earliest stage of its lifecycle, ideally at the stage of creation itself. 
  • Classifying of this data for its sensitivity and criticality to the organization
  • Establishing the data handling procedures to handle the data based on its value to ensure the appropriate handling and due care
  • Controlling and monitoring the access to critical and sensitive data for any unauthorized access and unauthorized modification 
  • Deploying the appropriate data protection controls at every stage of data life cycle in commensuration with the value of data to ensure the data integrity and confidentiality
  • Retaining these identified data sets as per the regulatory and business requirement and make it available as and when needed based on the requirement 
  • Declassifying the data in the lifecycle as needed to ensure the need of reducing of the protection level and further retention 
  • Securely deleting the data to ensure no traces left and keeping the records of secure deletion

While these controls are relatively easy to implement on structured form of data, it is uphill task to deploy it on unstructured data. You can possibly deploy some of these controls if you know where this data resides within and outside the organization. Major challenge remains is “Discovering such Data and its Repositories”.

Knowing these challenges, it is becoming difficult for security leaders to centrally define, manage and monitor the consistent policy enforcement across these various data sets and managing the data protection for the unstructured data. This all is leading towards siloed and inconsistent approach of data protection.


Knowing the difficulty of this enforcement and limitation of available technology, organization should adopt the following suggested approach to minimize the risk of data loss;

  • Tailor made the data protection policy and processes to suit the requirements of various formats of Data
  • Identify the critical and sensitive data of organization and its source of collection or creation irrespective of the format. Organization can further classify this data into customer related data (PII, PHI, Demographics, Credit Card, SSN etc.) and internal business related data (Business plans, intellectual property, source codes, trade secrets, other proprietary information) 
  • Create an inventory of all this data, its format and storage location. Leverage the data discovery utilities, which has capability of doing file analysis and searching cloud and big data, to discover and tag the data
  • Leverage the Master Data Management program to aid the effort of identifying these silos
  • Crate a metadata for all these identified data, validate the automatic tagging and classification performed by these discovery utilities to ensure the accurate classification
  • Perform the risk assessment for these data silos to determine the need of protection, and evaluate the available market products to meet the data protection requirement
  • Explore the alternate controls including native capabilities to meet the data protection requirement
  • Use the integrated data leakage protection capabilities of perimeter security devices to detect and prevent the attempt of data breaches. Also use the DLP suites to monitor the data at possible stages of data lifecycle
  • Identify the business owners and other stakeholders of this data and train them on aspect of secure data handling 
  • Keep regular track of this data and enforce reclassification to ensure the appropriate protection. Get the data declassified at relevant stage to reduce the liability of damage


Given the continued trend of unstructured data growth, security leaders should adopt the suggested holistic approach and apply the risk driven approach to tackle the security of unstructured data. Such approach will enable business to harness the power of these various data sets and analytical capability of these data platforms for gaining better understanding of customers, products, services and business in general in secure manner and meeting the regulators expectations.

Authored by Prashant Deo

TCS Enterprise Security and Risk Management

Rate this article: 
Average: 2.6 (9 votes)
Article category: 

There is 1 Comment

A complex and demanding issue which faces all Businesses.  I proffer one school of thought lies with the 'User'.  When you consider on a given platform, lets say for email.  If no one sends emails, then no space would be used?  However, we know thats not practicable.
As an Information Security Manager, I noted how staff don't empty their 'Delete Items' folder in their email.  Likewise, are staff emptying their 'Recycle Bins', all is relative.
It's about Discipline and Management, Education of Users is critical.  How many times are documents placed in the 'Cloud', once deposited and never seen , 'Out of sight, out of mind'.  The Cloud is a never ending repository, however, the business is paying for a certain amount of Terabytes.
It could be relevant to system storage or even the Cloud, subject to what the Business decides.  Therefore, if the Business decides to clear down all systems  and the Cloud of redundant data, this would be freeing up space and the Business is saving money.
Educate the Business, Educate the Users.  Food for thought may be.