GDPR – A practical challenges of data classification

GDPR – A practical challenges  of data classification

The organizations' have become dependent on customer data as it is being treated as supreme wealth. The data is globally accessible from everything and everywhere, from legacy applications, web interfaces, big data files shares and more importantly the messaging system. The data leakage of key information is considered as potential risk for not conforming to the local regulation that leaves the company facing huge penalty and regulatory issues. Data classification, labeling and protecting the customer data are the key activities of data owner's.

From data privacy perspective, the companies also want to prepare themselves for the EU data privacy regulation 'GDPR' and ensure privacy protection of their customers. I intended to write this article to explain some of the practical challenges that were faced myself during the data classification project at different stages.

General challenges

There are two types of data majorly known as structured and unstructured data, the structured is a data processed within application and databases, the unstructured data is the folders created by the user, data extraction for business purpose stored in laptops or in shared drives.
The data classification policy of the organization guides the business and IT, for classifying the data as personally identifiable or price sensitive.However, the non-existent right policy could be a big challenge for the business and IT, sometimes there are policy talks more generic and not relevant to business, there are security policies that talk about the classification from different perspectives and not just data privacy. The business user creates a huge volume of unstructured data without inventory, the organization is at more risk of managing the huge volume of unknown information, costing storage and risk of protecting the data. The lack of awareness on the data creation or extractions and keeping it in the folders is a major problem for the business.
The classification of the data is something the business should be aware of. On what type of data to be classified as highly confidential, confidential, internal, system or public with proper justification, such as price sensitive or personally identified information or any other category. The non-availability of the specification of data attributes to classify it within the structured or unstructured data is another hiccup for the data classification program.
Structured data:
Non availability of  centralised  Configuration management database (CMDB) is the major issue for structured data for mapping critical business process with application's or data base's, the organisation must maintain  CMDB to identify the highly confidential data, however most of the time the CMDB is outdated or no information is available, which makes difficulties for extracting and classifying the critical data.
Legacy applications is another major problem where the database structures are not built in a standard format. So understanding the data field and populating description of data fields and classifying them is a challenge for data owner, this leads to relay on people knowledge.
The applications may interface with single or many databases, so it's important to identify the flow of customer data, non-availability of mapping of interfacing databases with applications leaves relay on application support team's knowledge which could potentially delay the classification of data and miss unknown interfacing database. The absence of labels in the database column or field level information of structured data is very challenging unless there is description has been in place for every field, so making an assumption for fields leave the classification may go wrong.
Unstructured data:
The unstructured data is other major type of data where enterprises are more accumulated with file shares, documents, etc in the form of unstructured data created for years of years stored in the IT systems which creates huge volume of data, the lack of business user awareness on data classification policy leads accumulating large data  without inventory.
Managing the inventory of unstructured data is a hectic task due to a number of folders and files amassed over the years,    there could be a huge volume of unidentified files present in the storage in case of the non existence of classification process or reconciliation process followed in the organization.
The automation tools shall help an organization for data classification, however, there is a cost involved in it. If the company was chosen for manual classification then there is huge task is ahead of them for creating and managing the inventory of data, the classification owner should have clarity on what level classification they able to do for classification at the folder or individual file level. The additional challenge of file shares created by users is maintenance cost of the storage, the classification helps to remove the unwanted data which could reduce the maintenance cost of storage.

Suggestions and conclusion

The penalties and legal challenges are huge in case of not following the GDPR guidelines and data classification is imperative for an organization, here are some of the suggestions would help to set up the approach for data classification process.
  • Publishing the data classification policy is the crucial step for data classification, the classification policy shall be detailed with examples. The senior management support to forcing the policy is important.
  • Standardise the process of data classification with a proper template to identify, extract, label and classify will help the data owner to meet data privacy objectives through standard templates and data attributes.
  • Setting up the committee for data classification would help an organization to sail through data classification process as they can disseminate the knowledge on importance around the data classification to business user's and organize the ongoing classification process,
  • Creating awareness of business user responsibility for managing the customer data shall help the organization to maintain a minimum level of required data and manage data privacy with due diligence.
  • Ensure the responsibilities of data custodians, data owner, data privacy owner and auditor's on data classification.
  • Manage the metadata and inventory of every data sources with business ownership shall simplify the process of data classification. 
  • Finally, the internal and external audit review is required to certify that the data classification process and outcome of data classification
Hope, you have enjoyed reading the article, acquired the knowledge around the challenges of data classification.
Authored By - Arunkumar Durairaj
TCS Cyber Security Community
Rate this article: 
No votes yet
Article category: