Enterprises today are capturing huge volumes of varied data at a very fast pace. According to an IDC report, “The volume of data stored in the Global StorageSphere is doubling approximately every four years.”
But unfortunately, most of the data that is captured and stored is not being used, and this data that is not used is known as “dark” or “dusty” data. Research advisory firm Gartner says dark data is “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.”
These days, many companies, have enormous amounts of dark data. Industry Analyst firm Forrester says that “73% of the data in enterprises is never used for any strategic purposes”
Even further, according to Rahul Telang, professor of information systems at Carnegie Mellon University, “Over 90% of the data in business is dark data.”
Why do enterprises carry dark data? Why horde data that is never used? Firstly, many organizations say they keep dark data in an effort to maintain adherence to regulatory compliance. For example, according to the Office of the Privacy Commissioner of Canada (OPC), insurance providers should keep insurance records for a minimum of three years. As stated in an article from Mondaq, the OPC found that “a three-year retention period was rationally linked to fraud detection and prevention in the insurance industry.”
On top of this, the U.S. Food and Drug Administration’s Code of Federal Regulations, Title 21, mandates that pharmaceutical companies in the U.S. retain the data pertaining to drug production, control or distribution for at least one year.
Secondly, dark data exists due to organization silos and lack of collaboration between teams. This results in poor management of the unused data stored in legacy and retired applications, dormant content servers, log files, customer complaints, geolocation data, data integration payloads, departed employees’ mailboxes, shared network drives and many other repositories.
Thirdly, many companies have a mindset wherein they “retain everything,” from production reports and contracts to invoices and maintenance records, due to risk aversion. Enterprises often steer clear of potential risk, even when the potential benefits of an action equal or exceed the loss.
So, what is the solution for enterprises to avoid or reduce dark data? Reducing dark data is typically a five-step approach.
- Firstly, identify your data strategy based on the level of digitization of your firm’s products and services sold, degree of laws and regulations in your industry, delivery or distribution channels and logistics, and so on. In other words, understand the role and the importance of data in your industry and inside your enterprise.
- Do not capture data if you do not need it. Data capture, storage and processing should address your business objectives (i.e., operations, compliance and decision making). If the data in question does not contribute to these goals, challenge the endeavor. These first two steps ensure that you are proactively acting on reducing dark data in your company.
- Thirdly, profile and catalog your existing enterprise data, reflecting the entire data life cycle. Cataloging business data should be based on metadata, including classifying datasets into reference data (such as plants, product categories and currency), master data (such as vendors, customers and products) and transactional data (such as orders and invoices). While reference data and master data are relativity static datasets, dark data is invariably some form of transactional data, which is based on business events. While data cataloging generally is time-consuming, it can be automated by deploying spiders that capture metadata from various data stores. To summarize, this phase will identify and profile the specific data objects you should focus on to derive business value.
- Assign a data owner or custodian for transactional data. Empower your data custodian to act with intensity and purpose because reducing the dark data footprint in your company involves collaborating with multidisciplinary teams.
- Define a retention and disposition schedule for your transactional data. Build rules for efficient discovery of unused transactional data assets, given the data is already cataloged to support data discovery and governance. For data objects that fall outside the retention and disposition schedule archive, back up or even purge the data if needed.
Gartner predicts that by “2021, more than 80% of organizations will fail to develop a consolidated data security policy across silos, leading to potential noncompliance, security breaches, and financial liabilities.” Just like an unused machine or inventory is a liability, likewise, any unused data asset is both a liability and risk.
In today’s big-data world, if data is managed, used, and governed well, it can be a huge business asset. But data can be a business asset only if it is within a specific range. While too little data impairs data-driven business performance, a lot of data — especially dark data — can also hamper the enterprise. Thus, a good data strategy involves finding the right balance between enough data and dark data.