If you ever used a computer, you are most likely guilty of creating dark data. This fairly new term is also known as “dusty data”, precious data collecting dust on our machines and archives. With technology evolving fast, the speed of creating new data is high. The risks of neglecting old data is therefore increasing and growing. Time to dust off the digital shelves!
The creation of dark data
Similar to big data, dark data is hard to manage. The main reason is that it is usually out of control of normal data storage or backup mechanisms. It resides on our machines, old archives, possibly even CDs or DVDs in our archive. Usually this data is copied by end-users during their years of employment, or simply created on the wrong location. Working documents, or living documents, are a great example of new data which is valuable for the organization. These local copies are often created quickly to draft something, but in the end are never copied to our network shares. The result is missing out on valuable information and the risk of losing it at the stage of hardware failure.
While many documents stored on local drives or in archives might be considered useless, some of the data might be filling the missing gap for proper trend analysis. Another risk is that it might result in unneeded work, like performing the same kind of analysis which was already done in the past and could be easily replicated. Then there is the risk of users having several copies of the same data element, some online mixed with some offline versions.
Limiting the impact
We could actively search systems and archives for old data by comparing the hashes of each document, merge duplicates and adjust our links to the right location. Additionally we could use proper versioning and inform users about the possibilities. This way they understand the benefits of using a (slow) network drive, like being able to restore a document to its previous version.
Another option is to inform our users about the risks involved of storing local files, for example by software automation. Search for newly created Word and Excel documents or presentations and inform the user about these documents and explain the risks.