How long should you keep your data for?
January 16th, 2022

Here’s a brief look at the best practices and major legal directives pertaining to data retention in the age of cloud storage.

Cloud storage has revolutionized the way organizations collect, process, and store data. With practically limitless online storage available and the costs per gigabyte constantly dropping, it should not come as a surprise that the amount of data generated is doubling every two years.

Given the ubiquity of cloud storage, it can be tempting to keep data indefinitely. However, new legislation like GDPR and CCPA make it clear that you should not keep personal data longer than necessary. The real question is, how long is that?

Understanding data retention schedules

Data retention practices play a significant role in GDPR compliance in the form of the storage limitation principle. California’s similar CCPA legislation adopts the same principle, but neither laws directly answer the question of how long you should actually keep data for.

You should only retain personal data for the shortest time possible. This means that is should only be kept for as long as its purpose is fulfilled. If the data no longer serves its original purpose, it should be deleted. For this reason, every organization must create its own data retention policies that serve its specific use cases.

Some types of personally identifiable information are governed by other laws, which may seem to conflict with privacy regulations like GDPR and CCPA. For example, the Sarbanes-Oxley Act (SOX) requires US companies to retain accounting data for at least five years, while the Fair Labor Standards Act (FLSA) requires employers to retain employment records for three years. In both cases, data retention schedules override GDPR and CCPA, since it is necessary that you retain the data.

The law also gives extra leeway for the storage of personal data for archival purposes that are deemed in the public interest. For example, data pertaining to statistical or historical research purposes can generally be kept indefinitely. The only exception are subject access requests (SARs) under GDPR and CCPA, which also gives citizens the right to request the deletion of their data in certain circumstances. In these cases, data may need to be redacted to remove any personally identifiable information if you still want to keep it.

Why you need a data archiving solution

Considering the complexity and diversity of modern data storage solutions, it can be difficult to determine the right archival systems and processes. When data retention periods expire, it is vital that you do not continue processing the data sets for any other purposes than archival. Furthermore, you will also be obligated to apply the appropriate technical and organizational measures to protect the data.

Fortunately, many cloud storage vendors, such as AWS, provide data archiving services for long-term retention. Traditional options, such as NAS systems and tape drives also remain in widespread use. However, the cloud does offer a higher degree of scalability and flexibility, especially now that Web 3.0 decentralized storage systems are starting to appear. Archived data is not meant to be readily available, and it should not be accessible to any third-party tools and organizations. Instead, it is intended for use where long retention periods are a legal requirement.

Your data retention policies should always keep in mind cloud storage and archival, and you should typically refrain from relying solely on public cloud services. For example, your solution may automatically archive old data by moving it to a dedicated public or private cloud archiving platform or even an in-house solution such as a NAS or on-premises data center. These cold-site backups are there if you need them, but they are not readily accessible for everyday use.

Above all, defining a proper data retention period means maintaining visibility and control over your data. Therefore, it is imperative that you avoid so-called data graveyards, in which you end up with massive repositories of unused and unnecessary data. If there is any chance you may need to retain personal data for longer, then it should typically be anonymized.

Mineral is a storage connector that centralizes and optimizes your cloud storage solutions to match your unique workflow requirements. Sign up for our early access program today to find out how it works.

Arweave TX
Ethereum Address
Content Digest