Protecting PDFs for GDPR

In complying with the General Data Protection Regulation (GDPR) rules, companies of all sizes and industries have been defining processes and systems to manage personal data in a more structured and reliable way than ever before.

Obviously, the first things that come to mind are customer databases, CRM systems, and marketing automation tools. But what about documents?

Personal data and confidential information are not only contained in databases and internal systems, but also in various business documents.

The PDF Association refers to the PDF file format as “digital paper” and the “de facto standard for electronic documents.” In order to make sure PDF documents are included in your internal data protection compliance processes, there are a few things you should take care of. Below are five tips for preparing PDF documents for GDPR compliance.

1) Raise awareness among staff that not all PDFs are the same

Not all PDFs are searchable, like scans for example. To retrieve data in such documents, they must be converted to searchable PDFs using text recognition (OCR). So-called “digital-born” PDFs are known to be searchable; however, there may be a variety of reasons why the information in them could not be found through a full-text search, e.g. vector graphics that look like text, screenshots, or other images that contain text.

Fundamentally, the ambition of GDPR is about strengthening individual control over the use of their personal data. This means that your organization must have the systems and best practices in place to comply with subject access requests (Art. 15 of GDPR), “right to be forgotten” (Art. 17 of GDPR), or objection to data processing (Art. 21 of GDPR). You have to be sure that you will find all personal data of a data subject in order to comply.

Moreover, GDPR imposes rigorous obligations on data controllers and processors to safeguard privacy rights, including providing data subjects a copy of their processed personal data upon request (Art. 15 of GDPR). To do so, data controllers must implement appropriate technical and organizational measures (Art. 24 of GDPR) to ensure that processing of personally identifiable information is consistent with the Regulation and provide information to data subjects in a concise, transparent, intelligible and easily accessible form (Art. 12 of GDPR), including the right of data portability (Art. 20 of GDPR).

2) Make data in PDF documents “discoverable”

The primary challenge faced by organizations is lack of sufficient insight to information holdings that may span email, file shares, content repositories and, yes, in many cases, paper-based files. Not surprisingly, the PwC Pulse Survey found that initial investments are expected to be in data discovery best practices and tools.

Digitizing paper documents and converting previously scanned and non-searchable PDFs into searchable documents will ensure that documents in digital archives and repositories are quickly retrievable by simply searching for specific names or other personal information. The conversion can be automated for individual needs or for organization-wide handling of large amounts of documents. The process has to be set up just once and will run automatically afterward.

3) Minimize the sensitive data you store and share

GDPR imposes specific requirements related to data minimization (Art. 25 of GDPR). Therefore, it is prudent practice to either apply data anonymization best practices so that data subjects are no longer identifiable, or use data redaction for documents that may contain confidential, sensitive, or personally identifiable information.

Therefore, since PDF documents contain information that is confidential and is deemed to be personally identifiable information, it is recommended that any personal, sensitive, and confidential information is made unrecognizable in these documents—whether stored in a digital archive or shared with third parties. Using true redaction will permanently remove information from the documents and make it irretrievable.

4) Beware of the “hidden” data

In many cases, sensitive data is visible in plain sight within the document, but sometimes such data can also be “hidden” within metadata, attached files, and comments. Instead of going through all these areas manually and redacting or removing sensitive information piece by piece, with the appropriate software tools, these can be removed quickly and easily. PDF documents can be “sanitized” by removing “hidden” data with just a few clicks, which will help keep administrative burden and compliance cost within reason.

5) Protect the information in documents from unauthorized access

If certain personal information needs to be retained according to Art. 25 of GDPR the necessary measures need to be taken in order to limit the access to processed personal data only to authorized personnel. One of the possibilities to comply is protecting these documents with a password, which will make them accessible only to those who have the correct password. At the same time, beware of the fact that these documents then become inaccessible through keyword search. When it comes to handling requests such as subject access (Art. 15 GDPR), “right to be forgotten” (Art. 17 GDPR), or objection to data processing (Art. 21 GDPR), where finding all personal data of a data subject is crucial for compliance, password protected documents will need to be processed separately.

Check Also

NFTs And The Law: What Do I Actually Own?

A quick look into NFTs, and how they fit into a legal landscape that isn’t ready for them.