At Pismo, we have been using Google Cloud Data Loss Prevention (DLP) to inspect our log data and find any sensitive information that may be present in it. This inspection helps us avoid customer data leaks and fulfil privacy regulations like the European General Data Protection Regulation (GDPR).
Log data x privacy
Collecting and analysing log data is essential for troubleshooting and performance tuning. On the Pismo platform, we can increase the detail level of the log whenever we need to inspect specific events or transactions. Hence a single transaction may generate hundreds of lines of log data. The extra details help us quickly diagnose bugs and unexpected system behaviours. However, they also increase the theoretical risk of sensitive data leaking through the logging procedure. These leaks shouldn’t happen, of course. Our program codes go through several validation stages, and at least two engineers revise them in addition to their authors. Our team strives to make sure all of our APIs comply with privacy guidelines. Even so, a buggy API could potentially cause this failure. To avoid this risk and identify bugs that could compromise privacy, we use Google Cloud DLP.
Google Cloud DLP
DLP, a managed service offered by Google as an API, is designed to help discover, classify, and protect sensitive data. We created a worker software that submits new log files to DLP for inspection. DLP looks for character strings that could potentially violate privacy laws, such as the GDPR and the Brazilian LGPD. It searches, for instance, for sequences with the appearance of a name, address or phone number (passwords and other more critical data cannot leak this way because they are always encrypted). When DLP finds a suspect string, it masks the characters so that they cannot be read. Our tests have shown that DLP is quite effective in detecting possible data leaks in log data. It does generate some false positives, but this is a minor problem we can deal with efficiently. So it’s a handy tool to help us improve privacy on our platform.
DLP classifies the masked string according to the level of suspicion. We use the data it generates for two purposes. First, it goes to a dashboard where we can monitor our systems. If an API generates suspect log data, the development team is warned and can correct any possible bugs. We also use DLP data to calculate one of our service quality scores. Seeing this score improve over time gives us confidence that we are minimising software bugs and thus preserving our customers’ privacy.