How software can lead to improved data privacy (no, really!), and why we are excited about it
Anagh Prasad & Arpit Maheswhari, Jul’21
Over 3 decades since Tim Berners Lee pioneered the world wide web, the internet has changed the way we socialize, work, shop, learn, get entertained, & so on, at an unprecedented rate. At the heart of much of the internet’s power lies data – data that billions of users are generating all the time by leaving fingerprints on browsers & apps (the web never forgets), interacting with online products/services/ads (read cookies), or directly generating content themselves (Facebook et.al.).
While this data can make life easier and more convenient for consumers, it often comes with hidden costs that are not easily apparent. People are now questioning who really owns this data, & how much sharing is too much sharing. Sample this: 79% people feel concerned about how their data is used, & 81% believe that potential risks outweigh the benefits of using personal data for personalization (Pew survey, US adults, 2019).
Policymakers have tried to move in tandem with public expectations. Since the EU’s roll-out of GDPR in May 2018 – a watershed moment in personal data privacy – 60+ jurisdictions around the world have enacted or proposed postmodern data privacy laws. CCPA has been in place since 2020, and several US states – at least 15 at last count – are expected to follow suit with their own legislations soon. Gartner estimates that 65% of the world’s population will have their personal information covered under postmodern protective laws by 2023. The presence of regulations with serious consequences for non-compliance / breaches – either monetary or reputational – means privacy will be one of the dominant themes of the Internet’s next decade.
101: Personally Identifiable Information
At the core of issues and regulations around privacy is the concept of Personally Identifiable Information (PII). PII, in a broad sense, refers to data that could potentially be used to identify a particular individual employing reasonable means; it falls into three categories:
- Personal Information: This is information that leads to a direct identification of the end user e.g. name, govt IDs (SSN, Aadhar, Driving License no, etc), phone number, bank account number, email address, and so on. Personal Information is typically the day-0 agenda item for any discussion around privacy, as it is considered the most sensitive and in turn, needing priority protection.
- Browsing behaviour: A key component of the Web are cookies – small files that help browsers remember a variety of state information e.g. usernames and passwords, shopping cart items, browsing history and so on. Cookies are the reason why you feel there’s a bunch of sneakers following you on your online journey after visiting a footwear website. Much of online advertising relies on third party cookies – these are generated by entities different from the domain a user visits e.g. Google issuing its own cookie when you are shopping from a sneakers website – that aggregate user information and allow for precise targeting. Cookies have been an important focus area for privacy legislation & proactive moves by big-tech. “Privacy-first” browsers like Safari, Firefox, & Brave (together accounting for about 25% of browser market share) have already disabled 3rd party cookies by default; Apple has banned the use of 3rd party trackers for all iOS apps, and competitors like Google might need to follow suit soon.
- User Generated Content (UGC): Digital platforms like Facebook, Twitter, TikTok, etc. derive bulk of their content from their own users. Any online activity by a user, be it creating/posting new content, re-sharing content by others, liking or swiping on a certain piece of content can be classified under UGC. There is an increasing expectation for platforms to guarantee ownership of UGC to its true owners, i.e., users; for e.g., by being transparent around the usage/sharing of UGC, or providing options to opt-out and/or delete data collected in the past.
What is The Status Quo re Policy?
Much of initial policy interventions around the world, pioneered by GDPR & then CCPA, revolve around collecting, processing & securing PII. While the specifics differ, most legal frameworks globally have the following guiding principles when it comes to PII:
- No entity should be able to collect or process PII without explicit user consent.
- There should be transparency around what personal information an entity stores, how it processes it, & who it shares it with.
- Users should have an easy way to revoke consent & also the right to delete any PII that the transacting entity or its partners might have stored in the past.
- Whenever personal data is being used for a business purpose, records of the timing, nature, & purpose of use should be well documented & preserved.
- Entities collecting PII should anonymize the data – i.e. replace by tokens non-traceable to any user’s identity – wherever possible and must take accountability for protecting PII from third party attacks.
Where Can There be Problems?
First, it is important to realize that data security is a necessary but insufficient condition for compliance with privacy regulations. Security is about preventing unauthorized access to data, whereas privacy is about the collection and usage of data. An organization may be committing costly privacy breaches while being fully in control of the data it collects e.g. securely collecting data one was not supposed to collect in the first place; this is the reason why data privacy is different – though closely linked – to data security.
Consider the following examples of privacy breaches for each type of data that we described earlier.
- Unencrypted personal information is collected, or is stored for longer than needed.
- Third-party cookies collect and transmit data about user behaviour, without informed consent. The majority of cookies are third-party and owners / businesses themselves are often unaware of the number of cookies being deployed by their websites. Unless disabled by the browser or user, an average website today deploys more than 7 3rd party trackers, a number that in some cases runs into hundreds.
- Inadvertent use of PII from UGC for use cases where consent was not provided. As an example, Anheuser-Busch was sued by Kayla Kraft after they used a photograph she had uploaded as part of a contest in their advertising material.
What’s the Software Opportunity?
The proliferation of digital data, increasing complexity of running businesses across multiple geographies with varying privacy laws, and generally understaffed privacy / compliance teams means software will be needed by businesses to ensure privacy compliance.
Even though Privacy-SaaS is a market in the infancy stage, it has already produced a slew of promising startups: OneTrust (end-to-end workflow management), BigID (scanning/identifying sensitive data), Privacera (data access control), Transcend (consent management), Securiti (predicting/managing data risks), to name a few. OneTrust, the emerging leader across multiple sub-categories, is already a $5B+ company built in less than 5 years.
We believe the last 4-5 years constitute the first wave of Privacy-SaaS driven by enterprises’ need to keep up with the rapidly changing regulatory landscape. The category’s leading tools of today help their customers navigate regulations by (a) tackling privacy debt in their systems, or (b) building efficient workflows for handling user requests (read DSRs). This wave by itself has a long way to go with tens of billions of dollars in upside yet to be unlocked, driven by evolving data privacy regulations (65% of world’s population is expected to be covered by data privacy regulations in 2023 vs 10% as of 2020).
At the same time, we believe that a second wave for Privacy-SaaS is also springing up. This wave is characterized by a proactive approach in the way businesses collect, organize, store, process, and sell/export user data. It will evolve in sync with more businesses realizing that data privacy isn’t merely a compliance topic, but a key feature that users will use to judge them against competitors. The next decade’s most popular platforms and tools will be privacy-first by design, and multiple large businesses will be created to enable that shift.
Themes we are Excited About
At Stellaris, we are looking to explore investments in (a) startups riding on the first wave’s proliferation with a differentiated offering, as well as (b) early-movers from the second wave.
In the near term, we are particularly enthused about the following three opportunities:
1. Simplified Privacy-SaaS for SMBs:
SMBs need Privacy-SaaS that are designed for them: integrated, flexible, and self-serve to the extent possible. The opportunity is particularly more pronounced for the US market today, as it has lagged the EU by a couple of years in regulations and most US SMBs are in very early stages of comprehending the rapidly evolving expectations around data privacy. As with other software categories, we expect SMBs to want all-in-one solutions as opposed to best-of-breed for specific problems, and we expect this category to be amenable for inside sales as additional clarity emerges around privacy regulations.
2. “Intelligent” Privacy-SaaS for enterprises:
Enterprise data infrastructure of today is characterized by high complexity and several degrees of internal as well as external dependencies. Automating privacy workflows on top of this complexity is hard and hence a potential domain for specialized AI-led / RPA solutions to come up. Building such solutions might need a new player to focus on a narrow problem to start with, and later expand the offerings. Example problems could be pattern recognition for discovering personal data, predictive risk assessment in a development environment, automated privacy audits of external integrations, etc.
3. Privacy tools for application developers:
As privacy takes centrestage, the responsibility to develop privacy-first applications will expand beyond data privacy officers / compliance teams to encompass developers and product managers. Ultimately, this space will see a left shift akin to security. We are excited about companies building tools (widgets, kits, APIs etc.) that make it easier for developers to build privacy-first software: e.g., tools for personal data anonymization / obfuscation, integrating data access controls, storing consent traces, and so on.