Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer or any other entities with which I am affiliated.

Photo by Deng Xiang on Unsplash

I’ve intentionally made all of my posts free and without a paywall so that my content is more accessible. If you enjoy my content and would like to support me, please consider buying a paid subscription:

Support me with a paid subscription


While deciding what to write this week, I came across a LinkedIn post by Mike Privette summarizing cybersecurity fundings in his newsletter, Return on Security (which I recommend checking out). One topic stood out: data security.

It struck me that, outside of Cyera and perhaps Securiti a few years ago, there haven’t been many compelling data security startups lately. Varonis and BigID still feel like the last major entrants. I don’t blame founders — this has been a difficult space historically, both in terms of fundraising and exits. A big reason for that is technical: data sensitivity and risk are highly dependent on organizational context, which makes building effective products hard.

Yet the data landscape has changed dramatically. The rise of the modern data stack with companies like dbt Labs, Snowflake, Fivetran, and Databricks has reshaped how data is stored and analyzed. The AI boom has only intensified the importance of data as a core business asset. In light of this, I believe data security is overdue for a rethink.

Historically, data security companies have fallen into two categories: data loss prevention (DLP) and data classification. With the move to cloud and remote work, DLP has shifted into more of a network security challenge. Companies like Zscaler and Cloudflare have stepped in here, focusing on how data flows through networks and SaaS applications. It used to be simpler when data stayed in one place — security teams could label it and monitor its flow through a well-defined perimeter. Today, data flows from everywhere, especially corporate laptops and SaaS products.

Data classification has always been difficult, largely because it suffers from false positives. BigID’s insight was that some automation is better than manual labeling, even if imperfect. That’s true to an extent, but classification remains hard because context matters so much. What counts as sensitive data varies by use case, i.e., different organizations might consider the same type of data, e.g., personal information, to have different levels of risk. Of course, different types of data also carry different risks, and the ways organizations handle them vary accordingly. If your tool can’t reliably identify your most sensitive data, it loses a lot of its value. That said, I’m optimistic that AI will help here, because it’s better than humans at parsing context.

More recently, a new category has emerged (or maybe been rebranded) called data security posture management (DSPM). I actually don’t think DSPM is a bad idea. Ever since CSPM tools like Wiz took off, my perspective has shifted. When a new technical domain gains traction (the cloud in CSPM’s case), posture tools that assess risk in that domain can be valuable early on. The problem is that they hit a ceiling. Dashboards are easy to replicate, and posture tools often lack a defensible moat. That’s why Wiz has since repositioned as a CNAPP and expanded into detection and response — surfacing risk alone isn’t enough.

While data isn’t new, the way we use and manage it has changed. It’s decentralized. It moves faster. AI has made it more valuable. In particular, fine-tuning large language models using organizational data is now seen as a competitive advantage. This has turned even seemingly mundane datasets into critical business assets.

Companies also make more data-driven decisions today, which has elevated the importance of analytics. This is a shift that security hasn’t fully caught up with. Understanding how and why data is used and not just where it lives is increasingly necessary for managing risk and delivering business value.

Traditionally, most enterprise data was operational, e.g., user info, transactions, inventory, and payments. This is the data that powers applications in real time and typically lives in SQL databases. But more recently, analytical data has gained prominence. It’s not needed in real time, but it helps leaders understand how their business works. It lives in different tools, serves different users, and is stored in a different stack, powered by tools like Snowflake and Databricks. Analytics also sits at the heart of modern AI workflows.

This distinction matters. First, operational data doesn’t always have analytical value, but sensitive data often does. Second, most security programs are still optimized to protect operational data. Third, the increased use of analytics forces us to reconsider how we evaluate data risk. Finally, analytics can actually improve privacy, for instance, through aggregation techniques that reduce the exposure of individual records.

All of this makes data security more complex. Our tools and approaches need to evolve.

So what might that look like? Personally, I think Cyera is headed in the right direction. (For transparency, I’ve only seen a demo, not used the product.) I like that they’re starting with risk profiling and bringing in people with context to help classify data. My guess is they’re using AI behind the scenes to eventually automate much of this. Interestingly, this mirrors how LLMs work. We train them with human input, then let them scale. Security is starting to learn from adjacent fields.

But Cyera and other DSPMs can’t stop at classification. As we saw with CSPMs, posture tools eventually need to enable action. That’s the real opportunity. I haven’t yet seen a compelling product that enables effective remediation of data risk. It’s a hard challenge both technically and organizationally.

The edge cases are easy. If data isn’t sensitive, use it freely. If it’s extremely sensitive with no analytical value, don’t use it. The real difficulty lies in data that is both sensitive and valuable. What is the right strategy there? How does that compare to data with medium sensitivity but high utility? These are not just technical questions. They involve negotiation across teams with different incentives.

Access control is part of the answer, but not the whole solution. The goal is least privilege access that doesn’t block legitimate use. That means teasing apart specific use cases, which takes work.

One way to better manage this complexity is to introduce a simple risk framework that categorizes data into usage-based buckets. For example, some data should never be used for analytics under any circumstances. Other data can be used freely. A third category might allow use in an aggregate form only. These kinds of boundaries can help organizations apply consistent policies while still enabling value, and they allow security to know how to recognize and remediate data security issues. More importantly, they provide a starting point for tooling and products to help enforce them.

Even figuring out which data has analytical value is tough and varies across organizations. Risk tolerance also differs. How do you build a product that adapts to this variability? What does it even mean to “respond” to a data risk?

Right now, security doesn’t have great answers. There are very few tools designed to assess risk inside platforms like Snowflake or Databricks. Security teams don’t always understand how those systems are used or where the data goes. Just look at the recent Snowflake security issues.

In some cases, making progress might mean rethinking data models or abstracting parts of them. This could require application changes or clever design work. Either way, there are significant technical tradeoffs. Security buyers may not be ready for this. More technical teams might be.

Honestly, I have more questions than answers. But that’s also what makes this space exciting. There’s real room to build something impactful. Right now, most of this work is done manually by engineers. That makes it expensive and slow.

If someone builds a product with a strong, opinionated solution that actually works, the value could be enormous.

Leave a Reply

Sign Up for TheVCDaily

The best news in VC, delivered every day!