Moderator

3 months ago

When Data Becomes the Mission

Why state and local government, cities, and research universities are reorganizing infrastructure around data itself

If you remember one thing from this article: infrastructure used to organize around applications.
Increasingly, now it organizes around data. If you spend enough time around enterprise infrastructure, you start to notice something about how conversations begin.

Someone asks about storage. Not in a philosophical way. In a practical way.

How much capacity do we have left?

What’s the refresh cycle?

Is this staying on premises or moving to cloud?

What’s the backup strategy?

For years, that framing made perfect sense. Infrastructure was the foundation, and the job of infrastructure teams was to keep the lights on and the foundation solid.

But lately, in conversations with customers across state and local government, municipalities, cities, and universities, something feels different.

Because eventually someone says something like this:

“We have this data… but we can’t actually use it.”

And that is when the real conversation begins.

Why the public sector reveals the truth about data

There’s a perspective I heard recently that stuck with me.

The public sector isn’t a niche market.

It’s a microcosm of the entire enterprise technology world.

At first that sounds counterintuitive. The stereotype is that government IT has been quietly living under a rock since the previous century, next to a beige server and a stack of COBOL manuals.

But if you look closely, the opposite is true.

State agencies, cities, and research institutions operate in environments that combine nearly every architectural challenge the private sector faces — all at once.

Massive datasets
Highly distributed users
Strict security requirements
Long retention policies
Global collaboration

And an absolute requirement that systems remain available when people need them most.

In other words, the public sector experiences the full spectrum of data challenges simultaneously.

If you want to stress-test a data architecture, put it inside government.

Think about it.

A state government may run thousands of systems across dozens of agencies, each serving different missions but increasingly sharing the same underlying data.

A city manages infrastructure at the physical edge of society — traffic, water, SCADA, emergency services — where real-time decisions depend on accurate information.

Universities generate some of the largest research datasets on earth while collaborating across institutions and countries.

Each of these environments demands something slightly different from infrastructure.

But they all demand the same thing from data:

Security.

Integrity.

Mobility.

Context.

Availability.

And when those requirements collide in one environment, something interesting happens.

The solutions that work there tend to work everywhere.

A laboratory for the modern data enterprise

This is why many technology leaders quietly view the public sector as something more than a vertical market.

It’s a laboratory for enterprise-scale data architecture.

If a platform can operate in a world where:

sensitive personal data must remain protected

• systems span thousands of locations

• regulatory oversight is constant

• and uptime has real public consequences

…then that architecture will almost certainly succeed in commercial environments.

Banks, manufacturers, healthcare providers, and global enterprises face the same challenges.

Just rarely all at once.

Government simply compresses those problems into a single environment.

Solve the data problem for government, and you solve it for the enterprise.

That’s one reason the shift toward data-centric platforms is becoming so important.

When organizations treat infrastructure as a place to store files, they solve only a small part of the problem.

But when they treat data as the central operational asset — something that must be understood, governed, protected, and made usable across environments — the architecture begins to look very different.

And the public sector, with all its complexity, becomes the place where those architectures are tested first.

Which brings us back to the shift we’re seeing across the industry.

Because once you start looking at infrastructure through the lens of data itself, something else becomes obvious.

The center of gravity has moved.

When multiple systems depend on the same dataset, the data becomes part of the operating foundation.

And once that happens, moving it — or even restructuring it — becomes dramatically harder.

Which brings us to the concept that explains a lot of what is happening right now.

The quiet physics of data gravity

The first time I heard the term “data gravity” wasn’t in a conference keynote or a vendor presentation.

It was in 2015, when a recruiter from a startup called DataGravity (now Anomalo) reached out and asked if I would be interested in interviewing.

At the time, the idea sounded fascinating — and slightly theoretical. The company was built around the premise that data itself was becoming the most valuable asset in the data center, and that infrastructure needed to understand the content, context, and behavior of data, not just store it. The name alone hinted at something deeper: the idea that as datasets grow, they start exerting a kind of gravitational pull on the systems around them.

Back then, it felt like an interesting concept.

Today it feels like a description of reality.

The term “data gravity” itself was introduced by Dave McCrory back in 2010, and it turns out to be a remarkably accurate way to describe modern infrastructure. Dave McCrory Blog

The idea is simple.

As datasets grow, they become harder to move. More applications depend on them. More workflows connect to them. More policies govern them.

Eventually, the architecture starts organizing around the data itself.

Not because someone designed it that way.

Because the physics of large systems leave you very little choice.

Imagine trying to relocate a state Medicaid dataset that has been integrated with multiple benefit programs, identity verification systems, and fraud detection tools.

Technically possible? Sure.

Operationally trivial? Not even close.

The larger and more interconnected the dataset becomes, the stronger its gravitational pull.

Compute moves closer to the data.

Applications move closer to the data.

Infrastructure reorganizes around the data.

This is why organizations that once talked primarily about storage capacity are now talking about data platforms.

The center of gravity moved.

When data stops being passive

The moment data becomes operational, everything changes.

For years, most organizations treated data as something that accumulated quietly inside systems. Applications produced it. Storage kept it safe. Backups made sure it could be restored.

But that model starts to break down when the data itself becomes part of real-time decision making.

You can see this most clearly in environments that generate enormous volumes of information.

Cities now run infrastructure that continuously streams telemetry — traffic sensors, utility meters, environmental monitors, emergency response platforms. A water meter that once reported usage once a month might now generate thousands of readings per year. A traffic system that once relied on static timing can adapt dynamically to real-time conditions.

Each improvement creates more data.

More importantly, it creates operational dependence on that data.

Universities experience the same phenomenon in a different form. Research environments produce extraordinary datasets across genomics, climate science, and artificial intelligence. Sequencing a single human genome generates roughly 100 gigabytes of raw data, and large research programs may create terabytes or petabytes of new information every week.

In those environments the challenge isn’t just storing data.

It’s feeding it fast enough to the systems that depend on it.

Modern research clusters and GPU environments can process enormous volumes of information, but only if the underlying data pipeline keeps up. When storage cannot deliver data fast enough, expensive compute resources sit idle and discovery slows down.

And that reveals an important truth about modern infrastructure.

When systems depend on data in real time, the question stops being where the infrastructure lives.
The question becomes whether the data is available, trustworthy, and recoverable.

That distinction also explains why ransomware has become so disruptive to public institutions.

Attackers understand that the real leverage is not the servers or the network.

It’s the data.

When access to data disappears, the services built on top of it disappear as well.

Which brings us back to the deeper shift happening across the industry.

If data has become this central to operations, services, and discovery, then managing it as a passive byproduct of infrastructure is no longer enough.

Infrastructure alone is no longer the strategic layer.

The strategic layer is the data itself.

Organizations still need performance, availability, and resilience. Those fundamentals have not changed.

What has changed is the expectation that infrastructure should also help organizations understand, govern, protect, and use their data more effectively.

That is a very different problem than simply storing it.

And it is the reason the conversation is evolving from storage management to data management platforms.