ML models don't just "learn patterns". They memorize fragments of training data. Names, addresses, proprietary text — anything can resurface through model outputs.

The model becomes an attack surface.

Once deployed, you can't DELETE FROM weights WHERE data = 'sensitive'. The information is dissolved into billions of parameters. Removing it is an open research problem called machine unlearning.

That's why I run inference locally. Not for performance. Not for cost. Because where the model runs is a data governance decision.

Differential privacy, federated learning, privacy-preserving ML are important techniques. But they are mitigations inside an architecture already decided.

The real question comes earlier: who controls the infrastructure where your data becomes embeddings?

If the answer is "someone else's GPU cluster" — your data governance policy has a gap the size of a model checkpoint.