Silent Data Corruption
Silent data corruption is incorrect data that passes through pipelines and validation without triggering failures. It is especially damaging because it produces plausible outputs that are trusted until discrepancies surface downstream.
Common failure signals
- Plausible but shifted distributions (e.g., sudden step-changes)
- Duplicates or missing records without job errors
- Incorrect joins or mapping tables producing valid-looking results
- Unit/scale changes (cents <-> dollars) that do not violate schema
Often confused with
- Schema drift (structure changes; corruption is value correctness)
- Outliers (corruption can look like an outlier pattern but is systematic)
- Data quality in general (corruption is the "passed checks but wrong" subset)
Where it shows up in Analytical Reliability
- Data Movement Reliability: transformations introduce subtle logic errors
- Semantic Reliability: measures compute correctly over wrong data
- Change Reliability: code/config changes introduce corruption while tests pass