Schema Drift
Schema drift is any unplanned change to data structure that alters downstream system behavior. In analytical systems, schema drift often propagates silently, causing semantic failures long after pipelines report success.
Common failure signals
- New, removed, or renamed columns that are not reflected downstream
- Type changes (string <-> number, timestamp precision changes)
- Nested structure changes (JSON shape drift)
- Unexpected nulls introduced by upstream changes
Often confused with
- Breaking changes that fail a job immediately (schema drift often does not)
- Data quality issues (schema drift is structural; quality is value correctness)
- Contract violations (schema drift may cause them, but not always)
Where it shows up in Analytical Reliability
- Data Movement Reliability: ingestion/transform steps keep running while structure changes
- Semantic Reliability: relationships and measures now reference different structures
- Change Reliability: "safe" upstream changes introduce drift without downstream updates