WhyDidItFail Base concepts

Reconciliation Failure

Reconciliation failure is when two systems that should agree do not. It is a signal that data was lost, duplicated, or transformed incorrectly between stages--even when pipelines report success.

Common failure signals

  • Row counts or totals differ between source and destination
  • Aggregates reconcile at a high level but diverge by segment (region, product, customer)
  • Late-arriving data causes "eventual" reconciliation that never fully converges
  • Backfills correct one window but introduce drift in another

Often confused with

  • Expected timing differences (freshness lag) -- reconciliation failure persists beyond normal lag
  • Metric definition drift -- reconciliation compares like-for-like; drift changes "what is being measured"
  • Sampling artifacts -- reconciliation checks should use consistent logic and scope

Where it shows up in Analytical Reliability

  • Data Movement Reliability: gaps between stages (ingest -- transform -- serve) reveal silent loss/duplication
  • Semantic Reliability: models compute correctly over data that no longer matches upstream truth
  • Change Reliability: "safe" refactors or mapping edits break parity between systems

Related concepts