Most multi-language projects settle for behavioral equivalence: run the same test against two implementations, check that the outputs match, ship. This works — but it has a quiet weakness. Behavioral equivalence is defined by the test author, and the test author can be wrong. Verdicts can match when implementations share the same bug. And the moment you change one implementation, you have to remember to update all three with no mechanical way to verify alignment.
We wanted something stronger: a verifiable structural witness, not just a behavioral test.
The Canonical Hash
The insight that unlocked the design: if two implementations produce the same canonical gate-vector bytes for the same fixture, they have performed the same structural evaluation for that declared domain — regardless of language.
We defined a canonical output format called the gate vector:
gate_id:bool|gate_id:bool|...
Sorted lexicographically by gate ID, lowercase boolean, joined by |. A fully-passing WCGD admission produces:
F01:true|F02:true|F03:true|F04:true|F05:true|F06:true|F07:true|F08:true|F09:true|F10:true|F11:true|F12:true|FAP:true
Then we hash it: SHA256 of the UTF-8 encoding of that string, first 16 hex characters. That's the output hash.
The output hash for a fully-passing WCGD fixture is 50652a5823a2c420. In Python, in Swift, in Rust. Same bytes. Getting there was not simple.
What the Hash Actually Witnesses
A matching hash is not magic. It is a compact witness that several structural decisions aligned at once:
Gate identity: Both implementations resolved the same gate IDs. A different ID produces a different vector string.
Field mapping: Both implementations read the same fixture field for each gate. A wrong field mapping produces a different boolean, different string, different hash.
Sort order: Both sorted gates lexicographically. A different order produces a different string.
Boolean representation: Both used lowercase true/false. Python's str(True) produces "True" — capital T. That was one of our early bugs: Python and Swift diverged because of one character. Fixing it required adding .lower() to Python serialization and codifying the requirement as CR-LAW-04 — canonical bool representation is lowercase.
Once all four levels align, the hash is forced to be identical. The definition leaves nothing to interpretation. The bytes either match or they don't.
The Registry Is the Law
Every gate has a canonical morphism ID, defined in a single JSON file:
{
"morphism_id": "morph.wcgd.f07.numeric_tolerance",
"gate_id": "F07",
"fixture_field": "numeric_tolerance",
"morphism_type": "DECLARE"
}
No implementation is allowed to define gate identities locally. If Swift uses makai.gate.f07 instead of morph.wcgd.f07.numeric_tolerance, that is a compliance violation — not a naming preference — and it surfaces immediately as a hash mismatch. This happened exactly once during development. The registry caught the bug automatically.
The three languages bind to the registry in three different ways:
Python loads it at runtime as a dict keyed on canonical morphism IDs.
Swift encodes it as a static array in PathBEvaluator.expectedGates. In the current implementation, convergence tests enforce that it stays synchronized with the canonical JSON.
Rust takes the most structurally robust approach: the registry is compiled into const arrays at build time via build.rs:
// build.rs — runs during cargo build
let registry_path = PathBuf::from(&manifest_dir)
.join("../../compiler_registry/morphism_registry.json");
println!("cargo:rerun-if-changed={}", registry_path.display());
// reads JSON, generates:
// pub const WCGD_MORPHISMS: &[(&str, &str, &str, &str)] = &[
// ("morph.wcgd.f01.spec_identity", "F01", "spec_id_stable", "VERIFY"),
// ...
// ];
If morphism_registry.json changes and the field names no longer match the fixture structs, cargo build fails — before any test runs, before any binary ships. You cannot accidentally ship a Rust binary that disagrees with the registry.
The Convergence Matrix
We track cross-language convergence in a JSON file. Each entry records whether two implementations produce identical output hashes across all fixture classes. The snippet below shows three of the five fixture classes:
{
"path_a_lang": "python",
"path_b_lang": "rust",
"status": "ADMITTED",
"fixture_results": {
"wcgd_admitted": { "convergence": true, "hash_a": "50652a5823a2c420", "hash_b": "50652a5823a2c420" },
"wcgd_refused": { "convergence": true, "hash_a": "4cd3fd807d0169bb", "hash_b": "4cd3fd807d0169bb" },
"sc_admitted": { "convergence": true, "hash_a": "3e4aa232097f1392", "hash_b": "3e4aa232097f1392" }
}
}
The matrix covers all six language pairs (Python×Python, Swift×Swift, Rust×Rust, Python×Swift, Python×Rust, Swift×Rust) and all five fixture classes (WCGD admitted, refused, gap, and Sensor Commissioning admitted and refused). That is thirty independently checkable convergence entries.
All thirty: ADMITTED.
The matrix is not a test suite. It is a structural record. Each entry is independently verifiable: anyone with the three CLI binaries can run the same fixture and compare hashes. The hashes are the canonical output of the system, defined precisely enough that there is no room for disagreement.
MW-13: The Compiler Never Throws
One law applies to all three implementations equally: MW-13 — MakaiCompile never throws to its caller. Every evaluation always produces a complete verdict, even when something goes wrong. An unregistered morphism ID doesn't crash — it emits a GateResult with passed: false and refusal_code: "MORPHISM_ID_UNREGISTERED". The caller always gets a verdict. This matters for production systems where you want compile results to be inspectable even when the registry is partially invalid.
Behavioral Parity vs. Structural Admission
Behavioral parity — the weaker claim — means two implementations produce the same verdict on the same inputs. It can be achieved by coincidence, by sharing underlying code, or by writing tests that happen to cover the cases where implementations agree. It breaks silently when one is updated.
Language-independent structural admission — the stronger claim — means each implementation independently evaluates a shared morphism registry, and the results are witnessable as identical because the canonical form is defined to be unambiguous. If the hashes match across all fixture classes, the implementations read the same morphisms, mapped to the same gates, applied the same fixture fields, and serialized in the same canonical order.
The system has moved from "these things behave the same" to "these things are structurally the same computation, verified independently by three language runtimes and three SHA2 implementations."
That is the milestone.
What's Next
Live sensor data. Sensor Commissioning is currently running against synthetic fixtures. The next phase connects it to real sensor readings, where gate values come from actual commissioning events.
Postgres as the live registry source. Right now the JSON is authoritative and Postgres mirrors it. Eventually the direction reverses: the database is the source of truth, the JSON is generated from it. The compiler registry becomes live, auditable, database-backed.
More domains. WCGD Phase F and Sensor Commissioning are the first two structural domains. The morphism registry is designed to be extended — new domains add entries, never mutate existing ones. The convergence infrastructure is ready for them.
MakaiCompile is part of the MakaiWay structural information system.