Byzantine-safe math benchmarks with cryptographic proof. Run the Colab notebook, then share or upload results here.
Hash-chained evidence · Patent Pending 63/896,282
Grade-school math (GSM8K) with a 3-model consensus protocol: if ≥2 models agree → release answer; else constitutional halt. Every problem is committed in a SHA-256 hash chain; final hash commits the full run.
notebooks/gsm8k_byzantine_demo.ipynb — run in Google Colab with API keys in Secrets.GSM8K_SAMPLES = 1319 in Config.After running the Colab notebook, save the generated gsm8k_byzantine_run_*.json and optionally upload to Drive or commit to docs/results/. Headline metrics to paste here when you have a run:
Progress Prize 3 · Byzantine consensus submission (v4/v5) with code-verified weighting and constitutional halt.
Strategy and notebooks: AIMO3/ — see AIMO3_SCORE_47_STRATEGY.md and aevion-aimo3-submission-v5-47.ipynb.