Why it matters
The leaderboard has no flag from Southeast Asia.
Korea has A.X 4.0 on KMMLU. India has Sarvam-M on MILU. Japan has Swallow on its own leaderboard. Southeast Asia — 678 million people, an AI market projected to reach $80 billion by 2031 — has no entry on any frontier benchmark. Every inference in the region runs on a model trained somewhere else.
Malaysia has committed RM2 billion to sovereign AI infrastructure. SAINS was mandated to deliver DeepSAR, a Bidayuh and Iban language model. No corpus was assembled. No model shipped. The gap is structural, not technical — the only path to this language is through the community that speaks it.
Architecture
How it’s built.
LAMBA-a1 Pro is the frontier tier. LAMBA-a1 Flash is a 27B model shipped as open weights for sovereign, on-premise deployment. Both are built on the same three foundations.
The inference cost of a small model with the capacity of a large one — only three billion of thirty-five billion parameters fire on any given pass. That economics is what makes a sovereign deployment affordable to run, not just to license. Flash takes it further: 27B, open-weight, deployable on local infrastructure with no dependence on a foreign API.
Two strands that spiral upward. Supervised fine-tuning widens the distribution, surfacing latent capability from trillions of tokens of pretraining; reinforcement learning narrows it toward what survives execution — code that runs, maths that verifies. Neither alone is sufficient. Each turn, the model’s own verified reasoning is folded back into the next.
After OpenVLThinker (arXiv:2503.17352) and SASR (ICLR 2026).
For a low-resource language shaped by a small corpus, sycophancy is not a quality problem — it is language corruption. A model that agrees in order to please will invent grammar and bend vocabulary. LAMBA is trained to challenge a false claim in Bidayuh Bau rather than echo it. The language is a responsibility, not a feature.
Corpus
What it learns from.
Forty-one-plus datasets, distilled to roughly 400–600K curated examples after deduplication and quality filtering — thirteen categories spanning reasoning, coding, mathematics, science, Malay, multilingual instruction, tool use, cybersecurity and long-context.
And one corpus that cannot be assembled from outside.
The Bidayuh Bau corpus is built by hand by a native speaker, sourced from the living community, and held closed — never publicly released. Ethnologue lists the language with no signs of digital support. There is no dataset to download and no crawl to scrape. The only path to this data is through the people who speak it.
Benchmarks
The targets.
Projected single-pass scores following the five-turn Double Helix training run, measured against the base model. The methodology will appear in the technical paper.
| Benchmark | Base (Qwen3.6-35B-A3B) | LAMBA-a1 Pro (projected) |
|---|---|---|
| Coding & engineering | ||
| SWE-bench Verified | 73.4% | 82–87% |
| TerminalBench 2.0 | 51.5% | 60–68% |
| Science & reasoning | ||
| GPQA Diamond | 86.0% | 85–90% |
| MATH-500 | 97.0% | 97–98% |
| AIME 2025 | 90–93% | 90–94% |
| Tool use | ||
| MCP-Atlas | 62.8% | 71–77% |
| BFCL v3 | 72–76% | 74–80% |
| tau-bench | 67% | 66–73% |
| Regional — where no frontier model competes | ||
| Malay MMLU | 70–76% | 84–89% |
| Bau-Jagoi | — | World’s first |
Projected single-pass targets following the five-turn Double Helix training run; methodology will appear in the technical paper. Baselines are the Qwen3.6-35B-A3B base model. GPQA Diamond is held at or above its baseline as a training-health guardrail. No model has ever been evaluated on Bau-Jagoi — there is no prior score to beat, and LAMBA-a1 will be the first.
Launch
Three artifacts. Same day.
On launch, the work goes public all at once — a corpus paper, a model paper, and open weights. The frontier Pro tier is offered hosted; LAMBA-a1 Flash ships as open weights for sovereign, on-premise deployment.
Contact
Building toward launch.
Training partnerships, sovereign-deployment pilots, and B2G enquiries — [email protected].