How We Built It
The technical deep dive. For developers, IT folks, and the curious.
Tech Stack
Frontend
Astro + React
Static-first with React islands for interactivity. SEO-friendly HTML output.
Hosting
Static Site
No backend needed. Simple, fast, scalable.
Styling
CSS with CSS Variables
No frameworks. Just clean CSS with a design system.
Why Astro?
We migrated from a React SPA to Astro for better SEO. Here's what changed:
Our React SPA served the same empty HTML shell for every URL. Google couldn't see unique titles, descriptions, or content - just JavaScript that needed to run first.
Astro pre-renders thousands of static HTML pages with full SEO meta tags. React components only hydrate where interactivity is needed (tabs, quizzes, progress tracking).
Each generated page has unique titles, descriptions, Open Graph tags, Twitter cards, and canonical URLs. Secondary and O-Level pages now get the same SEO treatment as Primary pages.
Architecture Diagram
All content flows through numbered folders. The site reads generated markdown through static symlinks, while React islands provide tabs, source toggles, and model-labelled quiz/paper variants.
Raw Sources
MOE syllabus PDFs and collected exam-paper PDFs.
1-0-* Markdown OCR
Convert PDFs to markdown; normalize bundled paper files without changing original OCR output.
1-1-* Understanding Layer
Interpret syllabuses, extract templates, and enrich syllabus notes with exam evidence.
2-0 / 2-1 / 2-2 Exam-Derived Track
Generate topic quizzes and papers from real exam-derived templates when evidence passes.
3-0 / 3-1 LLM Template Track
Infer missing question-bank templates from syllabus and adjacent exam evidence.
4-0 Pure AI Track
Generate syllabus-first topic quizzes and practice papers from Stage 4 templates.
5-1 / 5-2 Parallel Model Runners
From 1 June 2026, Sec/O/A quiz and paper variants run through DeepSeek V4 Pro, Gemma 4 31B, and Qwen3.6 Plus with separate suffixes.
_deepseek / _gemma / _qwen Quality Verification
Check level fit, exam style, marks, timing, answers, and syllabus alignment.
6-0 Benchmarking
Score generated cheatsheets, parent guides, quizzes, and papers by model, then materialize level views grouped by LLM and subject.
9-0 / 9-1 Astro Website
Static pages plus React islands load manifests and display source/model labels.
tuitiongowhere-astro Current Sec/O/A generation contract: each model generates 20 quiz questions per topic, 5 exam-derived papers per assessment where evidence exists, and 5 pure AI practice papers.
The Pipeline
Stages 3 and 5 provide dual-source choice where both exam-derived and LLM-generated material exist.
Collection
Gather raw materials: 202 MOE syllabus PDFs and 10,769 exam papers from 100+ Singapore schools. Convert everything to markdown via OCR.
Analysis
LLM interprets syllabuses into per-level files. Extract 850+ question templates from exam papers. Enrich syllabuses with exam insights.
๐ Exam-Derived
Generate quizzes and exams using templates extracted from real papers. Used for P3-P6 and Sec/O/A groups with sufficient evidence.
Available: Evidence-gated๐ค LLM-Inferred
LLM generates templates from syllabus. Covers P1-P2 (no exam data) and supplements P3-P6/Sec/O/A with fresh variations.
Available: All levelsQuality Verification
Review generated quizzes and papers for age suitability, past-exam style, mark difficulty, hidden timing fit, answer-key consistency, and syllabus alignment.
Reports: Stage 6Model Benchmark
Gemma 4 26B A4B evaluates generated cheatsheets, parent guides, quizzes, and papers with 0.0-10.0 scores for syllabus alignment, answer quality, notation, paper format, timing, difficulty, and missing-image flags. Stage 9-1 derives per-level benchmark views grouped by LLM and subject, with a review area for scores below 8.0.
Dashboard: /benchmark and /benchmark/<level>Syllabus Processing
202 MOE syllabus PDFs converted to structured markdown, then split into 128 per-level files.
Each level gets exactly what they need. Math syllabus covers P1-P6, but we split it into 6 separate files for easier consumption.
Science is not taught in P1-P2. Our pipeline respects this - no science content for those levels.
Exam Paper Analysis
We analyzed exam papers from Singapore schools to understand real assessment patterns.
We extract question patterns, not questions. Templates teach AI to generate unlimited fresh practice - no recycled content.
Some secondary PDFs bundle many schools into one file. We keep the original OCR intact, then create per-school files beside the original bundle markdown: 83 bundles normalized into 844 per-school/front-matter markdown files. Page validation matched exactly: 29,120 source pages and 29,120 split pages.
Some compiled PDFs place marking schemes much later than the question paper. The split groups delayed answer keys back into the same school file instead of blindly cutting only contiguous ranges.
Generated Content
From the pipeline, we generate textbooks, cheatsheets, parent guides, quizzes, and exam papers.
Total: 2,900+ generated markdown files excluding audio, with Secondary/O-Level/A-Level content rolling out through the same pipeline.
Current Cost
We have paid around $200 out of pocket so far for AI subscriptions and model/API credits used to build the site.
This is a rounded running cost for OCR, extraction, generation, and experimentation. It does not include unpaid engineering time, and it will change as more verification and audio work runs.
AI Models Used
Used for primary content and earlier quiz/paper generation where quality mattered more than cost.
From 1 June 2026, Secondary, O-Level, and A-Level quizzes and papers are generated in parallel with these models to reduce cost and preserve model provenance on the website. In practice, different LLMs run at different speeds, and their effective handling of long syllabus and exam-paper context can differ even when the prompt contract is the same.
Fast, cost-effective for bulk processing. Good at recognizing question structure.
Geometry proofs, physics problems, multi-step calculations. When you need to show your working.
Singapore English (en-SG) and Mandarin Chinese (zh-CN) voices. Custom pitch and rate settings per character for natural-sounding dialogue.
Lessons Learned
Match Model to Task
Don't use one AI for everything. Mistral for English OCR, Gemini for Chinese, Claude for early generation, and DeepSeek V4 Pro plus Gemma 4 31B plus Qwen3.6 Plus for lower-cost quiz/paper generation.
Model Speed Is Not Uniform
Parallel generation does not finish evenly. Some models return quickly while others take much longer, and long input context can affect quality, formatting, or retry rates differently per model.
Understand Local Context
Singapore education has unique features: no P1-P2 exams, Science starts at P3, specific assessment types. Respect these constraints.
Templates Beat Examples
Extract patterns, not just content. A good template generates infinite practice questions. Raw examples are one-and-done.
Normalize Bundles Before Analysis
Some sources package many schools into one PDF. Split those bundles by school after OCR, and keep answer keys with the matching paper before extracting templates.
Static for SEO
SPAs are invisible to search engines. Pre-render HTML with Astro, use React islands only where interactivity is needed.
Open Source
We're considering open sourcing the tools and processes used to build this. Stay tuned.
Questions?
Curious about the technical details? Want to contribute? Get in touch.
Email Us