How We Built It

The technical deep dive. For developers, IT folks, and the curious.

Toby the Tortoise

Tech Stack

Frontend

Astro + React

Static-first with React islands for interactivity. SEO-friendly HTML output.

Hosting

Static Site

No backend needed. Simple, fast, scalable.

Styling

CSS with CSS Variables

No frameworks. Just clean CSS with a design system.

Why Astro?

We migrated from a React SPA to Astro for better SEO. Here's what changed:

๐Ÿ”
The Problem: SPAs are invisible to search engines

Our React SPA served the same empty HTML shell for every URL. Google couldn't see unique titles, descriptions, or content - just JavaScript that needed to run first.

๐Ÿ๏ธ
The Solution: Astro's Island Architecture

Astro pre-renders thousands of static HTML pages with full SEO meta tags. React components only hydrate where interactivity is needed (tabs, quizzes, progress tracking).

โœ…
What Google Now Sees

Each generated page has unique titles, descriptions, Open Graph tags, Twitter cards, and canonical URLs. Secondary and O-Level pages now get the same SEO treatment as Primary pages.

4,000+ Static HTML Pages
~10s Build Time
100% SEO Coverage

Architecture Diagram

All content flows through numbered folders. The site reads generated markdown through static symlinks, while React islands provide tabs, source toggles, and model-labelled quiz/paper variants.

1

Raw Sources

MOE syllabus PDFs and collected exam-paper PDFs.

1-0-*
โ†’
1.1

Markdown OCR

Convert PDFs to markdown; normalize bundled paper files without changing original OCR output.

1-1-*
โ†’
2

Understanding Layer

Interpret syllabuses, extract templates, and enrich syllabus notes with exam evidence.

2-0 / 2-1 / 2-2
3

Exam-Derived Track

Generate topic quizzes and papers from real exam-derived templates when evidence passes.

3-0 / 3-1
4

LLM Template Track

Infer missing question-bank templates from syllabus and adjacent exam evidence.

4-0
โ†“
5

Pure AI Track

Generate syllabus-first topic quizzes and practice papers from Stage 4 templates.

5-1 / 5-2

Parallel Model Runners

From 1 June 2026, Sec/O/A quiz and paper variants run through DeepSeek V4 Pro, Gemma 4 31B, and Qwen3.6 Plus with separate suffixes.

_deepseek / _gemma / _qwen
โ†’
6

Quality Verification

Check level fit, exam style, marks, timing, answers, and syllabus alignment.

6-0
โ†’
9

Benchmarking

Score generated cheatsheets, parent guides, quizzes, and papers by model, then materialize level views grouped by LLM and subject.

9-0 / 9-1
โ†’

Astro Website

Static pages plus React islands load manifests and display source/model labels.

tuitiongowhere-astro

Current Sec/O/A generation contract: each model generates 20 quiz questions per topic, 5 exam-derived papers per assessment where evidence exists, and 5 pure AI practice papers.

The Pipeline

Stages 3 and 5 provide dual-source choice where both exam-derived and LLM-generated material exist.

1

Collection

Gather raw materials: 202 MOE syllabus PDFs and 10,769 exam papers from 100+ Singapore schools. Convert everything to markdown via OCR.

โ†’
2

Analysis

LLM interprets syllabuses into per-level files. Extract 850+ question templates from exam papers. Enrich syllabuses with exam insights.

โ†™ โ†˜
3

๐Ÿ“š Exam-Derived

Generate quizzes and exams using templates extracted from real papers. Used for P3-P6 and Sec/O/A groups with sufficient evidence.

Available: Evidence-gated
VS
5

๐Ÿค– LLM-Inferred

LLM generates templates from syllabus. Covers P1-P2 (no exam data) and supplements P3-P6/Sec/O/A with fresh variations.

Available: All levels
โ†’
6

Quality Verification

Review generated quizzes and papers for age suitability, past-exam style, mark difficulty, hidden timing fit, answer-key consistency, and syllabus alignment.

Reports: Stage 6
โ†’
9

Model Benchmark

Gemma 4 26B A4B evaluates generated cheatsheets, parent guides, quizzes, and papers with 0.0-10.0 scores for syllabus alignment, answer quality, notation, paper format, timing, difficulty, and missing-image flags. Stage 9-1 derives per-level benchmark views grouped by LLM and subject, with a review area for scores below 8.0.

Dashboard: /benchmark and /benchmark/<level>

Syllabus Processing

202 MOE syllabus PDFs converted to structured markdown, then split into 128 per-level files.

202 Syllabus PDFs
128 Per-Level Files
P1-A Levels Covered
๐Ÿ“š
Level-specific files

Each level gets exactly what they need. Math syllabus covers P1-P6, but we split it into 6 separate files for easier consumption.

๐Ÿ”ฌ
Science starts at P3

Science is not taught in P1-P2. Our pipeline respects this - no science content for those levels.

Exam Paper Analysis

We analyzed exam papers from Singapore schools to understand real assessment patterns.

~38,000 Questions Analyzed
850+ Question Templates
P3-P6 Exam-Derived Content

We extract question patterns, not questions. Templates teach AI to generate unlimited fresh practice - no recycled content.

๐Ÿ“ฆ
Bundled-source normalization

Some secondary PDFs bundle many schools into one file. We keep the original OCR intact, then create per-school files beside the original bundle markdown: 83 bundles normalized into 844 per-school/front-matter markdown files. Page validation matched exactly: 29,120 source pages and 29,120 split pages.

๐Ÿงพ
Answer keys stay with the school

Some compiled PDFs place marking schemes much later than the question paper. The split groups delayed answer keys back into the same school file instead of blindly cutting only contiguous ranges.

Generated Content

From the pipeline, we generate textbooks, cheatsheets, parent guides, quizzes, and exam papers.

85 Textbooks
85 Cheatsheets
85 Parent Guides
1,117 Quiz Files
1,542 Exam Files
1,384 Audio Transcripts

Total: 2,900+ generated markdown files excluding audio, with Secondary/O-Level/A-Level content rolling out through the same pipeline.

Current Cost

We have paid around $200 out of pocket so far for AI subscriptions and model/API credits used to build the site.

~$200 Paid So Far
AI/API Credits + Subscriptions
$0 Student Price

This is a rounded running cost for OCR, extraction, generation, and experimentation. It does not include unpaid engineering time, and it will change as more verification and audio work runs.

AI Models Used

๐Ÿง 
Claude Sonnet 4 for earlier content generation

Used for primary content and earlier quiz/paper generation where quality mattered more than cost.

โš™๏ธ
DeepSeek V4 Pro + Gemma 4 31B + Qwen3.6 Plus for quiz and paper generation

From 1 June 2026, Secondary, O-Level, and A-Level quizzes and papers are generated in parallel with these models to reduce cost and preserve model provenance on the website. In practice, different LLMs run at different speeds, and their effective handling of long syllabus and exam-paper context can differ even when the prompt contract is the same.

๐Ÿ”ง
Claude Haiku 4.5 for template extraction

Fast, cost-effective for bulk processing. Good at recognizing question structure.

๐Ÿ“
DeepSeek R1 for complex reasoning

Geometry proofs, physics problems, multi-step calculations. When you need to show your working.

๐Ÿ”Š
Azure Neural TTS for audio lessons

Singapore English (en-SG) and Mandarin Chinese (zh-CN) voices. Custom pitch and rate settings per character for natural-sounding dialogue.

Lessons Learned

Match Model to Task

Don't use one AI for everything. Mistral for English OCR, Gemini for Chinese, Claude for early generation, and DeepSeek V4 Pro plus Gemma 4 31B plus Qwen3.6 Plus for lower-cost quiz/paper generation.

Model Speed Is Not Uniform

Parallel generation does not finish evenly. Some models return quickly while others take much longer, and long input context can affect quality, formatting, or retry rates differently per model.

Understand Local Context

Singapore education has unique features: no P1-P2 exams, Science starts at P3, specific assessment types. Respect these constraints.

Templates Beat Examples

Extract patterns, not just content. A good template generates infinite practice questions. Raw examples are one-and-done.

Normalize Bundles Before Analysis

Some sources package many schools into one PDF. Split those bundles by school after OCR, and keep answer keys with the matching paper before extracting templates.

Static for SEO

SPAs are invisible to search engines. Pre-render HTML with Astro, use React islands only where interactivity is needed.

Open Source

We're considering open sourcing the tools and processes used to build this. Stay tuned.

Questions?

Curious about the technical details? Want to contribute? Get in touch.

Email Us

[email protected]