AI model distillation (learning)¶
| Field | Value |
|---|---|
| Status | Active |
| Type | Personal learning / research |
Context for LLMs: If this wiki is pasted into a chat and the user says distillation or the distillation project, they mean this page and raw/ai-model-distillation-learning-plan.md (basename ai-model-distillation-learning-plan).
Description¶
A personal learning project on model distillation: training smaller or cheaper student models to approximate a larger teacher (or ensemble), including classic knowledge distillation, logits matching, and related compression ideas. Goal is hands-on understanding of how distilled models behave, fail, and trade off quality vs. cost — not a product launch.
Structured multi-phase plan (full verbatim checklist): raw/ai-model-distillation-learning-plan.md.
Learning plan (progress)¶
| Phase | Focus | Milestone | Status |
|---|---|---|---|
| 1 | Foundation: Moonshot (Alex Wissner-Gross), paper(s), distillation-minded hands-on (see Boxy below) | Grounding + one applied thread | Done |
| 2 | Generalize beyond the podcast via LLM prompts; save generalization doc; tie to your skills | Personalized 2–3 page summary | Not started |
| 3 | Read 8–12 resources: Hinton KD, distilling step-by-step, HF guides, OpenAI distillation tutorial, DistillKit/notebooks, search + alerts | Notion/Zotero (or equivalent) with highlights + snippets | Not started |
| 4 | Build a narrow student (OpenAI distillation path or open-source teacher → HF/DistillKit); deploy + case study | Live demo + 1-page “cost / niche metric” writeup | Not started |
Phase 2 (next) — outline¶
- One broad prompt: mechanisms (response / logit KD / synthetic data), economics, beyond LLMs, iterated loops, examples (e.g. Phi, OpenAI distillation API), 3–5 student niche ideas.
- 3–5 follow-ups (e.g. apply to medicine, law, coding).
- Deliverable: full thread saved + 2–3 page personalized summary.
Phase 3 — outline¶
- Papers: Hinton 2015 “Distilling the Knowledge…”; “Distilling step-by-step” (Google + Snorkel, 2023); HF KD + CV distillation material.
- Hands-on tutorials: OpenAI distillation guide (GPT-4o → mini); Snorkel/Labelbox intros; Arcee DistillKit / Nebius notebooks.
- Search habit: HF + synthetic-data queries; arXiv (“scaling laws”, “reinforcement-aware KD”); alerts/RSS for “model distillation.”
Phase 4 — outline¶
- Pick a narrow problem where you have edge (domain reviewer, tutor, clause explainer, etc.).
- Fast path: OpenAI API, synthetic data from teacher, distill to mini, fine-tune, held-out eval.
- OSS path: Large teacher on Groq/HF, synthetic rationales, Transformers + DistillKit / distillation
Trainer, optional quantize/prune. - Ship: Space/Vercel/Replicate, short social post + 1-page case study.
Boxy — on-device video/audio experiment (Phase 1 hands-on)¶
Boxy was a generic native iOS POC exploring on-device processing of audio extracted from imported video files — informed by this distillation study, not a wiki project page and not documented with a product use case here.
What it tested:
- SwiftUI app: user imports video clips; all analysis stays on device.
- Speech path: WhisperKit transcription, including smaller / distilled Whisper variants and perf tuning.
- Parallel audio classifier: custom detector trained offline, shipped as Core ML with a matched mel frontend; sliding windows over clip audio alongside the transcript.
- Output: rules-based merge of transcript text + detected audio-event timeline (turn-taking / quiet-region hints where the rules allowed).
- Offline training stack (historical): local FastAPI pipeline — Distil-Whisper LoRA, small CNN classifier, optional Core ML export.
Caveats captured: several minutes of analysis per clip on device; external testing via TestFlight when needed.
Lesson for distillation work: the local / student-model path was valuable for feasibility, latency, and privacy posture experiments; quality tradeoffs vs. larger cloud models are exactly what Phase 2–4 should generalize.
Origin¶
- Sources: Listened repeatedly to Moonshot episodes featuring Alex Wissner-Gross’s explanation of model distillation, plus reading a paper on the topic.
- Applied thread: Boxy (above) — distillation-minded on-device audio from video.
Key features / scope¶
- Theory + practice: core papers, podcasts, and targeted experiments.
- Boxy detail stays high-level on this page; no separate project page.
Tech stack¶
- Study: general PyTorch / HF / PEFT ecosystem; Whisper family and distilled variants in the Boxy POC.
- Phase 4: OpenAI distillation API vs. DistillKit / HF — choose when you start build.
- Add dedicated personal repo or notebook links in
raw/when you want them canonical.
Raw sources¶
raw/ai-model-distillation-learning-plan.md— phased checklist (Phases 2–4 verbatim + Phase 1 note).
Related¶
- llm-maintained-context — wiki maintenance practice used to track this work.
- ../What-Im-Working-On — public snapshot.
- glossary — Boxy, Distillation terms.
Known issues¶
- Phase 2–4 deliverables (generalization doc, Zotero/Notion library, shipped student + case study) not linked here yet.