CS 7180: Neural Mechanics

Modern AI systems are powerful but opaque: even their creators do not fully understand what the billions of artificial neurons are doing inside. This class teaches methods for probing neural networks to uncover what concepts they have learned and where those concepts are encoded.

Interdisciplinary teams of a domain-expert PhD student, one or more CS/ML graduate students, and a Bau Lab mentor will aim to produce publication-quality research papers studying how LLMs encode concepts from fields outside CS, an area often neglected in interpretability research. Teams will target venues like NeurIPS, ICLR, or ICML. Prerequisites: linear algebra, probability, deep learning, Python, PyTorch.

Syllabus

This schedule is provisional and subject to change.

Week	Tuesday	Thursday	Assignments / Notes
0	Introduction [slides] [video] Course structure, goals, and introduction to mechanistic interpretability. Thu Jan 8 Read: Cummings (FINER), Hamming (video), Nielsen		Form teams; establish team Google Drive; brainstorm 2-3 candidate concepts
1	Foundations [slides] [colab] [video] Logit lens, intermediate representations, and the vocabulary of mechanistic interpretability. Tue Jan 13 Read: Primer, Logit Lens, Latent Language	Project Pitches [slides] Thu Jan 15	Hand in pitch Google Doc; each team presents their pitch
2	Steering [slides] [video] Controlling model behavior by manipulating distributed neural representation vectors. Tue Jan 20 Read: Piantadosi, Superposition, ITI	Project Updates Thu Jan 22	First plot-a-thon slides; logit lens or Neuronpedia explorations
3	Evaluation Methodology [slides] [dataset builder] [evaluation] [video] Creating evaluation datasets; measuring LLM behavior with cloze tasks, LLM-as-judge, model-written evaluations. Tue Jan 27 Read: Model-Written Evals, LLM-as-Judge, LAMA	Project Updates Thu Jan 29	Plot-a-thon: benchmark findings and model choices; create GitHub
4	Representation Geometry [slides] [puns] [video] What does a concept look like? PCA visualization, linear directions, and geometric structure. Tue Feb 3 Read: Geometry of Truth, Sentiment, Vector Arithmetic	Project Updates Thu Feb 5	Push interactive visualizations to project website
5	Causal Localization [slides] [colab] [video] Where are facts and functions computed? Causal tracing, activation patching, and function vectors. Tue Feb 10 Read: ROME, Function Vectors, Entity Tracking	Project Updates Thu Feb 12	Start Overleaf; causal mediation experiments
6	Probes Is the information there? Training classifiers to decode concepts, plus methodological pitfalls. Tue Feb 17 Read: TCAV, Probing Control Tasks	Project Updates Thu Feb 19	Reproducible experiment infrastructure; trained concept probe
7	Attribution Which input tokens matter? Integrated gradients, faithful attribution, and RAG analysis. Tue Feb 24 Read: Integrated Gradients, MIRAGE	Project Updates Thu Feb 26	Draft intro framing your research story; input attribution
—	Spring Break Mar 2–6		No class
8	Circuits Reverse-engineering end-to-end algorithms: induction heads, ACDC, and automated circuit discovery. Tue Mar 10 Read: Induction Heads, ACDC, Faithfulness	Project Updates Thu Mar 12	Refine hypotheses; repeat experiments; circuits analysis
9	Training Dynamics & Model Editing How circuits emerge during training; surgical fact editing with MEMIT. Tue Mar 17 Read: Grokking, MEMIT, In-Context Algebra	Project Updates Thu Mar 19	Triangulate: scale or diversity experiments
10	Human Understanding & Self-Description Can interpretability help humans? Can models interpret themselves? Tue Mar 24 Read: Bridging the Human-AI Gap, Patchscopes, Neologism	Project Updates Thu Mar 26	More triangulation; finalize experiment results
11	How to Write a Paper Tue Mar 31	Peer Review Workshop Thu Apr 2	Introduction + methods draft for peer review
12	Plot-a-thon: Results Discussion Tue Apr 7	Peer Review Workshop Thu Apr 9	Complete paper draft for peer review
13	Guest Lecture Tue Apr 14	Paper Editing Workshop Thu Apr 16	Editing and refinement
14	Final Presentations Tue Apr 21		Final paper due Wed Apr 22

Names	Day	Time	Location
David	TBD	TBD	TBD
Nikhil	TBD	TBD	TBD

CS 7180 (AI Special Topics): Neural Mechanics

Syllabus

Grading

Office Hours

Supplementary Materials