Northeastern Logo

CS 7180 (AI Special Topics): Neural Mechanics

Northeastern University, Spring 2026

David Bau
Instructor
Nikhil Prakash

Modern AI systems are powerful but opaque: even their creators do not fully understand what the billions of artificial neurons are doing inside. This class teaches methods for probing neural networks to uncover what concepts they have learned and where those concepts are encoded.

Interdisciplinary teams of a domain-expert PhD student, one or more CS/ML graduate students, and a Bau Lab mentor will aim to produce publication-quality research papers studying how LLMs encode concepts from fields outside CS, an area often neglected in interpretability research. Teams will target venues like NeurIPS, ICLR, or ICML. Prerequisites: linear algebra, probability, deep learning, Python, PyTorch.

Time: Tuesday 11:45 am – 1:25 pm, Thursday 2:50 pm – 4:30 pm
Location: Hayden Hall 321

Syllabus

This schedule is provisional and subject to change.
Week Tuesday Thursday Assignments / Notes
0 Introduction
Course structure, goals, and introduction to mechanistic interpretability.
Thu Jan 8
Form teams; establish team Google Drive; brainstorm 2-3 candidate concepts
1 Foundations
Logit lens, intermediate representations, and the vocabulary of mechanistic interpretability.
Tue Jan 13
Read: Primer, Logit Lens, Latent Language
Project Pitches
Thu Jan 15
Hand in pitch Google Doc; each team presents their pitch
2 Steering
Controlling model behavior by manipulating distributed neural representation vectors.
Tue Jan 20
Read: Piantadosi, Superposition, ITI
Project Updates
Thu Jan 22
First plot-a-thon slides; logit lens or Neuronpedia explorations
3 Evaluation Methodology
Creating evaluation datasets; measuring LLM behavior with cloze tasks, LLM-as-judge, model-written evaluations.
Tue Jan 27
Read: Model-Written Evals, LLM-as-Judge, LAMA
Project Updates
Thu Jan 29
Plot-a-thon: benchmark findings and model choices; create GitHub
4 Representation Geometry
What does a concept look like? PCA visualization, linear directions, and geometric structure.
Tue Feb 3
Read: Geometry of Truth, Sentiment, Vector Arithmetic
Project Updates
Thu Feb 5
Push interactive visualizations to project website
5 Causal Localization
Where are facts and functions computed? Causal tracing, activation patching, and function vectors.
Tue Feb 10
Read: ROME, Function Vectors, Entity Tracking
Project Updates
Thu Feb 12
Start Overleaf; causal mediation experiments
6 Probes
Is the information there? Training classifiers to decode concepts, plus methodological pitfalls.
Tue Feb 17
Read: TCAV, Probing Control Tasks
Project Updates
Thu Feb 19
Reproducible experiment infrastructure; trained concept probe
7 Attribution
Which input tokens matter? Integrated gradients, faithful attribution, and RAG analysis.
Tue Feb 24
Read: Integrated Gradients, MIRAGE
Project Updates
Thu Feb 26
Draft intro framing your research story; input attribution
Spring Break
Mar 2–6
No class
8 Circuits
Reverse-engineering end-to-end algorithms: induction heads, ACDC, and automated circuit discovery.
Tue Mar 10
Read: Induction Heads, ACDC, Faithfulness
Project Updates
Thu Mar 12
Refine hypotheses; repeat experiments; circuits analysis
9 Training Dynamics & Model Editing
How circuits emerge during training; surgical fact editing with MEMIT.
Tue Mar 17
Read: Grokking, MEMIT, In-Context Algebra
Project Updates
Thu Mar 19
Triangulate: scale or diversity experiments
10 Human Understanding & Self-Description
Can interpretability help humans? Can models interpret themselves?
Tue Mar 24
Read: Bridging the Human-AI Gap, Patchscopes, Neologism
Project Updates
Thu Mar 26
More triangulation; finalize experiment results
11 How to Write a Paper
Tue Mar 31
Peer Review Workshop
Thu Apr 2
Introduction + methods draft for peer review
12 Plot-a-thon: Results Discussion
Tue Apr 7
Peer Review Workshop
Thu Apr 9
Complete paper draft for peer review
13 Guest Lecture
Tue Apr 14
Paper Editing Workshop
Thu Apr 16
Editing and refinement
14 Final Presentations
Tue Apr 21
Final paper due Wed Apr 22

Grading

Office Hours

Names Day Time Location
David TBD TBD TBD
Nikhil TBD TBD TBD

Supplementary Materials