Homogenized Transformers

Date/heure
18 février 2026
16:45 - 17:45

Oratrice ou orateur
Hugo Koubbi (Sorbonne université)

Catégorie d'évènement
Séminaire des doctorants


Résumé

We study the residual stream of multi-head Transformers in which the attention weights are i.i.d.\ random matrices across layers and heads. We identify critical scaling laws linking the depth $L$, the residual scale $\eta$, and the number of heads $H$, and show that different joint limits yield distinct homogenized effective models. To formalize these limits, we leverage the theory of stochastic modified equations. We apply this framework to Transformers at initialization and derive effective dynamics that clarify the roles of additional parameters, including the inverse temperature $\beta$, the embedding dimension $d$, and the context length $n$.