Memorization or Generalization? Exploring Transformer-based Large Language Models and, possibly, novel approaches
-
This is a past event
Transformer-based Large Language Models demonstrate extraordinary capabilities and, thus, change the approach of the ML/NLP/NN communities when conducting research. Large chunks of research topics are neglected as Transformer-based LLMs seem to be the ultimate solution. However, it is already emerging that a large part of the capabilities of LLMs depends on their ability to memorize. Moreover, the necessity for deep neural networks to memorize long-tailed data to obtain close to optimal generalization error has attracted a lot of discussion. In this talk, we aim to report our experience in the florid research area on LLMs, exploring how these models memorize and how they generalize from training data.
- Speaker
- Fabio Massimo Zanzotto
- Venue
- Meston G05