Dovonon et al., 2024 - Google Patents

Setting the record straight on transformer oversmoothing

Dovonon et al., 2024

Document ID: 1238794199568978372
Author: Dovonon G; Bronstein M; Kusner M
Publication year: 2024
Publication venue: arXiv preprint arXiv:2401.04301

External Links

Cited by

Snippet

Transformer-based models have recently become wildly successful across a diverse set of domains. At the same time, recent work has shown empirically and theoretically that Transformers are inherently limited. Specifically, they argue that as model depth increases …

Continue reading at arxiv.org (PDF) (other versions)

238000004458 analytical method 0 abstract description 19

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices

Similar Documents

Publication	Publication Date	Title
Dovonon et al.	2024	Setting the record straight on transformer oversmoothing
Touvron et al.	2021	Going deeper with image transformers
Wu et al.	2021	Incremental learning via rate reduction
Tirer et al.	2023	Perturbation analysis of neural collapse
Zanette et al.	2021	Design of experiments for stochastic contextual linear bandits
Karakida et al.	2019	Pathological spectra of the fisher information metric and its variants in deep neural networks
Zając et al.	2023	Prediction error-based classification for class-incremental learning
Zhang et al.	2020	Learning to search efficient densenet with layer-wise pruning
Aikawa et al.	2024	Improving the efficiency of training physics-informed neural networks using active learning
Aich et al.	2023	Efficient controllable multi-task architectures
Lee et al.	2018	Adaptive network sparsification with dependent variational beta-bernoulli dropout
Simon et al.	2022	Towards a robust differentiable architecture search under label noise
Nishiyama et al.	2025	Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions
Back	2018	Radial basis functions
Witzgall	2021	Deep Rapid Class Augmentation; A New Progressive Learning Approach that Eliminates the Issue of Catastrophic Forgetting
Müller et al.	2019	Gain estimation of linear dynamical systems using thompson sampling
Tanaka et al.	2012	Adaptive kernel principal components tracking
Abiyev et al.	2012	Differential evaluation learning of fuzzy wavelet neural networks for stock price prediction
van Wyk et al.	2016	Analysis of activation functions for particle swarm optimised feedforward neural networks
Wang et al.	2024	Incremental online learning of randomized neural network with forward regularization
Dash et al.	2018	Gold price prediction using an evolutionary pi-sigma neural network
Jie et al.	2021	Differentiable Neural Architecture Search with Morphism-based Transformable Backbone Architectures
Lee et al.	2018	Adaptive network sparsification via dependent variational beta-bernoulli dropout
Júnior et al.	2018	Data-driven fuzzy modelling methodologies for multivariable nonlinear systems
Joshi et al.	2024	Design of Interacting Particle Systems for Fast Linear Quadratic RL