Dovonon et al., 2024 - Google Patents
Setting the record straight on transformer oversmoothingDovonon et al., 2024
View PDF- Document ID
- 1238794199568978372
- Author
- Dovonon G
- Bronstein M
- Kusner M
- Publication year
- Publication venue
- arXiv preprint arXiv:2401.04301
External Links
Snippet
Transformer-based models have recently become wildly successful across a diverse set of domains. At the same time, recent work has shown empirically and theoretically that Transformers are inherently limited. Specifically, they argue that as model depth increases …
- 238000004458 analytical method 0 abstract description 19
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Dovonon et al. | Setting the record straight on transformer oversmoothing | |
| Touvron et al. | Going deeper with image transformers | |
| Wu et al. | Incremental learning via rate reduction | |
| Tirer et al. | Perturbation analysis of neural collapse | |
| Zanette et al. | Design of experiments for stochastic contextual linear bandits | |
| Karakida et al. | Pathological spectra of the fisher information metric and its variants in deep neural networks | |
| Zając et al. | Prediction error-based classification for class-incremental learning | |
| Zhang et al. | Learning to search efficient densenet with layer-wise pruning | |
| Aikawa et al. | Improving the efficiency of training physics-informed neural networks using active learning | |
| Aich et al. | Efficient controllable multi-task architectures | |
| Lee et al. | Adaptive network sparsification with dependent variational beta-bernoulli dropout | |
| Simon et al. | Towards a robust differentiable architecture search under label noise | |
| Nishiyama et al. | Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions | |
| Back | Radial basis functions | |
| Witzgall | Deep Rapid Class Augmentation; A New Progressive Learning Approach that Eliminates the Issue of Catastrophic Forgetting | |
| Müller et al. | Gain estimation of linear dynamical systems using thompson sampling | |
| Tanaka et al. | Adaptive kernel principal components tracking | |
| Abiyev et al. | Differential evaluation learning of fuzzy wavelet neural networks for stock price prediction | |
| van Wyk et al. | Analysis of activation functions for particle swarm optimised feedforward neural networks | |
| Wang et al. | Incremental online learning of randomized neural network with forward regularization | |
| Dash et al. | Gold price prediction using an evolutionary pi-sigma neural network | |
| Jie et al. | Differentiable Neural Architecture Search with Morphism-based Transformable Backbone Architectures | |
| Lee et al. | Adaptive network sparsification via dependent variational beta-bernoulli dropout | |
| Júnior et al. | Data-driven fuzzy modelling methodologies for multivariable nonlinear systems | |
| Joshi et al. | Design of Interacting Particle Systems for Fast Linear Quadratic RL |