[go: up one dir, main page]

Dovonon et al., 2024 - Google Patents

Setting the record straight on transformer oversmoothing

Dovonon et al., 2024

View PDF
Document ID
1238794199568978372
Author
Dovonon G
Bronstein M
Kusner M
Publication year
Publication venue
arXiv preprint arXiv:2401.04301

External Links

Snippet

Transformer-based models have recently become wildly successful across a diverse set of domains. At the same time, recent work has shown empirically and theoretically that Transformers are inherently limited. Specifically, they argue that as model depth increases …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/12Computer systems based on biological models using genetic models
    • G06N3/126Genetic algorithms, i.e. information processing using digital simulations of the genetic system
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6232Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
    • G06K9/6247Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/04Inference methods or devices

Similar Documents

Publication Publication Date Title
Dovonon et al. Setting the record straight on transformer oversmoothing
Touvron et al. Going deeper with image transformers
Wu et al. Incremental learning via rate reduction
Tirer et al. Perturbation analysis of neural collapse
Zanette et al. Design of experiments for stochastic contextual linear bandits
Karakida et al. Pathological spectra of the fisher information metric and its variants in deep neural networks
Zając et al. Prediction error-based classification for class-incremental learning
Zhang et al. Learning to search efficient densenet with layer-wise pruning
Aikawa et al. Improving the efficiency of training physics-informed neural networks using active learning
Aich et al. Efficient controllable multi-task architectures
Lee et al. Adaptive network sparsification with dependent variational beta-bernoulli dropout
Simon et al. Towards a robust differentiable architecture search under label noise
Nishiyama et al. Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions
Back Radial basis functions
Witzgall Deep Rapid Class Augmentation; A New Progressive Learning Approach that Eliminates the Issue of Catastrophic Forgetting
Müller et al. Gain estimation of linear dynamical systems using thompson sampling
Tanaka et al. Adaptive kernel principal components tracking
Abiyev et al. Differential evaluation learning of fuzzy wavelet neural networks for stock price prediction
van Wyk et al. Analysis of activation functions for particle swarm optimised feedforward neural networks
Wang et al. Incremental online learning of randomized neural network with forward regularization
Dash et al. Gold price prediction using an evolutionary pi-sigma neural network
Jie et al. Differentiable Neural Architecture Search with Morphism-based Transformable Backbone Architectures
Lee et al. Adaptive network sparsification via dependent variational beta-bernoulli dropout
Júnior et al. Data-driven fuzzy modelling methodologies for multivariable nonlinear systems
Joshi et al. Design of Interacting Particle Systems for Fast Linear Quadratic RL