possibly useful materials and tutorial for learning RWKV.
RWKV: Parallelizable RNN with Transformer-level LLM Performance.
-
🌟(2023-05) RWKV: Reinventing RNNs for the Transformer Era arxiv
-
(2023-03) Resurrecting Recurrent Neural Networks for Long Sequences arxiv
-
(2023-02) SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks arxiv
-
(2022-08) Simplified State Space Layers for Sequence Modeling ICLR2023
-
🌟(2021-05) An Attention Free Transformer arxiv
-
(2021-10) Efficiently Modeling Long Sequences with Structured State Spaces ICLR2022
-
(2020-08) Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention ICML2020
-
(2018) Parallelizing Linear Recurrent Neural Nets Over Sequence Length ICLR2018
-
(2017-09) Simple Recurrent Units for Highly Parallelizable Recurrence EMNLP2017
-
(2017-10) MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks Neurips2017
-
(2017-06) Attention Is All You Need Neurips2017
-
(2016-11) Quasi-Recurrent Neural Networks ICLR2017
-
Introducing RWKV - An RNN with the advantages of a transformer Hugging Face
-
有了Transformer框架后是不是RNN完全可以废弃了?知乎
-
RNN最简单有效的形式是什么?知乎
-
🌟RWKV的RNN CNN二象性 知乎
-
RNN的隐藏层需要非线性吗?知乎
-
Google新作试图“复活”RNN:RNN能否再次辉煌? 苏剑林
-
🌟How the RWKV language model works Johan Sokrates Wind
-
🌟The RWKV language model: An RNN with the advantages of a transformer Johan Sokrates Wind
-
The Unreasonable Effectiveness of Recurrent Neural Networks Andrej Karpathy blog