GPT from scratch

Building & training a transformer on the first 325 episodes of the Lex Fridman Podcast to answer questions.

Architecture

Byte pair encoding (BPE)

The text is encoded with byte pair encoding (BPE) to get a vocabulary of 1,000 tokens.
The number of tokens after encoding is approx. 60% of the original text length.

Here's an example of the encoding process:

tokens = encode("I think this is going to be awesome.")

>>>
tensor([360, 237, 153,  61, 158,  61, 158, 253, 194, 186, 280,  53,  75, 169,
         67, 183,  11], device='cuda:0')

len("I think this is going to be awesome.") # 36
len(tokens) # 17

decode(tokens)

>>>
"I think this is going to be awesome."

Inference

It's not very good yet, but can mimick some english.

prompt = "What do you think about language models?"
answer = prompt_model(model, prompt, max_new_tokens=800, topk=2)
print(answer)

>>>
I think that the sort that.
 But know?
 And there's a lot one the because the but the comple to the of the somether and of comple
 of of the because a look, the so the blange,
 but I don't some the sort of an and that the be there any had the to,
 but I'm unders to don't there there to the some of the sorther.
 And that the some that the bractive,
 but that.
 But the because actory the be the because this to that start of the some the call the of the
 and there's they're going the be exconce,
 the same that the some to through an that and of it
 of they're good, when the ARLOL the good the bedher a conver of of a conver the be of the see
 of they're good on That think to, I don't going of,
 the can the say, they like,
 they they world, you can toper one of the becople
 freed that the sorld?
 Yeah, they

Notes

You can find my notes on the implementation details here: 🤖 Transformer blogpost.
The implementation is based on the "Attention Is All You Need" paper and the "Let's build GPT" tutorial by Andrej Karpathy.

Lex Fridman Podcast Dataset

The transcribed subtitles for the first 325 episodes of the Lex Fridman Podcast are from Andrej Karpathy's Lexicap project, which used OpenAI's whisper model to transcribe them. I cleaned the data with some regular expressions to get one big corpus of text for training the transformer model.

Training

The model was trained for ~5 hours on a GPU.

References

Vaswani et. al: Attention Is All You Need - Link
Andrej Karpathy: Let's build GPT: from scratch, in code, spelled out - Link
Rasa: Rasa Algorithm Whiteboard - Transformers & Attention 1: Self Attention Link
Thumbnail: Link
AI Coffee Break with Letitia: Positional embeddings in transformers EXPLAINED - Demystifying positional encodings. Link

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
dev		dev
images		images
.gitignore		.gitignore
README.md		README.md
gpt_with_byte_pair_encoding.ipynb		gpt_with_byte_pair_encoding.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT from scratch

Building & training a transformer on the first 325 episodes of the Lex Fridman Podcast to answer questions.

Architecture

Byte pair encoding (BPE)

Inference

Notes

Lex Fridman Podcast Dataset

Training

References

About

Releases

Packages

Languages

till2/GPT_from_scratch

Folders and files

Latest commit

History

Repository files navigation

GPT from scratch

Building & training a transformer on the first 325 episodes of the Lex Fridman Podcast to answer questions.

Architecture

Byte pair encoding (BPE)

Inference

Notes

Lex Fridman Podcast Dataset

Training

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages