[go: up one dir, main page]

Skip to content

Modelzoo

zhezhaoa edited this page Oct 1, 2023 · 47 revisions

With the help of UER, we pre-trained models of different properties (e.g. models based on different corpora, encoders, and targets). All pre-trained weights introduced in this section are in UER format and can be loaded by UER directly. More pre-trained weights will be released in the near future. Unless otherwise noted, Chinese pre-trained models use BERT tokenizer and models/google_zh_vocab.txt as vocabulary (which is used in original BERT project). models/bert/base_config.json is used as configuration file in default. Commonly-used vocabulary and configuration files are included in models/ folder and users do not need to download them. In addition, We use scripts/convert_xxx_from_uer_to_huggingface.py to convert pre-trained weights into format that Huggingface Transformers supports, and upload them to Huggingface model hub (uer). In the rest of the section, we provide download links of pre-trained weights and the right ways of using them. Notice that, for space constraint, more details of a pre-trained weight are discussed in corresponding Huggingface model hub. We will provide the link of Huggingface model hub when we introduce the pre-trained weight.

Chinese RoBERTa Pre-trained Weights

This is the set of 24 Chinese RoBERTa weights. CLUECorpusSmall is used as training corpus. Configuration files are in models/bert/ folder. We only provide configuration files for Tiny,Mini,Small,Medium,Base,and Large models. To load other models, we need to modify emb_sizefeedforward_sizehidden_sizeheads_numlayers_num in the configuration file. Notice that emb_size = hidden_size, feedforward_size = 4 * hidden_size, heads_num = hidden_size / 64 . More details of these pre-trained weights are discussed here.

The pre-trained Chinese weight links of different layers (L) and hidden sizes (H):

H=128 H=256 H=512 H=768
L=2 2/128 (Tiny) 2/256 2/512 2/768
L=4 4/128 4/256 (Mini) 4/512 (Small) 4/768
L=6 6/128 6/256 6/512 6/768
L=8 8/128 8/256 8/512 (Medium) 8/768
L=10 10/128 10/256 10/512 10/768
L=12 12/128 12/256 12/512 12/768 (Base)

Take the Tiny weight as an example, we download the Tiny weight through the above link and put it in models/ folder. We can either conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt --vocab_path models/google_zh_vocab.txt \
                      --dataset_path dataset.pt --processes_num 8 --data_processor mlm

python3 pretrain.py --dataset_path dataset.pt --pretrained_model_path models/cluecorpussmall_roberta_tiny_seq512_model.bin \
                    --vocab_path models/google_zh_vocab.txt --config_path models/bert/tiny_config.json \
                    --output_model_path models/output_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 5000 --save_checkpoint_steps 2500 --batch_size 64 \
                    --data_processor mlm --target mlm

or use it on downstream classification dataset:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_tiny_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt --config_path models/bert/tiny_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --learning_rate 3e-4 --epochs_num 8 --batch_size 64

In fine-tuning stage, pre-trained models of different sizes usually require different hyper-parameters. The example of using grid search to find best hyper-parameters:

python3 finetune/run_classifier_grid.py --pretrained_model_path models/cluecorpussmall_roberta_tiny_seq512_model.bin \
                                        --vocab_path models/google_zh_vocab.txt \
                                        --config_path models/bert/tiny_config.json \
                                        --train_path datasets/book_review/train.tsv \
                                        --dev_path datasets/book_review/dev.tsv \
                                        --learning_rate_list 3e-5 1e-4 3e-4 --epochs_num_list 3 5 8 --batch_size_list 32 64

We can reproduce the experimental results reported here through above grid search script.

Chinese RoBERTa-WWM Pre-trained Weights

This is the set of 7 Chinese RoBERTa-WWM weights. CLUECorpusSmall is used as training corpus. Configuration files are in models/bert/ folder. Whole word masking (WWM) have better performance according to our experimental results. The details of xlarge pre-trained weight is discussed here. The details of other pre-trained weights are discussed here

The pre-trained Chinese weight links of different sizes:

Link
L=2/H=128 (Tiny)
L=4/H=256 (Mini)
L=4/H=512 (Small)
L=8/H=512 (Medium)
L=12/H=768 (Base)
L=24/H=1024 (Large)
L=36/H=1536 (Xlarge)

Take the RoBERTa-WWM Tiny weight as an example, we download the RoBERTa-WWM Tiny weight through the above link and put it in models/ folder. We can either conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt --vocab_path models/google_zh_vocab.txt \
                      --dataset_path dataset.pt --processes_num 8 --data_processor mlm

python3 pretrain.py --dataset_path dataset.pt --pretrained_model_path models/cluecorpussmall_roberta_wwm_tiny_seq512_model.bin \
                    --vocab_path models/google_zh_vocab.txt --config_path models/bert/tiny_config.json \
                    --output_model_path models/output_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 5000 --save_checkpoint_steps 2500 --batch_size 64 \
                    --data_processor mlm --target mlm

or use it on downstream classification dataset:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_tiny_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt --config_path models/bert/tiny_config.json
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --learning_rate 3e-4 --epochs_num 8 --batch_size 64

The example of using grid search to find best hyper-parameters for RoBERTa-WWM model:

python3 finetune/run_classifier_grid.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_tiny_seq512_model.bin \
                                        --vocab_path models/google_zh_vocab.txt \
                                        --config_path models/bert/tiny_config.json \
                                        --train_path datasets/book_review/train.tsv \
                                        --dev_path datasets/book_review/dev.tsv
                                        --learning_rate_list 3e-5 1e-4 3e-4 --epochs_num_list 3 5 8 --batch_size_list 32 64

We can reproduce the experimental results reported here through above grid search script.

Chinese word-based RoBERTa Pre-trained Weights

This is the set of 5 Chinese word-based RoBERTa weights. CLUECorpusSmall is used as training corpus. Configuration files are in models/bert/ folder. Google sentencepiece is used as tokenizer tool and models/cluecorpussmall_spm.model is used as sentencepiece model. Most Chinese pre-trained weights are based on Chinese character. Compared with character-based models, word-based models are faster (because of shorter sequence length) and have better performance according to our experimental results. More details of these pre-trained weights are discussed here

The pre-trained Chinese weight links of different sizes:

Link
L=2/H=128 (Tiny)
L=4/H=256 (Mini)
L=4/H=512 (Small)
L=8/H=512 (Medium)
L=12/H=768 (Base)

Take the word-based Tiny weight as an example, we download the word-based Tiny weight through the above link and put it in models/ folder. We can either conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt --spm_model_path models/cluecorpussmall_spm.model \
                      --dataset_path dataset.pt --processes_num 8 --data_processor mlm

python3 pretrain.py --dataset_path dataset.pt --pretrained_model_path models/cluecorpussmall_word_roberta_tiny_seq512_model.bin \
                    --spm_model_path models/cluecorpussmall_spm.model --config_path models/bert/tiny_config.json \
                    --output_model_path models/output_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 5000 --save_checkpoint_steps 2500 --batch_size 64 \
                    --data_processor mlm --target mlm

or use it on downstream classification dataset:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_word_roberta_tiny_seq512_model.bin \
                                   --spm_model_path models/cluecorpussmall_spm.model \
                                   --config_path models/bert/tiny_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --learning_rate 3e-4 --epochs_num 8 --batch_size 64

The example of using grid search to find best hyper-parameters for word-based model:

python3 finetune/run_classifier_grid.py --pretrained_model_path models/cluecorpussmall_word_roberta_tiny_seq512_model.bin \
                                        --spm_model_path models/cluecorpussmall_spm.model \
                                        --config_path models/bert/tiny_config.json \
                                        --train_path datasets/book_review/train.tsv \
                                        --dev_path datasets/book_review/dev.tsv
                                        --learning_rate_list 3e-5 1e-4 3e-4 --epochs_num_list 3 5 8 --batch_size_list 32 64

We can reproduce the experimental results reported here through above grid search script.

Chinese GPT-2 Pre-trained Weights

This is the set of Chinese GPT-2 pre-trained weights. Configuration files are in models/gpt2/ folder.

The link and detailed description (Huggingface model hub) of different pre-trained GPT-2 weights:

Model link Description link
CLUECorpusSmall GPT-2-xlarge https://huggingface.co/uer/gpt2-xlarge-chinese-cluecorpussmall
CLUECorpusSmall GPT-2-large https://huggingface.co/uer/gpt2-large-chinese-cluecorpussmall
CLUECorpusSmall GPT-2-medium https://huggingface.co/uer/gpt2-medium-chinese-cluecorpussmall
CLUECorpusSmall GPT-2 https://huggingface.co/uer/gpt2-chinese-cluecorpussmall
CLUECorpusSmall GPT-2-distil https://huggingface.co/uer/gpt2-distil-chinese-cluecorpussmall
Poem GPT-2 https://huggingface.co/uer/gpt2-chinese-poem
Couplet GPT-2 https://huggingface.co/uer/gpt2-chinese-couplet
Lyric GPT-2 https://huggingface.co/uer/gpt2-chinese-lyric
Ancient GPT-2 https://huggingface.co/uer/gpt2-chinese-ancient

Notice that extended vocabularies (models/google_zh_poem_vocab.txt and models/google_zh_ancient_vocab.txt) are used in Poem and Ancient GPT-2 models. CLUECorpusSmall GPT-2-distil model uses models/gpt2/distil_config.json configuration file. models/gpt2/config.json are used for other weights.

Take the CLUECorpusSmall GPT-2-distil weight as an example, we download the CLUECorpusSmall GPT-2-distil weight through the above link and put it in models/ folder. We can either conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt \
                      --vocab_path models/google_zh_vocab.txt \
                      --dataset_path dataset.pt --processes_num 8 \
                      --seq_length 128 --data_processor lm 

python3 pretrain.py --dataset_path dataset.pt \
                    --pretrained_model_path models/cluecorpussmall_gpt2_distil_seq1024_model.bin \
                    --vocab_path models/google_zh_vocab.txt \
                    --config_path models/gpt2/distil_config.json \
                    --output_model_path models/book_review_gpt2_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 10000 --save_checkpoint_steps 5000 --report_steps 1000 \
                    --learning_rate 5e-5 --batch_size 64

or use it on downstream classification dataset:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_gpt2_distil_seq1024_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/gpt2/distil_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --learning_rate 3e-5 --epochs_num 8 --batch_size 64

GPT-2 model can be used for text generation. First of all, we create story_beginning.txt and enter the beginning of the text. Then we use scripts/generate_lm.py to do text generation:

python3 scripts/generate_lm.py --load_model_path models/cluecorpussmall_gpt2_distil_seq1024_model.bin \
                               --vocab_path models/google_zh_vocab.txt \
                               --config_path models/gpt2/distil_config.json \
                               --test_path story_beginning.txt --prediction_path story_full.txt \
                               --seq_length 128

Chinese ALBERT Pre-trained Weights

This is the set of Chinese ALBERT pre-trained weights. Configuration files are in models/albert/ folder.

The link and detailed description (Huggingface model hub) of different pre-trained ALBERT weights:

Model link Description link
CLUECorpusSmall ALBERT-base https://huggingface.co/uer/albert-base-chinese-cluecorpussmall
CLUECorpusSmall ALBERT-large https://huggingface.co/uer/albert-large-chinese-cluecorpussmall

Take the CLUECorpusSmall ALBERT-base weight as an example, we download the CLUECorpusSmall ALBERT-base weight through the above link and put it in models/ folder. The example of using ALBERT-base on downstream dataset:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_albert_base_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt --config_path models/albert/base_config.json \
                                   --train_path datasets/book_review/train.tsv \
                                   --dev_path datasets/book_review/dev.tsv \
                                   --test_path datasets/book_review/test.tsv \
                                   --learning_rate 2e-5 --epochs_num 3 --batch_size 64

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/albert/base_config.json \
                                          --test_path datasets/book_review/test_nolabel.tsv \
                                          --prediction_path datasets/book_review/prediction.tsv \
                                          --labels_num 2

Chinese T5 Pre-trained Weights

This is the set of Chinese T5 pre-trained weights. Configuration files are in models/t5/ folder.

The link and detailed description (Huggingface model hub) of different pre-trained T5 weights:

Model link Description link
CLUECorpusSmall T5-small https://huggingface.co/uer/t5-small-chinese-cluecorpussmall
CLUECorpusSmall T5-base https://huggingface.co/uer/t5-base-chinese-cluecorpussmall

Take the CLUECorpusSmall T5-small weight as an example, we download the CLUECorpusSmall T5-small weight through the above link and put it in models/ folder. We can conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt \
                      --vocab_path models/google_zh_with_sentinel_vocab.txt \
                      --dataset_path dataset.pt \
                      --processes_num 8 --seq_length 128 \
                      --dynamic_masking --data_processor t5

python3 pretrain.py --dataset_path dataset.pt \
                    --pretrained_model_path models/cluecorpussmall_t5_small_seq512_model.bin \
                    --vocab_path models/google_zh_with_sentinel_vocab.txt \
                    --config_path models/t5/small_config.json \
                    --output_model_path models/book_review_t5_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 10000 --save_checkpoint_steps 5000 --report_steps 1000 \
                    --learning_rate 5e-4 --batch_size 64 \
                    --span_masking --span_geo_prob 0.3 --span_max_length 5

or use it on downstream dataset:

python3 finetune/run_text2text.py --pretrained_model_path models/cluecorpussmall_t5_small_seq512_model.bin \
                                  --vocab_path models/google_zh_with_sentinel_vocab.txt \
                                  --config_path models/t5/small_config.json \
                                  --train_path datasets/tnews_text2text/train.tsv \
                                  --dev_path datasets/tnews_text2text/dev.tsv \
                                  --seq_length 128 --tgt_seq_length 8 --learning_rate 3e-4 --epochs_num 3 --batch_size 32

python3 inference/run_text2text_infer.py --load_model_path models/finetuned_model.bin \
                                         --vocab_path models/google_zh_with_sentinel_vocab.txt \
                                         --config_path models/t5/small_config.json \
                                         --test_path datasets/tnews_text2text/test_nolabel.tsv \
                                         --prediction_path datasets/tnews_text2text/prediction.tsv \
                                         --seq_length 128 --tgt_seq_length 8 --batch_size 32

Users can download tnews dataset of text2text format from here.

Chinese T5-v1_1 Pre-trained Weights

This is the set of Chinese T5-v1_1 pre-trained weights. Configuration files are in models/t5-v1_1/ folder.

The link and detailed description (Huggingface model hub) of different pre-trained T5-v1_1 weights:

Model link Description link
CLUECorpusSmall T5-v1_1-small https://huggingface.co/uer/t5-v1_1-small-chinese-cluecorpussmall
CLUECorpusSmall T5-v1_1-base https://huggingface.co/uer/t5-v1_1-base-chinese-cluecorpussmall

Take the CLUECorpusSmall T5-v1_1-small weight as an example, we download the CLUECorpusSmall T5-v1_1-small weight through the above link and put it in models/ folder. We can conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt \
                      --vocab_path models/google_zh_with_sentinel_vocab.txt \
                      --dataset_path dataset.pt \
                      --processes_num 8 --seq_length 128 \
                      --dynamic_masking --data_processor t5

python3 pretrain.py --dataset_path dataset.pt \
                    --pretrained_model_path models/cluecorpussmall_t5-v1_1_small_seq512_model.bin \
                    --vocab_path models/google_zh_with_sentinel_vocab.txt \
                    --config_path models/t5-v1_1/small_config.json \
                    --output_model_path models/book_review_t5-v1_1_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 10000 --save_checkpoint_steps 5000 --report_steps 1000 \
                    --learning_rate 5e-4 --batch_size 64 \
                    --span_masking --span_geo_prob 0.3 --span_max_length 5

or use it on downstream dataset:

python3 finetune/run_text2text.py --pretrained_model_path models/cluecorpussmall_t5-v1_1_small_seq512_model.bin \
                                  --vocab_path models/google_zh_with_sentinel_vocab.txt \
                                  --config_path models/t5-v1_1/small_config.json \
                                  --train_path datasets/tnews_text2text/train.tsv \
                                  --dev_path datasets/tnews_text2text/dev.tsv \
                                  --seq_length 128 --tgt_seq_length 8 --learning_rate 3e-4 --epochs_num 3 --batch_size 32

python3 inference/run_text2text_infer.py --load_model_path models/finetuned_model.bin \
                                         --vocab_path models/google_zh_with_sentinel_vocab.txt \
                                         --config_path models/t5-v1_1/small_config.json \
                                         --test_path datasets/tnews_text2text/test_nolabel.tsv \
                                         --prediction_path datasets/tnews_text2text/prediction.tsv \
                                         --seq_length 128 --tgt_seq_length 8 --batch_size 32

PEGASUS Pre-trained Weights

This is the set of PEGASUS pre-trained weights. Configuration files are in models/pegasus/ folder.

The link and detailed description (Huggingface model hub) of PEGASUS weights:

Model link Description link
CLUECorpusSmall PEGASUS-base https://huggingface.co/uer/pegasus-base-chinese-cluecorpussmall
CLUECorpusSmall PEGASUS-large https://huggingface.co/uer/pegasus-large-chinese-cluecorpussmall

BART Pre-trained Weights

This is the set of BART pre-trained weights. Configuration files are in models/bart/ folder.

The link and detailed description (Huggingface model hub) of BART weights:

Model link Description link
CLUECorpusSmall BART-base https://huggingface.co/uer/bart-base-chinese-cluecorpussmall
CLUECorpusSmall BART-large https://huggingface.co/uer/bart-large-chinese-cluecorpussmall

Fine-tuned Chinese RoBERTa Weights

This is the set of fine-tuned Chinese RoBERTa weights. SBERT ChineseTextualInference NLI uses the models/sbert/base_config.json configuration file. The rest use the models/bert/base_config.json configuration file.

The link and detailed description (Huggingface model hub) of different fine-tuned RoBERTa weights:

Model link Description link
JD full sentiment classification https://huggingface.co/uer/roberta-base-finetuned-jd-full-chinese
JD binary sentiment classification https://huggingface.co/uer/roberta-base-finetuned-jd-binary-chinese
Dianping sentiment classification https://huggingface.co/uer/roberta-base-finetuned-dianping-chinese
Ifeng news topic classification https://huggingface.co/uer/roberta-base-finetuned-ifeng-chinese
Chinanews news topic classification https://huggingface.co/uer/roberta-base-finetuned-chinanews-chinese
CLUENER2020 NER https://huggingface.co/uer/roberta-base-finetuned-cluener2020-chinese
Extractive QA https://huggingface.co/uer/roberta-base-chinese-extractive-qa
ChineseTextualInference SBERT NLI https://huggingface.co/uer/sbert-base-chinese-nli

One can load these pre-trained models for pre-training, fine-tuning, and inference.

Chinese Pre-trained Weights Besides Transformer

This is the set of pre-trained weights besides Transformer.

The link and detailed description of different pre-trained weights:

Model link Configuration file Model details Training details
CLUECorpusSmall LSTM language model models/rnn_config.json --embedding word --remove_embedding_layernorm --encoder lstm --target lm steps: 500000 learning rate: 1e-3 batch size: 64*8 (the number of GPUs) sequence length: 256
CLUECorpusSmall GRU language model models/rnn_config.json --embedding word --remove_embedding_layernorm --encoder gru --target lm steps: 500000 learning rate: 1e-3 batch size: 64*8 (the number of GPUs) sequence length: 256
CLUECorpusSmall GatedCNN language model models/gatedcnn_9_config.json --embedding word --remove_embedding_layernorm --encoder gatedcnn --target lm steps: 500000 learning rate: 1e-4 batch size: 64*8 (the number of GPUs) sequence length: 256
CLUECorpusSmall ELMo models/birnn_config.json --embedding word --remove_embedding_layernorm --encoder bilstm --target bilm steps: 500000 learning rate: 5e-4 batch size: 64*8 (the number of GPUs) sequence length: 256

Chinese Pre-trained Weights from Other Organizations

Model link Description Description link
Google Chinese BERT-Base Configuration file: models/bert/base_config.json
Vocabulary: models/google_zh_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/bert
Google Chinese ALBERT-Base Configuration file: models/albert/base_config.json
Vocabulary: models/google_zh_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/albert
Google Chinese ALBERT-Large Configuration file: models/albert/large_config.json
Vocabulary: models/google_zh_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/albert
Google Chinese ALBERT-Xlarge Configuration file: models/albert/xlarge_config.json
Vocabulary: models/google_zh_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/albert
Google Chinese ALBERT-Xxlarge Configuration file: models/albert/xxlarge_config.json
Vocabulary: models/google_zh_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/albert
HFL Chinese BERT-wwm Configuration file: models/bert/base_config.json
Vocabulary: models/google_zh_vocab.txt
Tokenizer: BertTokenizer
https://github.com/ymcui/Chinese-BERT-wwm
HFL Chinese BERT-wwm-ext Configuration file: models/bert/base_config.json
Vocabulary: models/google_zh_vocab.txt
Tokenizer: BertTokenizer
https://github.com/ymcui/Chinese-BERT-wwm
HFL Chinese RoBERTa-wwm-ext Configuration file: models/bert/base_config.json
Vocabulary: models/google_zh_vocab.txt
Tokenizer: BertTokenizer
https://github.com/ymcui/Chinese-BERT-wwm
HFL Chinese RoBERTa-wwm-large-ext Configuration file: models/bert/large_config.json
Vocabulary: models/google_zh_vocab.txt
Tokenizer: BertTokenizer
https://github.com/ymcui/Chinese-BERT-wwm

English Pre-trained Weights from Other Organizations

Model link Description Description link
English BERT-Base-uncased Configuration file: models/bert/base_config.json
Vocabulary: models/google_uncased_en_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/bert
English BERT-Base-cased Configuration file: models/bert/base_config.json
Vocabulary: models/google_cased_en_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/bert
English BERT-Large-uncased Configuration file: models/bert/large_config.json
Vocabulary: models/google_uncased_en_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/bert
English BERT-Large-cased Configuration file: models/bert/large_config.json
Vocabulary: models/google_cased_en_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/bert
English BERT-Large-WWM-uncased Configuration file: models/bert/large_config.json
Vocabulary: models/google_uncased_en_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/bert
English BERT-Large-WWM-cased Configuration file: models/bert/large_config.json
Vocabulary: models/google_cased_en_vocab.txt
Tokenizer: BertTokenizer
https://github.com/google-research/bert
English RoBERTa-Base Configuration file: models/xlm-roberta/base_config.json
Vocabulary: models/huggingface_gpt2_vocab.txt models/huggingface_gpt2_merges.txt
Tokenizer: BPETokenizer
https://huggingface.co/roberta-base
English RoBERTa-Large Configuration file: models/xlm-roberta/large_config.json
Vocabulary: models/huggingface_gpt2_vocab.txt models/huggingface_gpt2_merges.txt
Tokenizer: BPETokenizer
https://huggingface.co/roberta-large
Clone this wiki locally