Training Data Extraction Attacks on Large Language Models: A Deeper Look into GPT-2 XL, GPT-2 IMDB, and LLaMA

Final Report: here

In this paper, we reproduce results from

Extracting Training Data from Large Language Models
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel
USENIX Security Symposium, 2021
https://arxiv.org/abs/2012.07805

to reproduce the Training Data Extraction Attack with several ablations. In addition, we apply the attack to the GPT-2 IMDB model, which is a version of GPT-2 that is fine tuned on a movie review dataset, and the LLaMA 7B model, which is a general 7 billion parameter LLM achieving performance on par with GPT-3. We choose these models to explore the effectiveness of the method on a model fine-tuned to a relatively small dataset, and on a state of the art model that is larger than any used in the original paper. We also develop and test a modification of their attack that makes use of DetectGPT, an approach for determining whether given text came from a particular language model, in hopes of achieving comparable performance to the original methods.

View Detection Sample Results: The results of the top 100 samples and whether they achieved a hit rate can be viewed in the files with the names "analysis[MODEL_NAME]*[SORTING_METHOD]"*

The following is from the original paper's README.

This repository contains code for extracting training data from GPT-2, following the approach outlined in the following paper:

WARNING: The experiments in our paper relied on different non-public codebases, and also involved a large amount of manual labor. The code in this repository is thus not meant to exactly reproduce the paper's results, but instead to illustrate the paper's approach and to help others perform similar experiments.
The code in this repository has not been tested at the scale considered in the paper (600,000 generated samples) and might find memorized content at a lower (or higher) rate!

Installation

You will need transformers, pytorch and tqdm. The code was tested with transformers v3.0.2 and torch v1.5.1.

Extracting Data

Simply run

python3 extraction.py --N 1000 --batch-size 10

to generate 1000 samples with GPT-2 (XL). The samples are generated with top-k sampling (k=40) and an empty prompt.

The generated samples are ranked according to four membership inference metrics introduced in our paper:

The log perplexity of the GPT-2 (XL) model.
The ratio of the log perplexities of the GPT-2 (XL) model and the GPT-2 (S) model.
The ratio of the log perplexities for the generated sample and the same sample in lower-case letters.
The ratio of the log perplexity of GPT-2 (XL) and the sample's entropy estimated by Zlib.

The top 10 samples according to each metric are printed out. These samples are likely to contain verbatim text from the GPT-2 training data.

Conditioning on Internet text

In our paper, we found that prompting GPT-2 with small snippets of text taken from the Web increased the chance of the model generating memorized content.

To reproduce this attack, first download a slice of the Common Crawl dataset:

./download_cc.sh

This will download a sample of the Crawl from May 2021 (~350 MB) to a file called commoncrawl.warc.wet.

Then, we can run the extraction attack with Internet prompts:

python3 extraction.py --N 1000 --internet-sampling --wet-file commoncrawl.warc.wet

Sample outputs

Some interesting data that we extracted from GPT-2 can be found here.

Note that these were found among 600,000 generated samples. If you generate a much smaller number of samples (10,000 for example), you will be less likely to find memorized content.

Citation

If this code is useful in your research, you are encouraged to cite our academic paper:

@inproceedings{carlini21extracting,
  author = {Carlini, Nicholas and Tramer, Florian and Wallace, Eric and Jagielski, Matthew and Herbert-Voss, Ariel and Lee, Katherine and Roberts, Adam and Brown, Tom and Song, Dawn and Erlingsson, Ulfar and Oprea, Alina and Raffel, Colin},
  title = {Extracting Training Data from Large Language Models},
  booktitle = {USENIX Security Symposium},
  year = {2021},
  howpublished = {arXiv preprint arXiv:2012.07805},
  url = {https://arxiv.org/abs/2012.07805}
}

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
__pycache__		__pycache__
detect-gpt		detect-gpt
llama_main		llama_main
plots		plots
.env		.env
.gitignore		.gitignore
ARCHIVED_llama-extraction.py		ARCHIVED_llama-extraction.py
IMDB Dataset.csv		IMDB Dataset.csv
LICENSE		LICENSE
README.md		README.md
Samples.md		Samples.md
analysis_gpt2xl_cdf.txt		analysis_gpt2xl_cdf.txt
analysis_gpt2xl_detectgpt.txt		analysis_gpt2xl_detectgpt.txt
analysis_gpt2xl_perp.txt		analysis_gpt2xl_perp.txt
analysis_imdb_cdf.txt		analysis_imdb_cdf.txt
analysis_imdb_detectgpt.txt		analysis_imdb_detectgpt.txt
analysis_imdb_perp.txt		analysis_imdb_perp.txt
analysis_llama_cdf.txt		analysis_llama_cdf.txt
analysis_llama_detectgpt.txt		analysis_llama_detectgpt.txt
analysis_llama_perp.txt		analysis_llama_perp.txt
calc_perp.py		calc_perp.py
convert_llama.py		convert_llama.py
cos484_project.pdf		cos484_project.pdf
download_cc.sh		download_cc.sh
extraction.py		extraction.py
get_perplexities.py		get_perplexities.py
googlesearch.py		googlesearch.py
gpt-2-imdb2.txt		gpt-2-imdb2.txt
gpt-2-imdb_extraction.py		gpt-2-imdb_extraction.py
gpt-2-imdb_perp-Perplexity_Scores.txt		gpt-2-imdb_perp-Perplexity_Scores.txt
gpt-2-imdb_perp.txt		gpt-2-imdb_perp.txt
gpt-2-imdb_perp_prompt.txt		gpt-2-imdb_perp_prompt.txt
gpt-2-imdb_zlib.txt		gpt-2-imdb_zlib.txt
gpt-2-imdb_zlib_prompt.txt		gpt-2-imdb_zlib_prompt.txt
gpt-2-s_perp.txt		gpt-2-s_perp.txt
gpt-2-xl-zlib.txt		gpt-2-xl-zlib.txt
gpt-2-xl.txt		gpt-2-xl.txt
gpt-2-xl_perp-Perplexity_Scores.txt		gpt-2-xl_perp-Perplexity_Scores.txt
gpt-2-xl_perp.txt		gpt-2-xl_perp.txt
gpt-3_extraction.py		gpt-3_extraction.py
gpt2-job.sh		gpt2-job.sh
gpt2-xl-detectgpt.txt		gpt2-xl-detectgpt.txt
imdb-job.sh		imdb-job.sh
imdb_detectgpt_samples.txt		imdb_detectgpt_samples.txt
job.sh		job.sh
llama-extraction2.py		llama-extraction2.py
llama-samples-perp-Perplexity_Scores.txt		llama-samples-perp-Perplexity_Scores.txt
llama-samples-perp.txt		llama-samples-perp.txt
llama-samples-perp_noprompt.txt		llama-samples-perp_noprompt.txt
llama-samples-zlib.txt		llama-samples-zlib.txt
llama-samples-zlib_noprompt.txt		llama-samples-zlib_noprompt.txt
llama-samples.txt		llama-samples.txt
llama_detectgpt_samples.txt		llama_detectgpt_samples.txt
plot.py		plot.py
results.txt		results.txt
results_gpt-2-imdb-cdf.txt		results_gpt-2-imdb-cdf.txt
results_gpt-2-xl-all_1000_perp.txt		results_gpt-2-xl-all_1000_perp.txt
results_gpt-2-xl-detectgpt.txt		results_gpt-2-xl-detectgpt.txt
results_gpt-2-xl-detectgpt_first-100.txt		results_gpt-2-xl-detectgpt_first-100.txt
results_gpt-2-xl-first_100_perp.txt		results_gpt-2-xl-first_100_perp.txt
results_gpt2-cdf.txt		results_gpt2-cdf.txt
results_imdb_first_100_detectgpt.txt		results_imdb_first_100_detectgpt.txt
results_imdb_first_100_perp.txt		results_imdb_first_100_perp.txt
results_llama_cdf.txt		results_llama_cdf.txt
results_llama_detectgpt.txt		results_llama_detectgpt.txt
results_llama_first_100.txt		results_llama_first_100.txt
scan_dataset.py		scan_dataset.py
search_cc.py		search_cc.py
temp.txt		temp.txt
test.py		test.py
test.txt		test.txt
test_llama_perp.txt		test_llama_perp.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training Data Extraction Attacks on Large Language Models: A Deeper Look into GPT-2 XL, GPT-2 IMDB, and LLaMA

The following is from the original paper's README.

Installation

Extracting Data

Conditioning on Internet text

Sample outputs

Citation

About

Releases

Packages

Contributors 4

Languages

License

brian-lou/Training-Data-Extraction-Attack-on-LLMs

Folders and files

Latest commit

History

Repository files navigation

Training Data Extraction Attacks on Large Language Models: A Deeper Look into GPT-2 XL, GPT-2 IMDB, and LLaMA

The following is from the original paper's README.

Installation

Extracting Data

Conditioning on Internet text

Sample outputs

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages