Curl is a framework for Privacy Preserving Machine Learning (PPML) that builds on top of CrypTen and PyTorch. CrypTen relies on expensive polynomial approximations for evaluating non linear functions such as logarithm, square root, etc. In contrast, Curl uses lookup tables (LUTs) encoded with Discrete Wavelet Transforms (DWT) to approximate non-linearities that result in faster evaluation while achieving better approximations.
This way, in Curl we are able to evaluate Large Language Models (LLMs) such as GPT-2, GPT Neo, BERT (tiny, base, large). Curl's goal and model is similar to CrypTen:
Its goal is to make secure computing techniques accessible to Machine Learning practitioners. It currently implements Secure Multiparty Computation as its secure computing backend and offers three main benefits to ML researchers:
It is machine learning first. The framework presents the protocols via a
CrypTensor
object that looks and feels exactly like a PyTorchTensor
. This allows the user to use automatic differentiation and neural network modules akin to those in PyTorch.CrypTen is library-based. It implements a tensor library just as PyTorch does. This makes it easier for practitioners to debug, experiment on, and explore ML models.
The framework is built with real-world challenges in mind. CrypTen does not scale back or oversimplify the implementation of the secure protocols.
Curl will appear in the proceedings of the Conference on Applied Machine Learning in Information Security (CAMLIS) 2024. The preprint can be accessed here; you can cite this work as follows:
@InProceedings{CAMLIS:SMUJRSV24,
author = "Manuel B. Santos and
Dimitris Mouris and
Mehmet Ugurbil and
Stanislaw Jarecki and
José Reis and
Shubho Sengupta and
Miguel de Vega",
title = "{Curl: Private LLMs through Wavelet-Encoded Look-Up Tables}",
pages = {1--31},
booktitle = {Proceedings of the Conference on Applied Machine Learning in Information Security},
address = {Arlington, Virginia, USA},
month = {October 24--25,},
year = 2024,
}
The original CrypTen paper can be accessed here (documented here); you can cite this work as follows:
@InProceedings{crypten2020,
author={B. Knott and S. Venkataraman and A.Y. Hannun and S. Sengupta and M. Ibrahim and L.J.P. van der Maaten},
title={CrypTen: Secure Multi-Party Computation Meets Machine Learning},
booktitle={arXiv 2109.00984},
year={2021},
}
CrypTen currently runs on Linux and Mac with Python 3.7. We also support computation on GPUs. Windows is not supported. To install Curl, follow the instructions in the CONTRIBUTING.md file.
CrypTen has a series of tutorial built on Jupyter notebooks in the tutorials directory as well as examples in the examples directory.
We extend these with our LLM applications in the LLMs directory, which you can run as:
❯❯ python examples/llms/launcher.py --world_size 2 --tensor_size 1,10 --multiprocess --model GPT2
To see the full list of arguments and LLMs available run the script with the --help
flag:
❯❯ python examples/llms/launcher.py --help
This is software for a research prototype and not production-ready code. This repository builds upon CrypTen and PyTorch.