Privacy Preserving Framework to anonymize user trajectories contained in a population of users. It allows for researchers and engineers to plug-in any dataset & model into our current system.
Processing GPS location time series data with a Trajectory Generative Adversarial Network to generate synthetic data.
This is a TensorFlow 2 project and if you plan to train models on an NVIDIA GPU we recommend using conda to install the dependencies because each version of TensorFlow only works with a specific version of CUDA (see TensorFlow docs for tested configurations) and conda can install isolated CUDA versions in its environments to prevent conflicts.
conda env create -f environment.yml
conda activate
- Add two environment variables to a
.env
file in this (root) directory:xxx_INPUT_DIR
which is the absolute directory path of the raw input dataset on your system, andxxx_INPUT_FILE
, which is a file location where you want preprocessed data to be saved, i.e. a CSV file. Replace thexxx
with a 3-letter "nickname" for your dataset. - Create a new .py file in src/datasets/ for your dataset.
- Write a class that subclasses
Dataset
(from src/datasets/base.py) and implements apreprocess()
method that reads in the raw data and returns apandas.DataFrame
. - Import your class into
src/datasets/init.py and add the
class name to the
DATASETS
list.
- Create a new .py file in src/models/ and write a
class that inherits from
TrajectoryModel
(in src/models/base.py) and implements at leasttrain
,predict
,save
andrestore
abstract methods. If it's a supervised model (like MARC) then you'll also want to add anevaluate
method to get metrics on the test set. - Import your model class into
src/models/init.py and add the class
name to the
MODELS
list.
Use the CLI script's train
command:
$ python mobility_cli.py
Usage: mobility_cli.py [OPTIONS] COMMAND [ARGS]...
Command line interface for the mobility learning framework.
Options:
--help Show this message and exit.
Commands:
evaluate Use SAVED_MODEL to predict the labels of DATASET.
predict Use trained MODEL saved in SAVED_PATH to make predictions based...
train Train MODEL on DATASET stored in DATASET_PATH for EPOCHS.
$ python mobility_cli.py train --help
Usage: mobility_cli.py train [OPTIONS] [LSTMTrajGAN|MARC] [MDCLausanne|GeoLife
Beijing|FourSquareNYC|PrivamovLyon] EPOCHS
Train MODEL on DATASET stored in DATASET_PATH for EPOCHS.
Options:
--help Show this message and exit.
$ python mobility_cli.py train LSTMTrajGAN GeoLifeBeijing 200
Use the CLI script's predict
command:
$ python mobility_cli.py predict --help
Usage: mobility_cli.py predict [OPTIONS] [LSTMTrajGAN|MARC] SAVED_PATH [MDCLau
sanne|GeoLifeBeijing|FourSquareNYC|PrivamovLyon
] OUTPUT_PATH
Use trained MODEL saved in SAVED_PATH to make predictions based on DATASET
and write to OUTPUT_PATH as CSV.
Options:
--help Show this message and exit.
$ python mobility_cli.py predict LSTMTrajGAN LSTMTrajGAN experiments/LSTMTrajGAN_GeoLifeBeijing/2021-07-24T23:55:37/saved_model/ outputs/LSTMTrajGAN_GeoLifeBeijing_predictions.csv
Use the CLI script's evaluate
command:
$ python mobility_cli.py evaluate --help
Usage: mobility_cli.py evaluate [OPTIONS] [LSTMTrajGAN|MARC] SAVED_PATH [MDCLa
usanne|GeoLifeBeijing|FourSquareNYC|PrivamovLy
on]
Use SAVED_MODEL to predict the labels of DATASET.
Options:
--help Show this message and exit.
Date | Note | Author |
---|---|---|
3/31 | Created Repo; initalization & config.py | jeffmur |
4/1 | Project structure and importing fixes | alexkyllo |
4/5 | Optimize freqMatrix function | alexkyllo |
4/6 | Appended Ali's LSTM-AE, updated req.txt | jeffmur |
7/17 | Added LSTMTrajGAN and MARC models | alexkyllo |
7/24 | GeoLife and Privamov Ready for training | jeffmur |
7/25 | Add training instructions to README | alexkyllo |
- Port LSTM-TrajGAN to TensorFlow 2 so it can be run in the same environment
- Preprocessing code for MDC data so it can be fed into LSTM-TrajGAN
- Port MARC reidentifier model to TF2
- Preprocessing code for GeoLife data so it can be fed into LSTM-TrajGAN
- Preprocessing code for Privamov data so it can be fed into LSTM-TrajGAN
- Post-processing code to output LSTM-TrajGAN generated trajectories to CSV
- Train LSTM-TrajGAN on MDC Lausanne dataset
- Train LSTM-TrajGAN on Foursquare NYC dataset
- Train LSTM-TrajGAN on GeoLife Beijing dataset
- Train LSTM-TrajGAN on Privamov Lyon dataset
- Train MARC on MDC dataset
- Train MARC on FourSquare NYC dataset
- Train MARC on GeoLife dataset
- Train MARC on Privamov dataset
- Compare MARC performance on real vs. generated MDC trajectories for LSTM-TrajGAN
- Get outputs from Yuting's LSTM-AE model on MDC, FourSquare, Privamov and GeoLife datasets
- Compare MARC performance on real vs. generated trajectories for LSTM-AE
- Evaluate realism of generated trajectories using distribution and distance comparisons