Kaggle - Open Problems - Multimodal Single Cell Integration - 2nd Place Solution

This repository is the 2nd place solutions for the Kaggle - Open Problems - Multimodal Single-Cell Competition.

It contains two parts from senkin13 and tmp .

If you run into any trouble with the setup/code or have any questions please contact tmp at baosenguo@163.com and senkin13 at senkin13@hotmail.com.

tmp's part

OVERVIEW

This pipeline mainly consists of the following parts:

Preprocessing
FE
Modeling

This simple solution produced a quite robust result (Public_lb 1st; Private_lb 2nd).

Preprocessing

using raw count:
normalization:
transformation:
standardization:
batch-effect correction:

Feature engineering

decomposition
- pca (64)
- ipca (128)
- factor analysis (64)
features selection
- Features highly correlated with target are selected. 245 features are selected in total.
cell-type (one-hot)

Modelling

both mlp and lgb used the same features introduced above.

mlp (simple mlp performs best (single model with 1 seed - public 0.815; private 0.772))
lgb

Local CV

random 5-fold cv
split according to "day"

Code

dataset preparation
- /tmp/data/prepare.ipynb
- /tmp/data/preprocess.ipynb
training
- /tmp/model/lgb.ipynb
- /tmp/model/mlp.ipynb
- /tmp/model/blending.ipynb

requirements

python 3.7.5
pandas 1.3.5
numpy 1.20.3
torch 1.9.0
sklearn 1.0.2

senkin13's part

HARDWARE:

(The following specs were used to create the original solution)

Windows 10 (4 TB boot disk, 64 vCPUs, 300 GB memory) 1 x NVIDIA TITAN RTX

SOFTWARE

(python packages are detailed separately in requirements.txt):

Python 3.8.10 CUDA 11.3 cuddn 7.6.5.32 nvidia drivers v.466.47

DATA SETUP

(assumes the Kaggle API is installed)

shell

below are the shell commands used in each step, as run from the top level directory

mkdir -p input features model sub

download all data to input

DATA PROCESSING

preprocess_cite.ipynb
preprocess_multi.ipynb

TRAIN & PREDICTION

cite_lgb_transformed_sparse_matrix.ipynb
cite_lgb_raw_clr_pca.ipynb
cite_lgb_raw_sparse_matrix.ipynb
cite_lgb_raw_target.ipynb
multi_lgb.ipynb
multi_nn.ipynb

ENSEMBLE

move tmp's tmp_cite_ensemble.joblib to ensemble/
ensemble.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
senkin13		senkin13
tmp		tmp
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle - Open Problems - Multimodal Single Cell Integration - 2nd Place Solution

tmp's part

OVERVIEW

Preprocessing

Feature engineering

Modelling

Local CV

Code

requirements

senkin13's part

HARDWARE:

SOFTWARE

DATA SETUP

shell

About

Releases

Packages

Languages

License

cheng-zi-ya/Kaggle-Open-Problems-Multimodal-Single-Cell-Integration-2nd-Place-Solution

Folders and files

Latest commit

History

Repository files navigation

Kaggle - Open Problems - Multimodal Single Cell Integration - 2nd Place Solution

tmp's part

OVERVIEW

Preprocessing

Feature engineering

Modelling

Local CV

Code

requirements

senkin13's part

HARDWARE:

SOFTWARE

DATA SETUP

shell

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages