Feature Engine

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the transforming parameters from the data and then transform it.

Feature-engine features in the following resources

Blogs about Feature-engine

Documentation

Documentation

Current Feature-engine's transformers include functionality for:

Missing Data Imputation
Categorical Encoding
Discretisation
Outlier Capping or Removal
Variable Transformation
Variable Creation
Variable Selection
Datetime Features
Time Series
Preprocessing
Scikit-learn Wrappers

Imputation Methods

MeanMedianImputer
RandomSampleImputer
EndTailImputer
AddMissingIndicator
CategoricalImputer
ArbitraryNumberImputer
DropMissingData

Encoding Methods

OneHotEncoder
OrdinalEncoder
CountFrequencyEncoder
MeanEncoder
WoEEncoder
PRatioEncoder
RareLabelEncoder
DecisionTreeEncoder
StringSimilarityEncoder

Discretisation methods

EqualFrequencyDiscretiser
EqualWidthDiscretiser
DecisionTreeDiscretiser
ArbitraryDiscreriser

Outlier Handling methods

Winsorizer
ArbitraryOutlierCapper
OutlierTrimmer

Variable Transformation methods

LogTransformer
LogCpTransformer
ReciprocalTransformer
ArcsinTransformer
PowerTransformer
BoxCoxTransformer
YeoJohnsonTransformer

Variable Creation:

MathFeatures
RelativeFeatures
CyclicalFeatures

Feature Selection:

DropFeatures
DropConstantFeatures
DropDuplicateFeatures
DropCorrelatedFeatures
SmartCorrelationSelection
ShuffleFeaturesSelector
SelectBySingleFeaturePerformance
SelectByTargetMeanPerformance
RecursiveFeatureElimination
RecursiveFeatureAddition
DropHighPSIFeatures
SelectByInformationValue

Datetime

DatetimeFeatures

Time Series

LagFeatures
WindowFeatures
ExpandingWindowFeatures

Preprocessing

MatchCategories
MatchVariables

Wrappers:

SklearnTransformerWrapper

Installation

From PyPI using pip:

pip install feature_engine

From Anaconda:

conda install -c conda-forge feature_engine

Or simply clone it:

git clone https://github.com/feature-engine/feature_engine.git

Example Usage

>>> import pandas as pd
>>> from feature_engine.encoding import RareLabelEncoder

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()

Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64

>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()

Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

Find more examples in our Jupyter Notebook Gallery or in the documentation.

Contribute

Details about how to contribute can be found in the Contribute Page

Briefly:

Fork the repo
Clone your fork into your local computer: git clone https://github.com/<YOURUSERNAME>/feature_engine.git
navigate into the repo folder cd feature_engine
Install Feature-engine as a developer: pip install -e .
Optional: Create and activate a virtual environment with any tool of choice
Install Feature-engine dependencies: pip install -r requirements.txt and pip install -r test_requirements.txt
Create a feature branch with a meaningful name for your feature: git checkout -b myfeaturebranch
Develop your feature, tests and documentation
Make sure the tests pass
Make a PR

Thank you!!

Documentation

Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.

To build the documentation make sure you have the dependencies installed: from the root directory: pip install -r docs/requirements.txt.

Now you can build the docs using: sphinx-build -b html docs build

License

BSD 3-Clause

Sponsor us

Sponsor us and support further our mission to democratize machine learning and programming tools through open-source software.

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.circleci		.circleci
.github		.github
docs		docs
feature_engine		feature_engine
paper		paper
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
mypy.ini		mypy.ini
requirements.txt		requirements.txt
setup.py		setup.py
test_requirements.txt		test_requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feature Engine

Feature-engine features in the following resources

Blogs about Feature-engine

Documentation

Current Feature-engine's transformers include functionality for:

Imputation Methods

Encoding Methods

Discretisation methods

Outlier Handling methods

Variable Transformation methods

Variable Creation:

Feature Selection:

Datetime

Time Series

Preprocessing

Wrappers:

Installation

Example Usage

Contribute

Documentation

License

Sponsor us

About

Releases

Packages

Languages

License

px39n/feature_engine

Folders and files

Latest commit

History

Repository files navigation

Feature Engine

Feature-engine features in the following resources

Blogs about Feature-engine

Documentation

Current Feature-engine's transformers include functionality for:

Imputation Methods

Encoding Methods

Discretisation methods

Outlier Handling methods

Variable Transformation methods

Variable Creation:

Feature Selection:

Datetime

Time Series

Preprocessing

Wrappers:

Installation

Example Usage

Contribute

Documentation

License

Sponsor us

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages