This repository contains an implementation of TPOT for obtaining optimal pipelines with the use of genetic algorithms.
If you want to know more about TPOT, how it works and what its components are, I really recommend you take a look at the blog: TPOT: Pipelines Optimization with Genetic Algorithms
- main.py: Contains the implementation of TPOT Classifier
- optimal_pipeline.py: Contains the optimal suggested pipeline obtained once TPOT Classifier has been implemented.
I recommend you to work with a virtual environment, in this case I am using pipenv. So in order to install the dependencies located in the Pipfile
you just need to type:
pipenv install
and then
pipenv shell
For optimizing the pipeline with TPOT Classifier, first comment the following line in main.py
:
if __name__ == "__main__":
automl = AutoML()
automl.load_data()
automl.pipeline_optimization()
# automl.train_suggested_tpot()
then run:
python -Bi main.py
once the optimization has been finalized, in the python console type the following:
automl.model.export('optimal_pipeline.py')
the previous command will overwrite the file optimal_pipeline.py
. Open the optimal_pipeline.py
and copy the pipeline function, the one looks like this:
# Average CV score on the training set was: 0.9347254053136407
exported_pipeline = make_pipeline(
PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
VarianceThreshold(threshold=0.2),
ZeroCount(),
GradientBoostingClassifier(learning_rate=1.0, max_depth=10, max_features=0.9000000000000001, min_samples_leaf=16, min_samples_split=3, n_estimators=100, subsample=0.7000000000000001)
)
paste the previous function into the main.py
file in the following function, such as:
def pipeline_suggested_by_tpot(self):
# Copied from optimal pipeline suggested by tpot in file "optimal_pipeline.py"
# Initialize
exported_pipeline = make_pipeline(
PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
VarianceThreshold(threshold=0.2),
ZeroCount(),
GradientBoostingClassifier(learning_rate=1.0, max_depth=10, max_features=0.9000000000000001, min_samples_leaf=16, min_samples_split=3, n_estimators=100, subsample=0.7000000000000001)
)
# Init training
exported_pipeline.fit(self.x_train, self.y_train)
print(f"Train acc: {exported_pipeline.score(self.x_train, self.y_train)}")
print(f"Test acc: {exported_pipeline.score(self.x_test, self.y_test)}")
Great, the last step is just run the main.py
by commenting the following lines:
if __name__ == "__main__":
automl = AutoML()
automl.load_data()
# automl.pipeline_optimization()
automl.train_suggested_tpot()
that is it!
Feel free to fork the model and add your own suggestiongs.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/YourGreatFeature
) - Commit your Changes (
git commit -m 'Add some YourGreatFeature'
) - Push to the Branch (
git push origin feature/YourGreatFeature
) - Open a Pull Request
If you have any question, feel free to reach me out at:
Distributed under the MIT License. See LICENSE.md
for more information.