This repository wants to explore different solutions for the Severstal competition hosted by Kaggle. Kaggle is a platform that provides various datasets from the real world machine learning problems and engages a large community of people. Severstal is a Russian company operating in the steel and mining industry. It creates a vast industrial data lake and in the 2019 looked to machine learning to improve automation, increase efficiency, and maintain high quality in their production.
The goal is to detect steel defects with segmentation models. The solutions are based on Pytorch with FastAI as high level deep learning framework.
In this repository you will find some Jupyter Notebooks used to build the steel_segmentation
library with nbdev and the training notebooks.
In the steel_deployment repository you can find a Binder/Voila web app for the deployment of the models built with this library (still updating).
To install this package, clone and install the repository and install via:
pip install git+https://github.com/marcomatteo/steel_segmentation.git
To install and edit this package:
clone git+https://github.com/marcomatteo/steel_segmentation.git
pip install -e steel_segmentation
The library is based on nbdev
, a powerful tool that builds a python package from Juptyer Notebooks.
pip install nbdev
To create the library, the documentation and tests use these commands:
nbdev_clean_nbs
nbdev_build_lib
nbdev_test_nbs
nbdev_build_docs
This enviroment works on MacOS and Linux. In Windows the WLS with Ubuntu 20.04 is raccomended.
Training only in Windows needs one package more to solve ipykernel
issues:
conda install pywin32
!mkdir ~/.kaggle
!cp ../kaggle.json ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json
Now you're authenticated with the Kaggle API (you'll need kaggle
so pip install kaggle
first), download and unzip the data:
!kaggle competitions download -c severstal-steel-defect-detection -p {path}
!mkdir data
!unzip -q -n {path}/severstal-steel-defect-detection.zip -d {path}
All of the experiments are based on Jupyter Notebooks and in the nbs
folder there are all the notebooks used to build the steel_segmentation
library (still updating):
- Explorating Data Analysis: data analysis, plots and utility functions.
- Transforms: leveraging Middle-level API of
fastai
for custom data loading pipeline. - Optimizer utility functions
- Loss functions
- Metrics
Training script in scripts
folder:
segmentation_train.py
: training segmentation models from qubvel repository.create_submission.py
: create a kaggle submission from a segmentation model trained and save the csv indata/submissions/
.
Models | Public score | Private score | Percentile Private LB |
---|---|---|---|
Pytorch UNET-ResNet18 | 0.87530 | 0.85364 | 85° |
Pytorch UNET-ResNet34 | 0.88591 | 0.88572 | 46° |
FastAI UNET-ResNet34 | 0.88648 | 0.88830 | 23° |
Pytorch FPN-ResNet34 | 0.89054 | 0.88911 | 19° |
Ensemble UNET-ResNet34_FPN-ResNet34 | 0.89184 | 0.89262 | 16° |