This repository wants to explore different solutions for the Severstal competition hosted by Kaggle. Kaggle is a platform that provides various datasets from the real world machine learning problems and engages a large community of people. Severstal is a Russian company operating in the steel and mining industry. It creates a vast industrial data lake and in the 2019 looked to machine learning to improve automation, increase efficiency, and maintain high quality in their production.

The goal is to detect steel defects with segmentation models. The solutions are based on Pytorch with FastAI as high level deep learning framework.

In this repository you will find some Jupyter Notebooks used to build the steel_segmentation library with nbdev and the training notebooks.

In the steel_deployment repository you can find a Binder/Voila web app for the deployment of the models built with this library (still updating).

Install

To install this package, clone and install the repository and install via:

pip install git+https://github.com/marcomatteo/steel_segmentation.git

Editable install

To install and edit this package:

clone git+https://github.com/marcomatteo/steel_segmentation.git
pip install -e steel_segmentation

The library is based on nbdev, a powerful tool that builds a python package from Juptyer Notebooks.

pip install nbdev

To create the library, the documentation and tests use these commands:

nbdev_clean_nbs
nbdev_build_lib
nbdev_test_nbs
nbdev_build_docs

This enviroment works on MacOS and Linux. In Windows the WLS with Ubuntu 20.04 is raccomended.

Training only in Windows needs one package more to solve ipykernel issues:

conda install pywin32

Download the dataset

To download the Kaggle competition data you will need an account (if this is the first time with the API follow this link) to generate the credentials, download and copy the kaggle.json into the repository directory.

!mkdir ~/.kaggle
!cp ../kaggle.json ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json

Now you're authenticated with the Kaggle API (you'll need kaggle so pip install kaggle first), download and unzip the data:

!kaggle competitions download -c severstal-steel-defect-detection -p {path}
!mkdir data
!unzip -q -n {path}/severstal-steel-defect-detection.zip -d {path}

Library notebooks

All of the experiments are based on Jupyter Notebooks and in the nbs folder there are all the notebooks used to build the steel_segmentation library (still updating):

Explorating Data Analysis: data analysis, plots and utility functions.
Transforms: leveraging Middle-level API of fastai for custom data loading pipeline.
Optimizer utility functions
Loss functions
Metrics

Training

Training script in scripts folder:

segmentation_train.py: training segmentation models from qubvel repository.
create_submission.py : create a kaggle submission from a segmentation model trained and save the csv in data/submissions/.

Results