PyCaret, Skorch Using Pipeline

Page content

개요

  • Scikit-Learn의 Pipeline은 강력하다.
  • PyCaret, Skorch에도 사용이 가능하다.
  • Google Colab에서 시도해보자.

필수 라이브러리 설치

  • pycaret을 설치 한 후에는 반드시 런타임 재시작을 클릭한다.
!pip install pycaret
Collecting pycaret
  Downloading pycaret-2.3.5-py3-none-any.whl (288 kB)
.
.
Successfully installed Boruta-0.3 Mako-1.1.6 PyYAML-6.0 alembic-1.4.1 databricks-cli-0.16.2 docker-5.0.3 funcy-1.17 gitdb-4.0.9 gitpython-3.1.24 gunicorn-20.1.0 htmlmin-0.1.12 imagehash-4.2.1 imbalanced-learn-0.7.0 joblib-1.0.1 kmodes-0.11.1 lightgbm-3.3.1 mlflow-1.22.0 mlxtend-0.19.0 multimethod-1.6 pandas-profiling-3.1.0 phik-0.12.0 prometheus-flask-exporter-0.18.7 pyLDAvis-3.2.2 pycaret-2.3.5 pydantic-1.8.2 pynndescent-0.5.5 pyod-0.9.6 python-editor-1.0.4 querystring-parser-1.2.4 requests-2.26.0 scikit-learn-0.23.2 scikit-plot-0.3.7 scipy-1.5.4 smmap-5.0.0 tangled-up-in-unicode-0.1.0 umap-learn-0.5.2 visions-0.7.4 websocket-client-1.2.3
!pip install -U skorch
Requirement already satisfied: skorch in /usr/local/lib/python3.7/dist-packages (0.11.0)
Requirement already satisfied: tabulate>=0.7.7 in /usr/local/lib/python3.7/dist-packages (from skorch) (0.8.9)
Requirement already satisfied: scikit-learn>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from skorch) (0.23.2)
Requirement already satisfied: tqdm>=4.14.0 in /usr/local/lib/python3.7/dist-packages (from skorch) (4.62.3)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.7/dist-packages (from skorch) (1.19.5)
Requirement already satisfied: scipy>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from skorch) (1.5.4)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.19.1->skorch) (1.0.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.19.1->skorch) (3.0.0)
from pycaret.datasets import get_data
data = get_data("electrical_grid")
tau1 tau2 tau3 tau4 p1 p2 p3 p4 g1 g2 g3 g4 stabf
0 2.959060 3.079885 8.381025 9.780754 3.763085 -0.782604 -1.257395 -1.723086 0.650456 0.859578 0.887445 0.958034 unstable
1 9.304097 4.902524 3.047541 1.369357 5.067812 -1.940058 -1.872742 -1.255012 0.413441 0.862414 0.562139 0.781760 stable
2 8.971707 8.848428 3.046479 1.214518 3.405158 -1.207456 -1.277210 -0.920492 0.163041 0.766689 0.839444 0.109853 unstable
3 0.716415 7.669600 4.486641 2.340563 3.963791 -1.027473 -1.938944 -0.997374 0.446209 0.976744 0.929381 0.362718 unstable
4 3.134112 7.608772 4.943759 9.857573 3.525811 -1.125531 -1.845975 -0.554305 0.797110 0.455450 0.656947 0.820923 unstable

PyTorchModel

  • sktorch 라이브러리는 PyTorch 모델과 함께 작동한다.
  • MLP 모델을 작성하는 클래스를 설계한다.
import torch.nn as nn

class Net(nn.Module): 
  def __init__(self, num_inputs=12, num_units_d1 = 200, num_units_d2 = 100):
    super(Net, self).__init__() 

    self.dense0 = nn.Linear(num_inputs, num_units_d1)
    self.nonlin = nn.ReLU()
    self.dropout = nn.Dropout(0.5)
    self.dense1 = nn.Linear(num_units_d1, num_units_d2)
    self.output = nn.Linear(num_units_d2, 2)
    self.softmax = nn.Softmax(dim=-1)

  def forward(self, X, **kwargs):
    X = self.nonlin(self.dense0(X))
    X = self.dropout(X)
    X = self.nonlin(self.dense1(X))
    X = self.softmax(self.output(X))
    return X

Skorch Classifier

  • NeuralNetClassifier 클래스를 PyTorch 클래스와 연동한다.
  • Optimizer 기본값인 SGD를 사용한다. 만약 다른 Optimizer로 변경을 원하면 다음 링크에서 확인한다.
  • Sktorch 5 폴드 교차검증을 수행한다.
    • 학습 데이터는 80%, 나머지 20%는 검증 데이터로 활용한다.
from skorch import NeuralNetClassifier 

net = NeuralNetClassifier(
    module = Net, 
    max_epochs = 30, 
    lr = 0.1, 
    batch_size = 32, 
    train_split = None
)

PyCaret과 신경망 학습 방법

  • SKORCH NN model을 초기화 했다면, 이번에는 PyCaret과 함께 모델을 학습할 수 있다.
  • PyCaret은 기본적으로 Pandas DataFrame을 메인 객체로 사용하다.
  • 그런데, sktorch model을 사용하기 위해서는 pipeline을 구성할 때는 DataFrameTransformer() 함수를 사용해야 한다.
from skorch.helper import DataFrameTransformer
import numpy as np
from sklearn.pipeline import Pipeline

nn_pipe = Pipeline(
    [("transform", DataFrameTransformer()), 
     ("net", net), ]
)

PyCaret Setup

  • Skorch API 대신 PyCaret 모델을 사용해본다.
  • log_experimentTrue를 사용하게 되면 MLFlow를 사용할 수 있다.
  • silentTrue인 경우 중간에 발생하는 press enter to continue 입력 단계를 피할 수 있다.
from pycaret.classification import *
target = "stabf"
clf1 = setup(data = data, 
            target = target,
            train_size = 0.8,
            fold = 5,
            session_id = 123,
            log_experiment = True, 
            experiment_name = 'electrical_grid_1', 
            silent = True)
Description Value
0 session_id 123
1 Target stabf
2 Target Type Binary
3 Label Encoded stable: 0, unstable: 1
4 Original Data (10000, 13)
5 Missing Values False
6 Numeric Features 12
7 Categorical Features 0
8 Ordinal Features False
9 High Cardinality Features False
10 High Cardinality Method None
11 Transformed Train Set (8000, 12)
12 Transformed Test Set (2000, 12)
13 Shuffle Train-Test True
14 Stratify Train-Test False
15 Fold Generator StratifiedKFold
16 Fold Number 5
17 CPU Jobs -1
18 Use GPU False
19 Log Experiment True
20 Experiment Name electrical_grid_1
21 USI 9626
22 Imputation Type simple
23 Iterative Imputation Iteration None
24 Numeric Imputer mean
25 Iterative Imputation Numeric Model None
26 Categorical Imputer constant
27 Iterative Imputation Categorical Model None
28 Unknown Categoricals Handling least_frequent
29 Normalize False
30 Normalize Method None
31 Transformation False
32 Transformation Method None
33 PCA False
34 PCA Method None
35 PCA Components None
36 Ignore Low Variance False
37 Combine Rare Levels False
38 Rare Level Threshold None
39 Numeric Binning False
40 Remove Outliers False
41 Outliers Threshold None
42 Remove Multicollinearity False
43 Multicollinearity Threshold None
44 Remove Perfect Collinearity True
45 Clustering False
46 Clustering Iteration None
47 Polynomial Features False
48 Polynomial Degree None
49 Trignometry Features False
50 Polynomial Threshold None
51 Group Features False
52 Feature Selection False
53 Feature Selection Method classic
54 Features Selection Threshold None
55 Feature Interaction False
56 Feature Ratio False
57 Interaction Threshold None
58 Fix Imbalance False
59 Fix Imbalance Method SMOTE

PyCaret Train Model

  • Random Forest 모델을 사용해본다.
model = create_model("rf")
Accuracy AUC Recall Prec. F1 Kappa MCC
0 0.9244 0.9796 0.9667 0.9189 0.9422 0.8331 0.8353
1 0.9275 0.9793 0.9549 0.9330 0.9438 0.8417 0.8422
2 0.9225 0.9810 0.9608 0.9211 0.9406 0.8294 0.8309
3 0.9081 0.9738 0.9461 0.9130 0.9293 0.7983 0.7993
4 0.9044 0.9738 0.9471 0.9071 0.9267 0.7894 0.7909
Mean 0.9174 0.9775 0.9551 0.9186 0.9365 0.8184 0.8197
SD 0.0093 0.0031 0.0079 0.0087 0.0071 0.0206 0.0206

PyCaret Train Skorch Model

  • 이번에는 Skorch Model을 Pycaret 함수에 넣어서 확인해본다.
skorch_model = create_model(nn_pipe)
Accuracy AUC Recall Prec. F1 Kappa MCC
0 0.8831 0.9644 0.9500 0.8769 0.9120 0.7389 0.7441
1 0.8550 0.9437 0.9569 0.8385 0.8938 0.6685 0.6831
2 0.8369 0.9280 0.9638 0.8146 0.8829 0.6202 0.6446
3 0.8506 0.9347 0.8668 0.8957 0.8810 0.6805 0.6812
4 0.8081 0.9411 0.9765 0.7789 0.8666 0.5400 0.5859
Mean 0.8468 0.9424 0.9428 0.8409 0.8873 0.6496 0.6678
SD 0.0245 0.0123 0.0390 0.0421 0.0151 0.0666 0.0519

Comparing Models

  • 두 모델 중 어떤 모델이 더 좋은지 확인해본다.
best_model = compare_models(include=[skorch_model, model], sort = "AUC")
Model Accuracy AUC Recall Prec. F1 Kappa MCC TT (Sec)
1 Random Forest Classifier 0.9174 0.9775 0.9551 0.9186 0.9365 0.8184 0.8197 2.114
0 NeuralNetClassifier 0.8426 0.9400 0.9547 0.8281 0.8861 0.6355 0.6565 11.878

Hyperparameter Grid

  • Hyperparameter 튜닝을 적용하도록 한다.
  • 모형 튜닝을 위한 parameter 값은 다음 명령어를 통해서 확인할 수 있다.
skorch_model.get_params().keys()
dict_keys(['memory', 'steps', 'verbose', 'transform', 'net', 'transform__float_dtype', 'transform__int_dtype', 'transform__treat_int_as_categorical', 'net__module', 'net__criterion', 'net__optimizer', 'net__lr', 'net__max_epochs', 'net__batch_size', 'net__iterator_train', 'net__iterator_valid', 'net__dataset', 'net__train_split', 'net__callbacks', 'net__predict_nonlinearity', 'net__warm_start', 'net__verbose', 'net__device', 'net___kwargs', 'net__classes', 'net__callbacks__epoch_timer', 'net__callbacks__train_loss', 'net__callbacks__train_loss__name', 'net__callbacks__train_loss__lower_is_better', 'net__callbacks__train_loss__on_train', 'net__callbacks__valid_loss', 'net__callbacks__valid_loss__name', 'net__callbacks__valid_loss__lower_is_better', 'net__callbacks__valid_loss__on_train', 'net__callbacks__valid_acc', 'net__callbacks__valid_acc__scoring', 'net__callbacks__valid_acc__lower_is_better', 'net__callbacks__valid_acc__on_train', 'net__callbacks__valid_acc__name', 'net__callbacks__valid_acc__target_extractor', 'net__callbacks__valid_acc__use_caching', 'net__callbacks__print_log', 'net__callbacks__print_log__keys_ignored', 'net__callbacks__print_log__sink', 'net__callbacks__print_log__tablefmt', 'net__callbacks__print_log__floatfmt', 'net__callbacks__print_log__stralign'])
net.get_params().keys()
dict_keys(['module', 'criterion', 'optimizer', 'lr', 'max_epochs', 'batch_size', 'iterator_train', 'iterator_valid', 'dataset', 'train_split', 'callbacks', 'predict_nonlinearity', 'warm_start', 'verbose', 'device', '_kwargs', 'classes', 'callbacks__epoch_timer', 'callbacks__train_loss', 'callbacks__train_loss__name', 'callbacks__train_loss__lower_is_better', 'callbacks__train_loss__on_train', 'callbacks__valid_loss', 'callbacks__valid_loss__name', 'callbacks__valid_loss__lower_is_better', 'callbacks__valid_loss__on_train', 'callbacks__valid_acc', 'callbacks__valid_acc__scoring', 'callbacks__valid_acc__lower_is_better', 'callbacks__valid_acc__on_train', 'callbacks__valid_acc__name', 'callbacks__valid_acc__target_extractor', 'callbacks__valid_acc__use_caching', 'callbacks__print_log', 'callbacks__print_log__keys_ignored', 'callbacks__print_log__sink', 'callbacks__print_log__tablefmt', 'callbacks__print_log__floatfmt', 'callbacks__print_log__stralign'])
import torch.optim as optim

custom_grid = {
	'net__max_epochs':[20, 30],
	'net__lr': [0.01, 0.05, 0.1],
	'net__module__num_units_d1': [50, 100, 150],
	'net__module__num_units_d2': [50, 100, 150],
	'net__optimizer': [optim.Adam, optim.SGD, optim.RMSprop]
	}
  • 이번에는 hyperparameter 모델을 적용하여 모델을 빠르게 만들어 본다.
tuned_skorch_model = tune_model(skorch_model, custom_grid = custom_grid)
Accuracy AUC Recall Prec. F1 Kappa MCC
0 0.8762 0.9667 0.9686 0.8562 0.9089 0.7182 0.7316
1 0.8675 0.9477 0.8784 0.9106 0.8942 0.7171 0.7179
2 0.8375 0.9452 0.7835 0.9535 0.8602 0.6706 0.6891
3 0.8575 0.9522 0.8208 0.9490 0.8803 0.7066 0.7180
4 0.7975 0.9315 0.9726 0.7704 0.8597 0.5127 0.5602
Mean 0.8472 0.9487 0.8848 0.8879 0.8807 0.6650 0.6834
SD 0.0280 0.0114 0.0763 0.0684 0.0192 0.0781 0.0631

References

  1. https://pycaret.org/
  2. https://www.analyticsvidhya.com/blog/2020/05/pycaret-machine-learning-model-seconds/
  3. https://github.com/skorch-dev/skorch
  4. https://towardsdatascience.com/skorch-pytorch-models-trained-with-a-scikit-learn-wrapper-62b9a154623e
  5. https://towardsdatascience.com/pycaret-skorch-build-pytorch-neural-networks-using-minimal-code-57079e197f33