개요
- Scikit-Learn의 Pipeline은 강력하다.
- PyCaret, Skorch에도 사용이 가능하다.
- Google Colab에서 시도해보자.
필수 라이브러리 설치
- pycaret을 설치 한 후에는 반드시 런타임 재시작을 클릭한다.
Collecting pycaret
Downloading pycaret-2.3.5-py3-none-any.whl (288 kB)
.
.
Successfully installed Boruta-0.3 Mako-1.1.6 PyYAML-6.0 alembic-1.4.1 databricks-cli-0.16.2 docker-5.0.3 funcy-1.17 gitdb-4.0.9 gitpython-3.1.24 gunicorn-20.1.0 htmlmin-0.1.12 imagehash-4.2.1 imbalanced-learn-0.7.0 joblib-1.0.1 kmodes-0.11.1 lightgbm-3.3.1 mlflow-1.22.0 mlxtend-0.19.0 multimethod-1.6 pandas-profiling-3.1.0 phik-0.12.0 prometheus-flask-exporter-0.18.7 pyLDAvis-3.2.2 pycaret-2.3.5 pydantic-1.8.2 pynndescent-0.5.5 pyod-0.9.6 python-editor-1.0.4 querystring-parser-1.2.4 requests-2.26.0 scikit-learn-0.23.2 scikit-plot-0.3.7 scipy-1.5.4 smmap-5.0.0 tangled-up-in-unicode-0.1.0 umap-learn-0.5.2 visions-0.7.4 websocket-client-1.2.3
Requirement already satisfied: skorch in /usr/local/lib/python3.7/dist-packages (0.11.0)
Requirement already satisfied: tabulate>=0.7.7 in /usr/local/lib/python3.7/dist-packages (from skorch) (0.8.9)
Requirement already satisfied: scikit-learn>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from skorch) (0.23.2)
Requirement already satisfied: tqdm>=4.14.0 in /usr/local/lib/python3.7/dist-packages (from skorch) (4.62.3)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.7/dist-packages (from skorch) (1.19.5)
Requirement already satisfied: scipy>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from skorch) (1.5.4)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.19.1->skorch) (1.0.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.19.1->skorch) (3.0.0)
from pycaret.datasets import get_data
data = get_data("electrical_grid")
| tau1 | tau2 | tau3 | tau4 | p1 | p2 | p3 | p4 | g1 | g2 | g3 | g4 | stabf |
|---|
| 0 | 2.959060 | 3.079885 | 8.381025 | 9.780754 | 3.763085 | -0.782604 | -1.257395 | -1.723086 | 0.650456 | 0.859578 | 0.887445 | 0.958034 | unstable |
|---|
| 1 | 9.304097 | 4.902524 | 3.047541 | 1.369357 | 5.067812 | -1.940058 | -1.872742 | -1.255012 | 0.413441 | 0.862414 | 0.562139 | 0.781760 | stable |
|---|
| 2 | 8.971707 | 8.848428 | 3.046479 | 1.214518 | 3.405158 | -1.207456 | -1.277210 | -0.920492 | 0.163041 | 0.766689 | 0.839444 | 0.109853 | unstable |
|---|
| 3 | 0.716415 | 7.669600 | 4.486641 | 2.340563 | 3.963791 | -1.027473 | -1.938944 | -0.997374 | 0.446209 | 0.976744 | 0.929381 | 0.362718 | unstable |
|---|
| 4 | 3.134112 | 7.608772 | 4.943759 | 9.857573 | 3.525811 | -1.125531 | -1.845975 | -0.554305 | 0.797110 | 0.455450 | 0.656947 | 0.820923 | unstable |
|---|
PyTorchModel
sktorch 라이브러리는 PyTorch 모델과 함께 작동한다.- MLP 모델을 작성하는 클래스를 설계한다.
import torch.nn as nn
class Net(nn.Module):
def __init__(self, num_inputs=12, num_units_d1 = 200, num_units_d2 = 100):
super(Net, self).__init__()
self.dense0 = nn.Linear(num_inputs, num_units_d1)
self.nonlin = nn.ReLU()
self.dropout = nn.Dropout(0.5)
self.dense1 = nn.Linear(num_units_d1, num_units_d2)
self.output = nn.Linear(num_units_d2, 2)
self.softmax = nn.Softmax(dim=-1)
def forward(self, X, **kwargs):
X = self.nonlin(self.dense0(X))
X = self.dropout(X)
X = self.nonlin(self.dense1(X))
X = self.softmax(self.output(X))
return X
Skorch Classifier
- NeuralNetClassifier 클래스를 PyTorch 클래스와 연동한다.
- Optimizer 기본값인 SGD를 사용한다. 만약 다른 Optimizer로 변경을 원하면 다음 링크에서 확인한다.
- Sktorch 5 폴드 교차검증을 수행한다.
- 학습 데이터는 80%, 나머지 20%는 검증 데이터로 활용한다.
from skorch import NeuralNetClassifier
net = NeuralNetClassifier(
module = Net,
max_epochs = 30,
lr = 0.1,
batch_size = 32,
train_split = None
)
PyCaret과 신경망 학습 방법
- SKORCH NN model을 초기화 했다면, 이번에는 PyCaret과 함께 모델을 학습할 수 있다.
- PyCaret은 기본적으로 Pandas DataFrame을 메인 객체로 사용하다.
- 그런데, sktorch model을 사용하기 위해서는
pipeline을 구성할 때는 DataFrameTransformer() 함수를 사용해야 한다.
from skorch.helper import DataFrameTransformer
import numpy as np
from sklearn.pipeline import Pipeline
nn_pipe = Pipeline(
[("transform", DataFrameTransformer()),
("net", net), ]
)
PyCaret Setup
- Skorch API 대신 PyCaret 모델을 사용해본다.
log_experiment가 True를 사용하게 되면 MLFlow를 사용할 수 있다.silent가 True인 경우 중간에 발생하는 press enter to continue 입력 단계를 피할 수 있다.
from pycaret.classification import *
target = "stabf"
clf1 = setup(data = data,
target = target,
train_size = 0.8,
fold = 5,
session_id = 123,
log_experiment = True,
experiment_name = 'electrical_grid_1',
silent = True)
| Description | Value |
|---|
| 0 | session_id | 123 |
|---|
| 1 | Target | stabf |
|---|
| 2 | Target Type | Binary |
|---|
| 3 | Label Encoded | stable: 0, unstable: 1 |
|---|
| 4 | Original Data | (10000, 13) |
|---|
| 5 | Missing Values | False |
|---|
| 6 | Numeric Features | 12 |
|---|
| 7 | Categorical Features | 0 |
|---|
| 8 | Ordinal Features | False |
|---|
| 9 | High Cardinality Features | False |
|---|
| 10 | High Cardinality Method | None |
|---|
| 11 | Transformed Train Set | (8000, 12) |
|---|
| 12 | Transformed Test Set | (2000, 12) |
|---|
| 13 | Shuffle Train-Test | True |
|---|
| 14 | Stratify Train-Test | False |
|---|
| 15 | Fold Generator | StratifiedKFold |
|---|
| 16 | Fold Number | 5 |
|---|
| 17 | CPU Jobs | -1 |
|---|
| 18 | Use GPU | False |
|---|
| 19 | Log Experiment | True |
|---|
| 20 | Experiment Name | electrical_grid_1 |
|---|
| 21 | USI | 9626 |
|---|
| 22 | Imputation Type | simple |
|---|
| 23 | Iterative Imputation Iteration | None |
|---|
| 24 | Numeric Imputer | mean |
|---|
| 25 | Iterative Imputation Numeric Model | None |
|---|
| 26 | Categorical Imputer | constant |
|---|
| 27 | Iterative Imputation Categorical Model | None |
|---|
| 28 | Unknown Categoricals Handling | least_frequent |
|---|
| 29 | Normalize | False |
|---|
| 30 | Normalize Method | None |
|---|
| 31 | Transformation | False |
|---|
| 32 | Transformation Method | None |
|---|
| 33 | PCA | False |
|---|
| 34 | PCA Method | None |
|---|
| 35 | PCA Components | None |
|---|
| 36 | Ignore Low Variance | False |
|---|
| 37 | Combine Rare Levels | False |
|---|
| 38 | Rare Level Threshold | None |
|---|
| 39 | Numeric Binning | False |
|---|
| 40 | Remove Outliers | False |
|---|
| 41 | Outliers Threshold | None |
|---|
| 42 | Remove Multicollinearity | False |
|---|
| 43 | Multicollinearity Threshold | None |
|---|
| 44 | Remove Perfect Collinearity | True |
|---|
| 45 | Clustering | False |
|---|
| 46 | Clustering Iteration | None |
|---|
| 47 | Polynomial Features | False |
|---|
| 48 | Polynomial Degree | None |
|---|
| 49 | Trignometry Features | False |
|---|
| 50 | Polynomial Threshold | None |
|---|
| 51 | Group Features | False |
|---|
| 52 | Feature Selection | False |
|---|
| 53 | Feature Selection Method | classic |
|---|
| 54 | Features Selection Threshold | None |
|---|
| 55 | Feature Interaction | False |
|---|
| 56 | Feature Ratio | False |
|---|
| 57 | Interaction Threshold | None |
|---|
| 58 | Fix Imbalance | False |
|---|
| 59 | Fix Imbalance Method | SMOTE |
|---|
PyCaret Train Model
model = create_model("rf")
| Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC |
|---|
| 0 | 0.9244 | 0.9796 | 0.9667 | 0.9189 | 0.9422 | 0.8331 | 0.8353 |
|---|
| 1 | 0.9275 | 0.9793 | 0.9549 | 0.9330 | 0.9438 | 0.8417 | 0.8422 |
|---|
| 2 | 0.9225 | 0.9810 | 0.9608 | 0.9211 | 0.9406 | 0.8294 | 0.8309 |
|---|
| 3 | 0.9081 | 0.9738 | 0.9461 | 0.9130 | 0.9293 | 0.7983 | 0.7993 |
|---|
| 4 | 0.9044 | 0.9738 | 0.9471 | 0.9071 | 0.9267 | 0.7894 | 0.7909 |
|---|
| Mean | 0.9174 | 0.9775 | 0.9551 | 0.9186 | 0.9365 | 0.8184 | 0.8197 |
|---|
| SD | 0.0093 | 0.0031 | 0.0079 | 0.0087 | 0.0071 | 0.0206 | 0.0206 |
|---|
PyCaret Train Skorch Model
- 이번에는 Skorch Model을 Pycaret 함수에 넣어서 확인해본다.
skorch_model = create_model(nn_pipe)
| Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC |
|---|
| 0 | 0.8831 | 0.9644 | 0.9500 | 0.8769 | 0.9120 | 0.7389 | 0.7441 |
|---|
| 1 | 0.8550 | 0.9437 | 0.9569 | 0.8385 | 0.8938 | 0.6685 | 0.6831 |
|---|
| 2 | 0.8369 | 0.9280 | 0.9638 | 0.8146 | 0.8829 | 0.6202 | 0.6446 |
|---|
| 3 | 0.8506 | 0.9347 | 0.8668 | 0.8957 | 0.8810 | 0.6805 | 0.6812 |
|---|
| 4 | 0.8081 | 0.9411 | 0.9765 | 0.7789 | 0.8666 | 0.5400 | 0.5859 |
|---|
| Mean | 0.8468 | 0.9424 | 0.9428 | 0.8409 | 0.8873 | 0.6496 | 0.6678 |
|---|
| SD | 0.0245 | 0.0123 | 0.0390 | 0.0421 | 0.0151 | 0.0666 | 0.0519 |
|---|
Comparing Models
- 두 모델 중 어떤 모델이 더 좋은지 확인해본다.
best_model = compare_models(include=[skorch_model, model], sort = "AUC")
| Model | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | TT (Sec) |
|---|
| 1 | Random Forest Classifier | 0.9174 | 0.9775 | 0.9551 | 0.9186 | 0.9365 | 0.8184 | 0.8197 | 2.114 |
|---|
| 0 | NeuralNetClassifier | 0.8426 | 0.9400 | 0.9547 | 0.8281 | 0.8861 | 0.6355 | 0.6565 | 11.878 |
|---|
Hyperparameter Grid
- Hyperparameter 튜닝을 적용하도록 한다.
- 모형 튜닝을 위한 parameter 값은 다음 명령어를 통해서 확인할 수 있다.
skorch_model.get_params().keys()
dict_keys(['memory', 'steps', 'verbose', 'transform', 'net', 'transform__float_dtype', 'transform__int_dtype', 'transform__treat_int_as_categorical', 'net__module', 'net__criterion', 'net__optimizer', 'net__lr', 'net__max_epochs', 'net__batch_size', 'net__iterator_train', 'net__iterator_valid', 'net__dataset', 'net__train_split', 'net__callbacks', 'net__predict_nonlinearity', 'net__warm_start', 'net__verbose', 'net__device', 'net___kwargs', 'net__classes', 'net__callbacks__epoch_timer', 'net__callbacks__train_loss', 'net__callbacks__train_loss__name', 'net__callbacks__train_loss__lower_is_better', 'net__callbacks__train_loss__on_train', 'net__callbacks__valid_loss', 'net__callbacks__valid_loss__name', 'net__callbacks__valid_loss__lower_is_better', 'net__callbacks__valid_loss__on_train', 'net__callbacks__valid_acc', 'net__callbacks__valid_acc__scoring', 'net__callbacks__valid_acc__lower_is_better', 'net__callbacks__valid_acc__on_train', 'net__callbacks__valid_acc__name', 'net__callbacks__valid_acc__target_extractor', 'net__callbacks__valid_acc__use_caching', 'net__callbacks__print_log', 'net__callbacks__print_log__keys_ignored', 'net__callbacks__print_log__sink', 'net__callbacks__print_log__tablefmt', 'net__callbacks__print_log__floatfmt', 'net__callbacks__print_log__stralign'])
dict_keys(['module', 'criterion', 'optimizer', 'lr', 'max_epochs', 'batch_size', 'iterator_train', 'iterator_valid', 'dataset', 'train_split', 'callbacks', 'predict_nonlinearity', 'warm_start', 'verbose', 'device', '_kwargs', 'classes', 'callbacks__epoch_timer', 'callbacks__train_loss', 'callbacks__train_loss__name', 'callbacks__train_loss__lower_is_better', 'callbacks__train_loss__on_train', 'callbacks__valid_loss', 'callbacks__valid_loss__name', 'callbacks__valid_loss__lower_is_better', 'callbacks__valid_loss__on_train', 'callbacks__valid_acc', 'callbacks__valid_acc__scoring', 'callbacks__valid_acc__lower_is_better', 'callbacks__valid_acc__on_train', 'callbacks__valid_acc__name', 'callbacks__valid_acc__target_extractor', 'callbacks__valid_acc__use_caching', 'callbacks__print_log', 'callbacks__print_log__keys_ignored', 'callbacks__print_log__sink', 'callbacks__print_log__tablefmt', 'callbacks__print_log__floatfmt', 'callbacks__print_log__stralign'])
import torch.optim as optim
custom_grid = {
'net__max_epochs':[20, 30],
'net__lr': [0.01, 0.05, 0.1],
'net__module__num_units_d1': [50, 100, 150],
'net__module__num_units_d2': [50, 100, 150],
'net__optimizer': [optim.Adam, optim.SGD, optim.RMSprop]
}
- 이번에는 hyperparameter 모델을 적용하여 모델을 빠르게 만들어 본다.
tuned_skorch_model = tune_model(skorch_model, custom_grid = custom_grid)
| Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC |
|---|
| 0 | 0.8762 | 0.9667 | 0.9686 | 0.8562 | 0.9089 | 0.7182 | 0.7316 |
|---|
| 1 | 0.8675 | 0.9477 | 0.8784 | 0.9106 | 0.8942 | 0.7171 | 0.7179 |
|---|
| 2 | 0.8375 | 0.9452 | 0.7835 | 0.9535 | 0.8602 | 0.6706 | 0.6891 |
|---|
| 3 | 0.8575 | 0.9522 | 0.8208 | 0.9490 | 0.8803 | 0.7066 | 0.7180 |
|---|
| 4 | 0.7975 | 0.9315 | 0.9726 | 0.7704 | 0.8597 | 0.5127 | 0.5602 |
|---|
| Mean | 0.8472 | 0.9487 | 0.8848 | 0.8879 | 0.8807 | 0.6650 | 0.6834 |
|---|
| SD | 0.0280 | 0.0114 | 0.0763 | 0.0684 | 0.0192 | 0.0781 | 0.0631 |
|---|
References
- https://pycaret.org/
- https://www.analyticsvidhya.com/blog/2020/05/pycaret-machine-learning-model-seconds/
- https://github.com/skorch-dev/skorch
- https://towardsdatascience.com/skorch-pytorch-models-trained-with-a-scikit-learn-wrapper-62b9a154623e
- https://towardsdatascience.com/pycaret-skorch-build-pytorch-neural-networks-using-minimal-code-57079e197f33