강의 홍보
Competition
Intro
import os
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
Check File Size
- Check Each Size of Dataset Folder in this competition
- train_records = 4.5GB
- test_tfrecords = 0.5MB
- train (image data) = 6.5GB
- test (image data) = 0.8MB
import os
def get_folder_size(file_directory):
# file_list = os.listdir(file_directory)
dir_sizes = {}
for r, d, f in os.walk(file_directory, False):
size = sum(os.path.getsize(os.path.join(r,f)) for f in f+d)
size += sum(dir_sizes[os.path.join(r,d)] for d in d)
dir_sizes[r] = size
print("{} is {} MB".format(r, round(size/2**20), 2))
base_dir = '../input/ranzcr-clip-catheter-line-classification'
get_folder_size(base_dir)
../input/ranzcr-clip-catheter-line-classification/test is 805 MB
../input/ranzcr-clip-catheter-line-classification/test_tfrecords is 555 MB
../input/ranzcr-clip-catheter-line-classification/train_tfrecords is 4563 MB
../input/ranzcr-clip-catheter-line-classification/train is 6592 MB
../input/ranzcr-clip-catheter-line-classification is 12524 MB
Check train file
train = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train.csv', index_col = 0)
test = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/sample_submission.csv', index_col = 0)
display(train.head())
display(test.head())
|
ETT - Abnormal |
ETT - Borderline |
ETT - Normal |
NGT - Abnormal |
NGT - Borderline |
NGT - Incompletely Imaged |
NGT - Normal |
CVC - Abnormal |
CVC - Borderline |
CVC - Normal |
Swan Ganz Catheter Present |
PatientID |
| StudyInstanceUID |
|
|
|
|
|
|
|
|
|
|
|
|
| 1.2.826.0.1.3680043.8.498.26697628953273228189375557799582420561 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
ec89415d1 |
| 1.2.826.0.1.3680043.8.498.46302891597398758759818628675365157729 |
0 |
0 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
bf4c6da3c |
| 1.2.826.0.1.3680043.8.498.23819260719748494858948050424870692577 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
3fc1c97e5 |
| 1.2.826.0.1.3680043.8.498.68286643202323212801283518367144358744 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
c31019814 |
| 1.2.826.0.1.3680043.8.498.10050203009225938259119000528814762175 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
207685cd1 |
|
ETT - Abnormal |
ETT - Borderline |
ETT - Normal |
NGT - Abnormal |
NGT - Borderline |
NGT - Incompletely Imaged |
NGT - Normal |
CVC - Abnormal |
CVC - Borderline |
CVC - Normal |
Swan Ganz Catheter Present |
| StudyInstanceUID |
|
|
|
|
|
|
|
|
|
|
|
| 1.2.826.0.1.3680043.8.498.46923145579096002617106567297135160932 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
| 1.2.826.0.1.3680043.8.498.84006870182611080091824109767561564887 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
| 1.2.826.0.1.3680043.8.498.12219033294413119947515494720687541672 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
| 1.2.826.0.1.3680043.8.498.84994474380235968109906845540706092671 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
| 1.2.826.0.1.3680043.8.498.35798987793805669662572108881745201372 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Definitions of Variables
- What’s inside data?
- StudyInstanceUID - unique ID for each image
- ETT - Abnormal - endotracheal tube placement abnormal
- ETT - Borderline - endotracheal tube placement borderline abnormal
- ETT - Normal - endotracheal tube placement normal
- NGT - Abnormal - nasogastric tube placement abnormal
- NGT - Borderline - nasogastric tube placement borderline abnormal
- NGT - Incompletely Imaged - nasogastric tube placement inconclusive due to imaging
- NGT - Normal - nasogastric tube placement borderline normal
- CVC - Abnormal - central venous catheter placement abnormal
- CVC - Borderline - central venous catheter placement borderline abnormal
- CVC - Normal - central venous catheter placement normal
- Swan Ganz Catheter Present(??)
- PatientID - unique ID for each patient in the dataset