Ranzcr

Tutorial of Ranzcr EDA

강의 홍보

Competition

Intro

import os

import pandas as pd

from matplotlib import pyplot as plt
import seaborn as sns

Check File Size

  • Check Each Size of Dataset Folder in this competition
    • train_records = 4.5GB
    • test_tfrecords = 0.5MB
    • train (image data) = 6.5GB
    • test (image data) = 0.8MB
import os

def get_folder_size(file_directory):
  # file_list = os.listdir(file_directory)
  dir_sizes = {}
  for r, d, f in os.walk(file_directory, False):
      size = sum(os.path.getsize(os.path.join(r,f)) for f in f+d)
      size += sum(dir_sizes[os.path.join(r,d)] for d in d)
      dir_sizes[r] = size
      print("{} is {} MB".format(r, round(size/2**20), 2))      
  
base_dir = '../input/ranzcr-clip-catheter-line-classification'
get_folder_size(base_dir)
../input/ranzcr-clip-catheter-line-classification/test is 805 MB
../input/ranzcr-clip-catheter-line-classification/test_tfrecords is 555 MB
../input/ranzcr-clip-catheter-line-classification/train_tfrecords is 4563 MB
../input/ranzcr-clip-catheter-line-classification/train is 6592 MB
../input/ranzcr-clip-catheter-line-classification is 12524 MB

Check train file

  • Let’s descirbe train
train = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train.csv', index_col = 0)
test = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/sample_submission.csv', index_col = 0)
display(train.head())
display(test.head())
ETT - Abnormal ETT - Borderline ETT - Normal NGT - Abnormal NGT - Borderline NGT - Incompletely Imaged NGT - Normal CVC - Abnormal CVC - Borderline CVC - Normal Swan Ganz Catheter Present PatientID
StudyInstanceUID
1.2.826.0.1.3680043.8.498.26697628953273228189375557799582420561 0 0 0 0 0 0 1 0 0 0 0 ec89415d1
1.2.826.0.1.3680043.8.498.46302891597398758759818628675365157729 0 0 1 0 0 1 0 0 0 1 0 bf4c6da3c
1.2.826.0.1.3680043.8.498.23819260719748494858948050424870692577 0 0 0 0 0 0 0 0 1 0 0 3fc1c97e5
1.2.826.0.1.3680043.8.498.68286643202323212801283518367144358744 0 0 0 0 0 0 0 1 0 0 0 c31019814
1.2.826.0.1.3680043.8.498.10050203009225938259119000528814762175 0 0 0 0 0 0 0 0 0 1 0 207685cd1
ETT - Abnormal ETT - Borderline ETT - Normal NGT - Abnormal NGT - Borderline NGT - Incompletely Imaged NGT - Normal CVC - Abnormal CVC - Borderline CVC - Normal Swan Ganz Catheter Present
StudyInstanceUID
1.2.826.0.1.3680043.8.498.46923145579096002617106567297135160932 0 0 0 0 0 0 0 0 0 0 0
1.2.826.0.1.3680043.8.498.84006870182611080091824109767561564887 0 0 0 0 0 0 0 0 0 0 0
1.2.826.0.1.3680043.8.498.12219033294413119947515494720687541672 0 0 0 0 0 0 0 0 0 0 0
1.2.826.0.1.3680043.8.498.84994474380235968109906845540706092671 0 0 0 0 0 0 0 0 0 0 0
1.2.826.0.1.3680043.8.498.35798987793805669662572108881745201372 0 0 0 0 0 0 0 0 0 0 0

Definitions of Variables

  • What’s inside data?
    • StudyInstanceUID - unique ID for each image
    • ETT - Abnormal - endotracheal tube placement abnormal
    • ETT - Borderline - endotracheal tube placement borderline abnormal
    • ETT - Normal - endotracheal tube placement normal
    • NGT - Abnormal - nasogastric tube placement abnormal
    • NGT - Borderline - nasogastric tube placement borderline abnormal
    • NGT - Incompletely Imaged - nasogastric tube placement inconclusive due to imaging
    • NGT - Normal - nasogastric tube placement borderline normal
    • CVC - Abnormal - central venous catheter placement abnormal
    • CVC - Borderline - central venous catheter placement borderline abnormal
    • CVC - Normal - central venous catheter placement normal
    • Swan Ganz Catheter Present(??)
    • PatientID - unique ID for each patient in the dataset