Chapter 1.6 Google Colab with Kaggle

Page content

강의 홍보

I. 개요

  • 데이터 시각화와 변환에 대해 짧게 익혔다면 바로 실전 데이터를 활용한다.
  • 이론이 조금 부족하게 느껴질 수 있지만, 모든 것을 다 알려드릴 수는 없다.
    • 결국 공부는 스스로 해야 한다.
  • 이 강의의 목적이 Kaggle 데이터를 활용한 Python 포트폴리오 제작 강의임을 잊지 말자.

II. Kaggle KPI 설치

  • Google Colab에서 Kaggle API를 불러오려면 다음 소스코드를 실행한다.
!pip install kaggle
Requirement already satisfied: kaggle in /usr/local/lib/python3.6/dist-packages (1.5.6)
Requirement already satisfied: python-slugify in /usr/local/lib/python3.6/dist-packages (from kaggle) (4.0.0)
Requirement already satisfied: certifi in /usr/local/lib/python3.6/dist-packages (from kaggle) (2020.4.5.1)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.6/dist-packages (from kaggle) (2.8.1)
Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.6/dist-packages (from kaggle) (1.12.0)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from kaggle) (2.23.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from kaggle) (4.41.1)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from kaggle) (1.24.3)
Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.6/dist-packages (from python-slugify->kaggle) (1.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->kaggle) (2.9)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->kaggle) (3.0.4)

III. Kaggle Token 다운로드

  • Kaggle에서 API Token을 다운로드 받는다.
  • [Kaggle]-[My Account]-[API]-[Create New API Token]을 누르면 kaggle.json 파일이 다운로드 된다.
  • 이 파일을 바탕화면에 옮긴 뒤, 아래 코드를 실행 시킨다.

from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
  print('uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  
# kaggle.json을 아래 폴더로 옮긴 뒤, file을 사용할 수 있도록 권한을 부여한다. 
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
Saving kaggle.json to kaggle.json
사용자 uploaded file "kaggle.json" with length 64 bytes
ls -1ha ~/.kaggle/kaggle.json
/root/.kaggle/kaggle.json
  • 에러 메시지가 없으면 성공적으로 json 파일이 업로드 되었다는 뜻이다.

IV. Kaggle 데이터 불러오기

  • 먼저 kaggle competition list를 불러온다.
!kaggle competitions list
Warning: Looks like you're using an outdated API Version, please consider updating (server 1.5.6 / client 1.5.4)
ref                                               deadline             category            reward  teamCount  userHasEntered  
------------------------------------------------  -------------------  ---------------  ---------  ---------  --------------  
digit-recognizer                                  2030-01-01 00:00:00  Getting Started  Knowledge       3174           False  
titanic                                           2030-01-01 00:00:00  Getting Started  Knowledge      23383            True  
house-prices-advanced-regression-techniques       2030-01-01 00:00:00  Getting Started  Knowledge       5379            True  
connectx                                          2030-01-01 00:00:00  Getting Started  Knowledge        388           False  
nlp-getting-started                               2030-01-01 00:00:00  Getting Started      Kudos       1710            True  
competitive-data-science-predict-future-sales     2020-12-31 23:59:00  Playground           Kudos       7188           False  
siim-isic-melanoma-classification                 2020-08-17 23:59:00  Featured           $30,000        597           False  
global-wheat-detection                            2020-08-04 23:59:00  Research           $15,000        695           False  
open-images-object-detection-rvc-2020             2020-07-31 16:00:00  Playground       Knowledge         22           False  
open-images-instance-segmentation-rvc-2020        2020-07-31 16:00:00  Playground       Knowledge          3           False  
hashcode-photo-slideshow                          2020-07-27 23:59:00  Playground       Knowledge         33           False  
prostate-cancer-grade-assessment                  2020-07-22 23:59:00  Featured           $25,000        526           False  
alaska2-image-steganalysis                        2020-07-20 23:59:00  Research           $25,000        464           False  
halite                                            2020-06-30 23:59:00  Featured             Kudos          0           False  
m5-forecasting-accuracy                           2020-06-30 23:59:00  Featured           $50,000       4708            True  
m5-forecasting-uncertainty                        2020-06-30 23:59:00  Featured           $50,000        566           False  
trends-assessment-prediction                      2020-06-29 23:59:00  Research           $25,000        584           False  
jigsaw-multilingual-toxic-comment-classification  2020-06-22 23:59:00  Featured           $50,000       1257           False  
tweet-sentiment-extraction                        2020-06-16 23:59:00  Featured           $15,000       1839           False  
trec-covid-information-retrieval                  2020-06-03 11:00:00  Research             Kudos         19           False  
  • 여기에서 참여하기 원하는 대회의 데이터셋을 불러오면 된다.
  • 이번 basic강의에서는 house-prices-advanced-regression-techniques 데이터를 활용한 데이터 가공과 시각화를 연습할 것이기 때문에 아래와 같이 코드를 실행하여 데이터를 불러온다.
!kaggle competitions download -c house-prices-advanced-regression-techniques
Warning: Looks like you're using an outdated API Version, please consider updating (server 1.5.6 / client 1.5.4)
Downloading train.csv to /content
  0% 0.00/450k [00:00<?, ?B/s]
100% 450k/450k [00:00<00:00, 67.0MB/s]
Downloading data_description.txt to /content
  0% 0.00/13.1k [00:00<?, ?B/s]
100% 13.1k/13.1k [00:00<00:00, 13.8MB/s]
Downloading test.csv to /content
  0% 0.00/441k [00:00<?, ?B/s]
100% 441k/441k [00:00<00:00, 61.5MB/s]
Downloading sample_submission.csv to /content
  0% 0.00/31.2k [00:00<?, ?B/s]
100% 31.2k/31.2k [00:00<00:00, 32.3MB/s]
data_description.txt  sample_data  sample_submission.csv  test.csv  train.csv
!ls
data_description.txt  sample_data  sample_submission.csv  test.csv  train.csv
  • 현재 총 4개의 데이터를 다운로드 받았다.
    • data_description.txt
    • sample_submission.csv
    • test.csv
    • train.csv

V. What’s Next

  • Google Colab에서 Kaggle API를 활용하여 데이터를 불러오는 것을 실습하였다.
  • 다만, 이제 고민해야 한다.
    • 이 프로그램을 종료하면 다운로드 받은 데이터는 사라진다.
  • 해결방안은 여러가지가 있다.
    • DB연동 (DB연동이라고요?)
    • 드라이브연동
    • Local 환경으로 다운로드 받기
  • DB연동은 초보자에게 조금 어렵다 (DB 종류도 많기도 하고..)
  • Local 환경으로 다운로드 받으면 애초에 구글 코랩을 사용할 이유가 없다.
  • 마지막 남은 옵션은 드라이브 연동! 어렵지 않다. 쉽다. 그러나 실습해야 자기것이 된다. Let’s Go

Reference

출처: https://colab.research.google.com/github/corrieann/kaggle/blob/master/kaggle_api_in_colab.ipynb