xgboost and kaggle with R
개요
- R 강의를 진행하면서
xgboost를 R로 구현하고 싶었다. kaggle에 있는 데이터를 불러와서 제출까지 가는 과정을 담았으니 입문자들에게 작은 도움이 되기를 바란다.
XGBoost 개요
- 논문 제목 - XGBoost: A Scalable Tree Boosting System
- 논문 게재일: Wed, 9 Mar 2016 01:11:51 UTC (592 KB)
- 논문 저자: Tianqi Chen, Carlos Guestrin
- 논문 소개
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

