[Python] PyDataset Library를 활용한 Sample 데이터 수집
Page content
강의 홍보
- 취준생을 위한 강의를 제작하였습니다.
- 본 블로그를 통해서 강의를 수강하신 분은 게시글 제목과 링크를 수강하여 인프런 메시지를 통해 보내주시기를 바랍니다.
스타벅스 아이스 아메리카노를 선물
로 보내드리겠습니다.
- [비전공자 대환영] 제로베이스도 쉽게 입문하는 파이썬 데이터 분석 - 캐글입문기
1줄 요약
- R처럼 Sample 데이터를 쉽게 불러오자.
Sample Dataset
- Sample Data를 가져오는 코드를 작성합니다.
- 이 때
PyDataset
라이브러리를 활용합니다.
!pip install pydataset
Collecting pydataset
[?25l Downloading https://files.pythonhosted.org/packages/4f/15/548792a1bb9caf6a3affd61c64d306b08c63c8a5a49e2c2d931b67ec2108/pydataset-0.2.0.tar.gz (15.9MB)
[K |████████████████████████████████| 15.9MB 285kB/s
[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from pydataset) (1.1.5)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->pydataset) (2.8.1)
Requirement already satisfied: numpy>=1.15.4 in /usr/local/lib/python3.7/dist-packages (from pandas->pydataset) (1.19.5)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->pydataset) (2018.9)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas->pydataset) (1.15.0)
Building wheels for collected packages: pydataset
Building wheel for pydataset (setup.py) ... [?25l[?25hdone
Created wheel for pydataset: filename=pydataset-0.2.0-cp37-none-any.whl size=15939431 sha256=ebe470895a3467fe13c7654021e9108227a6dec8ce6da4f9b4e704520bcd6203
Stored in directory: /root/.cache/pip/wheels/fe/3f/dc/5d02ccc767317191b12d042dd920fcf3432fab74bc7978598b
Successfully built pydataset
Installing collected packages: pydataset
Successfully installed pydataset-0.2.0
from pydataset import data
print(data())
dataset_id title
0 AirPassengers Monthly Airline Passenger Numbers 1949-1960
1 BJsales Sales Data with Leading Indicator
2 BOD Biochemical Oxygen Demand
3 Formaldehyde Determination of Formaldehyde
4 HairEyeColor Hair and Eye Color of Statistics Students
.. ... ...
752 VerbAgg Verbal Aggression item responses
753 cake Breakage Angle of Chocolate Cakes
754 cbpp Contagious bovine pleuropneumonia
755 grouseticks Data on red grouse ticks from Elston et al. 2001
756 sleepstudy Reaction times in a sleep deprivation study
- 데이터를 불러오는 코드를 작성한다.
cake = data("cake")
print(cake)
data("cake", show_doc=True)
replicate recipe temperature angle temp
1 1 A 175 42 175
2 1 A 185 46 185
3 1 A 195 47 195
4 1 A 205 39 205
5 1 A 215 53 215
.. ... ... ... ... ...
266 15 C 185 28 185
267 15 C 195 25 195
268 15 C 205 25 205
269 15 C 215 31 215
270 15 C 225 25 225
cake
PyDataset Documentation (adopted from R Documentation. The displayed examples are in R)
## Breakage Angle of Chocolate Cakes
### Description
Data on the breakage angle of chocolate cakes made with three different
recipes and baked at six different temperatures. This is a split-plot design
with the recipes being whole-units and the different temperatures being
applied to sub-units (within replicates). The experimental notes suggest that
the replicate numbering represents temporal ordering.
### Format
A data frame with 270 observations on the following 5 variables.
`replicate`
a factor with levels `1` to `15`
`recipe`
a factor with levels `A`, `B` and `C`
`temperature`
an ordered factor with levels `175` < `185` < `195` < `205` < `215` < `225`
`angle`
a numeric vector giving the angle at which the cake broke.
`temp`
numeric value of the baking temperature (degrees F).
### Details
The `replicate` factor is nested within the `recipe` factor, and `temperature`
is nested within `replicate`.
### Source
Original data were presented in Cook (1938), and reported in Cochran and Cox
(1957, p. 300). Also cited in Lee, Nelder and Pawitan (2006).
### References
Cook, F. E. (1938) _Chocolate cake, I. Optimum baking temperature_. Master's
Thesis, Iowa State College.
Cochran, W. G., and Cox, G. M. (1957) _Experimental designs_, 2nd Ed. New
York, John Wiley \& Sons.
Lee, Y., Nelder, J. A., and Pawitan, Y. (2006) _Generalized linear models with
random effects. Unified analysis via H-likelihood_. Boca Raton, Chapman and
Hall/CRC.
### Examples
str(cake)
## 'temp' is continuous, 'temperature' an ordered factor with 6 levels
(fm1 <- lmer(angle ~ recipe * temperature + (1|recipe:replicate), cake, REML= FALSE))
(fm2 <- lmer(angle ~ recipe + temperature + (1|recipe:replicate), cake, REML= FALSE))
(fm3 <- lmer(angle ~ recipe + temp + (1|recipe:replicate), cake, REML= FALSE))
## and now "choose" :
anova(fm3, fm2, fm1)