--- title: "Introduction to geysertimes" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to geysertimes} %\VignetteEngine{knitr::rmarkdown} --- # Basic Use Load the package other packages used in this vignette. ```r suppressPackageStartupMessages(library("dplyr")) library("geysertimes") ``` ## Get the Data The `gt_get_data` function downloads the compressed eruptions data from `https://geysertimes.org/archive/`, reads the data compressed data into R and saves version of the R object in the location specified in the `dest_folder` argument to the function. The default location for `dest_folder` is `file.path(tempdir(), "geysertimes"))`. This default location is used to meet the CRAN requirement of not writing files by default to any location other than under `tempdir()`. ```r default_path <- gt_get_data() #> Set dest_folder to geysertimes::gt_path() so that data persists between R sessions. ``` ```r default_path #> [1] "/tmp/RtmpUtc3wI/geysertimes/2021-05-06" ``` Users are encouraged to set `dest_folder` to the value given by `gt_path()` which is a permanent location appropriate for the user on the particular platform. ```r gt_path() #> [1] "~/.local/share/geysertimes" ``` If a permanent location is used, the user only needs to get the data once. Using the suggested value for `dest_folder`: ```r suggested_path <- gt_get_data(dest_folder=gt_path()) ``` ```r suggested_path #> [1] "~/.local/share/geysertimes/2021-05-06" ``` ## Load the Data The `gt_load_eruptions` is used to load the eruptions data in the current session. The `gt_load_geysers` loads the geyser location data in the current session. ```r eruptions <- gt_load_eruptions() geysers <- gt_load_geysers() ``` A quick look at the data: ```r dim(eruptions) #> [1] 1292133 25 names(eruptions) #> [1] "eruption_id" "geyser" "time" #> [4] "has_seconds" "exact" "near_start" #> [7] "in_eruption" "electronic" "approximate" #> [10] "webcam" "initial" "major" #> [13] "minor" "questionable" "duration" #> [16] "duration_seconds" "duration_resolution" "duration_modifier" #> [19] "entrant" "observer" "comment" #> [22] "time_updated" "time_entered" "primary_id" #> [25] "other_comments" ``` ### Data Version The data that is downloaded is versioned. The version id is the date when the data was downloaded. The `gt_version()` lists the latest version of the data that has been downloaded. Setting `all=TRUE` will list all versions of the data that have been downloaded. ```r gt_version() #> [1] "2021-05-06" ``` ```r gt_version(all=TRUE) #> [1] "2021-05-06" ``` ## Simple Analysis Geysers with the most recorded eruptions: ```r print(n=20, eruptions %>% group_by(geyser) %>% summarise(N=n()) %>% arrange(desc(N))) #> # A tibble: 450 x 2 #> geyser N #> #> 1 Old Faithful 184096 #> 2 Plume 143792 #> 3 Daisy 99807 #> 4 Little Cub 96889 #> 5 Lion 63418 #> 6 Grand 41475 #> 7 Aurum 35881 #> 8 Oblong 34080 #> 9 Fountain 27463 #> 10 Spouter 26981 #> 11 Riverside 26678 #> 12 Castle 25787 #> 13 Logbridge 23416 #> 14 Plate 21594 #> 15 Echinus 21330 #> 16 Depression 20668 #> 17 Turban 18885 #> 18 Great Fountain 18880 #> 19 Grotto 17533 #> 20 Jet 16368 #> # ... with 430 more rows ```