class: inverse, center, middle, nonum .pull-left[ # Introduction to _R_ ### Maximilian H.K. Hesselbarth #### University of Michigan (EEB) 2022-06-30 / 2022-07-01 ] .pull-right[ <img src="data:image/png;base64,#img/first-then.png" width="85%" style="display: block; margin: auto;" /> .ref[Artwork by @allison_horst] ] <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path></svg> [mhessel@umich.edu](mailto:mhessel@umich.edu) <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"></path></svg> [www.maxhesselbarth.com](https://www.maxhesselbarth.com) <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> [@MHKHesselbarth](https://twitter.com/MHKHesselbarth) --- class: center, middle, nonum All slides available at: [https://mhesselbarth.github.io/introduction-r-workshop/](https://mhesselbarth.github.io/introduction-r-workshop/) --- # About myself .pull-left[ <img src="data:image/png;base64,#img/hex-logos.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Member of the [Coastal Ecology and Conservation Lab](https://www.jacoballgeier.com) .small[("Allgeier Lab")] - Working on **individual-based simulation modelling**, **landscape ecology**, **point pattern analysis** and spatial data in general - Author/Contributor of several _R_ packages: _landscapemetrics_ .ref[(Hesselbarth et al. 2019)], _shar_ .ref[(Hesselbarth 2021)], _arrR_ .ref[(Esquivel et al. 2022)] and others ] --- class: inverse, left, bottom, clear, nonum ## Section 1: Basic introduction --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # _R_ programming language -- - Widely used programming language for **statistical analysis**, **data science** and much more -- - [_R_](https://www.r-project.org) is **free**, **open-source**, and **multi-platform** (_Windows_, _macOS_, _Linux_) -- - Allows open, reproducible, and transparent research -- - Very active and generally **friendly community** .small[(<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> [#rstats](https://twitter.com/hashtag/RStats), [#rspatial](https://twitter.com/hashtag/RSpatial))] -- - Very popular in ecology .ref[(Atkins et al. 2022, Hesselbarth et al. 2021, Joo et al. 2020, Lai et al. 2019)] -- - Integration of other programming languages (e.g., `Python`, `C`, `C++`) --- background-image: url("data:image/png;base64,#img/rstudio-logo.png") background-position: 95% 5% background-size: 15% # _RStudio_ -- - [_RStudio_](https://www.rstudio.com) as **integrated development environment** (IDE) -- - **Write**, save and **run** _R_ code scripts -- - Includes syntax-highlighting, auto-completion, figure and help panels, ... -- - Allows to use _RStudio_ projects (_.Rproj_) to organize related scripts/figure/data ... <img src="data:image/png;base64,#img/rstudio.png" width="55%" style="display: block; margin: auto;" /> --- class: middle background-image: url("data:image/png;base64,#img/rstudio-logo.png") background-position: 95% 5% background-size: 15% # ...some advice... .pull-left[ <img src="data:image/png;base64,#img/set-fire-rm.png" width=" 100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#img/workspace.png" width=" 100%" style="display: block; margin: auto;" /> ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # _R_ as a calculator -- .pull-left[ `+` Addition `-` Subtraction `*` Multiplication `/` Division `^` Exponentiation ] -- .pull-right[ ```r # add two numbers 14 + 5 ## [1] 19 # combination of different operators (23 - 5) / (4 * 5) ^ 2 ## [1] 0.045 ``` `#` for comments in code ] -- <br> `==`, `!=`, `>=`, `<=` Logical operators `&` "and" statement `|` "or" statement --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # _R_ Objects -- .pull-left[ - **Store** information into objects (also called variables sometimes) - Once objects are assigned using the `<-` operator, they can be **reused** - Use `snake_case` or `camelCase` naming ] -- .pull-right[ ```r # that's my age my_age <- 31 # that's the global life expectancy global_expectancy <- 90 # ...well... *(progress <- (my_age / global_expectancy) * 100) ## [1] 34.44444 # check if I'm still below 1/2 progress <= 50 ## [1] TRUE ``` ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% -- .pull-left[ ## Data types - numeric : `5.34` - integer : `5L` - character : `"fish"` - logical : `TRUE`/`FALSE` - .grey[(complex : `1+4i`)] ] -- .pull-right[ ## Data structures - `vector c()` Collection of elements/values of same type - `matrix()` Multi-dimensional _vector_ (rows, columns) - `data.frame()` Tabular data that allows different types in columns, but same number of rows - `list()` List of different types and/or structures (allows all of above) ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Vectorization -- - Vectorization one major **strength** of _R_ -- - Operations are applied to **all** elements of vector (or combinations) -- ```r # create named vector with extinction numbers extinct <- c("amphibians" = 35, "birds" = 159, "fish" = 80, "mammals" = 85) # log of all values log(extinct) ## amphibians birds fish mammals ## 3.555348 5.068904 4.382027 4.442651 # calc relative number in relation to maximum (max_number <- max(extinct)) ## [1] 159 *extinct / max_number * 100 ## amphibians birds fish mammals ## 22.01258 100.00000 50.31447 53.45912 # multiply all values with log values *extinct * log(extinct) ## amphibians birds fish mammals ## 124.4372 805.9558 350.5621 377.6254 ``` .ref[Source: https://ourworldindata.org/extinctions] --- class: inverse, center, middle # Palmer penguins dataset <img src="data:image/png;base64,#img/penguins.png" width="65%" style="display: block; margin: auto;" /> .ref[Horst et al. 2020] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Indexing .tiny[(1/3)] .pull-left[ ```r # create species vector (characters) species <- c("Adelie", "Chinstrap", "Gentoo") class(species) ## [1] "character" # subset second object *species[2] ## [1] "Chinstrap" # subset first and third object *species[c(1, 3)] ## [1] "Adelie" "Gentoo" # subset everything BUT second object *species[-2] ## [1] "Adelie" "Gentoo" (species[1:3] <- c("spec_1", "spec_2", "spec_3")) ## [1] "spec_1" "spec_2" "spec_3" c(species, "unknown") ## [1] "spec_1" "spec_2" "spec_3" "unknown" ``` ] .pull-right[ - Generally, indexing uses square brackets `object[element]` - **Positive** or **negative** indexing possible ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Indexing .tiny[(2/3)] .pull-left[ - For _matrices_ and _data.frames_, **two** indices are required `object[rows, cols]` - _data.frame_ columns and **named elements** can be accessed using `object$name` ] .pull-right[ ```r class(penguins) ## [1] "tbl_df" "tbl" "data.frame" # subset rows 1-5 and columns 1, 3, 8 *penguins[1:5, c(1, 3, 8)] ## # A tibble: 5 × 3 ## species bill_length_mm year ## <fct> <dbl> <int> ## 1 Adelie 39.1 2007 ## 2 Adelie 39.5 2007 ## 3 Adelie 40.3 2007 ## 4 Adelie NA 2007 ## 5 Adelie 36.7 2007 # subset body mass column as vector body_mass <- penguins$body_mass_g ``` ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Indexing .tiny[(3/3)] - Logical **tests** can be used to subset using `object[TRUE/FALSE-vector]` - If either row or column index is empty, **all** are returned ```r # subset all individuals with body mass larger than 4000 *penguins[penguins$body_mass_g > 4000, ] ## # A tibble: 174 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## <fct> <fct> <dbl> <dbl> <int> <int> ## 1 <NA> <NA> NA NA NA NA ## 2 Adelie Torgersen 39.2 19.6 195 4675 ## 3 Adelie Torgersen 42 20.2 190 4250 ## 4 Adelie Torgersen 34.6 21.1 198 4400 ## 5 Adelie Torgersen 42.5 20.7 197 4500 ## 6 Adelie Torgersen 46 21.5 194 4200 ## 7 Adelie Dream 39.2 21.1 196 4150 ## 8 Adelie Dream 39.8 19.1 184 4650 ## 9 Adelie Dream 44.1 19.7 196 4400 ## 10 Adelie Dream 39.6 18.8 190 4600 ## # … with 164 more rows, and 2 more variables: sex <fct>, year <int> ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Functions .tiny[(1/3)] -- .pull-left[ - Block of code that takes **input** and creates **output** - **Packages** are **collections** of functions provided by the community ```r library(palmerpenguins) library(scales) ``` ] -- .pull-right[ - Most functions take input value and often additional **arguments** - Functions are evaluated from **inside** to **outside** - Pass values by name or argument ```r # standardize body mass to 0-1 scale, # calculate mean, and calculate log log(mean(rescale(body_mass, to = c(0, 1)), na.rm = TRUE)) ## [1] -0.8742998 head(rescale(body_mass, to = c(1, 2))) ## [1] 1.291667 1.305556 1.152778 NA 1.208333 1.263889 head(rescale(body_mass, c(1, 2))) ## [1] 1.291667 1.305556 1.152778 NA 1.208333 1.263889 ``` ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Functions .tiny[(2/3)] -- .pull-left[ - Very **powerful resource** to understand and apply functions - **Error messages** often give hints whats wrong ```r min(penguins$sex) ## Error in Summary.factor(structure(c(2L, 1L, 1L, NA, 1L, 2L, 1L, 2L, NA, : 'min' not meaningful for factors ``` - Read and decided what to do with **warning messages** ```r mean(penguins$sex) ## Warning in mean.default(penguins$sex): argument is not numeric or logical: ## returning NA ## [1] NA ``` - Use e.g., `?function_name` to open **documentation** for help (or highlight + F1) ] -- .pull-right[ <img src="data:image/png;base64,#img/help-page.png" width="100%" style="display: block; margin: auto;" /> ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Functions .tiny[(3/3)] -- - Very easy to create **own** functions using `fun_name <- function(arg1, arg2,...){body}` -- - _R_ always returns the **last line** or explicit `return()` statement. -- ```r *calc_area <- function(x, y, total = FALSE, na.rm = TRUE){ # multiply length * width area <- x * y # calculate sum of all values if (total) { * area <- sum(area, na.rm = na.rm) } * return(area) } head(calc_area(penguins$bill_length_mm, penguins$bill_depth_mm)) ## [1] 731.17 687.30 725.40 NA 708.31 809.58 calc_area(penguins$bill_length_mm, penguins$bill_depth_mm, total = TRUE) ## [1] 256768.7 ``` --- class: inverse, left, bottom, clear, nonum background-image: url("data:image/png;base64,#img/meme-google.png") background-position: 50% 25% background-size: 35% ## Exercise 1: Basic introduction ... work on [_exercise-section-1.Rmd_](https://github.com/mhesselbarth/introduction-r-workshop/blob/main/exercises/exercise-section-1.Rmd) ... --- class: inverse, left, bottom, clear, nonum ## Section 2: Data wrangling and some stats --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Importing data -- - `read.table()` to import tabular **text files** (e.g., _file.csv_) -- - _readxl_ packages to import **Excel** files -- -- - `getwd()`/`setwd()` for current working directory (or better `here()` frome _here_ package) ```r # import tabular text data file (most robust) penguins_csv <- read.table(file = "data/penguins.csv", * header = TRUE, sep = ",") # read MS Excel file penguins_excel <- readxl::read_xlsx(path = "data/penguins.xlsx", sheet = "clean", col_types = c("text", "text", "numeric", "numeric", "numeric", "numeric", "guess", "guess")) head(penguins_excel, n = 3) ## # A tibble: 3 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema… ## 3 Adelie Torge… 40.3 18 195 3250 fema… ## # … with 1 more variable: year <dbl> ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Modify _data.frames_ .pull-left[ - **Modify** or **add** _data.frame_ columns by assigning values ] .pull-right[ ```r # remove all rows that have NA value head(complete.cases(penguins)) ## [1] TRUE TRUE TRUE FALSE TRUE TRUE penguins <- penguins[complete.cases(penguins), ] # modify existing columns *penguins$year <- factor(penguins$year) # create new columns based on existing ones penguins$body_mass_kg <- penguins$body_mass_g / 10000 penguins$bill_area <- penguins$bill_length_mm * penguins$bill_depth_mm # completely new penguins$rand <- runif(n = nrow(penguins)) ``` ] --- background-image: url("data:image/png;base64,#img/dplyr.png") background-position: 95% 5% background-size: 10% # _dplyr_ package .tiny[(1/3)] -- .pull-left[ - Part of the larger [_tidyverse_](https://www.tidyverse.org) _R_ package collection - Many useful functions (mostly) to deal with _data.frames_ - Also provides **pipe** operator, which allows to write `x %>% f() %>% g()` instead of `g(f(x))` ] -- .pull-right[ ```r # filter by sex and body mass *dplyr::filter(penguins, sex == "female", * body_mass_kg > 0.5) %>% # return only selected columns dplyr::select(species, sex, body_mass_kg) %>% # show last 5 rows tail(n = 5) ## # A tibble: 5 × 3 ## species sex body_mass_kg ## <fct> <fct> <dbl> ## 1 Gentoo female 0.505 ## 2 Gentoo female 0.515 ## 3 Gentoo female 0.51 ## 4 Gentoo female 0.52 ## 5 Gentoo female 0.52 ``` ] --- background-image: url("data:image/png;base64,#img/dplyr.png") background-position: 95% 5% background-size: 10% # _dplyr_ package .tiny[(2/3)] -- - `mutate()` to modify or add _data.frame_ columns -- - Can be combined with `case_when()` for if-else statements -- ```r dplyr::filter(penguins, sex == "female", body_mass_kg > 0.5) %>% dplyr::select(species, sex, body_mass_kg) %>% * dplyr::mutate(body_mass_pounds = body_mass_kg * 2.20462, * body_class = dplyr::case_when(body_mass_pounds < 1.125 ~ "light", * body_mass_pounds > 1.125 ~ "heavy")) ## # A tibble: 5 × 5 ## species sex body_mass_kg body_mass_pounds body_class ## <fct> <fct> <dbl> <dbl> <chr> ## 1 Gentoo female 0.505 1.11 light ## 2 Gentoo female 0.515 1.14 heavy ## 3 Gentoo female 0.51 1.12 light ## 4 Gentoo female 0.52 1.15 heavy ## 5 Gentoo female 0.52 1.15 heavy ``` --- background-image: url("data:image/png;base64,#img/dplyr.png") background-position: 95% 5% background-size: 10% # _dplyr_ package .tiny[(3/3)] .pull-left[ - `group_by()` and `summarize()` to calculated **group values** ] .pull-right[ ```r dplyr::group_by(penguins, species) %>% dplyr::summarise(n = dplyr::n(), max_g = max(body_mass_g)) ## # A tibble: 3 × 3 ## species n max_g ## <fct> <int> <int> ## 1 Adelie 146 4775 ## 2 Chinstrap 68 4800 ## 3 Gentoo 119 6300 ``` ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Descriptive stats -- - Many **descriptive** stats e.g., `min()`, `max()`, `mean()`, `sd()`, `range()`, `quantile()`, `density()`, etc. -- - `summary()` very powerful function that returns **different outputs** based on object -- ```r summary(penguins[, -c(9, 10, 11)]) ## species island bill_length_mm bill_depth_mm ## Adelie :146 Biscoe :163 Min. :32.10 Min. :13.10 ## Chinstrap: 68 Dream :123 1st Qu.:39.50 1st Qu.:15.60 ## Gentoo :119 Torgersen: 47 Median :44.50 Median :17.30 ## Mean :43.99 Mean :17.16 ## 3rd Qu.:48.60 3rd Qu.:18.70 ## Max. :59.60 Max. :21.50 ## flipper_length_mm body_mass_g sex year ## Min. :172 Min. :2700 female:165 2007:103 ## 1st Qu.:190 1st Qu.:3550 male :168 2008:113 ## Median :197 Median :4050 2009:117 ## Mean :201 Mean :4207 ## 3rd Qu.:213 3rd Qu.:4775 ## Max. :231 Max. :6300 ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Correlation - Calculate and test for **correlation** between two _vectors_ using `cor()`/`cor.test()` ```r # calculate correlation between body mass and bill length *cor(x = penguins$body_mass_g, y = penguins$bill_length_mm) ## [1] 0.5894511 # correlation test using formula syntax *cor.test(penguins$body_mass_g, penguins$bill_length_mm) ## ## Pearson's product-moment correlation ## ## data: penguins$body_mass_g and penguins$bill_length_mm ## t = 13.276, df = 331, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.5145745 0.6554058 ## sample estimates: ## cor ## 0.5894511 ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Student's t-Test -- - Student's t-Test to test if the **means** of two groups are **different** -- - Check documentation for **additional arguments** and test specifications ```r # extract male and femalae body mass values male <- penguins[penguins$sex == "male" & penguins$species == "Adelie", "body_mass_g", drop = TRUE] female <- penguins[penguins$sex == "female" & penguins$species == "Adelie", "body_mass_g", drop = TRUE] # compute t.test between male and female *t.test(male, female, conf.level = 0.95) ## ## Welch Two Sample t-test ## ## data: male and female ## t = 13.126, df = 135.69, p-value < 2.2e-16 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 573.0139 776.3012 ## sample estimates: ## mean of x mean of y ## 4043.493 3368.836 ## same but a bit more elegant use formula syntax *# t.test(body_mass_g ~ sex, data = penguins, subset = species == "Adelie") ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% ### Linear regression .tiny[(1/2)] -- - `lm()` to fit **linear regression** model using formula syntax -- - `summary()` behaves different for _lm_ than for _data.frame_ object -- ```r # fit linear model with variables bill length (dependent) and body mass (independent) *mass_length_lm <- lm(formula = bill_length_mm ~ body_mass_g, data = penguins) # get coefficients summary(mass_length_lm) ## ## Call: ## lm(formula = bill_length_mm ~ body_mass_g, data = penguins) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.1652 -3.0664 -0.7672 2.2356 16.0371 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.715e+01 1.292e+00 21.02 <2e-16 *** ## body_mass_g 4.003e-03 3.016e-04 13.28 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.424 on 331 degrees of freedom ## Multiple R-squared: 0.3475, Adjusted R-squared: 0.3455 ## F-statistic: 176.2 on 1 and 331 DF, p-value: < 2.2e-16 ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% ### Linear regression .tiny[(2/2)] - `predict()` allows to predict dependent variable based on new independent values <img src="data:image/png;base64,#index_files/figure-html/predict_plot-1.png" width="30%" style="display: block; margin: auto;" /> ```r bill_predictions <- predict(mass_length_lm, * newdata = data.frame(body_mass_g = seq(from = 3000, to = 6000, by = 250))) bill_predictions ## 1 2 3 4 5 6 7 8 ## 39.16059 40.16142 41.16224 42.16306 43.16388 44.16471 45.16553 46.16635 ## 9 10 11 12 13 ## 47.16717 48.16800 49.16882 50.16964 51.17046 ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # ANOVA -- - ANOVA to test if independent variable(s) affect numerical dependent variable (**more than 2 groups**) -- - Use `*` instead of `+` for **interaction** term -- ```r # test of flipper length depends on sex (2 levels) and species (3 levels) flipper_anova <- aov(flipper_length_mm ~ sex + species, data = penguins) # check anova results *summary(flipper_anova) ## Df Sum Sq Mean Sq F value Pr(>F) ## sex 1 4246 4246 129.5 <2e-16 *** ## species 2 50185 25093 765.3 <2e-16 *** ## Residuals 329 10787 33 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Tukey's post-hoc test - **Post-hoc test** to see difference between groups of independent variable levels ```r TukeyHSD(flipper_anova) ## Tukey multiple comparisons of means ## 95% family-wise confidence level ## ## Fit: aov(formula = flipper_length_mm ~ sex + species, data = penguins) ## ## $sex ## diff lwr upr p adj ## male-female 7.142316 5.907707 8.376925 0 ## ## $species ## diff lwr upr p adj ## Chinstrap-Adelie 5.72079 3.741515 7.700065 0 ## Gentoo-Adelie 27.04253 25.377568 28.707483 0 ## Gentoo-Chinstrap 21.32174 19.272353 23.371118 0 ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Assumptions .tiny[(1/2)] .pull-left[ #### Student's t-Test - Independence - Normality (`shapiro.test()`) - Homogeneity of variances (`var.test()`) - Random Sampling ] .pull-right[ #### Linear regression model - Independence - Normality (`shapiro.test()`) - Homoscedasticity (`plot(lm_object)`) - Linearity (`plot(y ~ x)`) ] ```r *shapiro.test(male) ## ## Shapiro-Wilk normality test ## ## data: male ## W = 0.98269, p-value = 0.416 *shapiro.test(female) ## ## Shapiro-Wilk normality test ## ## data: female ## W = 0.97684, p-value = 0.1985 ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Assumptions .tiny[(2/2)] - `plot(mass_length_lm)` returns four plots helping with model assumptions <img src="data:image/png;base64,#index_files/figure-html/plot_lm-1.png" width="45%" style="display: block; margin: auto;" /> --- class: inverse, left, bottom, clear, nonum background-image: url("data:image/png;base64,#img/lm.png") background-position: 50% 35% background-size: 50% ## Exercise 2: Basic stats ... work on [_exercise-section-2.Rmd_](https://github.com/mhesselbarth/introduction-r-workshop/blob/main/exercises/exercise-section-2.Rmd) ... --- class: inverse, left, bottom, clear, nonum ## Section 3: Data vizualisation --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # _base_ vs. _ggplot2_ - _R_ has two major visualization environments: _base_ and _ggplot2_ - Today, we are going to focus on _base_ .pull-left[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#index_files/figure-html/base_plot-1.png" alt="base plot" width="80%" /> <p class="caption">base plot</p> </div> ] .pull-right[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#index_files/figure-html/ggplot-1.png" alt="ggplot2" width="80%" /> <p class="caption">ggplot2</p> </div> ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Scatter plots .tiny[(1/3)] - Create simple scatter plot of bodymass vs. bill length .pull-left[ ```r plot(x = penguins$body_mass_g, y = penguins$bill_length_mm, * type = "p", cex = 1.0, pch = 19, xlab = "Body mass [g]", ylab = "Bill length [mm]") ``` ] .pull-right[ <img src="data:image/png;base64,#index_files/figure-html/scatter-1.png" width="80%" style="display: block; margin: auto;" /> ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Scatter plots .tiny[(2/3)] - Points can be colored based on values .pull-left[ ```r # specify color palette color_palette <- c(Chinstrap = "#c15bcb", Gentoo = "#077476", Adelie = "#ff8200") # make sure species factor has same order penguins$species <- factor(penguins$species, levels = c("Chinstrap", "Gentoo", "Adelie")) plot(x = penguins$body_mass_g, y = penguins$bill_length_mm, * col = color_palette[penguins$species], type = "p", cex = 1.0, pch = 19, xlab = "Body mass [g]", ylab = "Bill length [mm]") ``` ] .pull-right[ <img src="data:image/png;base64,#index_files/figure-html/scatter_color-1.png" width="80%" style="display: block; margin: auto;" /> ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Scatter plots .tiny[(3/3)] - Add other graphical objects to a plot .pull-left[ ```r # fit regression model (w/o species) lm_ttl <- lm(formula = bill_length_mm ~ body_mass_g, data = penguins) plot(x = penguins$body_mass_g, y = penguins$bill_length_mm, col = color_palette[penguins$species], type = "p", cex = 1.0, pch = 19, xlab = "Body mass [g]", ylab = "Bill length [mm]") # add regression line *abline(lm_ttl) # add legend *legend("topleft", legend = c("Chinstrap", "Gentoo", "Adelie"), col = color_palette, pch = 19) ``` ] .pull-right[ <img src="data:image/png;base64,#index_files/figure-html/scatter_lm-1.png" width="80%" style="display: block; margin: auto;" /> ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Barplot vs. Boxplot -- .pull-left[ ```r df_sum <- dplyr::filter(penguins, species == "Adelie") %>% dplyr::group_by(sex) %>% dplyr::summarise(mn = mean(flipper_length_mm), sd = sd(flipper_length_mm)) *p_bar <- barplot(mn ~ sex, data = df_sum, * ylim = c(0, 220)) arrows(x0 = p_bar, y0 = df_sum$mn - df_sum$sd, x1 = p_bar, y1 = df_sum$mn + df_sum$sd, lwd = 1.5, angle = 90, code = 3) ``` <img src="data:image/png;base64,#index_files/figure-html/barplot-1.png" width="55%" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r boxplot(flipper_length_mm ~ sex, data = penguins, subset = species == "Adelie") ``` <img src="data:image/png;base64,#index_files/figure-html/boxplot-1.png" width="65%" style="display: block; margin: auto;" /> ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Histogram and Density .tiny[(1/3)] .pull-left[ ```r hist(penguins$bill_length_mm, breaks = 20) ``` ] .pull-right[ <img src="data:image/png;base64,#index_files/figure-html/hist-1.png" width="75%" style="display: block; margin: auto;" /> ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Histogram and Density .tiny[(2/3)] .pull-left[ ```r chinstrap <- penguins$bill_length_mm[penguins$species == "Chinstrap"] gentoo <- penguins$bill_length_mm[penguins$species == "Gentoo"] adelie <- penguins$bill_length_mm[penguins$species == "Adelie"] hist(chinstrap, breaks = 20, col = "#c15bcb", xlim =c(30, 60), ylim = c(0, 20), xlab = "Bill length [mm]", main = "") hist(gentoo, breaks = 20, col = "#077476", * add = TRUE) hist(adelie, breaks = 20, col = "#ff8200", * add = TRUE) ``` ] .pull-right[ <img src="data:image/png;base64,#index_files/figure-html/hist_species-1.png" width="75%" style="display: block; margin: auto;" /> ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Histogram and Density .tiny[(3/3)] .pull-left[ ```r *hist(chinstrap, breaks = 20, probability = TRUE, border = "#c15bcb", col = NA, xlim = c(30, 60), xlab = "Bill length [mm]", main = "") hist(gentoo, breaks = 20, probability = TRUE, border = "#077476", col = NA, add = TRUE) hist(adelie, breaks = 20, probability = TRUE, border = "#ff8200", col = NA, add = TRUE) *lines(density(chinstrap), col = "#c15bcb") *lines(density(gentoo), col = "#077476") *lines(density(adelie), col = "#ff8200") ``` ] .pull-right[ <img src="data:image/png;base64,#index_files/figure-html/density-1.png" width="75%" style="display: block; margin: auto;" /> ] --- class: inverse, left, bottom, clear, nonum background-image: url("data:image/png;base64,#img/data-viz.png") background-position: 50% 25% background-size: 60% ## Exercise 3: Data vizualisation ... work on [_exercise-section-3.Rmd_](https://github.com/mhesselbarth/introduction-r-workshop/blob/main/exercises/exercise-section-3.Rmd) ... --- class: inverse, left, bottom, clear, nonum ## Section 4: Some more programming... --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Loops .tiny[(1/2)] -- - Code is **repeated** until condition is met -- - `for` and `while` loops most common -- - Useful if vectorization is not possible, but **can be slow** if used incorrectly - (Never grow a vector, always pre-allocate!) -- .pull-left[ ```r n <- 3 result_good <- numeric(length = n) result_bad <- numeric() for (i in 1:n) { x <- i * 5 * result_good[i] <- x result_bad <- c(result_bad, x) print(paste0("i=", i, " // x=", x)) } ## [1] "i=1 // x=5" ## [1] "i=2 // x=10" ## [1] "i=3 // x=15" ``` ] -- .pull-right[ ```loop_species for (i in c("Chinstrap", "Gentoo", "Adelie")) { dplyr::filter(penguins, species == i) %>% plot(body_mass_g ~ flipper_length_mm, data = ., main = paste("Species:", i)) } ``` ] --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # Loops .tiny[(2/2)] -- - `apply()` (_matrices_) and `lapply()` (_vectors_, _data.frames_, _lists_) as alternative instead of loops -- ```r penguins_mat <- as.matrix(penguins[, 3:6]) # calculate maximum of columns *apply(penguins_mat, 2, max) ## bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## 59.6 21.5 231.0 6300.0 penguins_list <- dplyr::group_by(penguins, island) %>% dplyr::group_split() # use anonymous function to calculate cv of each island lapply(penguins_list, function(i){ * data.frame(island = unique(i$island), cv = sd(i$body_mass_g) / mean(i$body_mass_g))}) ## [[1]] ## island cv ## 1 Biscoe 0.1675845 ## ## [[2]] ## island cv ## 1 Dream 0.1110369 ## ## [[3]] ## island cv ## 1 Torgersen 0.1218404 ``` --- background-image: url("data:image/png;base64,#img/r-logo.png") background-position: 95% 5% background-size: 10% # `ifelse` statements -- .pull-left[ - Check if **logical statement** is `TRUE/FALSE` and run different parts of code - General form: `if(conditions) {do this} else {do that}` - Often used in _loops_ and _functions_ - Vectorized `ifelse()` function available ] -- .pull-right[ ```r # useing vectorized function *ifelse(test = penguins$body_mass_g <= 3500, * yes = "small", no = "large") %>% head() ## [1] "large" "large" "small" "small" "large" "large" # using loop and classic if else statement class <- vector(mode = "character", length = nrow(penguins)) for (i in 1:nrow(penguins)) { # if body mass above 10000 something is probably wrong if (penguins[i, "body_mass_g"] > 10000) stop("Too big?") # classify based on threshold * if (penguins[i, "body_mass_g"] <= 3500) { class[i] <- "small" * } else { class[i] <- "large" } } head(class) ## [1] "large" "large" "small" "small" "large" "large" ``` ] --- class: right, inverse # Resources [_R_ for Data Science](https://r4ds.had.co.nz) [Advanced _R_](https://adv-r.hadley.nz) [Efficient _R_ programming](https://csgillespie.github.io/efficientR/) [What They Forgot to Teach You About _R_](https://rstats.wtf/index.html) [Big Book of _R_](https://www.bigbookofr.com/index.html) [Modern Statistics for Modern Biology](https://www-huber.embl.de/msmb/index.html) [Modern Statistics with R](http://www.modernstatisticswithr.com) [Introduction to Modern Statistics](https://openintro-ims.netlify.app) [ggplot2](https://ggplot2-book.org/index.html) --- class: middle # References .tiny[ Atkins, J.W., Stovall, A.E.L., Alberto Silva, C., 2022. _Open-Source tools in R for forestry and forest ecology._ Forest Ecology and Management 503, 119813. https://doi.org/10.1016/j.foreco.2021.119813 Esquivel, K.E., Hesselbarth, M.H.K., Allgeier, J.E., 2022. _Mechanistic support for increased primary production around artificial reefs._ Ecological Applications e2617. https://doi.org/10.1002/eap.2617 Hesselbarth, M.H.K., Sciaini, M., With, K.A., Wiegand, K., Nowosad, J., 2019. _landscapemetrics: An open‐source R tool to calculate landscape metrics._ Ecography 42, 1648–1657. https://doi.org/10.1111/ecog.04617 Hesselbarth, M.H.K., 2021. _shar: An R package to analyze species-habitat associations using point pattern analysis._ Journal of Open Source Software 6, 3811. https://doi.org/10.21105/joss.03811 Hesselbarth, M.H.K., Nowosad, J., Signer, J., Graham, L.J., 2021. _Open-source tools in R for landscape ecology._ Current Landscape Ecology Reports 6, 97–111. https://doi.org/10.1007/s40823-021-00067-y Horst A.M., Hill A.P., Gorman K.B., 2020. _palmerpenguins: Palmer Archipelago (Antarctica) penguin data._ R package version 0.1.0. Joo, R., Boone, M.E., Clay, T.A., Patrick, S.C., Clusella‐Trullas, S., Basille, M., 2020. _Navigating through the R packages for movement._ Journal of Animal Ecology 89, 248–267. https://doi.org/10.1111/1365-2656.13116 Lai, J., Lortie, C.J., Muenchen, R.A., Yang, J., Ma, K., 2019. _Evaluating the popularity of R in ecology._ Ecosphere 10. https://doi.org/10.1002/ecs2.2567 ] --- class: inverse, center, middle, nonum ## Thank you for your attention. ### Questions? <br> <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path></svg> [mhessel@umich.edu](mailto:mhessel@umich.edu) <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"></path></svg> [www.maxhesselbarth.com](https://www.maxhesselbarth.com) <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> [@MHKHesselbarth](https://twitter.com/MHKHesselbarth)