library(dplyr, warn.conflicts = FALSE, quietly = TRUE)
library(gapminder, warn.conflicts = FALSE, quietly = TRUE)
library(magrittr, warn.conflicts = FALSE, quietly = TRUE)
library(ggplot2, warn.conflicts = FALSE, quietly = TRUE) #for diamonds dataset
Subsetting and Aggregations with dplyr
1 dplyr
: Operations on data frames
on rows
on columns
Select rows, select columns. According to which properties?
Rows:
- Position: first/last n rows, nth to n+20th row,
- Condition(s) on values in one or more variable columns (e.g. the row with the maximum population
- Sampling: random sample, fraction
- Deduplication
- Sorting according to values columns
- Grouping: divide rows into groups according to values of one or more discrete variables (this is not visible anywhere but you need to do it to get group-wise aggregations. For instance, you have a data frame of some physical measures of individual males and females and you want the mean height of males separately from the mean height of females separately.
Columns:
Column position: you want to address the column by its position rather than name
Column name: you want to address the column by its name
Column selection:
you want to select a column for an operation for dropping it from the data frame
you want to transform a column from one data type to another
2 Libraries
3 magrittr
pipe in dplyr
: %>%
René Magritte 1898-1967: Belgian surrealist fascinated by semiotics