Rows: 548 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): geo
dbl (2): time, hourly_labour_cost_constant_2017_usd
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Gives you the number of rows and columns, the column names with their data class, and it also shows as many elements (values) in each column as to fit your screen.
4summary
summary(laborcost_df)
geo time hourly_labour_cost_constant_2017_usd
Length:548 Min. :1994 Min. : 0.000
Class :character 1st Qu.:2005 1st Qu.: 9.867
Mode :character Median :2011 Median :18.320
Mean :2010 Mean :19.686
3rd Qu.:2017 3rd Qu.:26.915
Max. :2020 Max. :48.720
Gives you the “five-number summary” of each numeric column (it’s often called this way, although the numbers are obviously six…). With categorical columns, it depends, whether the column is a character vector or a factor.
5summary with categorical columns as factors
geo time hourly_labour_cost_constant_2017_usd
cze : 22 Min. :1994 Min. : 0.000
svn : 22 1st Qu.:2005 1st Qu.: 9.867
cyp : 21 Median :2011 Median :18.320
deu : 21 Mean :2010 Mean :19.686
pol : 21 3rd Qu.:2017 3rd Qu.:26.915
svk : 21 Max. :2020 Max. :48.720
(Other):420
If you have a data frame with categorical variables converted to factors, the summary will show you a glimpse of their levels (unique values) and their frequencies, as well as tell you how many levels there are.
So far, do not worry about factors. The dplyr as well as the ggplot2 libraries do this factor conversion on the fly whenever they need it.
6 Rename a column with base R
hourly_labour_cost_constant_2017_usd too long, shorten to labor_cost.
You already know you could have named all columns your way when reading in the file. Here are two ways to rename a column: one base-R-like and the other one provided by dplyr.
ggsave wants a file name (with path) including the format suffix.
Sometimes, especially when you run your scripts over and over in Quarto, something breaks behind the scenes and you get cryptical error messages about mismatch of graphical devices or similar. Sometimes the best solution is to restart R, close the file, clean the Environment and reopen the file. Or run the command directly in the Console.
13 First insights about ggplot2
Specific syntax
A plot is an object (goes in a variable)
Can be saved to files - include desired format in the file name
Maps variables on X, Y, color, transparency…
Feel free to add more insights or impressions.
14ggplot2\(\approx\) implemented Grammar of Graphics
Rows: 43 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): geo
dbl (2): time, labor_cost
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(data = cze_deu_df, mapping =aes(x = time, y = labor_cost, color = geo)) +geom_point(size =7)
20 Syntax
ggplot(data, mappings) + geom_…( )
or
ggplot(data) + geom_…(mappings)
The mapping argument always takes the aes() function.
The plus sign never works at the beginning of a new line.
The geometric object (actual plot) is generated by a geom_something() function.
Different geom_ functions require/accept different aesthetic scales. Look them up in help/cheat sheet.
10.1 Comment on the plot
How do you call plots with points and two axes?
How many variables does the plot capture and how? Which are the types of variables?
Look at the script. Try to dissect it in parts and interpret them.