dplyr and ggplot2.First steps.
2025-07-28
readr, dplyr, and ggplot2dplyr::glimpsesummarysummary with categorical columns as factors geo time hourly_labour_cost_constant_2017_usd
cze : 22 Min. :1994 Min. : 0.000
svn : 22 1st Qu.:2005 1st Qu.: 9.867
cyp : 21 Median :2011 Median :18.320
deu : 21 Mean :2010 Mean :19.686
pol : 21 3rd Qu.:2017 3rd Qu.:26.915
svk : 21 Max. :2020 Max. :48.720
(Other):420
hourly_labour_cost_constant_2017_usd too long, shorten to labor_cost.
dplyrdplyrdplyr::distinct| geo |
|---|
| arg |
| arm |
| aus |
| aut |
| aze |
| bel |
| bgr |
| can |
| che |
| chl |
| cri |
| cyp |
| cze |
| deu |
| dnk |
| esp |
| est |
| fin |
| fra |
| gbr |
| geo |
| grc |
| hrv |
| hun |
| irl |
| isl |
| isr |
| ita |
| kaz |
| ltu |
| lux |
| lva |
| mda |
| mkd |
| mlt |
| mus |
| nld |
| nor |
| nzl |
| phl |
| pol |
| prt |
| rou |
| rus |
| svk |
| svn |
| swe |
| ukr |
How does it capture the variables now? Is it telling a different story?
ggsave(filename = file.path(project_path, output_folder, "laborcost_plot.svg"),
plot = laborcost_plot)
ggsave(filename = file.path(project_path, output_folder, "laborcost_plot.png"),
plot = laborcost_plot,
device = grDevices::png)
# device = "png" or this when RStudio hiccups
ggsave(filename = file.path(project_path, output_folder, "laborcost_plot.pdf"),
plot = laborcost_plot)
list.files(path = ,file.path(project_path, output_folder), pattern = "laborcost_plot")[1] "laborcost_plot.pdf" "laborcost_plot.png" "laborcost_plot.svg"
ggplot2Specific syntax
A plot is an object (goes in a variable)
Can be saved to files - include desired format in the file name
Maps variables on X, Y, color, transparency…
ggplot2 \(\approx\) implemented Grammar of Graphicshttps://ggplot2.tidyverse.org/
All plots have the same logic and components in a few layers.
Not just drawings, statistical transformations behind the scenes (e.g. histogram)
When you see a ggplot2 plot you have an idea how the source table is structured
ggplot2Data
Aesthetic mappings + Facets (subgraphs)
Geometric objects (aka geoms)
Statistical transformations (aka stats)
Coordinate system
Theme
data frame with tidy data structure
each observation on one row
each variable in one column
categorical variables automatically read as factors
axes X, Y
shape / linetype
color / fill / stroke
size / linewidth
alpha (transparency)
label
plot types, such as:
histogram
scatterplot
barplot
boxplot
heatmap
and many others
ggplot(data, mappings) + geom_…( )
or
ggplot(data) + geom_…(mappings)
Comment on the plot
How do you call plots with points and two axes?
How many variables does the plot capture and how? Which are the types of variables?
Look at the script. Try to dissect it in parts and interpret them.