So you are applying for that data scientist position? Try and impress us in 120 minutes.
1 What this “test” is for
This test will not affect your classification if you are registered for the regular exam at the Charles University. It serves (1) your self-assessment test immediately after the course (without time to internalize what you were exposed to) and (2) as a practical feedback for the teacher about the efficiency of the course setup.
The task is a little project that encompasses virtually the entire curriculum. Test yourself whether you have an idea how to approach the task and are able to look up the corresponding workflows. If you get stuck with the coding, try to describe what you would do in prose.
Please write your project in a Quarto file and e-mail (or otherwise share) the result to the teacher as an html file. Use prose to comment what you are doing and how you interpret the results.
Please include a section of “Final rant” or “Final sigh” where you describe your learning experience with the datasets, the tools, DataCamp, the study materials, and their presentation. Be as specific as possible - which topic was particularly confusing, where did you get lost, did DataCamp help you keep on track, did you look at the teaching materials after the lectures, did that help?
Imagine you are sharing your experience with a colleague who considers attending a hypothetical next run of this course.
Also, please share your good practices from your previous studying/teaching experience for inspiration.
2 Task for those that work on their own data
Please showcase your data and your research question using R as much as possible.
3 Task for those who want to recap on familiar data
Read the file
datasets_ATRIUM/gapminder_metadata_filenames.tsv
.Pick two files according to your liking.
Read each of them in and analyze it:
how many countries
how many years each country
Describe the table in words as well: is it a complete dataset, or would you have to do a lot of filtering to get the same years for each country?
Plot the socioeconomic indicator (most likely the third column).
Join the two tables that you have described.
Compute the correlation between the two indicators the two tables show:
cor(x,y, method = "pearson")
.Render the file as html. In the unlikely case that you would need to include any images from your local computer (e.g. screenshots), replace the YAML header of your quarto file with the contents of the file
YAML_header_multiformat
from theATRIUM_resources
folder.
3.1 Optional tasks
If the tasks above were easy, try to level up and include in the rendered file:
Filter the metadata dataframe (read from
datasets_ATRIUM/gapminder_metadata_filenames.tsv
) automatically for (any number of) files whose names suggest a topic that interests you. To do that, look up the appropriate function in thestringr
library cheatsheet and use it to find a string that indicates your favorite topic. For instance “employment”.Process the files in a loop (with or without the joining)
Write the workflow as a function
Run the function in a loop and/or with the
purrr::map
or otherpurrr::
function you look up.