Variables and functions

Silvie Cinková

2025-07-24

Tabular data

Table 1: Top longevity Europe 2007
country LifeExpectancy
Iceland 81.757
Switzerland 81.701
Spain 80.941
Sweden 80.884
France 80.657

Observations vs. aggregations (summaries)

Table 2: Life expectancy data in two countries
country year LifeExpectancy
Albania 2007 76.423
Albania 2002 75.651
Albania 1997 72.950
Denmark 2007 78.332
Denmark 2002 77.180
Denmark 1997 76.110
Table 3: Average life expectancy in Albania and Denmark 1997 - 2007
country AverageLifeExpectancy
Albania 75.00800
Denmark 77.20733

Data structures

Data Structures in R

Data types (aka classes)

R data types: characters, numbers, TRUE/FALSE (aka logical/Boolean)

Programmatic objects

  • object \(\approx\) anything you enter to the computer without an error message coming

    3 + 2
    [1] 5

    The resulting object vanished as soon as it appeared on the screen.

Programmatic variables

  • variable \(\approx\) a labeled box in computer memory to store an object

    my_calculation <- 3 + 2

    Nothing appeared, but the object exists in the variable.

    my_calculation
    [1] 5

Variable assignment statement

variable_name <- object
  • variable names must not

    • start with a digit

    • contain any characters but ASCII letters, numbers, and _ (underscore)

    • coincide with a few reserved words in R (e.g. for or in)

    Each variable appears in the Environment tab

Variable overwriting

  • create a different variable for each step in your script

    a <- 1
    b <- a + 2
    b
    [1] 3
  • overwrite the variable

    a <- 1
    a <- a + 2
    a
    [1] 3

Functions and their arguments

  • Functions \(\approx\) verbs

  • “argument structure”/“valency”

    • obligatory vs. free vs. unacceptable (no slot)

    • collocability

      • argument does not fit

      • argument causes a meaning shift

Documentation in Help

Help - functions to replace strings

Libraries (aka packages)

  • additional functions and data sets

  • first install (Packages tab)

  • load when you need a function from there

    • library(thelibrary) without quotes
  • when functions from different packages happen to have the same name

    • R warns when it finds such a pair across your loaded libraries.

    • Use these functions with package name like this: library::function()

Function with an argument

toupper(x = "hello")
[1] "HELLO"

Function with several arguments

chartr(old = "a", new = "A", x = "banana")
[1] "bAnAnA"
chartr(new = "A", x = "banana", old = "a")
[1] "bAnAnA"
chartr("a", "A", "banana") 
[1] "bAnAnA"

Coercion to string

toupper(3)
[1] "3"
toupper(TRUE)
[1] "TRUE"

No coercion with error

chartr(old = "mo", new = "fa", x = "mother") # strings 
[1] "father"
chartr(old = "5", new = "0",   x = "5toFour") # digits as strings
[1] "0toFour"
chartr(old = "5", new = 0, x = "5toFour" ) # new won't work
Error in chartr(old = "5", new = 0, x = "5toFour"): invalid 'new' argument

Write and run your own function

my_function <- function(my_1_arg, 
                        my_2_arg) {
  my_argument <- paste(my_1_arg,# existing R function
                       my_2_arg, 
                       sep = "--")
  toupper(my_argument)
}
# run it like so:
my_function(my_1_arg = "Hello",
            my_2_arg = "again")
[1] "HELLO--AGAIN"

Vector

  • c() function to concatenate / combine values to a vector
my_vector <- c(200, 0.3, 8)
my_vector
[1] 200.0   0.3   8.0
str(my_vector) # shows object's structure: numeric vector of 3 elements
 num [1:3] 200 0.3 8

Vectorized functions

nice_strings <- c("HOME", "EYE", "SET") # create vector 
result_vector <- tolower(nice_strings) # process with tolower
str(result_vector)
 chr [1:3] "home" "eye" "set"
chartr(new = "e", old = "E", x = result_vector)
[1] "home" "eye"  "set" 

Non-vectorized case

  • chartr cannot do a vector of replacements:
chartr(old = c("a", "b"), new = c("A", "B"), x = "banana")
Warning in chartr(old = c("a", "b"), new = c("A", "B"), x = "banana"): argument
'old' has length > 1 and only the first element will be used
Warning in chartr(old = c("a", "b"), new = c("A", "B"), x = "banana"): argument
'new' has length > 1 and only the first element will be used
[1] "bAnAnA"

Step by step (no automation)

banana_1 <- chartr(old = "a", new = "A",  x = "banana")
banana_1
[1] "bAnAnA"
banana_2 <- chartr(old = "b", new = "B", x = banana_1)
banana_2
[1] "BAnAnA"

Vector

  • Class
    • numeric, character, logical
  • Length (how many elements)
  • Elements can have names (named vector).
  • Order of elements matters.

Create a vector, combine them

(a <- c(23:27)) # colon generates incremental sequence
[1] 23 24 25 26 27
(b <- c(a, FALSE, TRUE))
[1] 23 24 25 26 27  0  1

Programming basics

Put each command on one line.

When you enclose an assignment statement in parentheses, the variable will print out.

What you see in the second chunk is called class coercion.

Vector class coercion logical to numeric

  • logical + numeric = numeric
  • boolean values translate to 0 and 1
boolean_vec <- c(TRUE, TRUE, FALSE) # logical 
numeric_vec <-  c(3, 2, 6) # numeric
c(boolean_vec, numeric_vec)
[1] 1 1 0 3 2 6
class(c(boolean_vec, numeric_vec))
[1] "numeric"

Vector class coercion to character

  • anything + character = character
  • even digits can be characters
mix_vec <- c(boolean_vec, 
             numeric_vec, 
             "June 8", "3 ") 
class(mix_vec)
[1] "character"

Programming with vectors

  • select elements of a vector

    • by their position

    • by some condition

Vector subsetting by positions

vec <- c("John", "Mary", "Paul")
vec[1]
[1] "John"
vec[2:3]
[1] "Mary" "Paul"
vec[c(1,3)]
[1] "John" "Paul"

Vector subsetting with logical operators

vec <- c(1, 20, 3, 4)
vec[vec < 20]
[1] 1 3 4
vec[vec < 20 & vec > 1] # and
[1] 3 4
vec[vec > 15 | vec < 3 ] # or 
[1]  1 20

Vector recycling

  • many functions proceed element by element

  • nothing gets recycled with equally long vectors

c(2,4,6,8)/c(2,2,2,2)
[1] 1 2 3 4
c(1000, 100, 10) / c(100, 10, 1)
[1] 10 10 10

Vector recycling

  • The second vector contains just one value and that must serve each element of the first vector

  • 2 gets recycled

c(2,4,6,8) / 2  
[1] 1 2 3 4

Vector recycling

  • the second vector gets recycled once

  • each of its element must serve twice

  • R believes you want it this way

c(2,4,6,8) / c(2,1) 
[1] 1 4 3 8

Vector recycling

  • Did you really want this? Warning.
c(10, 10, 10) * c(1,2)
Warning in c(10, 10, 10) * c(1, 2): longer object length is not a multiple of
shorter object length
[1] 10 20 10

Endnotes

1: Variables = columns (two), observations = rows

2: Qualitative/categorical variables are always discrete: When the values are names of countries like in the example, you cannot have a value that would lie, e.g., between Denmark and Sri Lanka. Life expectancy is a quantitative value and it is continuous. When you see neighboring values, it is very well possible that another country’s life expectancy would still lie between. When you disregard rounding, you could see extremely tiny differences, for instance five seconds or so… On the other hand, year is usually interpreted as a discrete variable, although time is unarguably a continuous concept.