Chapter 3 R basics

How to manipulate data in R ? How to install and load a package ? Let’s see..

3.1 Getting some help

3.2 Where am I ?

To get the current directory, use getwd() :

getwd()
[1] "D:/Roelandt/PERSONNEL/FOSS4G2019_Geoprocessing_with_R_workshop"

If you need to change the directory, there is setwd()

setwd("path/to/my/directory")

If you use Rstudio, I can only recommand to work with a project workflow to avoid path issues on another computer.

3.3 Make calculations

1+1
[1] 2
3 * 4
[1] 12
7/3
[1] 2.333333
7%%3 # rest of the division
[1] 1

3.4 Arthmetic functions

R provides a lot of arithmetic functions by default :

sqrt(4.0)
[1] 2
abs(-625)
[1] 625
log10(12900)
[1] 4.11059

3.5 Assign values to a variable

fruits <- c("apples", "pears", "lemons")
fruits
[1] "apples" "pears"  "lemons"
quantities <- c(3, 2, 1)
print(quantities)
[1] 3 2 1
Indices in R start at 1 !
print(fruits[1])
[1] "apples"
print(fruits[0]) # returns nothing
character(0)

3.6 For loop and print

3.6.1 Simple for loop

for (fruit in fruits) {
  print(fruit)
}
[1] "apples"
[1] "pears"
[1] "lemons"

3.6.2 For loop with indices

for (x in seq(length(fruits))) {
  print(paste0("I have ", quantities[x]," ", fruits[x],"."))
}
[1] "I have 3 apples."
[1] "I have 2 pears."
[1] "I have 1 lemons."
For loops in R are possible but not memory efficient. So if you need to walk through a large amount of data, please consider using functions instead.

3.7 Data types

3.7.1 Vectors

fruits and quantities are character and numeric vectors.

class(fruits)
[1] "character"
class(quantities)
[1] "numeric"

Vectors are the most basic R data object. There is six types of atomic vectors: logical, integer, double, complex, character and raw. You can’t mix types in vectors.

3.7.2 Dataframes

Another frequently encountered data type is the dataframe. It is a collection data organized by rows and columns. Columns that can be of different types. Rows don’t have to unique but having tidy data is known as a good pratice :

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.

Good thing is, in GIS, we tend to have tidy data, right ?

How to create a data frame from our vectors ?

3.7.2.1 With cbind.data.frame()

df1 <- cbind.data.frame(fruits, quantities) # column binding
print(df1)
  fruits quantities
1 apples          3
2  pears          2
3 lemons          1
class(df1)
[1] "data.frame"
df2 <- as_data_frame(fruits) # column binding
colnames(df2) <- "fruits" # change column name
print(df2)
# A tibble: 3 x 1
  fruits
  <chr> 
1 apples
2 pears 
3 lemons
class(df2)
[1] "tbl_df"     "tbl"        "data.frame"
Tibbles (tbl // tbl_df) are dataframes on steroids from the tidyverse.

3.7.3 Add columns to a dataframe

df3 <- cbind(df2, # entry dataframe
             quantities, # column with quantities
             price = c(4,7,9) # new colum with price
             )
df3
  fruits quantities price
1 apples          3     4
2  pears          2     7
3 lemons          1     9

3.7.4 Other datatypes

  • Matrices
  • Lists (list()) : collection of objects of different kind
List in R are not like lists in Python.

3.8 Filtering / Subsetting

In R, you can subset your data by value or variable. There is several way to do it, here is some of them.

3.8.1 Select variables

names(df3)
[1] "fruits"     "quantities" "price"     
df3[, 2:3]
  quantities price
1          3     4
2          2     7
3          1     9
df3[, c("fruits","price")]
  fruits price
1 apples     4
2  pears     7
3 lemons     9
df3 %>% # pipe symbol
  select(fruits, quantities) # select from dplyr
  fruits quantities
1 apples          3
2  pears          2
3 lemons          1

3.8.2 Filter values

df3[df3["price"] > 5,] # don't forget the column comma
  fruits quantities price
2  pears          2     7
3 lemons          1     9
df3 %>%
  filter(quantities >= 2)
  fruits quantities price
1 apples          3     4
2  pears          2     7

3.8.3 Mixing selection and filtering

df3[df3["price"] > 5, 1] # select the prices > 5
[1] "pears"  "lemons"
df3 %>% 
  filter(price > 5) %>%  # filter first
  select(fruits)         # select second
  fruits
1  pears
2 lemons

3.9 Joins

Let’s create a new dataframe to join

df4 <- cbind.data.frame(fruits = fruits, buyer = c("Sophie", "Marc", "Nathan"))
df4
  fruits  buyer
1 apples Sophie
2  pears   Marc
3 lemons Nathan

3.9.1 Merge

merged_df <- merge(x = df3, y = df4, by = "fruits", all = TRUE) # OUTER JOIN
merged_df
  fruits quantities price  buyer
1 apples          3     4 Sophie
2 lemons          1     9 Nathan
3  pears          2     7   Marc

See that answer on StackOverflow for more details on left, right, inner and outer joins with merge().

3.9.2 Dplyr

merged_df <- df3 %>%
  full_join(df4) ## or full_join(df4, by = "fruits")

merged_df
  fruits quantities price  buyer
1 apples          3     4 Sophie
2  pears          2     7   Marc
3 lemons          1     9 Nathan

See the documentation of {dplyr} for more information on joins.

3.10 Going further

If you want to go further in the learning of the R language and the Tidyverse tools, there is a lot of resources online. You might want to start by those :