Chapter 3 R basics
How to manipulate data in R ? How to install and load a package ? Let’s see..
3.1 Getting some help
3.2 Where am I ?
To get the current directory, use getwd()
:
getwd()
[1] "D:/Roelandt/PERSONNEL/FOSS4G2019_Geoprocessing_with_R_workshop"
If you need to change the directory, there is setwd()
setwd("path/to/my/directory")
If you use Rstudio, I can only recommand to work with a project workflow to avoid path issues on another computer.
3.3 Make calculations
1+1
[1] 2
3 * 4
[1] 12
7/3
[1] 2.333333
7%%3 # rest of the division
[1] 1
3.4 Arthmetic functions
R provides a lot of arithmetic functions by default :
sqrt(4.0)
[1] 2
abs(-625)
[1] 625
log10(12900)
[1] 4.11059
3.5 Assign values to a variable
fruits <- c("apples", "pears", "lemons")
fruits
[1] "apples" "pears" "lemons"
quantities <- c(3, 2, 1)
print(quantities)
[1] 3 2 1
print(fruits[1])
[1] "apples"
print(fruits[0]) # returns nothing
character(0)
3.6 For loop and print
3.6.1 Simple for loop
for (fruit in fruits) {
print(fruit)
}
[1] "apples"
[1] "pears"
[1] "lemons"
3.6.2 For loop with indices
for (x in seq(length(fruits))) {
print(paste0("I have ", quantities[x]," ", fruits[x],"."))
}
[1] "I have 3 apples."
[1] "I have 2 pears."
[1] "I have 1 lemons."
3.7 Data types
3.7.1 Vectors
fruits
and quantities
are character and numeric vectors.
class(fruits)
[1] "character"
class(quantities)
[1] "numeric"
Vectors are the most basic R data object. There is six types of atomic vectors: logical, integer, double, complex, character and raw. You can’t mix types in vectors.
3.7.2 Dataframes
Another frequently encountered data type is the dataframe. It is a collection data organized by rows and columns. Columns that can be of different types. Rows don’t have to unique but having tidy data is known as a good pratice :
- Each variable forms a column.
- Each observation forms a row.
- Each type of observational unit forms a table.
Good thing is, in GIS, we tend to have tidy data, right ?
How to create a data frame from our vectors ?
3.7.2.1 With cbind.data.frame()
df1 <- cbind.data.frame(fruits, quantities) # column binding
print(df1)
fruits quantities
1 apples 3
2 pears 2
3 lemons 1
class(df1)
[1] "data.frame"
df2 <- as_data_frame(fruits) # column binding
colnames(df2) <- "fruits" # change column name
print(df2)
# A tibble: 3 x 1
fruits
<chr>
1 apples
2 pears
3 lemons
class(df2)
[1] "tbl_df" "tbl" "data.frame"
tbl
// tbl_df
) are dataframes on steroids from the tidyverse.
3.7.3 Add columns to a dataframe
df3 <- cbind(df2, # entry dataframe
quantities, # column with quantities
price = c(4,7,9) # new colum with price
)
df3
fruits quantities price
1 apples 3 4
2 pears 2 7
3 lemons 1 9
3.7.4 Other datatypes
- Matrices
- Lists (
list()
) : collection of objects of different kind
3.8 Filtering / Subsetting
In R, you can subset your data by value or variable. There is several way to do it, here is some of them.
3.8.1 Select variables
names(df3)
[1] "fruits" "quantities" "price"
df3[, 2:3]
quantities price
1 3 4
2 2 7
3 1 9
df3[, c("fruits","price")]
fruits price
1 apples 4
2 pears 7
3 lemons 9
df3 %>% # pipe symbol
select(fruits, quantities) # select from dplyr
fruits quantities
1 apples 3
2 pears 2
3 lemons 1
3.8.2 Filter values
df3[df3["price"] > 5,] # don't forget the column comma
fruits quantities price
2 pears 2 7
3 lemons 1 9
df3 %>%
filter(quantities >= 2)
fruits quantities price
1 apples 3 4
2 pears 2 7
3.8.3 Mixing selection and filtering
df3[df3["price"] > 5, 1] # select the prices > 5
[1] "pears" "lemons"
df3 %>%
filter(price > 5) %>% # filter first
select(fruits) # select second
fruits
1 pears
2 lemons
3.9 Joins
Let’s create a new dataframe to join
df4 <- cbind.data.frame(fruits = fruits, buyer = c("Sophie", "Marc", "Nathan"))
df4
fruits buyer
1 apples Sophie
2 pears Marc
3 lemons Nathan
3.9.1 Merge
merged_df <- merge(x = df3, y = df4, by = "fruits", all = TRUE) # OUTER JOIN
merged_df
fruits quantities price buyer
1 apples 3 4 Sophie
2 lemons 1 9 Nathan
3 pears 2 7 Marc
See that answer on StackOverflow for more details on left, right, inner and outer joins with merge()
.
3.9.2 Dplyr
merged_df <- df3 %>%
full_join(df4) ## or full_join(df4, by = "fruits")
merged_df
fruits quantities price buyer
1 apples 3 4 Sophie
2 pears 2 7 Marc
3 lemons 1 9 Nathan
See the documentation of {dplyr} for more information on joins.
3.10 Going further
If you want to go further in the learning of the R language and the Tidyverse tools, there is a lot of resources online. You might want to start by those :
- Base R : R manuals
- Tidyverse : R for Data Science (free ebook)