# Chapter 3 R basics

How to manipulate data in R ? How to install and load a package ? Let’s see..

## 3.2 Where am I ?

To get the current directory, use `getwd()` :

``getwd()``
`` "D:/Roelandt/PERSONNEL/FOSS4G2019_Geoprocessing_with_R_workshop"``

If you need to change the directory, there is `setwd()`

``setwd("path/to/my/directory")``

If you use Rstudio, I can only recommand to work with a project workflow to avoid path issues on another computer.

## 3.3 Make calculations

``1+1``
`` 2``
``3 * 4``
`` 12``
``7/3``
`` 2.333333``
``7%%3 # rest of the division``
`` 1``

## 3.4 Arthmetic functions

R provides a lot of arithmetic functions by default :

``sqrt(4.0)``
`` 2``
``abs(-625)``
`` 625``
``log10(12900)``
`` 4.11059``

## 3.5 Assign values to a variable

``````fruits <- c("apples", "pears", "lemons")
fruits``````
`` "apples" "pears"  "lemons"``
``````quantities <- c(3, 2, 1)
print(quantities)``````
`` 3 2 1``
Indices in R start at 1 !
``print(fruits)``
`` "apples"``
``print(fruits) # returns nothing``
``character(0)``

## 3.6 For loop and print

### 3.6.1 Simple for loop

``````for (fruit in fruits) {
print(fruit)
}``````
`````` "apples"
 "pears"
 "lemons"``````

### 3.6.2 For loop with indices

``````for (x in seq(length(fruits))) {
print(paste0("I have ", quantities[x]," ", fruits[x],"."))
}``````
`````` "I have 3 apples."
 "I have 2 pears."
 "I have 1 lemons."``````
For loops in R are possible but not memory efficient. So if you need to walk through a large amount of data, please consider using functions instead.

## 3.7 Data types

### 3.7.1 Vectors

`fruits` and `quantities` are character and numeric vectors.

``class(fruits)``
`` "character"``
``class(quantities)``
`` "numeric"``

Vectors are the most basic R data object. There is six types of atomic vectors: logical, integer, double, complex, character and raw. You can’t mix types in vectors.

### 3.7.2 Dataframes

Another frequently encountered data type is the dataframe. It is a collection data organized by rows and columns. Columns that can be of different types. Rows don’t have to unique but having tidy data is known as a good pratice :

1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.

Good thing is, in GIS, we tend to have tidy data, right ?

How to create a data frame from our vectors ?

#### 3.7.2.1 With `cbind.data.frame()`

``````df1 <- cbind.data.frame(fruits, quantities) # column binding
print(df1)``````
``````  fruits quantities
1 apples          3
2  pears          2
3 lemons          1``````
``class(df1)``
`` "data.frame"``
``````df2 <- as_data_frame(fruits) # column binding
colnames(df2) <- "fruits" # change column name
print(df2)``````
``````# A tibble: 3 x 1
fruits
<chr>
1 apples
2 pears
3 lemons``````
``class(df2)``
`` "tbl_df"     "tbl"        "data.frame"``
Tibbles (`tbl` // `tbl_df`) are dataframes on steroids from the tidyverse.

### 3.7.3 Add columns to a dataframe

``````df3 <- cbind(df2, # entry dataframe
quantities, # column with quantities
price = c(4,7,9) # new colum with price
)
df3``````
``````  fruits quantities price
1 apples          3     4
2  pears          2     7
3 lemons          1     9``````

### 3.7.4 Other datatypes

• Matrices
• Lists (`list()`) : collection of objects of different kind
List in R are not like lists in Python.

## 3.8 Filtering / Subsetting

In R, you can subset your data by value or variable. There is several way to do it, here is some of them.

### 3.8.1 Select variables

``names(df3)``
`` "fruits"     "quantities" "price"     ``
``df3[, 2:3]``
``````  quantities price
1          3     4
2          2     7
3          1     9``````
``df3[, c("fruits","price")]``
``````  fruits price
1 apples     4
2  pears     7
3 lemons     9``````
``````df3 %>% # pipe symbol
select(fruits, quantities) # select from dplyr``````
``````  fruits quantities
1 apples          3
2  pears          2
3 lemons          1``````

### 3.8.2 Filter values

``df3[df3["price"] > 5,] # don't forget the column comma``
``````  fruits quantities price
2  pears          2     7
3 lemons          1     9``````
``````df3 %>%
filter(quantities >= 2)``````
``````  fruits quantities price
1 apples          3     4
2  pears          2     7``````

### 3.8.3 Mixing selection and filtering

``df3[df3["price"] > 5, 1] # select the prices > 5``
`` "pears"  "lemons"``
``````df3 %>%
filter(price > 5) %>%  # filter first
select(fruits)         # select second``````
``````  fruits
1  pears
2 lemons``````

## 3.9 Joins

Let’s create a new dataframe to join

``````df4 <- cbind.data.frame(fruits = fruits, buyer = c("Sophie", "Marc", "Nathan"))
df4``````
``````  fruits  buyer
1 apples Sophie
2  pears   Marc
3 lemons Nathan``````

### 3.9.1 Merge

``````merged_df <- merge(x = df3, y = df4, by = "fruits", all = TRUE) # OUTER JOIN
merged_df``````
``````  fruits quantities price  buyer
1 apples          3     4 Sophie
2 lemons          1     9 Nathan
3  pears          2     7   Marc``````

See that answer on StackOverflow for more details on left, right, inner and outer joins with `merge()`.

### 3.9.2 Dplyr

``````merged_df <- df3 %>%
full_join(df4) ## or full_join(df4, by = "fruits")

merged_df``````
``````  fruits quantities price  buyer
1 apples          3     4 Sophie
2  pears          2     7   Marc
3 lemons          1     9 Nathan``````