Chapter 2 Introduction
R is a programming language dedicated to statistics and data analysis.
2.1 Why R ?
- R is tailor made for data science
- Great for reproductibility (scripts, project)
- Do everything from the analysis to the reporting in one tool (thanks to Rmarkdown)
2.1.1 R in GIS
Can we do GIS in R ? Yes !
And we can do R in GIS too !
2.1.1.1 R spatial
There is several tools for handling spatial data in R. Historically, there was {sp}, {rgdal}, {rgeos} as core packages for geospatial data handling and dedicated packages for more advanced processing.
The {sf} package released a couple years ago is a modernisation of thoses packages. It connects directly to GDAL, GEOS and PROJ libraries and implements the Simple Features Access into R. It is compatible with the Tidyverse collection of R packages (more on that later).
See r-spatial.github.io/sf for more informations about {sf}.
2.1.1.2 R in QGIS
Since Sextante first release, you can use R scripts in QGIS Processing toolbox !
2.1.1.3 RQGIS
RQGIS is R package that provides access to QGIS functionnality within the R environnement.
2.1.2 RSAGA
There is also the RSAGA package to access SAGA processing tools within R.
2.1.3 GRASS
You can use R in GRASS or GRASS in R too : see the GRASS wiki
So there is a lot of connexions between R and the GIS world. In the R data science context, it becomes geodata science and adds tools to understand and visualize the dataset.
2.2 Base R, Tidyverse and data.table
Base R is a set of functions shipped when you install R. Some of those functions were written by different people, not always developers, so even it is powerful it is not homogenous in the syntax or not always efficient (avoid writing for loops in R).
As a language, R is like French; it has an elegant core, but every rule comes with a set of ad-hoc exceptions that directly contradict it. http://r.cs.purdue.edu/pub/ecoop12.pdf
This critisism lead people to create packages to mitigate those issues.
The Tidyverse is a set of homogenous packages, providing a coherent syntax around verbs (filter, select, etc) and the possibility to pipe operations.
Data processing with R tidyverse - Ginolhac and al. 2017.
It aims to provide readibility and understanding rather than performance (althought it sometimes provides both).
The Tidyverse is modular as you can load each packages separatly. We’ll use mostly the {dplyr} package in this workshop.
data.table in the other end was created to be more efficient and can easily compete with Python Pandas for example. data.table can handle very large datasets. Its syntax is close to Python Pandas but less readible by non programmers.
It is not closed worlds. You can mix all of these in your script like Suzan Baert explained at SatRday Paris 2019.
You can even mix R with Python if you need to with the Rpackage reticulate or rpy2.
This document is based on geodata handling with {sf} and {dplyr} (from the Tidyverse toolset).
This can be followed on Rstudio or Jupyter R notebook.