engine2.qmd

```{r,echo=FALSE,message=FALSE,warning=FALSE}
library(lidR)
library(stars)

r3dDefaults = rgl::r3dDefaults
m = structure(c(0.921, -0.146, 0.362, 0, 0.386, 0.482, -0.787, 0, 
                -0.06, 0.864, 0.5, 0, 0, 0, 0, 1), .Dim = c(4L, 4L))
r3dDefaults$FOV = 50
r3dDefaults$userMatrix = m
r3dDefaults$zoom = 0.75


knitr::opts_chunk$set(
  comment =  "#>", 
  collapse = TRUE,
  fig.align = "center")

rgl::setupKnitr(autoprint = TRUE)
```

# LAScatalog processing engine (2/2) {#sec-engine2}

@sec-engine showed how to use the `LAScatalog` processing engine to apply `lidR` functions on a collection of files. This included how to process acquisitions in user defined chunk sizes, loading buffers on-the-fly, saving the output to disk, and parallelizing tasks. These examples all used existing functions provided with the `lidR` package. The power of the engine goes beyond limited used cases, allowing developers to design their own applications.

Let's imagine we want to normalize a data set **and** colorize it using RGB aerial data. According to the previous sections we could do something like this:

``` r
opt_output_files(ctg) <- paste0(tempdir(), "/{*}_normalized") # specify output normalized las disk save location
ctg_norm <- normalize_height(ctg, tin()) # normalize

# loop through catalog files and apply rgb colour
rgb <- terra::rast("rgb.tif")
for (file in ctg$filename)
{
  newname <-  tools::file_path_sans_ext(basename(file))
  newname <- paste0("folder/", newname, "_colored.laz")
  las <- readLAS(file)
  las <- merge_spatial(las, rgb)
  writeLAS(las, newname)
}
```

This method is however far from optimal. First, **the whole** point cloud is read two times:

1.  To normalize
2.  To colorize

This takes a significant amount of time. It also implies the need to create copies of the acquisition after each processing step, taking up a lot of disk storage. Then, to finish, a custom `for` loop is used because `merge_spatial()` cannot be applied on a `LAScatalog` natively.

**What if we were able to do that in a single run?**

## `catalog_apply()` {#sec-engine-catalog-apply}

In this section we outline the `catalog_apply()` function, which is used subliminally in every function seen in the @sec-engine. Here we show how to leverage its versatility to design tools that do not yet exist, like the normalization + colorization example from above, but also more innovative processes.

Again, the `LAScatalog` class and the `LAScatalog` engine are deeply documented in a two dedicated vignettes available [here](https://cran.r-project.org/web/packages/lidR/vignettes/lidR-LAScatalog-class.html) and [here](https://cran.r-project.org/web/packages/lidR/vignettes/lidR-LAScatalog-engine.html). The purpose of these examples is to provide alternative, hands-on documentation with more applied examples and real use cases.

In this section we will use the same 9 files used in @sec-engine and an RGB raster that encompasses the collection of files.

```{r plot-ctg-engine2, fig.height=4, fig.width=4}
ctg <- readLAScatalog("data/ENGINE/catalog/")
plot(ctg)
```

```{r, echo=FALSE}
opt_progress(ctg) <- FALSE
```

The RGB image is available for download [here](data/ENGINE/catalog/rgb.tif).

```{r, echo=FALSE, eval=FALSE}
url="https://servicesmatriciels.mern.gouv.qc.ca:443/erdas-iws/ogc/wmts/Inventaire_Ecoforestier/Inventaire_Ecoforestier/default/GoogleMapsCompatibleExt2:epsg:3857/{z}/{y}/{x}.jpg"
mffp = list(src="Aerial Photography", q=url, sub=NA, cit='')
bbox <- sf::st_buffer(sf::st_as_sfc(sf::st_bbox(ctg)), 300)
bbox <- sf::st_bbox(sf::st_transform(bbox, "EPSG:4326"))
rgb <- maptiles::get_tiles(bbox, crop = TRUE, provider = mffp, zoom = 16, verbose = T)
rgb <- terra::project(rgb, st_crs(ctg)$wkt)
terra::plotRGB(rgb)
plot(ctg, add = T)
bb = ext(ctg)
bb = bb + 100
rgb = terra::crop(rgb, bb)
terra::plotRGB(rgb)
plot(ctg, add = T)
terra::writeRaster(rgb, "data/ENGINE/catalog/rgb.tif", overwrite=TRUE)
```

```{r plot-rgbmap, fig.height=8, fig.width=8, warning=FALSE}
rgbmap <- terra::rast("data/ENGINE/catalog/rgb.tif")
terra::plotRGB(rgbmap)
```

## Create user-defined function {#sec-engine-user-function-generic}

First we need to create a function that combines the normalization + colorization steps. Let's call it `norm_color()`. It's simple and uses only 2 functions.

```{r}
norm_color <- function(las, rgbmap) # create user-defined function
{
  nlas <- normalize_height(las, tin()) # normalize
  colorized <- merge_spatial(nlas, rgbmap) # colorize
  return(colorized) # output
}
```

Let's try it on a sample plot.

```{r plot-las-rgb-colored, rgl = TRUE}
las <- clip_circle(ctg, 338800, 5238500, 40)
nlasrgb <- norm_color(las, rgbmap) # apply user defined function

plot(nlasrgb, color = "RGB", bg = "white", size = 6) # some plotting
```

It works! We have a workable function that performs on a point cloud. Remember that `norm_color()` is just an example. You could instead choose to make an algorithm for individual tree detection or a method to sample point of interest, *let your imagination run wild!*

So far it works only when using a `LAS` object. Now let's make it working with a `LAScatalog`.

## Create an intermediate function for `catalog_apply()` {#sec-engine-user-function-extended}

The simple way is:

``` r
new_ctg <- catalog_map(ctg, norm_color, rgbmap = rgbmap)
```

However, here we will learn the hard way in order to understand better how it works internally and understand the full potential of `catalog_apply()` for which `catalog_map()` is only a simplified version.

The core engine is the function `catalog_apply()` but this function does not work with any user-defined function. User-defined functions must respect a specific template (take a look at the documentation in R) and is expected to perform some specific tasks. First, the primary variable must be a chunk from the catalog (see `chunk` below), the function must read the chunk, check that it is not empty, then perform the computation.

A valid function is the following:

```{r}
norm_color_chunk <- function(chunk, rgbmap) # user defined function
{
  las <- readLAS(chunk)                  # read the chunk
  if (is.empty(las)) return(NULL)        # check if it actually contain points
  nlasrgb <- norm_color(las, rgbmap) # apply computation of interest
  return(nlasrgb) # output
}
```

This introduces an additional level of complexity that is crucial. First, catalog chunks will be sequentially fed into the user's function. Each chunk is read inside the user-defined function with all processing options being automatically respected (`filter`, `select`, `chunk_size`, `chunk_buffer` etc.). The `las` variable in the code snippet above is a point cloud extracted from the catalog that contains the chunks + a buffer. It's important to check that the loaded portion of the collection is not empty or subsequent code will likely fail. This may happen depending on the size of the chunk and the `filter` options chosen.

This function is now `catalog_apply()` compatible and can be applied over the entire ALS acquisition. The output will be a point cloud, so we need to pay attention mitigate memory issues and save to disk. At this stage, we haven't mitigated that problem yet so we add it below.

```{r, echo=FALSE}
# I'm adding that just to compute faster in the example where we do not look at the point cloud anyway
opt_filter(ctg) <- "-keep_class 2"
```

```{r}
opt_output_files(ctg) <- paste0(tempdir(), "/{*}_norm_rgb") # write to disk
output <- catalog_apply(ctg, norm_color_chunk, rgbmap = rgbmap) # implement user-defined function using catalog_apply
str(output)
```

We can see that the output is a `list` with the name of each file written to disk. This is the default behavior of the engine. It returns a `list` with one element per chunk. This isn't really convenient. We have seen in @sec-engine that in the case of `LAS` files we can return a `LAScatalog`, which is far more convenient.

We can use the option `automerge = TRUE` so `catalog_apply()` will automatically merge the list into something more user-friendly.

```{r plot-ctg-colored-overlaps, fig.height=4, fig.width=4}
opt_output_files(ctg) <- paste0(tempdir(), "/{*}_norm_rgb")
options <- list(automerge = TRUE) # merge all the outputs
output <- catalog_apply(ctg, norm_color_chunk, rgbmap = rgbmap, .options = options)
output
```

We now have a `LAscatalog`. It is much better but:

```{r}
plot(output)
```

We still have a problem here. The files are overlapping because we read each file with a buffer and then the outputs have been written **with their buffer**. This is bad practice. It's important to always remove the buffer after the computation. In this specific case, the output is a `LAS` file. When `readLAS()` reads a chunk from the catalog the points in the buffer are flagged so they can be easily manipulated. Since they are flagged, we can easily remove these points by adding this step to our user-defined function. It is always the role of the user to remove the buffer of the output.

```{r plot-ctg-colored, fig.height=4, fig.width=4}
norm_color_chunk <- function(chunk, rgbmap)
{
  las <- readLAS(chunk)
  if (is.empty(las)) return(NULL)
  nlasrgb <- norm_color(las, rgbmap)
  nlasrgb <- filter_poi(nlasrgb, buffer == 0) # remove buffer
  return(nlasrgb)
}

opt_output_files(ctg) <- paste0(tempdir(), "/{*}_norm_rgb")
options <- list(automerge = TRUE)
output <- catalog_apply(ctg, norm_color_chunk, rgbmap = rgbmap, .options = options)
plot(output)
```

When the outputs are spatial vectors or spatial rasters they can be cropped using the bounding box of the chunk accessible via `extent(chunk)` or `bbox(chunk)` or `st_bbox(chunk)` or `ext(chunk)` that match respectively with `raster`, `sp`, `sf/stars` and `terra`.

Nice! We created a custom function -- `norm_color()` -- and we upscale its definition to capably and efficiently process an entire ALS acquisition made up of hundreds of files. Using options described in @sec-engine, we can also introduce parallelization, chunk size control, and buffer size control.

``` r
library(future)
plan(multisession)
opt_filter(ctg) <- "-keep_class 2"
opt_chunk_size(ctg) <- 300
opt_chunk_buffer(ctg) <- 40
opt_output_files(ctg) <- paste0(tempdir(), "/{*}_norm_rgb")
options <- list(automerge = TRUE)
output <- catalog_apply(ctg, norm_color_chunk, rgbmap = rgbmap, .options = options)
```

We can check that it worked by loading a sample from somewhere in our new catalog. According to the options put above, we are expecting to get a normalized (we normalized), colored (we merged RGB image), ground points (we used `-keep_class 2`).

```{r plot-cliped-gnd-colored, rgl = TRUE}
opt_output_files(output) <- ""
las <- clip_circle(output, 338800, 5238500, 60)
plot(las, color = "RGB", size = 6, bg = "white")
```

## Make a user-friendly function for third party users {#sec-engine-user-function-lascatalog}

When designing tools that will be used by third parties, we would like to hide `catalog_apply()` and intermediate functions inside a more high level and user-friendly function similarly to `lidR` functions that work the same both with a `LAS` or a `LAScatalog`. Moreover we would like to make the function foolproof for users. To do this we can create wrap our codes if `if-else` statements to apply to correct code to the correct object.

```{r}
# Create a generic function
norm_color <- function(las, rgbmap)
{
  if (is(las, "LAScatalog")) {
    options <- list(automerge = TRUE)
    output <- catalog_apply(ctg, norm_color, rgbmap = rgbmap, .options = options)
    return(output)
  } else if (is(las, "LAScluster")) {
    x <- readLAS(las)
    if (is.empty(x)) return(NULL)
    nlasrgb <- norm_color(x, rgbmap)
    nlasrgb <- filter_poi(nlasrgb, buffer == 0)
    return(nlasrgb)
  } else if (is(las, "LAS")) {
    nlas <- normalize_height(las, tin())
    colorized <- merge_spatial(nlas, rgbmap)
    return(colorized)
  } else {
    stop("Not supported input")
  }
}
```

We now have a single function that can be used seamlessly on a `LAS` or a `LAScatalog` object.

``` r
las_norm_colored <-  norm_color(las, rgbmap)

opt_output_files(ctg) <- paste0(tempdir(), "/{*}_norm_rgb")
ctg_norm_colored <-  norm_color(ctg, rgbmap)
```

The different steps to apply to a `LAScluster` (i.e. `readLAS` + remove buffer of the output) are, in this case, very common. It is possible to use `catalog_map()` instead, that is capable of doing this tasks for you. But in some cases the output may be not supported, because the cropping step is more complex. In those cases, `catalog_apply()` must be used because it allows for a more accurate control.

```{r}
# Create a generic function
norm_color <- function(las, rgbmap)
{
  if (is(las, "LAScatalog")) {
    output <- catalog_map(ctg, norm_color, rgbmap = rgbmap, .options = options)
    return(output)
  } else if (is(las, "LAS")) {
    nlas <- normalize_height(las, tin())
    colorized <- merge_spatial(nlas, rgbmap)
    return(colorized)
  } else {
    stop("Not supported input")
  }
}
```

## Make a safe function for third party users {#sec-engine-user-function-safe}

At this stage our function is almost finished. However it is not foolproof. What if the user of this new function does not provide any output path template? The point cloud in each chunk will be loaded in memory and will be retained in memory until it eventually becomes full and R crashes. We must prevent the use of the function if no path to the disk is given.

Also, the computation of a DTM requires a buffer. If the user sets a 0 m buffer the output of this function will be incorrect. We must prevent the use of the function if a buffer of 0 is given. These two cases are covered by the engine with the options `need_output_file = TRUE` and `need_buffer = TRUE`. To finish we would like to disable the ability to tune the `select` option to ensure no attribute is lost.

```{r}
norm_color <- function(las, rgbmap)
{
  if (is(las, "LAScatalog")) {
    opt_select(las) <- "*" # disable select tuning
    options <- list(automerge = TRUE, need_output_file = TRUE, need_buffer = TRUE) # require output path & buffer size
    output <- catalog_map(ctg, norm_color, rgbmap = rgbmap, .options = options)
    return(output)
  } else if (is(las, "LAS")) {
    nlas <- normalize_height(las, tin())
    colorized <- merge_spatial(nlas, rgbmap)
    return(colorized)
  } else {
    stop("Not supported input")
  }
}
```

We can test what happens if we use this function naively.

```{r, error = TRUE}
opt_output_files(ctg) <- ""
ctg_norm_colored <- norm_color(ctg, rgbmap)
```

It failed with an informative message!

We are done. This is exactly how the `lidR` package works and we have presented all the tools needed to extend it with new applications.

More examples can be found further along in this book. For example, @sec-outbox-distance-returns presents how to compute a raster of the average distance between first and last returns, and @sec-outbox-rumple-index presents how to compute a rumple index of the canopy from the point cloud and, of course, the documentation of the package itself is more comprehensive than this book.