Introduction to mapmate

The mapmate package is used for map- and globe-based data animation pre-production. Specifically, mapmate functions are used to generate and save to disk a series of map graphics that make up a still image sequence, which can then be used in video editing and rendering software of the user’s choice. This package does not make simple animations directly within R, which can be done with packages like animation. mapmate is more specific to maps, hence the name, and particularly suited to 3D globe plots of the Earth.

This introduction covers the following toy examples:

Generate a data frame containing monthly map data (optionally seasonal or annual aggregate average data) in the form of an n-year moving or rolling average based on an input data frame of raw monthly data.
Generate a sequence of still frames of:
- map data for use in a flat map animation.
- dynamic/temporally changing map data projected onto a static globe (3D Earth)
- static map data projected onto rotating globe
- dynamic map data projected onto rotating globe
Parallel processing examples using parallel::mclapply
Convenient iterator wrapper function
Comparison of map tiles, map lines, and polygons
Non-map data example (time series line growth)

Computing moving averages

With a data frame containing monthly map data in the following form, a data frame of moving averages can be computed using get_ma. The n-year moving average window is controlled by size. The window size pertains to years whether the original monthly data remain as monthly data or are summarized on the fly to seasonal or annual averages.

library(mapmate)
library(dplyr)
library(purrr)
data(monthlytemps)
monthlytemps
#> # A tibble: 1,080 x 5
#>      lon   lat  Year Month     z
#>    <dbl> <dbl> <int> <int> <dbl>
#>  1 -177.  53.7  2010     1  1.68
#>  2 -177.  53.7  2010     2  2.02
#>  3 -177.  53.7  2010     3  0.55
#>  4 -177.  53.7  2010     4  1.49
#>  5 -177.  53.7  2010     5  1.05
#>  6 -177.  53.7  2010     6  0.76
#>  7 -177.  53.7  2010     7  0.92
#>  8 -177.  53.7  2010     8  1.16
#>  9 -177.  53.7  2010     9  0.79
#> 10 -177.  53.7  2010    10  0.47
#> # ... with 1,070 more rows

get_ma(monthlytemps, type = "seasonal", season = "winter")
#> # A tibble: 80 x 4
#>      lon   lat  Year     z
#>    <dbl> <dbl> <dbl> <dbl>
#>  1 -177.  53.7  2015 0.526
#>  2 -177.  53.7  2016 0.410
#>  3 -177.  53.7  2017 0.522
#>  4 -177.  53.7  2018 0.460
#>  5 -177.  53.7  2019 0.547
#>  6 -177.  53.7  2020 0.477
#>  7 -177.  53.7  2021 0.598
#>  8 -177.  53.7  2022 0.702
#>  9 -177.  53.7  2023 0.665
#> 10 -177.  53.7  2024 0.713
#> # ... with 70 more rows

get_ma(monthlytemps, type = "annual", size = 20)
#> # A tibble: 71 x 4
#>      lon   lat  Year     z
#>    <dbl> <dbl> <int> <dbl>
#>  1 -177.  53.7  2019 0.483
#>  2 -177.  53.7  2020 0.486
#>  3 -177.  53.7  2021 0.522
#>  4 -177.  53.7  2022 0.577
#>  5 -177.  53.7  2023 0.590
#>  6 -177.  53.7  2024 0.613
#>  7 -177.  53.7  2025 0.660
#>  8 -177.  53.7  2026 0.685
#>  9 -177.  53.7  2027 0.713
#> 10 -177.  53.7  2028 0.728
#> # ... with 61 more rows

Generate a still image sequence

To keep the data size minimal, the monthlytemps data set only includes data from one map cell, which is sufficient for the moving average example. But for mapping we need more complete map data. The annualtemps data set below is not subsampled, though it is highly spatially aggregated as well as consisting of annual averages of monthly data. This made it small enough to not be a burden to include in the package, but complete enough to use for mapping examples.

data(annualtemps)
annualtemps
#> # A tibble: 55,080 x 4
#>      lon   lat  Year     z
#>    <dbl> <dbl> <int> <dbl>
#>  1 -177.  53.7  2010  1.09
#>  2 -177.  67.0  2010  3.21
#>  3 -177.  73.7  2010  2.76
#>  4 -170   53.7  2010  0.91
#>  5 -170   60.3  2010  2.47
#>  6 -170   67.0  2010  2.73
#>  7 -163.  20.3  2010  0.19
#>  8 -163.  53.7  2010  0.79
#>  9 -163.  60.3  2010  1.43
#> 10 -163.  67.0  2010  1.28
#> # ... with 55,070 more rows

The function below saves png files to disk as a still image sequence. Set your working directory or be explicit about your file destination. Make sure to obtain the range of data across not only space (a single map) but time (all maps). This is to ensure a constant color to value mapping with your chosen color palette across all maps in the set of png files. For simplicity, set the number of still frames to be equal to the number of unique time points in the data frame, in this example one frame for each year.

Since each point in this data frame represents a coarse grid cell, we use type = "maptiles" in the call to save_map. save_map is used for tiles, points (type = "points"), polygon outlines (type = "maplines"), polygon fill (type = = "polygons"), densities (type = "density"), and great circle arcs (type = "network"). One of these six types must be specified and the data must be appropriate for the type.

While we are working with data frames in this tiles example, the data typically would originate from a rasterized object, hence the nice even spacing of the coordinates. Last of all, but important, the data frame must contain a column of still frame sequence IDs or plot number IDs. This variable is simply the integers 1, 2, ..., n for n frames or individual plots and is used for subsetting data for each plot and for sequential numbering of output files. It can have any name. Here it is called frameID and is computed based on the unique years in the example data frame. It is passed to save_map via the id argument.

For these static examples, arguments pertaining specifically to image sequences can be ignored for now (n.period, n.frames, z.range) and since we are not saving files to disk we can also ignore arguments like dir and file, but we will return to all these immediately after.

Initial static 2D and 3D map examples

library(RColorBrewer)
pal <- rev(brewer.pal(11, "RdYlBu"))
temps <- mutate(annualtemps, frameID = Year - min(Year) + 1)
frame1 <- filter(temps, frameID == 1)  # subset to first frame
id <- "frameID"

save_map(frame1, z.name = "z", id = id, ortho = FALSE, col = pal, type = "maptiles", 
    save.plot = FALSE, return.plot = TRUE)
save_map(frame1, z.name = "z", id = id, col = pal, type = "maptiles", save.plot = FALSE, 
    return.plot = TRUE)

2D flat map and 3D globe

Flat map sequence

Dynamic data, static map

save_map is not typically used in the manner above. The above examples are just to show static plot output, but there is no convenience in using mapmate for that when we can just use ggplot2 or other plotting packages directly. Below are some examples of how save_map is usually called as a convenient still image sequence generator.

Notice the importance of passing the known full range of the data. The optional file argument is also used. For a flat map we still do not need to pay attention to the globe-specific arguments (lon, lat, rotation.axis) and those dealing with displaying data while globe rotation occurs across the image sequence can also be ignored (n.period).

rng <- range(annualtemps$z, na.rm = TRUE)
n <- length(unique(annualtemps$Year))
file <- "annual_2D"
temps <- split(temps, temps$frameID)
walk(temps, ~save_map(.x, z.name = "z", id = id, ortho = FALSE, col = pal, type = "maptiles", 
    file = file, z.range = rng))

Globe sequence

Dynamic data, static map

For plotting data on the globe, set ortho = TRUE (default). In this example the globe remains in a fixed position and orientation. The default when passing a single, scalar longitude value to the lon argument is to use it as a starting point for rotation. Therefore, we need to pass our own vector of longitudes defining a custom longitude sequence to override this behavior. Since the globe is to remain in a fixed view, set lon = rep(-70, n). Note that a scalar lat argument does not behave as a starting point for rotation in the perpendicular direction so no explicitly repeating vector is needed.

Also pay attention to n.period, which defines the number of frames or plot iterations required to complete one globe rotation (the degree increment). The default of n.period = 360 means that it will take 360 frames to complete one rotation, the perspective shifting by one degree per frame. More frames will allow for smoother rotation, but also means more data to process. More frames does not automatically mean slower animation, since when producing the video in video editing software there is always the ability to speed up or slow down the frame rate as well as to duplicate or drop frames. That is not relevant here, but worth mentioning since mapmate is primarily intended as a video pre-production package to be used in that broader context and workflow.

n.period can be any amount when providing a scalar lon value. The longitude sequence will propagate to a length of n.period. However, if providing a vector representing a custom path sequence of longitudes to lon, there is no assumption that the sequence is meant to be cyclical (do your own repetition if necessary). It could just be a single pass, custom path to change the globe view. In this case, n.period must be equal to the length of the lon and lat vectors or an error will be thrown. This is more a conceptual restriction than anything. With a custom path defined by lon and lat vectors, which may not be at all cyclical, it only makes sense to define the period as the length of the sequence itself.

file <- "annual_3D_fixed"
walk(temps, ~save_map(.x, z.name = "z", id = id, lon = rep(-70, n), lat = 50, 
    n.period = n, n.frames = n, col = pal, type = "maptiles", file = file, z.range = rng))

Static data, dynamic map

Now let the globe rotate, drawing static data on the surface. In this example the nation borders are drawn repeatedly while allowing the Earth to spin through the still image sequence from frame to frame. walk from the purrr package is used to easily duplicate the borders data frame in a list while adding an plot number ID column that increments over the list.

It’s quite redundant and this is worse if the data set is large, but this is how save_map accepts the data. If you want to avoid duplicating a static data frame in a list, there is always the option to use an explcit for-loop and change the ID value in the data frame at each iteration before passing it to save_map. Fortunately, it would make little difference if the data change from frame to frame, which is the main purpose of save_map. Data sets which do not vary from frame to frame generally serve the role of providing a background layer to other data being plotted, as with the nation borders here and as we will see later with a bathymetric surface.

In the case of repeating data passed to save_map, it only makes sense when plotting on a globe and when using a perspective that has some kind of periodicity to it. This limits how many times the same data must be redrawn. For example, if the Earth completes one rotation in 90 frames, it does not matter that maps of time series data may be plotted over a course of thousands of frames. When layering those maps on top of a constant background layer on the rotating globe, only 90 plots of the background data need to be generated. Those 90 saved images can be looped in a video editor until they reach the end of the time series maps image sequence.

The plot type has changed to maplines. z.range can be ignored because it only applies to map tiles (essentially square grid polygon fill coloring) which have values associated with surface pixels and for polygon fill (type = "polygons") and some types of density maps (type = "density"). It is irrelevant for polygon outlines as well as for type = "points" and for type = "network" plots of line segment data. A single color can be used for the lines so the previously used palette for tiles has been replaced.

data(borders)
borders <- map(1:n, ~mutate(borders, frameID = .x))
file <- "borders_3D_rotating"
walk(borders, ~save_map(.x, id = id, lon = -70, lat = 50, n.period = 30, n.frames = n, 
    col = "orange", type = "maplines", file = file))

Using one more example, return to the list of annual temperature anomalies data frames used previously. Those were used to show changing data on a fixed-perspective globe plot. Here we plot the first layer, repeatedly, as was just done with the borders data set, allowing the Earth to rotate. Remember to update the plot ID values. The key difference with this example of fixed data and a changing perspective is that providing z.range is crucial to maintaining constant color mapping.

temps1 <- map(1:n, ~mutate(temps[[1]], frameID = .x))
rng1 <- range(temps1[[1]]$z, na.rm = TRUE)
file <- "year1_3D_rotating"
walk(temps1, ~save_map(.x, z.name = "z", id = id, lon = -70, lat = 50, n.period = 30, 
    n.frames = n, col = pal, type = "maptiles", file = file, z.range = rng1))

Dynamic data, dynamic map

Finally we have the case of changing data drawn on a map with a changing perspective. This example plots the full time series list of projected annual temperature anomalies data frames with globe rotation.

file <- "annual_3D_rotating"
walk(temps, ~save_map(.x, z.name = "z", id = id, lon = -70, lat = 50, n.period = 30, 
    n.frames = n, col = pal, type = "maptiles", file = file, z.range = rng))

Of course, there is no reason to draw static data on a static flat map or static (non-rotating) globe because that would yield a still image sequence of a constant image. The utility of the function is in generating a series of plots where either the data changes, the perspective changes, or both.

Multiple image sequences

Putting it all together, we can generate three still image sequences to subsequently be layered on top of one another in an animation. Below is an example using a bathymetry surface map as the base layer, the temperature anomalies that will be the middle layer, and the nation borders which will stack on top. This also is why the saved png images have a default transparent background. There is an expectation of eventual layering. In the borader context of video editing it is ideal to have separate layers to work with to not constrain what can be displayed. This is also why save_map does not try to draw every possible map layer, as it would more often than not constrain downstream productivity.

data(bathymetry)
bath <- map(1:n, ~mutate(bathymetry, frameID = .x))
rng_bath <- range(bath[[1]]$z, na.rm = TRUE)
pal_bath <- c("black", "steelblue4")

walk(bath, ~save_map(.x, z.name = "z", id = id, n.frames = n, col = pal_bath, 
    type = "maptiles", file = "background", z.range = rng_bath))
walk(borders, ~save_map(.x, id = id, n.frames = n, col = "black", type = "maplines", 
    file = "foreground"))
walk(temps, ~save_map(.x, z.name = "z", id = id, n.frames = n, col = pal, type = "maptiles", 
    file = "timeseries", z.range = rng))

Parallel processing

For larger data sets and longer sequences, this is much faster the more it can be parallelized. mapmate is designed to add convenience for making relatively heavy duty animations. The emphasis is on images which will look sharp and lend themselves to smooth frame transitions. They may also do so while displaying a large amount of data. High-resolution images (even larger than 4K) can be used to allow zooming during animations without loss of quality. For all these reasons, processing large amounts of data and generating still image sequences can take a long time depending on how large, long, and complex the desired plots sequences are.

The toy examples given earlier are not ideal for serious production. The code is provided for simple reproduction and tutorial purposes. The save_map function can be used more powerfully on a Linux server with a large number of CPU cores, and enough RAM to meet the needs of your data. It uses mclapply from the base R parallel package. It will not work on Windows. The choice was made for convenience.

The example below is like the previous one, but using mclapply. It assumes you have a 32-CPU Linux server node. If you have multiple nodes, you could even go so far as to explore the Rmpi package to link across, say, 10 nodes to yield the power of 320 CPUs. But if you have access to that kind of computing power, you probably already know it and can explore it on your own. It is beyond the scope of this tutorial.

library(parallel)
mclapply(bath, save_map, z.name = "z", id = id, n.frames = n, col = pal_bath, 
    type = "maptiles", file = "background", z.range = rng_bath, mc.cores = 32)
mclapply(borders, save_map, id = id, n.frames = n, col = "orange", type = "maplines", 
    file = "foreground", mc.cores = 32)
mclapply(temps, save_map, z.name = "z", id = id, n.frames = n, col = pal, type = "maptiles", 
    file = "timeseries", z.range = rng, mc.cores = 32)

Iterator wrapper function

The save_seq function is a convenient wrapper around both save_map and save_ts (documented further below). It also provides the option to process in parallel (on Linux systems only, via parallel::mclapply) with the use_mclapply argument. This wrapper function allows for passing the entire data frame, handling the breaking into a list of data frame subsets by plot ID internally. This spares the user from having to explicitly call functions like map and walk directly and moves some of this data preparation work into save_seq.

The serial and parallel equivalents to the previous examples are shown below. It won’t always save much in the way of gross typing because the wrapper is used to pass along the same arguments that would otherwise be passed to save_map or save_ts directly, along with some of its own arguments. But it allows focusing on calling a single plotting iterator function instead of requiring iteration be done explicitly, the iterative aspect being abstracted away to some small degree. Since the function is primarily used for map sequences, style = "map" is the default. Later we will see style = "tsline" for time series line growth sequences.

# Serial
save_seq(bath, z.name = "z", id = id, n.frames = n, col = pal_bath, type = "maptiles", 
    file = "background", z.range = rng_bath)
save_seq(borders, id = id, n.frames = n, col = "black", type = "maplines", file = "foreground")
save_seq(temps, z.name = "z", id = id, n.frames = n, col = pal, type = "maptiles", 
    file = "timeseries", z.range = rng)

# Parallel, Linux, with 32 CPU cores available
save_seq(bath, use_mclapply = TRUE, mc.cores = 32, z.name = "z", id = id, n.frames = n, 
    col = pal_bath, type = "maptiles", file = "background", z.range = rng_bath)
save_seq(borders, use_mclapply = TRUE, mc.cores = 32, id = id, n.frames = n, 
    col = "black", type = "maplines", file = "foreground")
save_seq(temps, use_mclapply = TRUE, mc.cores = 32, z.name = "z", id = id, n.frames = n, 
    col = pal, type = "maptiles", file = "timeseries", z.range = rng)

Polygon fill and limitations

Previous examples made use of type = "maptiles" and type = "maplines". Here is an example using type = "polygons". While polygon borders were drawn earlier with maplines, they didn’t need to be polygon borders and could have been any data suitable to be passed on to ggplot2::geom_path. In the example below, a data frame originating from a SpatialPolygonsDataFrame is expected and the intent is to fill polygons with colors based on data values. A convenient example can be shown making use of the rworldmap package.

library(rworldmap)
library(rworldxtra)  # required for 'high' resolution map
library(maptools)  # required for fortify to work
# also recommend installing rgeos

spdf <- joinCountryData2Map(countryExData, mapResolution = "high")
spdf@data$id <- rownames(spdf@data)
bio <- ggplot2::fortify(spdf, region = "id") %>% left_join(subset(spdf@data, 
    select = c(id, BIODIVERSITY)), by = "id")
n <- 30
bio <- map(1:n, ~mutate(bio, frameID = .x) %>% rename(lon = long))
file <- "bioDivPolygons_3D_rotating"
x1 <- "BIODIVERSITY"
clrs <- c("royalblue", "purple", "orange", "yellow")

# Return a test map
save_map(bio[[1]], z.name = x1, id = id, lon = 0, lat = 20, n.period = n, n.frames = n, 
    col = clrs, type = "polygons", file = file, save.plot = FALSE, return.plot = TRUE)

# Walk over all maps
walk(bio, ~save_map(.x, z.name = x1, id = id, lon = 0, lat = 20, n.period = n, 
    n.frames = n, col = clrs, type = "polygons", file = file))

Note that type = "polygons" will work poorly for the orthographic projection. Nation polygons will be clipped at varying points as the globe rotates, leading to moving artifacts around the perimeter of the visible hemisphere. In the view above, this is clear over North America. There is no work around implemented for this within save_map when using type = "polygons" on a globe. As a result, it is recommended to only use world polygons with flat maps.

The external work around is to rasterize source polygon data, e.g. from a SpatialPolygonsDataFrame to a satisfactorily high resolution and plot the rasterized grid cells data frame using type = "maptiles". See the earlier examples using the annualtemps data set for an introduction to the use of tiles. Of course, this will lead to greater processing time, as geom_tile from ggplot2 is essentially plotting many tiny square polygons rather than the small number of larger polygons in the original data if using geom_polygon. The version using tiles is below. Tiles are one square degree, which will still appear a somewhat coarse resolution. Much higher resolutions are painfully slow to render with geom_tile.

library(raster)
proj4 <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +to wgs84=0,0,0"
r <- raster(extent(-180, 180, -90, 90), nrow = 180, ncol = 360, proj4)
bio2 <- rasterize(spdf, r, field = x1) %>% rasterToPoints %>% tbl_df() %>% setNames(c("lon", 
    "lat", x1))
bio2 <- map(1:n, ~mutate(bio2, frameID = .x))
file <- "bioDivMaptiles_3D_rotating"

# Return a test map
save_map(bio2[[1]], z.name = x1, id = id, lon = 0, lat = 20, n.period = n, n.frames = n, 
    col = clrs, type = "maptiles", file = file, save.plot = FALSE, return.plot = TRUE)

# Walk over all maps
walk(bio2, ~save_map(.x, z.name = x1, id = id, lon = 0, lat = 20, n.period = n, 
    n.frames = n, col = clrs, type = "maptiles", file = file))

Time series line plot

save_ts is similar to save_map in function arguments and behavior. but the current implementation takes a complete data frame, which is subset based on the iteration during the walk process. This was simpler than requiring a list of data frames to be passed, one data frame element for each iteration, as is done with save_map.

The distinction is save_map tends to be used for very large data sets containing many rows of spatially explicit data for each time step. On the other hand, save_ts is used for line charts of time series data. Currently, save_ts draws one line like in the below example, but even if generalized to grouping by factors and drawing multiple lines, the total amount of data will typically remain far less than compared to that passed to save_map. Therefore, it was convenient to leave save_ts as a function which works on a single data frame containing all data.

The example here leaves room for x and y axes but does not draw them. This option is available (and is the default) because this represents fixed content, unlike the time series line that grows through the image sequence. Since mapmate is designed under the assumption that still image sequences are pre-production for data visualization animations and are likely to be layered in external video editing software, there is no need to redraw content in an image sequence which may be represented with a single static image and layered in later.

See the function documentation for other related arguments and options.

means <- group_by(annualtemps, Year) %>% summarise(Mean = mean(z)) %>% ungroup() %>% 
    mutate(frameID = 1:n())
xlm <- range(means$Year)
ylm <- range(means$Mean)
lab <- paste0("ts_", means$frameID[1])
walk(means$frameID, ~save_ts(means, x = "Year", y = "Mean", id = id, cap = .x, 
    col = "blue", xlm = xlm, ylm = ylm))

# Using implicit iteration looks like this.  Note that the cap argument is
# not needed.
save_seq(means, style = "tsline", x = "Year", y = "Mean", id = id, col = "blue", 
    xlm = xlm, ylm = ylm)

Computing moving averages

Generate a still image sequence

Initial static 2D and 3D map examples

Flat map sequence

Dynamic data, static map

Globe sequence

Dynamic data, static map

Static data, dynamic map

Dynamic data, dynamic map

Multiple image sequences

Parallel processing

Iterator wrapper function

Polygon fill and limitations

Time series line plot

Contents