To begin using snapclim
effectively, take a look at the climate data collections available in your current version of the package.
library(snapclim)
climate_collections()
#> # A tibble: 1 x 11
#> id description regions points start end daily monthly seasonal
#> <chr> <chr> <lgl> <lgl> <dbl> <dbl> <lgl> <lgl> <lgl>
#> 1 ar5s~ AR5/CMIP5 ~ TRUE TRUE 1901 2100 FALSE TRUE TRUE
#> # ... with 2 more variables: annual <lgl>, decadal <lgl>
This prints a table of available data sets, one per row. The columns provide useful metadata regarding each data set.
For most data sets, a request returns data for a specific location. Given that there are 86 defined climate regions and 3,867 point locations across Alaska and western Canada in the ar5stats
collection, it is helpful to consult a table of available locations.
climate_locations()
#> # A tibble: 3,953 x 2
#> Location Group
#> <chr> <chr>
#> 1 AK-CAN AK-CAN
#> 2 Arctic LCC AK LCC regions
#> 3 North Pacific LCC AK LCC regions
#> 4 Northwestern Interior Forest South LCC AK LCC regions
#> 5 Northwestern Interior Forest North LCC AK LCC regions
#> 6 Western Alaska LCC AK LCC regions
#> 7 Boreal Alaska L1 Ecoregions
#> 8 Maritime Alaska L1 Ecoregions
#> 9 Polar Alaska L1 Ecoregions
#> 10 Alaska Range Transition Alaska L2 Ecoregions
#> # ... with 3,943 more rows
The table is truncated here, but contains nearly 4,000 options under the Location
column. There is a corresponding column providing the set that each location is grouped under. Every location belongs to a group. The table above contains all regions and point locations. climate_locations
can be used to list a subset of only one or the other.
climate_locations(type = "region")
#> # A tibble: 86 x 2
#> Location Group
#> <chr> <chr>
#> 1 AK-CAN AK-CAN
#> 2 Arctic LCC AK LCC regions
#> 3 North Pacific LCC AK LCC regions
#> 4 Northwestern Interior Forest South LCC AK LCC regions
#> 5 Northwestern Interior Forest North LCC AK LCC regions
#> 6 Western Alaska LCC AK LCC regions
#> 7 Boreal Alaska L1 Ecoregions
#> 8 Maritime Alaska L1 Ecoregions
#> 9 Polar Alaska L1 Ecoregions
#> 10 Alaska Range Transition Alaska L2 Ecoregions
#> # ... with 76 more rows
climate_locations(type = "point")
#> # A tibble: 3,867 x 2
#> Location Group
#> <chr> <chr>
#> 1 Adak Station Alaska
#> 2 Afognak Alaska
#> 3 Akhiok Alaska
#> 4 Akiachak Alaska
#> 5 Akiak Alaska
#> 6 Akutan Alaska
#> 7 Alakanuk Alaska
#> 8 Alatna Alaska
#> 9 Aleknagik Alaska
#> 10 Aleut Village Alaska
#> # ... with 3,857 more rows
Some locations share the same name. For example, there is Galena, Alaska and Galena, British Columbia.
library(dplyr)
climate_locations() %>% filter(Location == "Galena")
#> # A tibble: 2 x 2
#> Location Group
#> <chr> <chr>
#> 1 Galena Alaska
#> 2 Galena British Columbia
It is good practice to avoid ambiguity when requesting data, though it is permitted since familiarity with the available locations you are interested in can make your data requests simpler.
snapclim
provides access to a large amount of SNAP climate data, far more than would be stored locally within an R package. The data collections are stored on Amazon Web Services (AWS). snapclim
interfaces with AWS to bring the specific data you need into your R session, as if it were a native package data set.
SNAP climate data sets are accessed with climdata
. If at any time you get stuck with using climdata
, see the function documentation. It provides detailed descriptions and usage for the available function arguments. The first argument, id
, specifies a unique data collection. See the id
column in climate_collections
above. Next, a location is specified. A simple call to climdata
for SNAP 2-km downscaled AR5/CMIP5 climate data summary statistics for Anchorage, Alaska looks like the following.
climdata("ar5stats", "Anchorage")
#> # A tibble: 101,400 x 8
#> RCP Model Var Group Location Year Month Mean
#> <fct> <fct> <fct> <chr> <chr> <int> <fct> <dbl>
#> 1 Historical CRU 4.0 pr Alaska Anchorage 1901 Jan 7
#> 2 Historical CRU 4.0 pr Alaska Anchorage 1901 Feb 1
#> 3 Historical CRU 4.0 pr Alaska Anchorage 1901 Mar 14
#> 4 Historical CRU 4.0 pr Alaska Anchorage 1901 Apr 16
#> 5 Historical CRU 4.0 pr Alaska Anchorage 1901 May 13
#> 6 Historical CRU 4.0 pr Alaska Anchorage 1901 Jun 24
#> 7 Historical CRU 4.0 pr Alaska Anchorage 1901 Jul 40
#> 8 Historical CRU 4.0 pr Alaska Anchorage 1901 Aug 70
#> 9 Historical CRU 4.0 pr Alaska Anchorage 1901 Sep 58
#> 10 Historical CRU 4.0 pr Alaska Anchorage 1901 Oct 69
#> # ... with 101,390 more rows
In subsequent examples, arguments are named for additional clarity.
The data includes SNAP’s downscaled historical, observation-based Climatological Research Unit (CRU) 4.0 data and both downscaled historical and projected climate model outputs for all five of the General Circulation Models (GCMs) utilized by SNAP. All three CMIP5 emissions scenarios, or Representative Concentration Pathways (RCPs) are included. The data cover the entire available time period at a monthly time step.
By default, all available climate variables are returned: precipitation and mean, minimum and maximum temperature. These refer to monthly precipitation totals and monthly means of mean, minimum and maximum daily temperatures. This can be reduced to a specific variable in the initial call to climdata
with variable = "pr"
for example, or the table can be filtered subsequently.
What if the location is not unique, like Galena? climdata
will throw a warning and let you know it is assuming the first group found in the list of available locations.
climdata(id = "ar5stats", area = "Galena")
#> Warning in .check_area(area, set): `area` not unique and `set` not
#> provided. Assuming 'Alaska'. Please provide `set`.
#> # A tibble: 101,400 x 8
#> RCP Model Var Group Location Year Month Mean
#> <fct> <fct> <fct> <chr> <chr> <int> <fct> <dbl>
#> 1 Historical CRU 4.0 pr Alaska Galena 1901 Jan 18
#> 2 Historical CRU 4.0 pr Alaska Galena 1901 Feb 18
#> 3 Historical CRU 4.0 pr Alaska Galena 1901 Mar 18
#> 4 Historical CRU 4.0 pr Alaska Galena 1901 Apr 16
#> 5 Historical CRU 4.0 pr Alaska Galena 1901 May 15
#> 6 Historical CRU 4.0 pr Alaska Galena 1901 Jun 33
#> 7 Historical CRU 4.0 pr Alaska Galena 1901 Jul 48
#> 8 Historical CRU 4.0 pr Alaska Galena 1901 Aug 61
#> 9 Historical CRU 4.0 pr Alaska Galena 1901 Sep 42
#> 10 Historical CRU 4.0 pr Alaska Galena 1901 Oct 30
#> # ... with 101,390 more rows
The following example avoids the ambiguity, hence no warning.
climdata(id = "ar5stats", area = "Galena", set = "British Columbia")
#> # A tibble: 101,400 x 8
#> RCP Model Var Group Location Year Month Mean
#> <fct> <fct> <fct> <chr> <chr> <int> <fct> <dbl>
#> 1 Historical CRU 4.0 pr British Columbia Galena 1901 Jan 87
#> 2 Historical CRU 4.0 pr British Columbia Galena 1901 Feb 62
#> 3 Historical CRU 4.0 pr British Columbia Galena 1901 Mar 26
#> 4 Historical CRU 4.0 pr British Columbia Galena 1901 Apr 34
#> 5 Historical CRU 4.0 pr British Columbia Galena 1901 May 27
#> 6 Historical CRU 4.0 pr British Columbia Galena 1901 Jun 87
#> 7 Historical CRU 4.0 pr British Columbia Galena 1901 Jul 36
#> 8 Historical CRU 4.0 pr British Columbia Galena 1901 Aug 6
#> 9 Historical CRU 4.0 pr British Columbia Galena 1901 Sep 69
#> 10 Historical CRU 4.0 pr British Columbia Galena 1901 Oct 21
#> # ... with 101,390 more rows
The climate variable values given in the tables obtained so far have all pertained to specific points in space. For regional climate data, a broad set of statistics is available that summarizes the distribution of climate values over a spatial domain defined by a polygon. For example, the table for the Arctic Tundra contains the mean, standard deviation, minimum, maximum and a set of distribution quantiles. Since the columns are truncated when printed in the display below, the first seven columns of ID variables are dropped in this example using select
in order to show more of the additional statistics.
x <- climdata(id = "ar5stats", area = "Arctic Tundra")
select(x, -c(1:7))
#> # A tibble: 101,400 x 13
#> Mean SD Min Max Pct_025 Pct_05 Pct_10 Pct_25 Pct_50 Pct_75
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 19.3 6.7 5.8 56.2 9.1 10 11.1 14.8 18.3 23
#> 2 15.4 5.6 4.1 46.1 6.9 8.1 9 11.1 14 18.9
#> 3 14.2 5.6 2.9 38.1 6.9 8.1 9 9.9 13 17.1
#> 4 15.5 6.5 4 49.7 7.9 8.1 9 10.8 13.9 19.1
#> 5 15.1 7.8 0.7 53.1 7 7.8 8 8.9 12.2 19.9
#> 6 29.9 16.6 10.5 126. 12.6 13.5 14.2 17.3 23.9 39.8
#> 7 49.7 22 12.5 144. 17.9 21.3 25.1 31.3 46 64.5
#> 8 63.7 25.6 22.4 167. 28.8 30.6 32.8 41.5 61.5 80.1
#> 9 43.4 24.3 14.6 144 17.9 19 20.5 23.1 36.2 57
#> 10 33.3 11.2 17.8 80.4 20.1 20.9 21.8 23.9 30.7 39.8
#> # ... with 101,390 more rows, and 3 more variables: Pct_90 <dbl>,
#> # Pct_95 <dbl>, Pct_975 <dbl>
Note that these statistics summarize values across space. They do not also summarize values over months or years, or across climate models and scenarios. Distributional information is available for each point in time and under each combination of other available factors.
More highly aggregated data sets are available as well. The previous data sets were returned using the default argument time_scale = "monthly"
. If you simply change this to seasonal
or annual
, climdata
will return the respective data set. The first few columns have been dropped:
x <- climdata(id = "ar5stats", area = "Arctic Tundra", time_scale = "seasonal")
select(x, -c(1:3))
#> # A tibble: 33,800 x 17
#> Group Region Year Season Mean SD Min Max Pct_025 Pct_05 Pct_10
#> <chr> <chr> <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Alas~ Arcti~ 1901 Winter 18.9 7.8 1.5 65.3 8.1 9.1 10.1
#> 2 Alas~ Arcti~ 1901 Spring 15 6.7 0.7 51.2 7.3 7.9 8.2
#> 3 Alas~ Arcti~ 1901 Summer 47.4 25.5 10.1 157. 13.6 14.8 18.2
#> 4 Alas~ Arcti~ 1901 Autumn 33.4 18.3 6.3 144. 12.2 13.9 17.6
#> 5 Alas~ Arcti~ 1902 Winter 19.1 8.1 1.6 66.9 8 9 10
#> 6 Alas~ Arcti~ 1902 Spring 15 6.7 0.5 53 7.5 7.9 8.3
#> 7 Alas~ Arcti~ 1902 Summer 47.3 25.4 10.5 168 13.7 14.7 18
#> 8 Alas~ Arcti~ 1902 Autumn 32.9 17.9 6.7 143 12 13.8 17.6
#> 9 Alas~ Arcti~ 1903 Winter 19.1 8.4 1.6 61.2 7.9 8.9 10
#> 10 Alas~ Arcti~ 1903 Spring 14.6 9.9 0.8 74.7 4.5 5.7 7
#> # ... with 33,790 more rows, and 6 more variables: Pct_25 <dbl>,
#> # Pct_50 <dbl>, Pct_75 <dbl>, Pct_90 <dbl>, Pct_95 <dbl>, Pct_975 <dbl>
It is important to note that seasonal and annual aggregate statistics are not simple means of monthly statistics. Each of these collections is independently derived from climate variable spatial probability distributions at their respective temporal resolutions.
This means that, for example, monthly temperature quantiles for the Arctic Tundra are calculated from monthly spatial temperature distributions and winter temperature quantiles are calculated from the applicable 3-month period spatial temperature distributions. While the mean is invariant to this difference, other statistics are not. The 95th percentile winter temperature across space during the three month period does not result from taking the average of three monthly 95th percentile values.
A final note on season and annual statistics is that, like monthly statistics, these remain period totals for precipitation and period averages for temperature variables.
In contrast to monthly, seasonal and annual resolution statistics, all three of which are computed across space at their respective temporal resolutions, decadal statistics are in fact simple decadal averages of monthly, seasonal and annual data. For example, the decadal mean of the 95th percentile monthly temperature across a region is just that; the mean of the ten annual 95th percentile monthly values in a decade.
For this reason, decadal data is not requested with climdata
by specifying it with time_scale
, which always pertains to annual and intra-annual (monthly or seasonal) time steps. Instead, use decavg = TRUE
. This is FALSE
by default so it did not previously need to be specified. When requesting decadal averages, there is still the choice of whether those averages should be of monthly, seasonal or annual resolution statistics.
x <- climdata(id = "ar5stats", area = "Arctic Tundra", time_scale = "seasonal",
decavg = TRUE)
select(x, -c(1:3))
#> # A tibble: 3,776 x 17
#> Group Region Decade Season Mean SD Min Max Pct_025 Pct_05 Pct_10
#> <chr> <chr> <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Alas~ Arcti~ 1900 Winter 18.1 8.8 1.6 71.1 6.6 7.6 8.8
#> 2 Alas~ Arcti~ 1900 Spring 14.2 8.7 0.5 70.6 3.7 4.7 6.5
#> 3 Alas~ Arcti~ 1900 Summer 46.3 27.6 7.8 188. 12.4 13.9 16.6
#> 4 Alas~ Arcti~ 1900 Autumn 30.3 18.2 4.1 144. 8.6 10.1 13.3
#> 5 Alas~ Arcti~ 1910 Winter 20.1 13.8 1 107. 5.2 6.5 8.5
#> 6 Alas~ Arcti~ 1910 Spring 16.5 12.3 0.6 98.4 4.1 5 6.7
#> 7 Alas~ Arcti~ 1910 Summer 47 29 6.9 198. 11.8 13.2 16.3
#> 8 Alas~ Arcti~ 1910 Autumn 32.5 20.9 3.6 162. 8.7 11 14.3
#> 9 Alas~ Arcti~ 1920 Winter 22.7 15.1 0.9 102 5.1 6.2 7.6
#> 10 Alas~ Arcti~ 1920 Spring 16.1 12.2 0.4 84.7 2.3 2.9 4
#> # ... with 3,766 more rows, and 6 more variables: Pct_25 <dbl>,
#> # Pct_50 <dbl>, Pct_75 <dbl>, Pct_90 <dbl>, Pct_95 <dbl>, Pct_975 <dbl>
It is possible to obtain climate data sets that include multiple locations using climdata
, but this is only available for smaller data sets where it would not lead to a cumbersome data download. Currently, the only data set for which multiple locations can be returned at once is the decadal averages data set in the ar5stats
collection. By specifying area = "points"
rather than a specific point location, a table is returned containing data for all 3,867 point locations.
There are two requirements that help to ensure this does not lead to an excessive download size or waiting time. As mentioned, this is only available for the highly aggregated decadal data. Without setting decavg = TRUE
, attempting to specify area = "points"
will throw an error. The second requirement is that only a single climate variable will be returned. You can always call climdata
multiple times for additional variables if desired. Therefore, you should also specify the variable
argument. If you do not, mean temperature (tas
) is assumed.
x <- climdata(id = "ar5stats", area = "points", time_scale = "annual", decavg = TRUE,
variable = "tas")
To show the result more effectively, filter the table to a specific combination of other factors.
filter(x, RCP == "6.0" & Model == "GFDL-CM3" & Decade == 2050)
#> # A tibble: 3,867 x 8
#> RCP Model Var Group Location Decade Season Mean
#> <fct> <fct> <fct> <chr> <chr> <int> <fct> <dbl>
#> 1 6.0 GFDL-CM3 tas Alaska Adak Station 2050 Annual 7.2
#> 2 6.0 GFDL-CM3 tas Alaska Afognak 2050 Annual 8.6
#> 3 6.0 GFDL-CM3 tas Alaska Akhiok 2050 Annual 8.4
#> 4 6.0 GFDL-CM3 tas Alaska Akiachak 2050 Annual 3.4
#> 5 6.0 GFDL-CM3 tas Alaska Akiak 2050 Annual 3.3
#> 6 6.0 GFDL-CM3 tas Alaska Akutan 2050 Annual 7.6
#> 7 6.0 GFDL-CM3 tas Alaska Alakanuk 2050 Annual 3.8
#> 8 6.0 GFDL-CM3 tas Alaska Alatna 2050 Annual -1.7
#> 9 6.0 GFDL-CM3 tas Alaska Aleknagik 2050 Annual 5.4
#> 10 6.0 GFDL-CM3 tas Alaska Aleut Village 2050 Annual 8.6
#> # ... with 3,857 more rows
The snapclim
package is essentially a data package. It provides a simplified and convenient interface in R enabling easy access to a large amount of SNAP climate data spread over multiple collections, stemming from different sources and existing for different purposes. It does not provide functionality for performing statistical analysis and graphing, which is provided by R in general. Useful stock functions pertaining specifically to analyzing SNAP data, including climate data accessed with snapclim
, are available in the snapstat
package (under development).
For more information on the climate probability distributions from which regional climate statistics are calculated, see the snapdist
package (under development) or SNAP’s Climate Analytics Shiny app for working examples.