Generate slurm script for ALFRESCO data extractions from output maps.

alf_extract_slurm(out_dir = alfdef()$alf_slurm_dir,
  file = "alf_extract.R", ntasks, nodes, ntasks_per_node,
  exclusive = TRUE, domain = "akcan1km", rmpi = TRUE,
  modelIndex = NULL, project = NULL, years = NULL, reps = NULL,
  cru = NULL, repSample = NULL, locgroup = NULL,
  email = "mfleonawicz@alaska.edu", partition = "main",
  copy_rscript = TRUE, max_cores = 32)

Arguments

out_dir

the directory containing file. This is where the slurm script is to be saved and executed from.

file

the R script to be called by Rscript when the generated slurm script is executed. May be similarly generated by this function. See copy_rscript below.

ntasks

numeric, SLURM number of tasks. See details.

nodes

numeric, SLURM number of compute nodes. See details.

ntasks_per_node

numeric, SLURM number of tasks per node. See details.

exclusive

logical, put nodes into exclusive use for the job when the generated slurm script is executed. Defaults to TRUE.

domain

character, the ALFRESCO run spatial domain, either "akcan1km" or "ak1km".

rmpi

logical, use Rmpi, defaults to TRUE.

modelIndex

integer, iterator, refers to the position in the list of a project's ALFRESCO model run GCM/RCP output directories.

project

character, a (new) project name for extracted data. It need not match any directory names pertaining to the raw ALFRESCO outputs.

years

numeric vector of years for data extraction.

reps

numeric vector of ALFRESCO simulation replicates for data extraction, e.g. 1:200.

cru

logical, whether data extraction is for historical years (ALFRESCO runs based on CRU data) or projected years (GCM data).

repSample

optional numeric vector of reps for subsampling, e.g., 1:30.

locgroup

optional character string naming a specific Location Group to process extraction for instead of all Location Groups.

email

defaults to the author/maintainer/user.

partition

defaults to "main".

copy_rscript

logical, also copy template R script from alfresco package along with the generated slurm script. Defaults to TRUE.

max_cores

maximum number of processors to use on a single Atlas node, defaults to 32.

Details

This function is used for generating a slurm script that is used for extracting data from ALFRESCO geotiff map outputs for subsequent analyses. The generated slurm script leverages Rmpi for multi-node cluster processing. It is used to call an R script with Rscript, e.g., that created by alf_extract_rscript. The scripts are typically used in this fashion, but they can also be generating with rmpi = FALSE, in which case nodes need only be 1.

Formal SLURM job arguments are passed by ntasks, nodes, ntasks_per_node, exclusive, email and partition. General script setup arguments include out_dir, file, copy_rscript and max_cores. All other arguments refer to those passed to the Rscript call within the slurm script. Any of these that are NULL are ignored and it is assumed they will be passed explicitly as name-value pairs (in any order) at the command line by the user when the generated slurm script is executed. Any of these arguments that are not NULL are hardcoded into the string of arguments listed after Rscript.

This provides flexible generality when generating ALFRESCO data extraction slurm scripts. Note that the number of non-null arguments among these decreases the number of available general arguments available at the command line when the script is executed. For example, if seven of the nine arguments available to Rscript are hardcoded into the slurm script by passing them to alf_extract_slurm explicitly, then the generated script will show only an additional $1 $2 after the fixed arguments rather than $1 $2... $9.

If years is missing, then the formal SLURM job arguments ntasks, nodes and ntasks_per_node must be provided and it is assumed that a year range that matches the number of tasks will be provided when the slurm script is executed. These arguments are always intended to be hardcoded into a generated script.

Alternatively, if years is provided explicitly, then if these job arguments are missing they can be inferred internally based on the number of years to be processed using the script and the max_cores to be used per node. Since these scripts are run on SNAP's Atlas cluster, the default max_cores is 32. This can be lowered for intensive extraction jobs that may reach node RAM limits otherwise.

Note that this function is intended to be run on the Atlas cluster. If you make a bash script like this on Windows, you may have to run a command line utility like dos2unix on the file.

Examples

# NOT RUN {
alf_extract_slurm(
  domain = "ak1km", project = "JFSP", years = 1950:2013,
  reps = 1:32, cru = TRUE
)
# }