alf_extract_slurm.Rd
Generate slurm script for ALFRESCO data extractions from output maps.
alf_extract_slurm(out_dir = alfdef()$alf_slurm_dir, file = "alf_extract.R", ntasks, nodes, ntasks_per_node, exclusive = TRUE, domain = "akcan1km", rmpi = TRUE, modelIndex = NULL, project = NULL, years = NULL, reps = NULL, cru = NULL, repSample = NULL, locgroup = NULL, email = "mfleonawicz@alaska.edu", partition = "main", copy_rscript = TRUE, max_cores = 32)
out_dir | the directory containing |
---|---|
file | the R script to be called by |
ntasks | numeric, SLURM number of tasks. See details. |
nodes | numeric, SLURM number of compute nodes. See details. |
ntasks_per_node | numeric, SLURM number of tasks per node. See details. |
exclusive | logical, put nodes into exclusive use for the job when the generated slurm script is executed. Defaults to |
domain | character, the ALFRESCO run spatial domain, either |
rmpi | logical, use |
modelIndex | integer, iterator, refers to the position in the list of a project's ALFRESCO model run GCM/RCP output directories. |
project | character, a (new) project name for extracted data. It need not match any directory names pertaining to the raw ALFRESCO outputs. |
years | numeric vector of years for data extraction. |
reps | numeric vector of ALFRESCO simulation replicates for data extraction, e.g. |
cru | logical, whether data extraction is for historical years (ALFRESCO runs based on CRU data) or projected years (GCM data). |
repSample | optional numeric vector of reps for subsampling, e.g., |
locgroup | optional character string naming a specific Location Group to process extraction for instead of all Location Groups. |
defaults to the author/maintainer/user. |
|
partition | defaults to |
copy_rscript | logical, also copy template R script from |
max_cores | maximum number of processors to use on a single Atlas node, defaults to 32. |
This function is used for generating a slurm script that is used for extracting data from ALFRESCO geotiff map outputs for subsequent analyses.
The generated slurm script leverages Rmpi
for multi-node cluster processing.
It is used to call an R script with Rscript
, e.g., that created by alf_extract_rscript.
The scripts are typically used in this fashion, but they can also be generating with rmpi = FALSE
, in which case nodes
need only be 1.
Formal SLURM job arguments are passed by ntasks
, nodes
, ntasks_per_node
, exclusive
, email
and partition
.
General script setup arguments include out_dir
, file
, copy_rscript
and max_cores
.
All other arguments refer to those passed to the Rscript
call within the slurm script.
Any of these that are NULL
are ignored and it is assumed they will be passed explicitly as name-value pairs (in any order)
at the command line by the user when the generated slurm script is executed.
Any of these arguments that are not NULL
are hardcoded into the string of arguments listed after Rscript
.
This provides flexible generality when generating ALFRESCO data extraction slurm scripts.
Note that the number of non-null arguments among these decreases the number of available general arguments available at the command line
when the script is executed. For example, if seven of the nine arguments available to Rscript
are hardcoded into the slurm
script by passing them to alf_extract_slurm
explicitly, then the generated script will show only an additional $1 $2
after the fixed arguments rather than $1 $2... $9
.
If years
is missing, then the formal SLURM job arguments ntasks
, nodes
and ntasks_per_node
must be provided and it is assumed that a year range that matches the number of tasks will be provided when the slurm script is executed.
These arguments are always intended to be hardcoded into a generated script.
Alternatively, if years
is provided explicitly, then if these job arguments are missing they can be inferred internally
based on the number of years to be processed using the script and the max_cores
to be used per node.
Since these scripts are run on SNAP's Atlas cluster, the default max_cores
is 32.
This can be lowered for intensive extraction jobs that
may reach node RAM limits otherwise.
Note that this function is intended to be run on the Atlas cluster. If you make a bash script like this on Windows,
you may have to run a command line utility like dos2unix
on the file.
# NOT RUN { alf_extract_slurm( domain = "ak1km", project = "JFSP", years = 1950:2013, reps = 1:32, cru = TRUE ) # }