
Environmental optima and site-species environmental distance (with optional kernel)
Source:R/compute_environment_kernel.R
compute_environment_kernel.Rd
Estimates each species’ environmental optimum as the abundance-weighted mean of site-level environmental covariates, then computes the distance (default: Gower) between every site and every species optimum. Optionally returns an environmental kernel (similarity) by transforming distances (e.g., Gaussian).
Usage
compute_environment_kernel(
site_env,
abundance_wide = NULL,
predictions = NULL,
site_col = "site_id",
env_cols = NULL,
coord_cols = c("x", "y"),
method = c("gower", "euclidean", "manhattan"),
gower_stand = TRUE,
kernel = c("distance", "similarity", "gaussian"),
sigma_e = NULL,
sigma_method = c("sd", "median", "iqr"),
scale_01 = TRUE
)
Arguments
- site_env
data.frame. One row per site; must include
site_col
and the environmental variables (numeric/factor/ordered). Coordinate columns can be present.- abundance_wide
matrix/data.frame or NULL. Sites × species numeric abundance/weight table (rows = sites, columns = species). If
NULL
, providepredictions
instead.- predictions
data.frame or NULL. Long table with columns
species
,site_id
, andpred
to constructabundance_wide
internally.- site_col
character. Site identifier column in
site_env
(andpredictions
). Default"site_id"
.- env_cols
NULL or character. Environmental columns in
site_env
. IfNULL
, auto-detect as all non-site_col
non-coordinate columns.- coord_cols
character. Columns to exclude from env detection (e.g., coords). Default
c("x","y")
.- method
character. Distance metric for site-optimum comparison:
"gower"
(default),"euclidean"
, or"manhattan"
.- gower_stand
logical. Standardise numeric vars inside Gower (default
TRUE
).- kernel
character. One of
"distance"
(no transform),"similarity"
(1 - scaled distance), or"gaussian"
(Gaussian kernel). Default"distance"
.- sigma_e
numeric or NULL. Bandwidth for Gaussian kernel. If
NULL
, estimated from non-zeroenv_dist
usingsigma_method
.- sigma_method
character. How to estimate
sigma_e
when missing:"sd"
(default),"median"
, or"iqr"
.- scale_01
logical. Min-max scale distances to 0,1 before
"similarity"
; for non-Gower metrics only, this is applied automatically if needed.
Value
A list with:
env_opt
: species × env matrix of abundance-weighted optima.env_dist
: sites × species distance matrix (site ↔ species-optimum).K_env
: sites × species kernel (ifkernel != "distance"
).sigma_e
: bandwidth used (if Gaussian).meta
: list of settings and detected columns.
Details
Let \(E_s\) be the vector of environmental covariates at site \(s\), and \(w_{s,j}\) the abundance (weight) of species \(j\) at site \(s\). The environmental optimum for species \(j\) is $$ \mu_j \;=\; \frac{\sum_s w_{s,j} \, E_s}{\sum_s w_{s,j}}. $$ The environmental distance between site \(s\) and species \(j\) is then \(d_{s j} = \mathrm{Gower}(E_s, \mu_j)\) (or another supported metric).
The optional kernel converts distances to similarity, e.g., Gaussian \(K_{sj} = \exp\{-d_{sj}^2/(2 \sigma_e^2)\}\). The bandwidth \(\sigma_e\) controls how quickly suitability decays with mismatch; by default we estimate \(\sigma_e\) from the distribution of \(d_{sj}\).
Examples
# --- Simulate sites, environments, and species weights (no extra packages) ---
set.seed(123)
# Sites and species
n_sites <- 6
n_spp <- 4
sites <- paste0("s", seq_len(n_sites))
spp <- paste0("sp", seq_len(n_spp))
# Site-level environment table (include optional coord columns)
site_env <- data.frame(
site_id = sites,
x = rnorm(n_sites),
y = rnorm(n_sites),
temp = runif(n_sites, 5, 25), # numeric env var
precip = runif(n_sites, 400, 900), # numeric env var
check.names = FALSE
)
# Sites × species abundance/weights (any non-negative numbers)
abundance_wide <- matrix(
rexp(n_sites * n_spp, rate = 1),
nrow = n_sites, ncol = n_spp,
dimnames = list(sites, spp)
)
# Compute environmental optima and site–species distances
ek <- compute_environment_kernel(
site_env = site_env,
abundance_wide = abundance_wide, # avoids needing tidyr
site_col = "site_id",
method = "euclidean", # uses stats::dist (no extra deps)
kernel = "gaussian", # also returns K_env
sigma_method = "sd"
)
# Inspect results
str(ek$env_opt) # species × env optima
#> num [1:4, 1:2] 14.7 14.3 14.9 12.5 709.4 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:4] "sp1" "sp2" "sp3" "sp4"
#> ..$ : chr [1:2] "temp" "precip"
dim(ek$env_dist) # sites × species distance matrix
#> [1] 6 4
if (!is.null(ek$K_env)) range(ek$K_env, na.rm = TRUE)
#> [1] 0.004038454 0.997522436
# (Optional) If you prefer Gower distances, set method = "gower".
# This uses cluster::daisy internally and may require the 'cluster' package:
# ek_gower <- compute_environment_kernel(
# site_env = site_env,
# abundance_wide = abundance_wide,
# site_col = "site_id",
# method = "gower",
# kernel = "similarity"
# )