Environmental optima and site-species environmental distance (with optional kernel)

Estimates each species’ environmental optimum as the abundance-weighted mean of site-level environmental covariates, then computes the distance (default: Gower) between every site and every species optimum. Optionally returns an environmental kernel (similarity) by transforming distances (e.g., Gaussian).

Usage

compute_environment_kernel(
  site_env,
  abundance_wide = NULL,
  predictions = NULL,
  site_col = "site_id",
  env_cols = NULL,
  coord_cols = c("x", "y"),
  method = c("gower", "euclidean", "manhattan"),
  gower_stand = TRUE,
  kernel = c("distance", "similarity", "gaussian"),
  sigma_e = NULL,
  sigma_method = c("sd", "median", "iqr"),
  scale_01 = TRUE
)

Arguments

site_env: data.frame. One row per site; must include site_col and the environmental variables (numeric/factor/ordered). Coordinate columns can be present.
abundance_wide: matrix/data.frame or NULL. Sites × species numeric abundance/weight table (rows = sites, columns = species). If NULL, provide predictions instead.
predictions: data.frame or NULL. Long table with columns species, site_id, and pred to construct abundance_wide internally.
site_col: character. Site identifier column in site_env (and predictions). Default "site_id".
env_cols: NULL or character. Environmental columns in site_env. If NULL, auto-detect as all non-site_col non-coordinate columns.
coord_cols: character. Columns to exclude from env detection (e.g., coords). Default c("x","y").
method: character. Distance metric for site-optimum comparison: "gower" (default), "euclidean", or "manhattan".
gower_stand: logical. Standardise numeric vars inside Gower (default TRUE).
kernel: character. One of "distance" (no transform), "similarity" (1 - scaled distance), or "gaussian" (Gaussian kernel). Default "distance".
sigma_e: numeric or NULL. Bandwidth for Gaussian kernel. If NULL, estimated from non-zero env_dist using sigma_method.
sigma_method: character. How to estimate sigma_e when missing: "sd" (default), "median", or "iqr".
scale_01: logical. Min-max scale distances to 0,1 before "similarity"; for non-Gower metrics only, this is applied automatically if needed.

Value

A list with:

env_opt: species × env matrix of abundance-weighted optima.
env_dist: sites × species distance matrix (site ↔ species-optimum).
K_env: sites × species kernel (if kernel != "distance").
sigma_e: bandwidth used (if Gaussian).
meta: list of settings and detected columns.

Details

Let $E_s$ be the vector of environmental covariates at site $s$, and $w_{s,j}$ the abundance (weight) of species $j$ at site $s$. The environmental optimum for species $j$ is $$ \mu_j \;=\; \frac{\sum_s w_{s,j} \, E_s}{\sum_s w_{s,j}}. $$ The environmental distance between site $s$ and species $j$ is then $d_{s j} = \mathrm{Gower}(E_s, \mu_j)$ (or another supported metric).

The optional kernel converts distances to similarity, e.g., Gaussian $K_{sj} = \exp\{-d_{sj}^2/(2 \sigma_e^2)\}$. The bandwidth $\sigma_e$ controls how quickly suitability decays with mismatch; by default we estimate $\sigma_e$ from the distribution of $d_{sj}$.

Examples

# --- Simulate sites, environments, and species weights (no extra packages) ---
set.seed(123)

# Sites and species
n_sites <- 6
n_spp   <- 4
sites   <- paste0("s", seq_len(n_sites))
spp     <- paste0("sp", seq_len(n_spp))

# Site-level environment table (include optional coord columns)
site_env <- data.frame(
  site_id = sites,
  x       = rnorm(n_sites),
  y       = rnorm(n_sites),
  temp    = runif(n_sites, 5, 25),     # numeric env var
  precip  = runif(n_sites, 400, 900),  # numeric env var
  check.names = FALSE
)

# Sites × species abundance/weights (any non-negative numbers)
abundance_wide <- matrix(
  rexp(n_sites * n_spp, rate = 1),
  nrow = n_sites, ncol = n_spp,
  dimnames = list(sites, spp)
)

# Compute environmental optima and site–species distances
ek <- compute_environment_kernel(
  site_env       = site_env,
  abundance_wide = abundance_wide,   # avoids needing tidyr
  site_col       = "site_id",
  method         = "euclidean",      # uses stats::dist (no extra deps)
  kernel         = "gaussian",       # also returns K_env
  sigma_method   = "sd"
)

# Inspect results
str(ek$env_opt)       # species × env optima
#>  num [1:4, 1:2] 14.7 14.3 14.9 12.5 709.4 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:4] "sp1" "sp2" "sp3" "sp4"
#>   ..$ : chr [1:2] "temp" "precip"
dim(ek$env_dist)      # sites × species distance matrix
#> [1] 6 4
if (!is.null(ek$K_env)) range(ek$K_env, na.rm = TRUE)
#> [1] 0.004038454 0.997522436

# (Optional) If you prefer Gower distances, set method = "gower".
# This uses cluster::daisy internally and may require the 'cluster' package:
# ek_gower <- compute_environment_kernel(
#   site_env       = site_env,
#   abundance_wide = abundance_wide,
#   site_col       = "site_id",
#   method         = "gower",
#   kernel         = "similarity"
# )