Skip to contents

Estimates each species’ environmental optimum as the abundance-weighted mean of site-level environmental covariates, then computes the distance (default: Gower) between every site and every species optimum. Optionally returns an environmental kernel (similarity) by transforming distances (e.g., Gaussian).

Usage

compute_environment_kernel(
  site_env,
  abundance_wide = NULL,
  predictions = NULL,
  site_col = "site_id",
  env_cols = NULL,
  coord_cols = c("x", "y"),
  method = c("gower", "euclidean", "manhattan"),
  gower_stand = TRUE,
  kernel = c("distance", "similarity", "gaussian"),
  sigma_e = NULL,
  sigma_method = c("sd", "median", "iqr"),
  scale_01 = TRUE
)

Arguments

site_env

data.frame. One row per site; must include site_col and the environmental variables (numeric/factor/ordered). Coordinate columns can be present.

abundance_wide

matrix/data.frame or NULL. Sites × species numeric abundance/weight table (rows = sites, columns = species). If NULL, provide predictions instead.

predictions

data.frame or NULL. Long table with columns species, site_id, and pred to construct abundance_wide internally.

site_col

character. Site identifier column in site_env (and predictions). Default "site_id".

env_cols

NULL or character. Environmental columns in site_env. If NULL, auto-detect as all non-site_col non-coordinate columns.

coord_cols

character. Columns to exclude from env detection (e.g., coords). Default c("x","y").

method

character. Distance metric for site-optimum comparison: "gower" (default), "euclidean", or "manhattan".

gower_stand

logical. Standardise numeric vars inside Gower (default TRUE).

kernel

character. One of "distance" (no transform), "similarity" (1 - scaled distance), or "gaussian" (Gaussian kernel). Default "distance".

sigma_e

numeric or NULL. Bandwidth for Gaussian kernel. If NULL, estimated from non-zero env_dist using sigma_method.

sigma_method

character. How to estimate sigma_e when missing: "sd" (default), "median", or "iqr".

scale_01

logical. Min-max scale distances to 0,1 before "similarity"; for non-Gower metrics only, this is applied automatically if needed.

Value

A list with:

  • env_opt: species × env matrix of abundance-weighted optima.

  • env_dist: sites × species distance matrix (site ↔ species-optimum).

  • K_env: sites × species kernel (if kernel != "distance").

  • sigma_e: bandwidth used (if Gaussian).

  • meta: list of settings and detected columns.

Details

Let \(E_s\) be the vector of environmental covariates at site \(s\), and \(w_{s,j}\) the abundance (weight) of species \(j\) at site \(s\). The environmental optimum for species \(j\) is $$ \mu_j \;=\; \frac{\sum_s w_{s,j} \, E_s}{\sum_s w_{s,j}}. $$ The environmental distance between site \(s\) and species \(j\) is then \(d_{s j} = \mathrm{Gower}(E_s, \mu_j)\) (or another supported metric).

The optional kernel converts distances to similarity, e.g., Gaussian \(K_{sj} = \exp\{-d_{sj}^2/(2 \sigma_e^2)\}\). The bandwidth \(\sigma_e\) controls how quickly suitability decays with mismatch; by default we estimate \(\sigma_e\) from the distribution of \(d_{sj}\).

Examples

# --- Simulate sites, environments, and species weights (no extra packages) ---
set.seed(123)

# Sites and species
n_sites <- 6
n_spp   <- 4
sites   <- paste0("s", seq_len(n_sites))
spp     <- paste0("sp", seq_len(n_spp))

# Site-level environment table (include optional coord columns)
site_env <- data.frame(
  site_id = sites,
  x       = rnorm(n_sites),
  y       = rnorm(n_sites),
  temp    = runif(n_sites, 5, 25),     # numeric env var
  precip  = runif(n_sites, 400, 900),  # numeric env var
  check.names = FALSE
)

# Sites × species abundance/weights (any non-negative numbers)
abundance_wide <- matrix(
  rexp(n_sites * n_spp, rate = 1),
  nrow = n_sites, ncol = n_spp,
  dimnames = list(sites, spp)
)

# Compute environmental optima and site–species distances
ek <- compute_environment_kernel(
  site_env       = site_env,
  abundance_wide = abundance_wide,   # avoids needing tidyr
  site_col       = "site_id",
  method         = "euclidean",      # uses stats::dist (no extra deps)
  kernel         = "gaussian",       # also returns K_env
  sigma_method   = "sd"
)

# Inspect results
str(ek$env_opt)       # species × env optima
#>  num [1:4, 1:2] 14.7 14.3 14.9 12.5 709.4 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:4] "sp1" "sp2" "sp3" "sp4"
#>   ..$ : chr [1:2] "temp" "precip"
dim(ek$env_dist)      # sites × species distance matrix
#> [1] 6 4
if (!is.null(ek$K_env)) range(ek$K_env, na.rm = TRUE)
#> [1] 0.004038454 0.997522436

# (Optional) If you prefer Gower distances, set method = "gower".
# This uses cluster::daisy internally and may require the 'cluster' package:
# ek_gower <- compute_environment_kernel(
#   site_env       = site_env,
#   abundance_wide = abundance_wide,
#   site_col       = "site_id",
#   method         = "gower",
#   kernel         = "similarity"
# )