Import and harmonise biodiversity-occurrence data

get_occurrence_data() reads occurrence records from a local CSV/file path, an in-memory data.frame, or a GBIF download (ZIP) and returns a tidy data frame in either long (one row = one record) or wide (one row = one site, one column = one species) form.

Usage

get_occurrence_data(
  data = NULL,
  source_type = c("local_csv", "data_frame", "gbif"),
  gbif_zip_url = NULL,
  download_dir = tempdir(),
  sep = ",",
  site_id_col = NULL,
  x_col = NULL,
  y_col = NULL,
  sp_name_col = NULL,
  pa_col = NULL,
  abund_col = NULL,
  species_cols = NULL
)

Arguments

data: File path (when source_type = "local_csv"), an in-memory data.frame ("data_frame"), or NULL ("gbif").
source_type: "local_csv", "data_frame", or "gbif".
gbif_zip_url: URL to a GBIF download ZIP (required when source_type = "gbif").
download_dir: Folder to save the ZIP/extracted file (default: tempdir()).
sep: Field separator for CSVs (default ",").
site_id_col, x_col, y_col, sp_name_col, pa_col, abund_col: Optional custom column names.
species_cols: Optional numeric or character vector specifying the species columns in a wide input (e.g. 4:11 or c("Sp1","Sp2")). Overrides the default "sp_*" detection.

Value

A data.frame:

Long format: Columns site_id, x, y, sp_name, plus pa or abund.
Wide → long: Same columns after stacking the specified or auto-detected species columns.

Details

Column names are auto-detected from common patterns ("site_id", "x", "y", "sp_name", "pa" or "abund"). Supply *_col arguments only when your data use different names.

For wide data the helper normally looks for columns that start with "sp_". Set species_cols to a numeric range (e.g. 4:11) or a character vector of column names when the species columns do not follow the "sp_*" convention.

Workflow

Read the data from source_type.
Detect / insert compulsory columns (site, coords, species, value).
Validate coordinates (-180 ≤ lon ≤ 180, -90 ≤ lat ≤ 90).
Return
- a long table (site_id, x, y, sp_name, pa|abund) when species name + value columns are present; or
- a long table reshaped from wide species columns.

Examples