
Import and harmonise biodiversity-occurrence data
Source:R/get_occurrence_data.R
get_occurrence_data.Rd
get_occurrence_data()
reads occurrence records from a local CSV/file
path, an in-memory data.frame
, or a GBIF download (ZIP) and
returns a tidy data frame in either long (one row = one record) or
wide (one row = one site, one column = one species) form.
Arguments
- data
File path (when
source_type = "local_csv"
), an in-memorydata.frame
("data_frame"
), orNULL
("gbif"
).- source_type
"local_csv"
,"data_frame"
, or"gbif"
.- gbif_zip_url
URL to a GBIF download ZIP (required when
source_type = "gbif"
).- download_dir
Folder to save the ZIP/extracted file (default:
tempdir()
).- sep
Field separator for CSVs (default
","
).- site_id_col, x_col, y_col, sp_name_col, pa_col, abund_col
Optional custom column names.
- species_cols
Optional numeric or character vector specifying the species columns in a wide input (e.g.
4:11
orc("Sp1","Sp2")
). Overrides the default"sp_*"
detection.
Value
A data.frame
:
- Long format
Columns
site_id
,x
,y
,sp_name
, pluspa
orabund
.- Wide → long
Same columns after stacking the specified or auto-detected species columns.
Details
Column names are auto-detected from common patterns
("site_id"
, "x"
, "y"
, "sp_name"
, "pa"
or "abund"
).
Supply *_col
arguments only when your data use different names.
For wide data the helper normally looks for columns that start with
"sp_"
. Set species_cols
to a numeric range (e.g. 4:11
) or a character
vector of column names when the species columns do not follow the
"sp_*"
convention.
Workflow
Read the data from
source_type
.Detect / insert compulsory columns (site, coords, species, value).
Validate coordinates (-180 ≤ lon ≤ 180, -90 ≤ lat ≤ 90).
Return
a long table (
site_id
,x
,y
,sp_name
,pa|abund
) when species name + value columns are present; ora long table reshaped from wide species columns.
See also
tidyr::pivot_longer()
used internally.
Examples
# 1. Local CSV example -----------------------------------------------
tmp <- tempfile(fileext = ".csv")
df_local <- data.frame(
site_id = 1:10,
x = runif(10), y = runif(10),
sp_name = c("plant1", "plant2","plant3", "plant4","plant5", "plant1", "plant2","plant3", "plant4","plant5"),
abun = sample(0:20, size = 10, replace = TRUE)
)
write.csv(df_local, tmp, row.names = FALSE)
local_test = get_occurrence_data(data = tmp, source_type = "local_csv", sep = ",")
# 2. Existing wide-format data.frame -----------------------------------------------
df_wide <- df_local %>%
pivot_wider(
names_from = sp_name, # these become column names
values_from = abun, # fill these cell values
values_fn = sum, # sum duplicates
values_fill = 0 # fill missing with 0
)
#> Error in pivot_wider(., names_from = sp_name, values_from = abun, values_fn = sum, values_fill = 0): could not find function "pivot_wider"
wide_test = get_occurrence_data(data = df_wide, source_type = "data_frame", species_cols = 4:11)
#> Error: object 'df_wide' not found
# 3. Custom names ----------------------------------------------------------
names(sim_dat)[1:5] <- c("plot_id", "lon", "lat", "taxon", "presence")
#> Error: object 'sim_dat' not found
occ_long2 <- get_occurrence_data(
data = sim_dat,
source_type = "data_frame",
site_id_col = "plot_id",
x_col = "lon",
y_col = "lat",
sp_name_col = "taxon",
pa_col = "presence"
)
#> Error: object 'sim_dat' not found
head(occ_long2)
#> Error: object 'occ_long2' not found
# 4. GBIF download (requires internet) -----------------------------------------------
if (FALSE) { # \dontrun{
gbif_test = get_occurrence_data(
source_type = "gbif",
gbif_zip_url = "https://api.gbif.org/v1/occurrence/download/request/0038969-240906103802322.zip"
)
} # }