
Scrape and Analyze Wikipedia & Trait Data for a Species
Source:R/get_trait_data.R
get_trait_data.Rd
Given a binomial species name, this function retrieves optional metadata from Wikipedia (taxonomic summary, taxonomy, image, color palette) and joins relevant plant/trait data from a TRY-style or user-provided trait table. Fuzzy matching is used for both TRY and local tables to handle minor spelling or naming mismatches.
Usage
get_trait_data(
species,
remove_bg = FALSE,
do_palette = TRUE,
do_taxonomy = TRUE,
do_summary = TRUE,
do_image = TRUE,
bg_thresh = 80,
green_delta = 20,
n_palette = 5,
preview = FALSE,
save_folder = NULL,
use_try = FALSE,
try_data = NULL,
trait_species_col = "AccSpeciesName",
local_trait_df = NULL,
local_species_col = "species",
max_dist = 1
)
Arguments
- species
Character. Species name (binomial, e.g. "Acacia karroo").
- remove_bg
Logical. Remove green/white backgrounds from Wikipedia image? (default: TRUE)
- do_palette, do_taxonomy, do_summary, do_image
Logical. Control which metadata to scrape (default: TRUE for all).
- bg_thresh
Integer. Brightness threshold for white background removal (default: 80).
- green_delta
Integer. How much greener is "green" than R/B? (default: 20).
- n_palette
Integer. Number of colors to extract for palette (default: 5).
- preview
Logical. Show image after processing? (default: TRUE)
- save_folder
Character or NULL. If non-NULL, will save processed PNG image here.
- use_try
Logical. If TRUE, join plant traits using a TRY-format database/table (default: FALSE).
- try_data
Character (path) or data.frame. Path to TRY file, or data frame containing trait data.
- trait_species_col
Name of species column in TRY trait table (default: "AccSpeciesName").
- local_trait_df
Optional. Data.frame of local trait data (can be any species-trait table).
- local_species_col
Name of species column in local trait table (default: "species").
- max_dist
Numeric. Maximum distance for fuzzy join (Levenshtein/Jaro-Winkler; default: 1).
Details
For TRY tables,
TraitName
is used for wide trait columns. For local tables, all columns except the species column are returned.Fuzzy matching is used to allow for spelling or formatting mismatches.
Image-based color palette extraction uses simple k-means clustering; backgrounds can be removed using a color threshold.
Requires: dplyr, purrr, tibble, optionally fuzzyjoin, rvest, httr, stringr, jsonlite, magick, abind.
You can control which metadata are scraped for speed.
Examples
if (FALSE) { # \dontrun{
# Example using TRY table:
get_trait_data("Acacia karroo", use_try = TRUE, try_data = try_traits, trait_species_col = "SpeciesName")
# Example using local trait table:
get_trait_data("Acraea horta", local_trait_df = traits, local_species_col = "species")
# Scrape only metadata (no traits):
get_trait_data("Acacia karroo", use_try = FALSE)
} # }