There has been an exponential increase in the quantity and type of biodiversity data in recent years, including presence-absence, counts, and presence-only citizen science data. Species Distribution Models (SDMs) have...
详细信息
There has been an exponential increase in the quantity and type of biodiversity data in recent years, including presence-absence, counts, and presence-only citizen science data. Species Distribution Models (SDMs) have typically been used in ecology to estimate current and future ranges of species and are a common tool used when making conservation prioritization decisions. However, the integration of these data in a model-based framework is needed to address many of the current large-scale threats to biodiversity. Current SDM practice typically underutilizes the large amount of publicly available biodiversity data and does not follow a set of standard best practices. Integrating different data types with open-source tools and reproducible workflows saves time, increases collaboration opportunities, and increases the power of data inference in SDMs. We aim to address this issue by (1) proposing methods and (2) generating a reproducible workflow to integrate different available data types to increase the power of SDMs. We provide the R package intSDM, as well as guidance on how to accommodate users' diverse needs and ecological questions with different data types available on the Global Biodiversity Information Facility (GBIF), the largest biodiversity data aggregator in the world. Finally, we provide a case study of the application of our proposed reproducible workflow by creating SDMs for vascular plants in Norway, integrating presence-only and presence-absence species occurrence data as well as climate data.
Nuclear magnetic resonance (NMR) spectroscopy is a useful tool for detection and identification of molecular structural information, with increasing applications in environmental sciences. NMR instrument outputs are h...
详细信息
Nuclear magnetic resonance (NMR) spectroscopy is a useful tool for detection and identification of molecular structural information, with increasing applications in environmental sciences. NMR instrument outputs are however heterogeneous and require extensive post-processing, creating barriers to their use and application by non-specialists. Here, we report on a new open-source R package, nmrrr, that processes and visualizes spectral data obtained from one-dimensional solution-state and solid-state NMR experiments;the package also performs relevant calculations commonly applied in natural organic matter communities, such as computing the relative abundance of various functional groups. We document the package's installation, dependencies, and functions;and provide a standard workflow for processing NMR data. This package is currently available on CRAN and GitHub, and community contributions are welcome.
Agent-based models find wide application in all fields of science where large-scale patterns emerge from properties of individuals. Due to increasing capacities of computing resources it was possible to improve the le...
详细信息
Agent-based models find wide application in all fields of science where large-scale patterns emerge from properties of individuals. Due to increasing capacities of computing resources it was possible to improve the level of detail and structural realism of next-generation models in recent years. However, this is at the expense of increased model complexity, which requires more efficient tools for model exploration, analysis and documentation that enable reproducibility, repeatability and parallelization. NetLogo is a widely used environment for agent-based model development, but it does not provide sufficient built-in tools for extensive model exploration, such as sensitivity analyses. One tool for controlling NetLogo externally is the r-package RNetLogo. However, this package is not suited for efficient, reproducible research as it has stability and resource allocation issues, is not straightforward to be setup and used on high performance computing clusters and does not provide utilities, such as storing and exchanging metadata, in an easy way. We present the r-package nlrx, which overcomes stability and resource allocation issues by running NetLogo simulations via dynamically created XML experiment files. Class objects make setting up experiments more convenient and helper functions provide many parameter exploration approaches, such as Latin Hypercube designs, Sobol sensitivity analyses or optimization approaches. Output is automatically collected in user-friendly formats and can be post-processed with provided utility functions. nlrx enables reproducibility by storing all relevant information and simulation output of experiments in one r object which can conveniently be archived and shared. We provide a detailed description of the nlrx package functions and the overall workflow. We also present a use case scenario using a NetLogo model, for which we performed a sensitivity analysis and a genetic algorithm optimization. The nlrx package is the first framework fo
Textbook data is essential for teaching statistics and data science methods because it is clean, allowing the instructor to focus on methodology. Ideally textbook datasets are refreshed regularly, especially when they...
详细信息
Textbook data is essential for teaching statistics and data science methods because it is clean, allowing the instructor to focus on methodology. Ideally textbook datasets are refreshed regularly, especially when they are subsets taken from an ongoing data collection. It is also important to use contemporary data for teaching, to imbue the sense that the methodology is relevant today. This article describes the trials and tribulations of refreshing a textbook dataset on wages, extracted from the National Longitudinal Survey of Youth (NLSY79) in the early 1990s. The data is useful for teaching modeling and exploratory analysis of longitudinal data. Subsets of NLSY79, including the wages data, can be found in supplementary materials from numerous textbooks and research articles. The NLSY79 database has been continually updated through to 2018, so new records are available. Here we describe our journey to refresh the wages data, and document the process so that the data can be regularly updated into the future. Our journey was difficult because the steps and decisions taken to get from the raw data to the wages textbook subset have not been clearly articulated. We have been diligent to provide a reproducible workflow for others to follow, which also hopefully inspires more attempts at refreshing data for teaching. Three new datasets and the code to produce them are provided in the open source R package called yowie. Supplementary materials for this article are available online.
Background Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide t...
详细信息
Background Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. To guide forthcoming genome generation efforts and promote efficient prioritization of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data. Findings Here we present an automated analysis workflow that surveys genome assemblies from the United States NCBI, assesses their completeness using the relevant BUSCO datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, examine how key assembly metrics relate to gene content completeness, and compare results from using different BUSCO lineage datasets. Conclusions These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritizations for ongoing and future sampling, sequencing, and genome generation initiatives.
Background: The ever-increasing volume of academic literature necessitates efficient and sophisticated tools for researchers to analyze, interpret, and uncover trends. Traditional search methods, while valuable, often...
详细信息
Background: The ever-increasing volume of academic literature necessitates efficient and sophisticated tools for researchers to analyze, interpret, and uncover trends. Traditional search methods, while valuable, often fail to capture the nuance and interconnectedness of vast research domains. Results: TopicTracker, a novel software tool, addresses this gap by providing a comprehensive solution from querying PubMed databases to creating intricate semantic network maps. Through its functionalities, users can systematically search for desired literature, analyze trends, and visually represent co-occurrences in a given field. Our case studies, including support for the WHO on ethical considerations in infodemic management and mapping the evolution of ethics pre- and post-pandemic, underscore the tool's applicability and precision. Conclusions: TopicTracker represents a significant advancement in academic research tools for text mining. While it has its limitations, primarily tied to its alignment with PubMed, its benefits far outweigh the constraints. As the landscape of research continues to expand, tools like TopicTracker may be instrumental in guiding scholars in their pursuit of knowledge, ensuring they navigate the large amount of literature with clarity and precision.
暂无评论