Data Quality Program
NEON is committed to delivering high-quality, research-grade data products to the ecological community.
NEON is currently in the last stages of construction. Data collected during construction have not necessarily been subject to full quality assurance and quality control procedures. For example, early observational data were collected using paper datasheets instead of mobile data entry applications. During NEON's construction phase, data products will continue to be added and all are considered as construction-grade quality in the near term. Atmospheric data collected before June 2014 are undergoing reprocessing.
As NEON moves into its early operational phase, data QA/QC procedures will continue to be refined and revised. Input from the user community is both solicited and encouraged. Formal input is encouraged through our advisory groups. Informal feedback can be contributed through our portal feedback page.
|Observation Systems (OS) ⌄||Instrumented Systems (IS) ⌄||Airborne Observing Platforms (AOP) ⌄|
Quality Information for Observational Systems Data Products
The data quality strategy for Observational Systems (OS) data is primarily focused on controlling and validating data entry. Every download of an OS data product includes an associated Validation file, which contains the validation and automated creation rules applied to entry and ingest of the data in question. The rules appear in three columns in the Validation file: entryValidationRulesForm, entryValidationRulesParser, and parserToCreate. All three columns’ rules are written in Nicl, the NEON Ingest Conversion Language.
entryValidationRulesForm: These are the rules implemented in the mobile data entry application for the data product. Some data products, and some tables within data products, are ingested by spreadsheet, in which case this column will be blank. Rules in this column are not machine-read, but interpreted by an application developer, so may deviate slightly from strict Nicl.
NEON’s data entry applications are developed using the platform provided by Fulcrum. The applications are designed to follow the workflow set out in each data collection protocol, with data entry constraints as documented in the entryValidationRulesForm. Typical constraints include numeric thresholds; choice lists of valid values for specific fields, such as genus and species names; conditional validation, such as species lists restricted by location; auto-population of sample identifiers; and dynamic availability of fields and app subsections, depending on data entered.
entryValidationRulesParser: These are the rules applied by the OS Parser, the custom software that ingests OS data into the NEON database. Violation of any of these rules prevents data ingest. These are machine read.
parserToCreate: Instead of validating or rejecting data inputs, these rules are used to generate data for specific fields, based on other fields in the data. Most frequently these are either simple arithmetic or derivation of sample-related data. These are machine read.
Fig 1. Screenshot of Fulcrum app for ground beetle sorting
In addition to controlled data entry, most OS data products include product-specific quality flags, and, where relevant, quality and uncertainty data from analytical laboratories. Many physical samples collected by NEON are analyzed by external laboratories, which are reviewed annually by the NEON Calibration and Validation team to ensure they are meeting agreed-upon standards of process and data quality. Laboratories typically provide per-sample data flagging, as well as long-term performance data for methods and instruments.
For products with an expanded data package option, quality information is generally in the expanded package. Details can be found in the Data Quality section of the Data Product User Guide for each product, available for download along with the data.
Quality Information for Instrumented Systems Data Products
The majority of quality information for Instrumented Systems (IS) data can be found in the expanded data package. Both the basic and expanded packages contain the same data variables, such as the mean, minimum, maximum, uncertainty, etc. over the aggregation interval (e.g. 30-minute). In addition to these data variables, the basic package typically contains a final quality flag, which aggregates the results of all quality control tests into a single indicator of whether the data point is considered trustworthy (0) or suspect (1).
The expanded package includes the final quality flag as well as quality metrics summarizing the results of each quality test over the aggregation interval. Three quality metrics per test convey the proportion of raw measurements that passed the test, failed the test, or whether the test could not be run (indeterminate). The results of all quality tests are aggregated into alpha and beta quality metrics, which respectively summarize the proportion of raw measurements that failed or were indeterminate for any of the applied quality tests (Smith et al. 2014). Using pre-determined thresholds, the alpha and beta quality metrics are used to compute the final quality flag. Details about the automated quality flags and metrics can be found in the Plausibility ATBD, De-spiking ATBD, and Quality Flags and Metrics ATBD.
Fig 2. IS quality flag computations. Adapted from Smith et al. 2014.
Computation of the final quality flag from the alpha and beta quality metrics can be overridden by the science review flag. If the data are determined to be suspect due to known adverse conditions not captured by automated flagging, the science review flag is raised (1), which in turn raises the final quality flag (1) regardless of its computed value. The science review flag is included in the expanded download package.
The Algorithm Theoretical Basis Document (ATBD) for each data product specifies which quality tests are applied, along with whether failure of each quality test results in removal of raw data prior to aggregating. The ATBD is available for download with each data product.
Quality Information for Airborne Observation Platform Data Products
The AOP payload consists of the NEON Imaging Spectrometer (NIS), waveform and discrete LiDAR, and a high-resolution digital camera and two Global Position system (GPS)/Inertial Measurement Units (IMUs). The GPS / IMU sensors provide high quality position and orientation information for the airborne trajectory which allows for a rigorous geolocation and high absolute and relative positioning between the sensors.
The quality of AOP data are highly dependent on proper payload operation, maintenance, and calibration. Each season the AOP sensors are lab assessed in the Sensor Test Facility, undergoing diagnostic tests to ensure the instruments are behaving correctly, as well as calibration to ensure inter-annual data consistency. With each flight campaign, pre and post calibration flights are performed in Boulder to ensure the calibration and geolocation of all instruments is acceptable. Throughout the flight season, the quality of the data are monitored through vicarious calibration targets and sensor self-diagnostic data streams. Results of lab-calibrations, or pre and post flight campaign calibration flights can be made available upon request.
For each flight, a series of L0 quality checks are performed to ensure the fidelity of the raw sensor streams. Once verified, the data are backed-up and a copy is shipped to NEON’s long-term archival storage. After archive, NEON scientists process the data to higher level data products (L1+) which are delivered to the public. Throughout processing, several automated QA checks are performed to ensure the data meets NEON accuracy requirements. Automated L1+ QA/QC checks also generate data quality reports that are reviewed by Science staff and are distributed with the data. The QA reports also contain information on acquisition / processing parameters / calibration parameters used in data processing. The QA reports are the primary means for communicating AOP data quality to the public. Currently, five reports are produced that include:
1. QA of the SBET
2. QA of the L1 processing of LiDAR data (point cloud)
3. QA of the L1+ processing of discrete LiDAR data
4. QA of the at-sensor radiance processing of the NIS data
5. QA of the reflectance and L2 NIS products
The following images show examples of the QA information that can be obtained from the QA reports. The first shows the airborne trajectory, colored by the estimated error in the position, the second shows the point spacing of acquired LiDAR points. Users are encouraged to explore the available QA documents which are currently available through the Citrix ShareFile system (sign up here for access), and are planned for addition to data portal downloads.
Fig 3. Sample image of the position error in the airborne trajectory from the SBET (Smoothed Best Estimated Trajectory) QA report
Fig 4. Sample image of LiDAR point spacing from the discrete lidar processing QA report
Smith DE, Metzger S, Taylor JR (2014) A Transparent and Transferable Framework for Tracking Quality Information in Large Datasets. PLOS ONE 9(11): e112249.https://doi.org/10.1371/journal.pone.0112249