Data Availability

NEON is currently in the last stages of construction. Data collection during construction began at different times for every data product and every site, and data pipelines were built in parallel with early data collection. As a consequence, current data availability varies considerably by site and by data product, and many data products still have a backlog of data waiting to be processed.

This page describes the overall availability of data for each of NEON’s subsystems, the factors limiting availability, and the work now underway to complete data delivery. For information about the quality of available data, see the Data Quality page.

Observational Data (OS) ⌄ Instrumented Data (IS) ⌄ Airborne Observation Platform Data (AOP) ⌄ Externally Hosted Data ⌄

 

Observational Data (OS)

Collection Frequency

Frequency of data collection varies widely across observational data products. Information about collection frequency can be found in the data product detail pages in the Data Product Catalog and the data product readmes; more detailed collection information can be found in protocol documents.

Overall, OS data collection frequency varies from 3 times per week to once every five years. For products collected every five years, one-fifth of all sites are collected in each year. Due to this schedule, data may not yet be available for some sites. For the tentative schedule of sampling by site by year, out to 2021, see the OS sampling spreadsheet.

Ongoing Data Latency

Observational data are collected by human observers, and are either directly observed or derived from analyses on physical samples collected in the field. The collection and analysis workflows result in four general categories of data latency for OS data:

  1. Data collected directly in the field
  2. Data from samples analyzed in NEON lab facilities
  3. Data from samples shipped to contracted analytical facilities on a rolling basis - i.e., samples are shipped as soon as they are collected or in small batches throughout the year
  4. Data from samples shipped to contracted analytical facilities on an annual basis - i.e., samples are shipped in bulk at the end of the field season

With a few exceptions, the latency duration increases from category 1 to 4. Field-collected data may be available on the NEON Data Portal within a month of collection, while samples analyzed on a rolling basis are typically available between 90 days and nine months after collection. Samples analyzed in bulk annually are a small proportion of OS data and have a latency of slightly more than one year.

Many OS data products include multiple data tables, each of which may fall into a different one of the categories above. For example, in the data product Plant foliar physical and chemical properties (DP1.10026.001), the cfc_fieldData table contains records of leaf collection in the field, the cfc_LMA table contains leaf mass per area measurements collected in the NEON lab, and the cfc_chlorophyll, cfc_elements, and cfc_lignin tables contain analyses performed by contracted facilities. Each data table is published to the data portal as it becomes available, so two months after leaf collection, the cfc_fieldData and cfc_LMA data may be available, while the chemical analyses are still pending. If downloaded data do not include all expected tables, check back later!

Legacy Data Latency

For data collected prior to 2018, additional delays affect availability of some products. For some products and some sites, data collection began as early as 2012. Because data pipelines were not yet in place at the time, a backlog of data built up that has to be cleaned and quality controlled for ingest into the finished pipeline. In addition, some samples were collected without contracts finalized for labs to carry out analyses. Once these backlogs are resolved, all past data are expected to be available on the portal. To check whether data were collected for a given site in a given year, see the OS sampling spreadsheet.

As of fall 2018, all lab contracts are in place. The most common remaining cause of legacy data delay is sample backlog at the subset of analytical facilities where contracts were finalized recently. The only OS data products with a significant backlog are the microbial products, DNA barcode products, soil initial characterization products, and stream discharge products. Data will be added to the portal as they become available.

 

Instrumented Data (IS)

Collection Frequency

Raw sensor measurement frequency varies by data product and ranges from 40 Hz to 1 measurement every 5 minutes. Averaging is commonly applied during the process of transforming raw measurements into the data products served on the NEON Data Portal. Most IS data products are provided at two aggregation intervals, commonly 1 and 30 minutes. The Data Product Catalog provides information on raw sensor measurement frequency and aggregation intervals, in addition to other details about each data product.

Most often, sensors operate continuously after install at a site. However, some sensor locations at some sites are removed seasonally due to adverse or inappropriate measurement conditions. For example, some aquatic sensors are removed during periods in which the lake or stream is dry or frozen, and the lowest 2D wind sensor is removed from towers that experience snow. Currently, data are still processed for specific sensor locations that are removed seasonally. This results in downloadable files with only quality flags populated (no sensor values). Currently OKSR is the only site where the instrumented system is shut down entirely during winter due to lack of power supply. No data are processed during this period (no files available).

Ongoing Data Latency

Instrumented data for the preceding month are published to the NEON Data Portal during the week of the second Monday of the current month. As of September 2018, all sites except TECR and PUUM are online and streaming instrumented data, and most instrumented data products are continuously made available according to this monthly schedule.

Processing of eddy covariance data (DP4.00200.001: Bundled data products – eddy covariance) is still being initialized. Initialization of all sites is expected to be complete by the end of February 2019, with data becoming available as each site is initialized. When complete, ongoing publication of eddy covariance products will also follow the monthly publishing schedule.

Legacy Data Latency

Data prior to September 2018 (and February 2019 for eddy covariance) may contain large gaps in coverage or low availability although the site was operational. We are continuously improving the data processing pipeline to avoid gaps in published data. Legacy data availability will improve as system upgrades are made throughout 2019.

 

Airborne Observation Platform Data (AOP)

Collection Frequency and Data Latency

The AOP collects airborne remote sensing data at a subset of the NEON sites annually. The schedule for each flight season is published on the NEON website several months prior to commencement of collections (see this page for the 2018 season), and provides the time period that the AOP will be at each domain. Precise collection times for a site within the domain’s scheduled time period is not predetermined, as it is primarily driven by weather conditions encountered upon arrival. The total collection time required for each site will vary based on the size of the area covered as well as weather conditions. Under good weather conditions, the smallest NEON sites can be collected in a single day, while the largest could take up to a week. If weather conditions prove unsuitable for collections during the entire scheduled time at a domain, portions of sites or entire sites may not be collected. The AOP data will typically be published on the NEON data portal in less than 60 days after the final collection day at a site. All non-legacy data collected to date are currently available.

Legacy Data Latency

The NEON AOP conducted collections of several sites prior to NEON reaching its operational phase. Collections conducted between 2013 and 2016 were processed with algorithms that had not yet reached their current level of maturity. These ‘legacy’ products are currently provided through the NEON data portal and are concurrently undergoing a re-processing so that they conform in structure and quality with current products. It is expected that the reprocessing of these data products will be complete in early 2019 and the legacy products on the data portal will be replaced.

 

Externally Hosted Data

AERONET

Data streams from the spectral sun photometer are sent directly to NASA for processing. The Aerosol Robotic Network, or AERONET, is run by NASA’s Goddard Space Flight Center and is a central repository for sites around the world that use the same sensor. Data are generally available within a few days of collection.

To locate data, either visit the Data Portal’s catalog entry for the Spectral Sun Photometer - Calibrated Sky Radiances data product, or visit AERONET’s global list of sites and look for site name beginning with “NEON_”. AERONET also does a broad range of data processing and visualization, so this is a great resource for other data, including aerosol optical depth and water vapor.

AmeriFlux

NEON terrestrial sites are also registered with AmeriFlux, a community network for sharing data related to surface-atmosphere fluxes of carbon, water, and energy. A number of relevant NEON data products are sent to AmeriFlux for co-hosting, where they are made available in the same format as other AmeriFlux site data. AmeriFlux also produces NEON’s gap-filled meteorological data products as well as several additional derived data products.

NEON data are sent to AmeriFlux on a quarterly basis, and are made available directly from AmeriFlux following AmeriFlux’s processing timeline. The first data submission to AmeriFlux is scheduled for January 2019.

Barcode of Life Databases (BOLD)

DNA barcoding is a method to help identify or confirm identifications of sampled species, particularly ones that are difficult to identify by morphology. Barcoding is used at NEON 1) for cases where an expert taxonomist or field taxonomist is not able to classify a cryptic or poorly described species or 2) to perform QA/QC on identifications. After the CO1 gene for each sample is sequenced by an external analytical facility, the sequence data and metadata are sent to the Barcode of Life Databases (BOLD). There is one project on BOLD for each of NEON’s four barcoding data products:

For each of these products, sampling data are provided through the NEON data portal, and links are provided to the corresponding project at BOLD.

Since the barcoding data are generated at the end of a long processing chain, including waiting for all of the sampling and expert taxonomist identifications to be completed prior to sample selection for DNA barcoding, the latency can be a year or more. Currently, the analytical facility is working through a backlog of mosquito and beetle samples that should be completed by mid-2019.

MG-RAST, EBI, and NCBI SRA

Several times per year, soil and freshwater (surface and benthic) samples are analyzed for microbial (bacteria, archaea and fungi) content. Similarly, zooplankton and macroinvertebrates are sampled and sequenced. Samples are preserved in ethanol in the field and shipped to a contracting lab for analysis. For all of the above data products, sequence data are uploaded to the Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) portal. There is currently a significant backlog of sample analyses for microbial samples and subsequent upload of sequence data to MG-RAST, extending back through 2016 samples. These data are expected to be available by mid-2019. Once the analytical facility is up to date with sample analysis and the data upload processes have matured, the lag between data collection and publication of metadata on the NEON data portal and availability of sequence data at MG-RAST should decrease to 6-12 months. The same data are propagated by MG-RAST to the European Bioinformatics Institute (EMBL-EBI) and from EBI to the US National Center for Biotechnology Information Sequence Read Archive (NCBI SRA) under BioProject 395925.

Phenocam Gallery

NEON has deployed a Stardot NetCam on the top of all terrestrial towers to study above-canopy phenology. Every 15 minutes each camera captures back-to-back Red, Green, Blue (RGB) and Infrared (IR) images. Over time, these images can be used to detect seasonal changes in vegetative canopies (e.g., onset of leaf growth and senescence). At all aquatic sites, photos are collected that capture the land-water interface. Photos may also be used for qualitative estimates of snow cover, riparian characteristics, or weather.

Images are sent to and processed by PhenoCam, a cooperative network that archives and distributes imagery and derived data products from digital cameras deployed at research sites across North America and around the world. NEON’s phenocam images are generally available within one day for viewing and downloading from the PhenoCam Gallery, along with images and data from other phenocam sites across the world. Phenocam also produces a filtered view of all of the NEON cameras. To browse a filtered view of just NEON cameras, as well as daily and monthly tallies of the number of images that have been received.