Environmental Geoinformatics Laboratory
Data Science

Data Science and Analysis

Understanding uncertainty and uncovering unseen connections

It is crucial to thoroughly analyze scientific results in order to draw conclusions with confidence. However, the effects of uncertainty are often not properly considered, partly because accounting for and combining multiple sources of error is often non-trivial. For example, uncertainty in sediment accumulation rate is often overlooked. There is of course uncertainty in the measured age of the sediment, but there is also additional uncertainty due to sediment deformation during the coring process. Neither of these errors are normally distributed which further hinders assessment of how it affects results, and whether the observed trends and variability are significant and distinguishable from noise.
To better appreciate how multiple sources of uncertainty impact our interpretations, we apply a number of statistical methods to the analysis of results. Non-normally distributed errors can be assessed using Monte Carlo methods, as well as Bayesian analysis. Physical processes can be modeled to simulate how signals can be modified during and after sediment deposition.
We also use computer vision, deep learning, and big data techniques to further understand processes affecting preservation of environmental information in the geologic records.

Data Science Themes

Statistical Analysis

Students in this lab will learn and use basic statistical techniques to evaluate their data in order to make better-informed interpretations. This includes considering both measurement error, geochronological error, and, when possible, sediment deformation-induced error. Students will learn to evaluate the likelihood that a particular interpretation is true and will learn how to present results when there is uncertainty.

Estimation of Uncertainty and Improving Confidence

Quantifying uncertainty is particularly important in geochronology. While a radiocarbon date may have an analytical uncertainty of, for example, ± 30 years, once calibrated to calendar years, the uncertainty becomes much higher and non-normally distributed. When the ages of strata between age-depth determinations is modeled, the uncertainty in those assigned ages further increases. Properly understanding the involved uncertainty and how it would affect interpretations regarding timing of events is critically important to making informed conclusions.

Computer Vision

We are using computer vision for a number tasks, including image segmentation, finding regions of interest, and matching common features between pairs of images. Matching points between pairs of images is especially useful for creating composite images of split sediment core surfaces with very little distortion by taking a series of overlapping images, then stitching them together, similar to how a panorama of a landscape is created. However in this case, we need to have maximum control over the image transformations taking place, hence we develop our own methods.

Deep Learning

We are also developing methods for using AI to perform interpretation of sediment core structures and sequences. Thus far we have trained a neural network to segment images of sediment cores based on the degree of disturbance, either biological or physical, for creating color reflectance records that are largely free of contaminating artifacts. We are also working on using AIs to automate time-consuming tasks of precisely cropping out the core from the background, as well as isolating the scale bar, and automatically determining the length of the sediment core.