Introduction to Stratigraphic Data Analysis (SDAR)

John Ortiz 1,2,3, Carlos Jaramillo 1,2

1 Smithsonian Tropical Research Institute, Balboa, Ancón, Republic of Panama, 2 Corporación Geológica ARES, Bogotá, Colombia. 3 Servicio Geológico Colombiano, Bogotá, Colombia.

SDAR is a fast and consistent tool for plotting and facilitating the analysis of stratigraphic and sedimentological data, designed to plot detailed stratigraphic sections and to perform quantitative stratigraphic analyses.

Introduction

Stratigraphic Columns (SC) are the most useful and common ways to represent the field descriptions (e.g., grain size, the thickness of rock packages, fossil content and lithological components) of rock sequences and well logs. In these representations, the width of SC vary according to the grain size (i.e., the wider the strata, the coarser the rocks (Miall 1990; Tucker 2011), and the thickness of each layer is represented at the vertical axis of the diagram. Typically these representations are drawn 'manually' using vector graphic editors (e.g., Adobe Illustrator®, CorelDRAW®, Inskape). Nowadays there are various software packages which automatically plots SCs, but there are not versatile open-source tools and it is very difficult to both store and analyse stratigraphic information.

This document presents Stratigraphic Data Analysis in R (SDAR), an analytical package designed for both plotting and facilitate the analysis of Stratigraphic Data in R (R Core Team 2019). SDAR, uses simple stratigraphic data and takes advantage of the flexible plotting tools available in R to produce detailed SCs. The main benefits of SDAR are:

Getting started

To install SDAR package from CRAN:

install.packages("SDAR")

Workflow

The standard workflow in SDAR consists of


DATA: saltarin_beds

To explore the functionalities of SDAR, we will use the publicly available dataset of Saltarin well, saltarin_beds is the example dataset available within SDAR, this dataset gives a lithologic description for borehole Saltarin 1A, located in the Llanos Basin in eastern Colombia (4.612 N, 70.495 W). The stratigraphic well Saltarin 1A drilled 671 meters of the Miocene succession of the eastern Llanos basin, corresponding to the Carbonera (124.1 m; 407.1 ft), Leon (105.1 m; 344.8 ft), and Guayabo Formations (441.8 m; 1449.5 ft) (Bayona, et al. 2008). The Saltarin core was described at a scale of 1:50 for identification of grain-size trends, sedimentary structures, clast composition, the thickness of lamination, bioturbation patterns, and macrofossil identification, all of which are used for identifying individual lithofacies and for sedimentological and stratigraphic analyses (Jaramillo et al., 2017).

The command data(saltarin_beds) will load the dataset saltarin_beds into the current R session.

library(SDAR)     # Load SDAR library
data(saltarin_beds)     # load Saltarin demo dataset
class(saltarin_beds)
#> [1] "data.frame"

# check the content and the structure of Saltarin_beds dataset

nrow(saltarin_beds)     # number of rock layers
#> [1] 686
ncol(saltarin_beds)     # number of variables recording composition and texture description of each layer
#> [1] 22
names(saltarin_beds)     # variable names of composition and texture description of each layer
#>  [1] "bed_number"           "base"                 "top"                 
#>  [4] "rock_type"            "prim_litho"           "grain_size"          
#>  [7] "prim_litho_percent"   "sec_litho"            "grain_size_sec_litho"
#> [10] "sec_litho_percent"    "base_contact"         "grading"             
#> [13] "grain_size_base"      "grain_size_top"       "sorting"             
#> [16] "roundness"            "matrix"               "cement"              
#> [19] "fabric"               "munsell_color"        "Rcolor"              
#> [22] "notes"

Note that saltarin_beds is a data frame object with 686 layers (rows), and 22 variables (columns) storing thickness, composition and texture description of each layer, stored following the suggested format by SDAR (to get more details about the specific types of data required by SDAR, check SDAR_data_model vignette).
In order to draw a stratigraphic layer in SDAR, the minimum information required for each layer is bed_number, thickness (i.e, it is defined by a base and a top), rock_type, prim_litho, and grain_size. In summary, a table with the structure presented in table 1 must be provided.

Table 1: Example of beds/layers table.

This example is from a borehole core where depths are measured down from the surface,
therefore “base” is greather than “top”.

bed_number base top rock_type prim_litho grain_size
1 671 670.2 sedimentary claystone clay
2 670.2 669.4 covered
3 669.4 669.18 sedimentary sandstone medium sand
4 669.18 667.6 sedimentary limestone wackestone
5 667.6 667.2 sedimentary conglomerate boulder
6 667.2 666.2 sedimentary shale silt


# header of the mandatory fields of "saltarin_beds" dataset to draw a graphic log using SDAR
head(saltarin_beds[,1:6])
#>   bed_number   base    top   rock_type prim_litho grain_size
#> 1          1 671.00 670.20 sedimentary  claystone       clay
#> 2          2 670.20 669.40 sedimentary  siltstone       silt
#> 3          3 669.40 669.18 sedimentary  siltstone       silt
#> 4          4 669.18 667.60 sedimentary  claystone       clay
#> 5          5 667.60 667.20 sedimentary  siltstone       silt
#> 6          6 667.20 666.20 sedimentary  siltstone       silt

NOTE: The SDAR project includes the development of a graphic user interface to connect this R package with a database management system; for this reason the structure of the data and headers (column names) should be followed in order to match the database structure.
To improve communication between geoscientists, some conventions, defined by sedimentologists to draw lithology patterns, and to describe grain size, color and so on, are implemented. Details on the information required to define a layer and the sources for the conventions implemented are provided in the vignette “SDAR data model”.

vignette("SDAR_data_model")

Getting your own data into R

We have provided on the SDAR repository a template of the data format used by SDAR as a Microsoft Excel spreadsheet, SDAR_v0.95_beds_template.xlsx. This is the suggested format by SDAR to store thickness, composition and texture description of rock layers (beds). The data for each bed should be presented as a row, with columns for each of the parameters entered for that bed (e.g., thickness, lithology, grain size and so on).

The simplest way to get your stratigraphic data into R for use with SDAR is to fill out the SDAR beds Excel template and import this file into R. There are several functions to load Excel files into R, below are the steps to import an Excel file using the readxl package.

To install readxl package from CRAN:

install.packages("readxl")

In order to import an Excel file, navigate to your working directory (for example, with setwd()), or add the full path where your file is stored to the read_excel function.

library (readxl)     # load the readxl package
my_beds <- read_excel("file_name.xlsx")     # on your working directory 
my_beds <- read_excel("Path where your Excel file is stored/file_name.xlsx")     # setting full path

# Notice that the separator between folders is forward slash (/), as it is on Linux and Mac systems.
# When working in Windows, you need to either use the forward slash or using double backslash (\\).
my_beds <- read_excel("C:\\Users\\john\\Desktop\\File_name.xlsx")     # full path example in windows systems

Additional external data examples

The Saltarin well example dataset available within SDAR is also accesible in Excel format, it is available in installed files folder inst/extdata, to find inst/extdata/SDAR_v0.95_beds_saltarin.xlsx, you need to call system.file("extdata", "mydata.xlsx", package = "mypackage").

#  Read the SDAR beds external data example (Excel file format)
library (readxl) 
fpath <- system.file("extdata", "SDAR_v0.95_beds_saltarin.xlsx", package = "SDAR")
beds_data <- read_excel(fpath)

nrow(beds_data)   # number of rock layers
#> [1] 686
names(beds_data)  # variable names of composition and texture description of each layer
#>  [1] "bed_number"           "base"                 "top"                 
#>  [4] "rock_type"            "prim_litho"           "grain_size"          
#>  [7] "prim_litho_percent"   "sec_litho"            "grain_size_sec_litho"
#> [10] "sec_litho_percent"    "base_contact"         "grading"             
#> [13] "grain_size_base"      "grain_size_top"       "sorting"             
#> [16] "roundness"            "matrix"               "cement"              
#> [19] "fabric"               "munsell_color"        "Rcolor"              
#> [22] "notes"

Data validation - the strata class

Validating data is all about checking whether a dataset meets all the requirements it must to fulfill, and the strata function makes it easy for you to check if your stratigraphic data satisfy the defined SDAR data model. The SDAR package introduces a new S4 object class called strata to store stratigraphic data. This S4 class gives a rigorous definition of a strata object. The valid object of this S4 class will meet all the requirements specified in the definition (e.g., the names of the columns must be called: bed_number, base, top, rock_type, prim_litho, grain_size, also base and top must be of a numeric type). The definition of this S4 class reduces errors. It recognizes the type of information that the object contains, and the validity of it (wickham 2014).

The strata class provide an additional argument called datum, this parameter allows users to define the horizontal reference datum. The options are base or top; base is the case when thickness is measured up from the bottom of, e.g., an outcrop section; top is the case when depths are measured down from the surface, e.g., boreholes and cores. The default options is datum = "top"

# strata function automatically validates the inputted dataset
# and returns a stratigraphy class object.

validated_beds <- strata(saltarin_beds)
#>    'beds data has been validated successfully'

# check the class of the object generated by the strata function
class(validated_beds)
#> [1] "strata"
#> attr(,"package")
#> [1] "SDAR"

The previous chunk of code validated the inputted dataset saltarin_beds and returns a new strata class object validated_beds. The fact that there are no warnings or errors beds data has been validated successfully means that indeed each row (bed/layer) information in the input data, successfully satisfy the expectations in SDAR data model  (an error would occur for example, if we’d misspell sandstone). By default, all errors and warnings are printed out on the R console screen when validation rules are confronted with input data. The following example contains an error specification Error: Check row numbers 3, 7. values (sandtone, mudston) are 'prim_litho' not register in 'litho.table'. (note that sandstone and mudstone are misspelled, therefore the error is caught and shown in the R console). In beds/layers stratigraphic overlapping is not allowed, if overlapping occurs strata function will print an error on screen and return a dataframe object with the overlapping intervals.

In order to validate data from an outcrop / stratigraphic section, set the parameter datum = "top"

# datum = "base" must be selected when stratigraphic distance above datum 
# increases upwards (toward younger levels, as a stratigraphic section).
outcrop_validated_beds <- strata(my_outcrop_beds, datum = "base")

Methods within the strata class

In this version of SDAR package, the methods associated with the strata class are plot and summary. Once the stratigraphy data is loaded into R, and sucessfully validated on the strata class, we are able to plot strata class objects to visualise the information. The plot method provides different outputs depending on the parameter settings. The summary method displays standard information about the strata class object. The summary function displays a synopsis of the content in the strata object including the total number of layers, the thickness of the study section and the number of layers by lithology type, and grain size.

Plot method for strata class

The minimal information required to plot a stratigraphic column using SDAR is a table with the structure presented in table 1. Having a defined and a validated dataset, as a strata class, the plot method plot.strata is accessed automatically.

# Code to generate example presented in Figure 1.
library(SDAR)     # load SDAR library
data(saltarin_beds)     # load Saltarin beds dataset
validated_beds <- strata(saltarin_beds)     # validates the Saltarin_beds dataset
plot(validated_beds)     # plot a stratigraphic log with the SDAR default options
# The default parameters are: `datum = "top"`, `data.units = "feet"`, 
# `scale = 100`, and `barscale = 2`

output1

              Figure 1: Output example of the plot method for a strata class. The Saltarin datased was previously
            validated into a strata class, here it is plotted using the default parameters.


Setting up drawing scale, and the unit of measurement

This plotting parameter (scale) enables users to employ different drawing scales (graphic vertical scaling). It defines the vertical scale to draw the graphic log, from 1:1 to any desired scale (e.g., 1:50, 1:200, 1:500). Moreover, the data.units parameter allows users to specifies the unit of measure of the stratigraphic thickness used in input data (thickness measured in field), the user defines whether the data were measured in meters or feet, default unit ’feet’.

# Code to generate example presented in Figure 2.
plot(validated_beds, data.units="meters", scale=300, barscale=5)
# plot Saltarin dataset at 1:300 scale in meters (meters was the measure unit in the description
# process of Saltarin well), and thickness marks and labels each 5 meters, by default the bar scale is 
# plotted at the left side of the lithology track.

output2

              Figure 2: Saltarin dataset setting the parameter data.units = “meters”, scale 1:300, and barscale = 5.


Drawing a specific interval for a given outcrop section or borehole log

Given that the stratigraphic information is stored in a numerical format, SDAR provides the option to draw a specific interval for a given outcrop section or borehole log. The parameters included in plot function that allows this functionality are:

# Code to generate the example presented in Figure 3.
plot(validated_beds, data.units="meters", subset.base=614, subset.top=597)

output3

              Figure 3: The beds included into the stratigraphic interval defined by subset.base and subset.top parameters [614 - 597 meters] are plotted.


Graded Bedding - Modifying grain size of a specific layer

Often the grain size is not a constant parameter throughout a rock layer, for that reason, in a detailed field description geologists include the grain size variation. Usually, the grain size is described at the bottom and at the top of the layer. Grading commonly consists of an upward decrease in grain size (normal grading), however, certain sedimentary process result in an upward increase in grain size (inverse grading). When grading is normal or inverse, the grain size of the base and top must be provided in the format presented in Table 2.

Table 2: Example of beds/layers table including grading information.

In order to include and represent gradding information in SDAR, the columns grading,
grain_size_base, and grain_size_top must be included in beds/layers table.

bed_number base top rock_type prim_litho grain_size grading grain_size_base grain_size_top
1 671 670.2 sedimentary claystone clay
2 670.2 669.4 covered
3 669.4 669.18 sedimentary sandstone medium sand normal coarse sand fine / medium sand
4 669.18 667.6 sedimentary limestone wackestone normal packstone wackestone
5 667.6 667.2 sedimentary conglomerate boulder inverse cobble boulder
6 667.2 666.2 sedimentary shale silt


Plotting interval features

In the previous sections it was presented how SDAR represents the information associated with beds. Here, how SDAR integrate intervals attributes (e.g., bioturbation, sedimentary structures) is presented.

An interval is defined over a stratigraphic range; it has to be defined by a base and a top, the main requirement to set an interval is that the recorded geological feature (e.g., sedimentary structures, bioturbation, unit name, fossil content) is presented throughout the defined stratigraphic range.

In the data structure to define intervals, the user must define a stratigraphical base, top, and the recorded feature of each interval as is presented in Table 3. Each row in this data array describes a stratigraphic interval with the feature described on it (to get more details about the specific types of data required by SDAR, check SDAR_data_model vignette). The interval features available to integrate in this SDAR version are:

Table 3: Examples of interval tables.
Bioturbation
base top index
669.4 669.2 intense
668.6 668.2 moderate
665.2 665.0 moderate
661.4 659.9 low
637.5 637.0 low
Sedimentary structures
base top sed_structure
671 670.2 cross bedding
671.5 671.5 climbing ripples
669.4 669.18 lenticular lamination
668.2 667.6 normal grading
667.2 666.2 wavy lamination


We have provided on the SDAR repository a template of the data format used by SDAR as a Microsoft Excel spreadsheet, SDAR_v0.95_intervals_template.xlsx. This is the suggested format by SDAR to store interval information (e.g., bioturbation, sedimentary structures, and so on).

Import your own intervals data into R

In order to import a sheet from an Excel file, navigate to your working directory (for example, with setwd()), or add the full path where your file is stored to the read_excel function, and specify the sheet to read with a number or name (the name of a sheet) or (the position of the sheet).

# Specify sheet by its name
my_int_data <- read_excel("file_name.xlsx", sheet= "data")     # on your working directory 
my_int_data <- read_excel("Path where your Excel file is stored/file_name.xlsx", sheet= "data")  # full path

# Specify sheet by its index
my_int_data <- read_excel("file_name.xlsx", sheet= 1)

# Notice that the separator between folders is forward slash (/), as it is on Linux and Mac systems.
# When working in Windows, you need to either use the forward slash or using double backslash (\\).
# full path example in windows systems
my_int_data <- read_excel("C:\\Users\\john\\Desktop\\File_name.xlsx", sheet= "data")     

The Saltarin intervals dataset is available in Excel format, it is available in installed files folder inst/extdata, to find inst/extdata/SDAR_v0.95_intervals_saltarin.xlsx, you need to call
system.file("extdata", "mydata.xlsx", package = "mypackage").

#  Read the bioturbation external data example (Saltarin intervals Excel file format)
fpath <- system.file("extdata", "SDAR_v0.95_intervals_saltarin.xlsx", package = "SDAR")
bioturbation_data <- read_excel(fpath, sheet = "bioturbation")     # import bioturbation sheet

nrow(bioturbation_data)   # number of bioturbated intervals
#> [1] 151
bioturbation_data  # header of Saltarin bioturbation dataset
#> # A tibble: 151 x 3
#>     base    top index   
#>    <dbl>  <dbl> <chr>   
#> 1 669.4  669.2  intense 
#> 2 668.6  668.2  moderate
#> 3 665.15 664.95 moderate
#> 4 661.4  659.9  low     
#> # … with 147 more rows

Import Saltarin intervals dataset

# import core_number data
core_number_data <- read_excel(fpath, sheet = "core_number")
# import samples data
samples_data <- read_excel(fpath, sheet = "samples")
# import sedimentary structures data
sed_structures_data <- read_excel(fpath, sheet = "sed_structures")
# import fossils data
fossils_data <- read_excel(fpath, sheet = "fossils")

# import other symbols data
other_symbols_data <- read_excel(fpath, sheet = "other_symbols")
# import lithostratigraphy data
litho_data <- read_excel(fpath, sheet = "lithostra")
# import chronostratigraphy data
crono_data <- read_excel(fpath, sheet = "chronostra")

Display interval features

Plot setting parameters allows users to integrate features to the graphic log (e.g. sedimentary structure, fossil content, unit name). These elements will be plotted on the right or left side of the lithological column. Each one of these additional features will be displayed as symbols, graphic bar, or points at the right or left side of the lithological column. Figure 4 presents the way that SDAR represents the interval attributes.

# Code to generate example presented in Figure 4.
plot(validated_beds, data.units="meters",  
    subset.base=664, subset.top=649, 
    bioturbation=bioturbation_data,
    fossils=fossils_data, 
    sed.structures=sed_structures_data,
    other.sym=other_symbols_data, 
    samples=samples_data, 
    ncore=core_number_data, 
    lithostrat=litho_data, 
    chronostrat=crono_data, 
    symbols.size=0.8)
# For the performance of this example only a subset of the data is plotted. In order to plot
# the complete Saltarin Well dataset, suppress subset.base=664, and subset.top=649" parameters

output4

          Figure 4: Graphic log of Saltarin well for [664 - 649 meters] interval, adding symbol features
          representation (e.g, sedimentary structures, fossil content, samples), bioturbation,
          and lithostratigraphic and chonostratigraphic framework.


SDAR output

Figures 1-4 present examples of graphic logs generated automatically using SDAR packages after the stratigraphic information has been correctly loaded and validated into R. Graphic log generated by SDAR is exported as PDF files (completely editable with any vector drawing application). It will present on a single page, and the paper size will automatically be updated by changes in the vertical scale, or when different sets of attributes are plotted on the right or left side of the lithological column (check the working directory for the PDF output file).

If you see problems with the PDF output, remember that the problem is much more likely to be in your viewer than in R. Try another viewer if possible, browsers as Mozilla Firefox and Google Chrome provide an excellent rendering engine for PDF files.

Summary method for strata class data

In this section, the functionality of the summary method is presented. When summary function is executed with a strata class object, the results are printed in the R console. The summary function displays a synopsis of the content in the strata object. It includes the total number of layers, the thickness of the SC, the thickness of covered intervals, thickness percent and the number of layers by lithology type, into the study SC. The results of running summary function with the example dataset are printed below.

summary(validated_beds)
#>                                  
#>  Number of beds:              610
#>  Number of covered intervals   76
#>                                        
#>  Thickness of the section:        671.0
#>  Thickness of covered intervals:   77.9
#> 
#> Summary by lithology: 
#>  
#>                 Thickness  Percent (%)  Number beds
#> sandstone           233.3        34.77          330
#> claystone           211.6        31.53          130
#> siltstone           143.4        21.37          138
#> coal                  3.1         0.46            8
#> conglomerate          1.8         0.27            4
#> covered              77.9        11.61           76
summary(validated_beds, grain.size=TRUE)
#>                                  
#>  Number of beds:              610
#>  Number of covered intervals   76
#>                                        
#>  Thickness of the section:        671.0
#>  Thickness of covered intervals:   77.9
#> 
#> Summary by lithology: 
#>  
#>                 Thickness  Percent (%)  Number beds
#> sandstone           233.3        34.77          330
#> claystone           211.6        31.53          130
#> siltstone           143.4        21.37          138
#> coal                  3.1         0.46            8
#> conglomerate          1.8         0.27            4
#> covered              77.9        11.61           76
#> 
#> Summary by Grain Size: 
#>  
#>                              Thickness  Percent (%)  Number beds
#> clay                             194.0        28.92          123
#> clay / silt                       43.7         6.51           28
#> silt                              88.6        13.21           89
#> silt / very fine sand             88.3        13.16          101
#> very fine sand                    71.6        10.68          122
#> very fine / fine sand             32.4         4.83           49
#> fine sand                         27.5         4.10           37
#> fine / medium sand                20.3         3.03           18
#> medium sand                        9.2         1.37           11
#> medium / coarse sand               5.6         0.83            8
#> coarse sand                        5.5         0.82           15
#> coarse / very coarse sand          3.7         0.55            3
#> very coarse / granule              1.5         0.22            3
#> granule                            1.1         0.16            3
#> covered                           77.9        11.61           76

Acknowledgments

This project has been sponsored by Carlos Jaramillo (Smithsonian Tropical Research Institute), financial support of this research was provided by COLCIENCIAS (partly funding the master studies of the main author) fundación para la Investigación de la Ciencia y la Tecnológia del Banco de la República, (Colombia), Corporación Geológica ARES (Colombia), and the Smithsonian Tropical Research Institute, the Anders Foundation, 1923 Fund and Gregory D. and Jennifer Walston Johnson.

The Saltarin 1A well dataset for this analysis, was provided by Alejandro Mora of HOCOL S.A.

Bibliography