penguins {datasets}R Documentation

Measurements of Penguins near Palmer Station, Antarctica

Description

Data on adult penguins covering three species found on three islands in the Palmer Archipelago, Antarctica, including their size (flipper length, body mass, bill dimensions), and sex.

The columns of penguins are a subset of the more extensive penguins_raw data frame, which includes nesting observations and blood isotope data. There are differences in the column names and data types. See the ‘Format’ section for details.

Usage

penguins
penguins_raw

Format

penguins is a data frame with 344 rows and 8 variables:

species

factor, with levels Adelie, Chinstrap, and Gentoo

island

factor, with levels Biscoe, Dream, and Torgersen)

bill_len

numeric, bill length (millimeters)

bill_dep

numeric, bill depth (millimeters)

flipper_len

integer, flipper length (millimeters)

body_mass

integer, body mass (grams)

sex

factor, with levels female and male

year

integer, study year: 2007, 2008, or 2009

penguins_raw is a data frame with 344 rows and 17 variables. 8 columns correspond to columns in penguins, though with different variable names and/or classes:

Species

character

Island

character

Culmen Length (mm)

numeric, bill length

Culmen Depth (mm)

numeric, bill depth

Flipper Length (mm)

numeric, flipper length

Body Mass (g)

numeric, body mass

Sex

character

Date Egg

Date, when study nest observed with 1 egg. The year component is the year column in penguins

There are 9 further columns in penguins_raw:

studyName

character, expedition during which the data was collected

Sample Number

numeric, continuous numbering sequence for each sample

Region

character, the region of Palmer LTER sampling grid

Stage

character, denoting reproductive stage at sampling

Individual ID

character, unique ID for each individual in dataset

Clutch Completion

character, if the study nest was observed with a full clutch, i.e., 2 eggs

Delta 15 N (o/oo)

numeric, the ratio of stable isotopes 15N:14N

Delta 13 C (o/oo)

numeric, the ratio of stable isotopes 13C:12C

Comments

character, additional relevant information

Details

Gorman et al. (2014) used the data to study sex dimorphism separately for the three species.

Horst et al. (2022) popularized the data as an illustration for different statistical methods, as an alternative to the iris data.

Kaye et al. (2025) provide the scripts used to create these data sets from the original source data, and a notebook reproducing results from Gorman et al. (2014).

Note

These data sets are also available in the palmerpenguins package. See the package website for further details and resources.

The penguins data has some shorter variable names than the palmerpenguins version, for compact code and data display.

Source

Adélie penguins:

Palmer Station Antarctica LTER and K. Gorman (2020). Structural size measurements and isotopic signatures of foraging among adult male and female Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative, doi:10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f.

Gentoo penguins:

Palmer Station Antarctica LTER and K. Gorman (2020). doi:10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689.

Chinstrap penguins:

Palmer Station Antarctica LTER and K. Gorman (2020). doi:10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e.

The title naming convention for the source for the Gentoo and Chinstrap data is that same as for Adélie penguins.

References

Gorman, K. B., Williams, T. D. and Fraser, W. R. (2014) Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis). PLoS ONE 9, 3, e90081; doi:10.1371/journal.pone.0090081.

Horst, A. M., Hill, A. P. and Gorman, K. B. (2022) Palmer Archipelago Penguins Data in the palmerpenguins R Package - An Alternative to Anderson's Irises. R Journal 14, 1; doi:10.32614/RJ-2022-020.

Kaye, E., Turner, H., Gorman, K. B., Horst, A. M. and Hill, A. P. (2025) Preparing the Palmer Penguins Data for the datasets Package in R. doi:10.5281/zenodo.14902740.

Examples

## view summaries
summary(penguins)
summary(penguins_raw) # not useful for character vectors
## convert character vectors to factors first
dFactor <- function(dat) {
  dat[] <- lapply(dat, \(.) if (is.character(.)) as.factor(.) else .)
  dat
}
summary(dFactor(penguins_raw))

## visualise distribution across factors
plot(island ~ species, data = penguins)
plot(sex ~ interaction(island, species, sep = "\n"), data = penguins)

## bill depth vs. length by species (color) and sex (symbol):
## positive correlations for all species, males tend to have bigger bills
sym <- c(1, 16)
pal <- c("darkorange","purple","cyan4")
plot(bill_dep ~ bill_len, data = penguins, pch = sym[sex], col = pal[species])

## simplified sex dimorphism analysis for Adelie species:
## proportion of males increases with several size measurements
adelie <- subset(penguins, species == "Adelie")
plot(sex ~ bill_len, data = adelie)
plot(sex ~ bill_dep, data = adelie)
plot(sex ~ body_mass, data = adelie)
m <- glm(sex ~ bill_len + bill_dep + body_mass, data = adelie, family = binomial)
summary(m)

## Produce the long variable names as from {palmerpenguins} pkg:
long_nms <- sub("len", "length_mm",
                sub("dep","depth_mm",
                    sub("mass", "mass_g", colnames(penguins))))
## compare long and short names:
noquote(rbind(long_nms, nms = colnames(penguins)))

## Not run:  # << keeping shorter 'penguins' names in this example:
    colnames(penguins) <- long_nms

## End(Not run)

[Package datasets version 4.5.0 Index]