penguins {datasets} | R Documentation |
Measurements of Penguins near Palmer Station, Antarctica
Description
Data on adult penguins covering three species found on three islands in the Palmer Archipelago, Antarctica, including their size (flipper length, body mass, bill dimensions), and sex.
The columns of penguins
are a subset of the more extensive
penguins_raw
data frame, which includes nesting observations
and blood isotope data. There are differences in the column names
and data types. See the ‘Format’ section for details.
Usage
penguins
penguins_raw
Format
penguins
is a data frame with 344 rows and 8 variables:
species
factor
, with levelsAdelie
,Chinstrap
, andGentoo
island
factor
, with levelsBiscoe
,Dream
, andTorgersen
)bill_len
numeric
, bill length (millimeters)bill_dep
numeric
, bill depth (millimeters)flipper_len
integer
, flipper length (millimeters)body_mass
integer
, body mass (grams)sex
factor
, with levelsfemale
andmale
year
integer
, study year: 2007, 2008, or 2009
penguins_raw
is a data frame with 344 rows and 17 variables.
8 columns correspond to columns in penguins
,
though with different variable names and/or classes:
Species
character
Island
character
Culmen Length (mm)
numeric
, bill lengthCulmen Depth (mm)
numeric
, bill depthFlipper Length (mm)
numeric
, flipper lengthBody Mass (g)
numeric
, body massSex
character
Date Egg
Date
, when study nest observed with 1 egg. The year component is theyear
column inpenguins
There are 9 further columns in penguins_raw
:
studyName
character
, expedition during which the data was collectedSample Number
numeric
, continuous numbering sequence for each sampleRegion
character
, the region of Palmer LTER sampling gridStage
character
, denoting reproductive stage at samplingIndividual ID
character
, unique ID for each individual in datasetClutch Completion
character
, if the study nest was observed with a full clutch, i.e., 2 eggsDelta 15 N (o/oo)
numeric
, the ratio of stable isotopes 15N:14NDelta 13 C (o/oo)
numeric
, the ratio of stable isotopes 13C:12CComments
character
, additional relevant information
Details
Gorman et al. (2014) used the data to study sex dimorphism separately for the three species.
Horst et al. (2022) popularized the data as an illustration
for different statistical methods, as an alternative to the iris
data.
Kaye et al. (2025) provide the scripts used to create these data sets from the original source data, and a notebook reproducing results from Gorman et al. (2014).
Note
These data sets are also available in the palmerpenguins package. See the package website for further details and resources.
The penguins
data has some shorter variable names than the palmerpenguins version,
for compact code and data display.
Source
- Adélie penguins:
Palmer Station Antarctica LTER and K. Gorman (2020). Structural size measurements and isotopic signatures of foraging among adult male and female Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative, doi:10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f.
- Gentoo penguins:
Palmer Station Antarctica LTER and K. Gorman (2020). doi:10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689.
- Chinstrap penguins:
Palmer Station Antarctica LTER and K. Gorman (2020). doi:10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e.
The title naming convention for the source for the Gentoo and Chinstrap data is that same as for Adélie penguins.
References
Gorman, K. B., Williams, T. D. and Fraser, W. R. (2014) Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis). PLoS ONE 9, 3, e90081; doi:10.1371/journal.pone.0090081.
Horst, A. M., Hill, A. P. and Gorman, K. B. (2022) Palmer Archipelago Penguins Data in the palmerpenguins R Package - An Alternative to Anderson's Irises. R Journal 14, 1; doi:10.32614/RJ-2022-020.
Kaye, E., Turner, H., Gorman, K. B., Horst, A. M. and Hill, A. P. (2025) Preparing the Palmer Penguins Data for the datasets Package in R. doi:10.5281/zenodo.14902740.
Examples
## view summaries
summary(penguins)
summary(penguins_raw) # not useful for character vectors
## convert character vectors to factors first
dFactor <- function(dat) {
dat[] <- lapply(dat, \(.) if (is.character(.)) as.factor(.) else .)
dat
}
summary(dFactor(penguins_raw))
## visualise distribution across factors
plot(island ~ species, data = penguins)
plot(sex ~ interaction(island, species, sep = "\n"), data = penguins)
## bill depth vs. length by species (color) and sex (symbol):
## positive correlations for all species, males tend to have bigger bills
sym <- c(1, 16)
pal <- c("darkorange","purple","cyan4")
plot(bill_dep ~ bill_len, data = penguins, pch = sym[sex], col = pal[species])
## simplified sex dimorphism analysis for Adelie species:
## proportion of males increases with several size measurements
adelie <- subset(penguins, species == "Adelie")
plot(sex ~ bill_len, data = adelie)
plot(sex ~ bill_dep, data = adelie)
plot(sex ~ body_mass, data = adelie)
m <- glm(sex ~ bill_len + bill_dep + body_mass, data = adelie, family = binomial)
summary(m)
## Produce the long variable names as from {palmerpenguins} pkg:
long_nms <- sub("len", "length_mm",
sub("dep","depth_mm",
sub("mass", "mass_g", colnames(penguins))))
## compare long and short names:
noquote(rbind(long_nms, nms = colnames(penguins)))
## Not run: # << keeping shorter 'penguins' names in this example:
colnames(penguins) <- long_nms
## End(Not run)