[Bioc-devel] Encoding issues with citations (Windows only)

Leonardo Collado Torres lcollado at jhu.edu
Tue Oct 3 01:43:46 CEST 2017


I've been using knitcitations for a while to handle citations in HTML
vignettes. I had been using knitcitations::read.bibtex() until I
realized that it no longer reads the entries in the order that were
given in the bib file**. So I made a change and it all works... except
on Windows. I finally updated my R installation in a Windows laptop
and saw that the problem is with encoding.

This short code reproduces the issue:

## Load package

## Tries to cite, prints package name and error when it fails
check_bib <- function() {
xx <- sapply(bib, function(x) {
tryCatch(citep(x), error = function(e) {
message(paste('found an error attempting to cite', names(x)))

## list of citations
bib <- c(knitcitations = citation('knitcitations'),
    IRanges = citation('IRanges'),
    S4Vectors = citation('S4Vectors'))

## Error message:
Error in nchar(aut) : invalid multibyte string, element 1

## Entry that fails
> bib[['S4Vectors']]
Pag<U+653C><U+3E38>s H, Lawrence M and Aboyoun P (2017). _S4Vectors:
S4 implementation of vector-like and list-like objects_. R package
version 0.15.10.

I see that knitcitations::write.bibtex() uses a "?" in authors in
situations like this which is why I didn't notice this issue before.
>From https://cran.r-project.org/doc/manuals/R-exts.html#The-DESCRIPTION-file
I see that 'Encoding' in the DESCRIPTION file is used for the citation
and I do see "Encoding: UTF-8" in the S4Vectors DESCRIPTION file.

I get this error with GenomeInfoDb, AnnotationDbi, S4Vectors and
SummarizedExperiment (details and reproducibility info at
across the different vignettes I maintain. But I don't get it with
IRanges, GenomicRanges and other packages where Hervé Pagès is an
author (those packages cite the 2013 PLoS paper). For example, the
IRanges package has a inst/CITATION file that uses citEntry(    ,
textVersion = "Pag\\es"). So, specifying an inst/CITATION file works.

> citep(bib[['IRanges']])
[1] "(Lawrence, Huber, Pagès, et al., 2013)"

I imagine that there is a way to deal with the encoding problem
properly but I haven't been able to find it. If you have ideas on how
I can fix this please let me know.


PS I posted this information at
https://github.com/cboettig/knitcitations/issues/103 as well.

** As you can see below read.bibtex() changes the order of the
citations, so I can't cite them later using citep().

> write.bibtex(bib, file = 'test.bib')
Writing 3 Bibtex entries ... OK
Results written to file 'test.bib'

## test.bib contents
  title = {knitcitations: Citations for 'Knitr' Markdown Files},
  author = {Carl Boettiger},
  year = {2017},
  note = {R package version 1.0.8},
  url = {https://CRAN.R-project.org/package=knitcitations},

  title = {Software for Computing and Annotating Genomic Ranges},
  author = {Michael Lawrence and Wolfgang Huber and Herv\'e Pag\`es
and Patrick Aboyoun and Marc Carlson and Robert Gentleman and Martin
Morgan and Vincent Carey},
  year = {2013},
  journal = {{PLoS} Computational Biology},
  volume = {9},
  issue = {8},
  doi = {10.1371/journal.pcbi.1003118},
  url = {http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118},

  title = {S4Vectors: S4 implementation of vector-like and list-like objects},
  author = {?},
  year = {2017},
  note = {R package version 0.15.10},

## read.bibtex() changes the order

> read.bibtex('test.bib')
[1] ? _S4Vectors: S4 implementation of vector-like and list-like
objects_. R package version 0.15.10. 2017.

[2] C. Boettiger. _knitcitations: Citations for 'Knitr' Markdown
Files_. R package version 1.0.8. 2017. <URL:

[3] M. Lawrence, W. Huber, H. Pagès, et al. “Software for Computing
and Annotating Genomic Ranges”. In: _PLoS Computational Biology_ 9 (8
2013). DOI:
10.1371/journal.pcbi.1003118. <URL:

More information about the Bioc-devel mailing list