[Bioc-devel] Encoding issues with citations (Windows only)
Leonardo Collado Torres
lcollado at jhu.edu
Tue Oct 3 01:43:46 CEST 2017
Hi,
I've been using knitcitations for a while to handle citations in HTML
vignettes. I had been using knitcitations::read.bibtex() until I
realized that it no longer reads the entries in the order that were
given in the bib file**. So I made a change and it all works... except
on Windows. I finally updated my R installation in a Windows laptop
and saw that the problem is with encoding.
This short code reproduces the issue:
## Load package
library('knitcitations')
## Tries to cite, prints package name and error when it fails
check_bib <- function() {
xx <- sapply(bib, function(x) {
tryCatch(citep(x), error = function(e) {
message(paste('found an error attempting to cite', names(x)))
print(e)
})
})
}
## list of citations
bib <- c(knitcitations = citation('knitcitations'),
IRanges = citation('IRanges'),
S4Vectors = citation('S4Vectors'))
check_bib()
citep(bib[['S4Vectors']])
## Error message:
Error in nchar(aut) : invalid multibyte string, element 1
## Entry that fails
> bib[['S4Vectors']]
Pag<U+653C><U+3E38>s H, Lawrence M and Aboyoun P (2017). _S4Vectors:
S4 implementation of vector-like and list-like objects_. R package
version 0.15.10.
I see that knitcitations::write.bibtex() uses a "?" in authors in
situations like this which is why I didn't notice this issue before.
>From https://cran.r-project.org/doc/manuals/R-exts.html#The-DESCRIPTION-file
I see that 'Encoding' in the DESCRIPTION file is used for the citation
and I do see "Encoding: UTF-8" in the S4Vectors DESCRIPTION file.
I get this error with GenomeInfoDb, AnnotationDbi, S4Vectors and
SummarizedExperiment (details and reproducibility info at
https://gist.github.com/anonymous/a8c6374b381dc9c27f55487756cb4e1b)
across the different vignettes I maintain. But I don't get it with
IRanges, GenomicRanges and other packages where Hervé Pagès is an
author (those packages cite the 2013 PLoS paper). For example, the
IRanges package has a inst/CITATION file that uses citEntry( ,
textVersion = "Pag\\es"). So, specifying an inst/CITATION file works.
> citep(bib[['IRanges']])
[1] "(Lawrence, Huber, Pagès, et al., 2013)"
I imagine that there is a way to deal with the encoding problem
properly but I haven't been able to find it. If you have ideas on how
I can fix this please let me know.
Thanks!
Leo
PS I posted this information at
https://github.com/cboettig/knitcitations/issues/103 as well.
** As you can see below read.bibtex() changes the order of the
citations, so I can't cite them later using citep().
> write.bibtex(bib, file = 'test.bib')
Writing 3 Bibtex entries ... OK
Results written to file 'test.bib'
## test.bib contents
@Manual{boettiger2017knitcitations,
title = {knitcitations: Citations for 'Knitr' Markdown Files},
author = {Carl Boettiger},
year = {2017},
note = {R package version 1.0.8},
url = {https://CRAN.R-project.org/package=knitcitations},
}
@Article{lawrence2013software,
title = {Software for Computing and Annotating Genomic Ranges},
author = {Michael Lawrence and Wolfgang Huber and Herv\'e Pag\`es
and Patrick Aboyoun and Marc Carlson and Robert Gentleman and Martin
Morgan and Vincent Carey},
year = {2013},
journal = {{PLoS} Computational Biology},
volume = {9},
issue = {8},
doi = {10.1371/journal.pcbi.1003118},
url = {http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118},
}
@Manual{pags2017s4vectors,
title = {S4Vectors: S4 implementation of vector-like and list-like objects},
author = {?},
year = {2017},
note = {R package version 0.15.10},
}
## read.bibtex() changes the order
> read.bibtex('test.bib')
[1] ? _S4Vectors: S4 implementation of vector-like and list-like
objects_. R package version 0.15.10. 2017.
[2] C. Boettiger. _knitcitations: Citations for 'Knitr' Markdown
Files_. R package version 1.0.8. 2017. <URL:
https://CRAN.R-project.org/package=knitcitations>.
[3] M. Lawrence, W. Huber, H. Pagès, et al. “Software for Computing
and Annotating Genomic Ranges”. In: _PLoS Computational Biology_ 9 (8
2013). DOI:
10.1371/journal.pcbi.1003118. <URL:
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118}.>
More information about the Bioc-devel
mailing list