[R] File coding problem: how to read a windows-1252 encoded file
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue May 13 16:03:23 CEST 2014
On 13/05/2014 14:35, Bob O'Hara wrote:
> I'm trying to read a text file (actually the ftp file in command below),
> and I'm getting an error:
>
>> SpCodes=read.fwf("
> ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt",
> + widths=c(7,6,51,51), skip=6, n=5, header=F,
> stringsAsFactors=F)
> Error in substring(x, first, last) :
> invalid multibyte string at '<e0> vent'
>
> The problem is caused by"Dendrocygne à ventre noir", which has a French
> character which seems to be causing the problems: there are more throughout
> the file (and I want to read the whole file: I'm picking uot bits above to
> make it easier), so I can't manually delete this. The file is apparently in
> the ISO-8859 format (or it might be windows-1252), but using that in either
> encoding= or fileEncoding= doesn't work:
Why do you expect them to? read.fwf reads the file (not read.table) and
it does not have those arguments. You need to give a file/url
connection with specified encoding.
> con <-
url("ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt",
encoding = "cp1252")
> read.fwf(con, widths=c(7,6,51,51), skip=6, n=5, header=F)
> close(con)
>
> SpCodes=read.fwf("
> ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt",
> widths=c(7,6,51,51), skip=6, n=5, header=F,
> stringsAsFactors=F, fileEncoding="ISO-8859")
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list