[R] File coding problem: how to read a windows-1252 encoded file

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue May 13 16:03:23 CEST 2014


On 13/05/2014 14:35, Bob O'Hara wrote:
> I'm trying to read a text file (actually the ftp file in command below),
> and I'm getting an error:
>
>> SpCodes=read.fwf("
> ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt",
> +                  widths=c(7,6,51,51), skip=6, n=5, header=F,
> stringsAsFactors=F)
> Error in substring(x, first, last) :
>    invalid multibyte string at '<e0> vent'
>
> The problem is caused by"Dendrocygne à ventre noir", which has a French
> character which seems to be causing the problems: there are more throughout
> the file (and I want to read the whole file: I'm picking uot bits above to
> make it easier), so I can't manually delete this. The file is apparently in
> the ISO-8859 format (or it might be windows-1252), but using that in either
> encoding= or fileEncoding= doesn't work:

Why do you expect them to?  read.fwf reads the file (not read.table) and 
it does not have those arguments.  You need to give a file/url 
connection with specified encoding.

 > con <- 
url("ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt", 
encoding = "cp1252")
 > read.fwf(con, widths=c(7,6,51,51), skip=6, n=5, header=F)
 > close(con)

>
> SpCodes=read.fwf("
> ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt",
>                   widths=c(7,6,51,51), skip=6, n=5, header=F,
> stringsAsFactors=F, fileEncoding="ISO-8859")

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list