[R] Reading in a table with ISO-latin1 encoding in MacOS-X (Intel)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Jun 8 16:17:23 CEST 2006
You are using this as intended, although your email message came in latin9
not latin1, which does not affect your examples. Have you actually
checked (e.g. via a hex dump) that the file is in latin1?
I assume that if you converted the file to UTF-8 you then used
read.table(R_data/hs+sfnet.T.060505.tbl4", header=TRUE)
If so, you need to investigate the locale in use, as which letters are
valid depends on the locale: on Linux UTF-8 locales all letters in all
languages are valid in R names, but that is not necessarily the MacOS
interpretation. (Invalid characters in names will be converted to ., and
if the locale is wrong so may be the interpretation of bytes as
characters.)
You might find more informed help on the r-sig-mac list.
On Thu, 8 Jun 2006, Antti Arppe wrote:
> Dear colleages in R,
>
> I have earlier been working with R in Linux, where reading in a table
> containing Scandinavian letters ("ä", "ö", and "å") in the header as part of
> variable names has not caused any problem whatsoever.
>
> However, when trying to do the same in R running on new MacOS-X (with an
> Intel processor) with the same original text table does not seem to work
> whichever way I try. Following the recommendations on the R site and using
> the 'file' function to set the encoding breaks down at the first encounter
> with a Scandinavian character:
>
> THINK <- read.table(file("R_data/hs+sfnet.T.060505.tbl4",
> encoding="latin1"),header=TRUE)
> Warning messages:
> 1: invalid input found on input connection 'R_data/hs+sfnet.T.060505.tbl4'
> 2: incomplete final line found by readTableHeader on
> 'R_data/hs+sfnet.T.060505.tbl4'
>
> A sample exemplifying such characters as variable labels is below (for which
> the behavior of R in Mac is the same as for the larger file referred to
> above):.
>
> ajatella miettiä pohtia
> 1 FALSE FALSE TRUE
> 2 FALSE FALSE FALSE
> 3 FALSE TRUE FALSE
> 4 FALSE TRUE FALSE
> 5 TRUE FALSE FALSE
> 6 TRUE FALSE FALSE
> 7 FALSE FALSE FALSE
> 8 FALSE TRUE FALSE
> 9 FALSE TRUE FALSE
> 10 FALSE FALSE FALSE
>
> Converting the the file from ISO-latin-1 to UTF8 (with Mac's TextEdit
> application)allows the file to be read in in its entirety, but still the
> Scandinavian character in the heading is coerced to a period '.', or two, in
> fact (i.e. 'miettiä' -> 'miett..')
>
> Have I possibly misunderstood how the 'file' function should be used in
> conjunction with 'read.table', or might the problem with latin1-to-utf
> conversion be somewhere else?
>
> Appreciating any help on this matter,
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list