[R] read a xls file
Ian Jenkinson
ian.jenkinson at wanadoo.fr
Tue Jan 20 17:21:34 CET 2009
On 2009-01-20 15:24, Gavin Simpson wrote:
> On Tue, 2009-01-20 at 14:45 +0100, Ian Jenkinson wrote:
>
>>> See the R Import/Export manual. Also
>>> RSiteSearch("import excel")
>>> gives many hits. It seems as if this question is
>>> asked almost daily.
>>>
>>> On Sun, Jan 18, 2009 at 9:15 AM, Michele Santacatterina
>>> <miksanta at gmail.com> wrote:
>>>
>>>
>>>>> Hello,
>>>>>
>>>>> i have a xls file. I will read it in r, what library-command i use for
>>>>> this??
>>>>>
>>>>> any ideas??
>>>>>
>> I feel concerned because I have just spent a frustrating couple of days
>> trying to read an Excel (xls) file, with the aid of the R book (Crawley,
>> 2007), and R help files. I failed, but finally found a workaround. My
>> experience might help others.
>>
>
> You did read ?read.table yes?
>
Unfortunately not. Now I see that it is rich and detailed. Thank you.
> There are three arguments there that can help in such situations:
>
> 'colClasses' allows the finest grained control over how R imports your
> text files. You specify what each column is, noting that if you have
> lots of columns, things like
>
> c("numeric", rep("character", 12))
>
> will deal with runs of columns of the same type, without having to type
> them all by hand.
>
I'm a bit mystified by this.
> 'as.is' is a vector of logicals (TRUE/FALSE) that controls whether a
> column is read in as is or converted.
>
I have just tried your suggestion:
>
TTT<-read.table("/D/.../090117T_P.txt",header=TRUE,as.is="Log.microplank.biomass")
I CONFIRM IT WORKS! Sure enough elements in my "Log.microplank.biomass"
column are now "numeric":
> (TTT[5,7])
[1] 1.612784
> class(TTT[5,7])
[1] "numeric"
Indeed, I see that Crawley (2007) gives this example on p.100:
[>] murder<-read.table("c:\\temp\\murders.txt,header=T,as.is="region")
, but until now I hadn't understood what "region" meant, so I didn't see
how to use "as.is". Now I realise "region" is a header name in that
particular data.frame.
Sorry for being such a newby!
> 'stringsAsFactors' a single logical. Should all character variables be
> converted to factors.
>
I hadn't come across this. This would be useful if you wanted variables
as factors, but my problem was that I got "factors" when I wanted "numeric"
> Some of these would have been useful in your case.
>
> I'm not sure what you tried, but I have found that saving an .xls file
> as a CSV via OpenOffice.org (on Linux) and subsequently reading it in
> with read.csv("foo.csv", ...) to be reasonably fool proof, especially
> when one makes use of the arguments about for fine-grained processing.
>
I can appreciate this would probably have worked for me too, had I known
how to do the "fine-grained processing".
> Someone in this thread posted a response that included the use of RODBC,
> which I haven't tried, but there are a plethora of ways to read data
> from Excel without having to torture yourself and the data formats to do
> so.
>
I saw this about RODBC, but it seemed a complicated way of doing things,
and in any case it seems that to run RODBC you need Excel 2004 or
higher, which I would need to buy.
One beautiful thing about R and OOo is that they are excellent and they
cost nothing, so you don't have to buy, borrow or steal them.
> HTH
>
> G
>
>
Ian
>> My data were in an Excel xls file
>>
>> I have R (version 2.6.2) installed in Kubuntu Linux
>> I also have R (version 2.6.2) installed in Windows XP SP_3 running in
>> VirtualBox (a Virtual Computer) in Kubuntu, and I have (very old) Excel
>> 97 on this system.
>>
>> I wasted a lot of time exporting from Excel in various formats (txt,
>> csv, dif, tab-delimited, ;-delimited ,-delimited, etc.). (I checked they
>> were of correct format by peeking with a text editor.)
>> Then I would try reading using e.g. read.table("[file
>> path]",header=TRUE) or read.csv(...) or read.csv2(...), or
>> read.DIF(...), with or without "header=TRUE" or "header =FALSE".
>> I also copied to the "clipboard" and tried reading using
>> read.DIF("clipboard")
>> In many of these cases I did get a data.frame that looked nice on-screen.
>>
>> My recurrent problem, however, was that many of the numeric variables in
>> the resultant data frame were CLASS "factor". If you do arithmetic or
>> plotting on factors, either it fails or gives wrong results.
>>
>> So I spent hours using (as.numeric(...)) with variants and permutations,
>> etc. Most times (as.numeric(...)) seems to work, but actually the data
>> either remained unchanged (as a "factor") or gave "numeric" but wrong
>> numbers.
>>
>> I read the xls file using gnumeric application and saved as a dif file,
>> then used read.DIF("[file path]"). This gave some correct "numeric"
>> numbers but jumbled and partly duplicated.
>>
>> N.B. My problems were essentially the same whether I used R in XP or in
>> Linux (kubuntu)
>>
>> MY SOLUTION (working in Linux):
>> Read the Excel file (xls) using Open Office.org (version 2.4.1)
>> (downloadable for free for Linux or Windows).
>> Save as dif file.
>> In R, TT<-read.DIF("[file path]",header=TRUE)
>> It worked, and all my numerical data elements were "numeric", correct
>> and in the right order. Omit "header=TRUE" if you don't want the first
>> elements of the spreadsheet columns declared as headers.
>>
>> Hope this may help someone.
>>
>> Here's a subset of my data in a data.frame (environmental data on
>> plankton):
>>
>> >TT
>> Stn Day Mean.salinity Mean.temperature Secchi.disc.
>> Log.microplank.biomass
>> 1 1 12 0 14 0.7
>> 1.954242509
>> 2 1 70 13.5 16.55 0.3
>> 3.083860801
>> 3 1 93 13.45 16.85 0.6
>> 2.651278014
>> 4 1 153 6.78 14.2 0.5
>> 2.075546961
>> 5 1 200 0 9.3 0.7
>> 1.612783857
>> 6 1 231 0 7.1 0.8
>> 1.491361694
>> 7 1 283 0 8.8 0.4
>> 2.123851641
>> 8 1 330 4.95 9.45 0.3
>> 2.276461804
>> 9 1 370 16.6 12.3 0.4
>> 2.728353782
>> 10 3 12 16.25 11.95 0.55
>> 2.025305865
>> 11 3 70 22.35 16.1 0.5
>> 2.096910013
>> 12 3 93 26.05 17.15 1.5
>> 1.707570176
>> 13 3 153 23.4 14.2 1
>> 1.755874856
>> 14 3 200 14.05 8.6 0.4
>> 1.812913357
>> 15 3 231 7.9 6.3 0.3
>> 1.897627091
>> 16 3 283 11.2 7.25 0.7
>> 1.832508913
>> 17 3 330 19.95 8.1 0.5
>> 1.785329835
>> 18 3 370 24.35 11.5 0.4
>> 2.361727836
>> 19 4 12 18.1 12.05 0.6
>> 1.792391689
>> 20 4 70 24.35 15.9 0.7
>> 1.973127854
>> 21 4 93 27 17.35 1.3
>> 1.982271233
>> 22 4 153 25.8 14.2 0.8
>> 1.924279286
>> 23 4 200 16.2 9 0.4
>> 1.653212514
>> 24 4 231 11.5 6.85 0.4
>> 1.819543936
>> 25 4 283 10.95 8.2 0.25
>> 2.096910013
>> 26 4 330 19.7 8.45 0.4
>> 2.025305865
>> 27 4 370 25.6 11.5 0.5
>> 2.274157849
>>
>> Ian Jenkinson
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
--
Dr Ian R. Jenkinson
Agence de Conseil et de Recherche Océanographiques
Lavergne
19320 La Roche Canillac
France
+33 555 29 19 48
+33 555 29 19 82 (fax)
+33 608 89 13 62 (mobile)
ian.jenkinson at wanadoo.fr
http://assoc.orange.fr/acro/
More information about the R-help
mailing list