[R] multicolumn sort on dataframe?
Bambang Suryobroto
suryobroto at ipb.ac.id
Mon Mar 29 08:22:05 CEST 2004
Dear lists;
I'm migrating to and slowly learning R. I want to expand this multicolumn
sorting subject to counting the frequencies of mutiplicate rows.
The motivation is to count the frequencies of individuals with same
haplotypes in a population genetic study. A sample of table (ex.dta) is as
follows:
IDNUM DYS19 DYS388 DYS390 DYS393 DYS394 DYS395
TG002 200 129 203 133 251 119
TG053 200 129 203 133 251 119
TG020 200 129 207 133 251 127
TG066 NA NA NA NA NA NA
TG104 200 129 203 133 251 119
TG018 NA NA 199 133 NA 119
TG060 200 129 203 133 251 119
TG058 NA NA NA 133 NA NA
TG009 200 129 203 133 251 119
TG106 200 129 211 137 251 123
I did like this:
> ex <- read.table( "ex.dta" , header=T, row.names=1 )
> one <- rep( 1,10 )
> aggregate( one , by=ex , sum )
DYS19 DYS388 DYS390 DYS393 DYS394 DYS395 x
1 200 129 203 133 251 119 5
2 200 129 211 137 251 123 1
3 200 129 207 133 251 127 1
and got exactly what I wanted. However, as the table grows larger, the
script takes longer time to complete. For 300x6 table, after about 10
minutes Windows complained low in virtual memory and increased the paging
file while denying request from other applications. Eventually R crashed
leaving Windows crippled.
Did I miss something? Are there any ways other than the two line script
above?
Context:
R 1.8.1 on WinXP Pro
Rgui.exe --max-mem-size=400M
Celeron 1GHz, 256 MB ram, free harddisk space 3.3 GB
All best,
Bambang Suryobroto, D.Sc
Head, Laboratory of Zoology
Department of Biology
Faculty of Mathematics and Natural Sciences
Bogor Agricultural University
Jalan Pajajaran, Bogor 16143
INDONESIA
Tel: +62-251-328391
Fax: +62-251-345011
More information about the R-help
mailing list