[BioC] Selecting Unique rows in multiple column data frames
Jenny Drnevich
drnevich at uiuc.edu
Mon Nov 6 16:48:47 CET 2006
Hi Matjaz,
For option 2, if your data frame is called 'mydata' just do:
mydata.unique <- mydata[ !duplicated(mydata$ID), ]
This is will pull out the first instance of each ID, along with the M value.
Cheers,
Jenny
At 04:32 AM 11/6/2006, alex lam \(RI\) wrote:
>Hi Matjaz,
>For option 1, have a look at the help page of the method "aggregate".
>
>I don't understand your option 2. Perhaps I am misreading what your are
>saying.
>If you want to select unique rows according to column 1 and 2, you can
>create a third column by joining col1 and 2
>
>Col3 <- paste(ID, M, sep="_")
>Index <- unique(Col3)
>YourData[Index,]
>
>But I can't see that any replicates would be having identical M values.
>
>Cheers,
>Alex
>
>------------------------------------
>Alex Lam
>PhD student
>Department of Genetics and Genomics
>Roslin Institute (Edinburgh)
>Roslin
>Midlothian EH25 9PS
>Great Britain
>
>Phone +44 131 5274471
>Web http://www.roslin.ac.uk
>
>
>-----Original Message-----
>From: bioconductor-bounces at stat.math.ethz.ch
>[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Matja¾ Hren
>Sent: 06 November 2006 09:06
>To: Bioconductor
>Subject: [BioC] Selecting Unique rows in multiple column data frames
>
>Dear list!
>
>
>
>I have data frames with 2 columns of normalised microarray data (more that
>10k rows, custom-made array) with the following layout (not real data):
>
>
>
>ID M
>
>ID1 -4.60138
>
>ID2 -3.28832
>
>ID3 4.83560
>
>ID4 6.45286
>
>ID4 6.65235
>
>ID4 6.38745
>
>ID4 6.74514
>
>ID5 4.43995
>
>ID6 -1.78943
>
>ID7 -4.00257
>
>ID8 -4.46327
>
>ID9 -3.13956
>
>ID10 2.52233
>
>ID11 -1.81214
>
>ID11 -1.78625
>
>ID11 -1.61214
>
>ID11 -1.52354
>
>
>
>ID is the oligo ID (spot-ID), M is the corresponding M-value.
>
>
>
>Only one spot per block is present in replicates (4). Therefore I would
>like to use one of the following 2 options:
>
>
>
>1. Average the M-values in rows that have the same ID and extract the data
>table with both columns.
>
>2. or if the first option does not work: Extract the rows with unique ID
>(both columns) and remove the replicates. I tried using "unique" on ID
>column but I couldn't extend its use to more than one column in the data frame.
>
>
>
>I used R 2.4.0 and limma package for normalisation.
>
>
>
>
>
>Thank you in advance,
>
>
>
>
>
>Matjaz
>
>
>
>----------------------------------------------------------------------------
>
>Matjaz Hren
>
>
>
>National Institute of Biology
>
>Department of Plant Physiology and Biotechnology
>
>SLOVENIA
>
>----------------------------------------------------------------------------
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list