[R] Duplicated genes

Mon Sep 9 21:30:27 CEST 2013

Hi,

May be you can try this:
dat1New<-  dat1[!(duplicated(dat1$gene)|duplicated(dat1$gene,fromLast=TRUE)),]
dat2<-dat1[duplicated(dat1$gene)|duplicated(dat1$gene,fromLast=TRUE),]
 lst1<-split(dat2,dat2$gene)
dat3<-unsplit(lapply(lst1,function(x) {x1<- sum(apply(x[,6:32],2,function(y) y[1]>=y[2]));x2<- sum(apply(x[,6:32],2, function(y) y[1]<=y[2])); if(x1>x2) x[1,] else x[2,] } ),unique(dat2$gene)) #assuming that there are not more than 2 copies of a particular gene. (In the dataset, it was not present)
 dat4<-rbind(dat1New,dat3)
dat5<-dat4[order(as.numeric(row.names(dat4))),]
 dim(dat5)
#[1] 639  32

A.K.

________________________________
From: Vivek Das <vd4mmind at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Monday, September 9, 2013 2:30 PM
Subject: Re: Duplicated genes

actually these are all differentially expressed genes. So the one with the most differentially expressed will be there in the list and its duplicate will be removed. Can you tell me again? I think then the script will change right?

----------------------------------------------------------

Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy

emails: vivek.das at ieo.eu
            vchris_05 at yahoo.co.in
            vd4mmind at gmail.com

On Mon, Sep 9, 2013 at 8:27 PM, arun <smartpink111 at yahoo.com> wrote:

Hi,
>Try:
>dat1<- read.table("DEGs_all.txt",sep="",header=TRUE,stringsAsFactors=FALSE)
>dim(dat1)
>#[1] 725  32
>length(unique(dat1$gene))
>#[1] 639
> dat2<-dat1[!duplicated(dat1$gene),]
> dim(dat2)
>#[1] 639  32
>
>dim(unique(dat1))
>#[1] 725  32
>
>The duplicated genes have different expression values.  You didn't provide information on how to select those unique genes.  Here, the first row of every duplicated gene will be selected and others are removed.
>
>But suppose, you want to get the mean values of those rows.
>library(plyr)
> res<-ddply(dat1[,c(1,6:32)],.(gene), numcolwise(mean,na.rm=TRUE))
>dim(res)
>#[1] 639  28
>
>A.K.
>
>
>
>
>
>
>
>________________________________
>From: Vivek Das <vd4mmind at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Monday, September 9, 2013 1:35 PM
>Subject: Urgent help
>
>
>
>I have a data list with genes , I want to reduce the list to its unique genes. The genes are having expression values but some of the genes are duplicates. Is there any way where I can remove the duplicate names from the list and only have the genes once with their corresponding values.Please see the attached matrix.
>
>It will be nice if you can let me know. Its a bit urgent
>
>----------------------------------------------------------
>
>Vivek Das
>PhD Student in Computational Biology
>Giuseppe Testa's Lab
>European School of Molecular Medicine
>IFOM-IEO Campus
>Via Adamello, 16
>Milan, Italy
>
>emails: vivek.das at ieo.eu
>            vchris_05 at yahoo.co.in
>            vd4mmind at gmail.com
>