[R] Duplicated genes
arun
smartpink111 at yahoo.com
Mon Sep 9 21:30:27 CEST 2013
Hi,
May be you can try this:
dat1New<- dat1[!(duplicated(dat1$gene)|duplicated(dat1$gene,fromLast=TRUE)),]
dat2<-dat1[duplicated(dat1$gene)|duplicated(dat1$gene,fromLast=TRUE),]
lst1<-split(dat2,dat2$gene)
dat3<-unsplit(lapply(lst1,function(x) {x1<- sum(apply(x[,6:32],2,function(y) y[1]>=y[2]));x2<- sum(apply(x[,6:32],2, function(y) y[1]<=y[2])); if(x1>x2) x[1,] else x[2,] } ),unique(dat2$gene)) #assuming that there are not more than 2 copies of a particular gene. (In the dataset, it was not present)
dat4<-rbind(dat1New,dat3)
dat5<-dat4[order(as.numeric(row.names(dat4))),]
dim(dat5)
#[1] 639 32
A.K.
________________________________
From: Vivek Das <vd4mmind at gmail.com>
To: arun <smartpink111 at yahoo.com>
Sent: Monday, September 9, 2013 2:30 PM
Subject: Re: Duplicated genes
actually these are all differentially expressed genes. So the one with the most differentially expressed will be there in the list and its duplicate will be removed. Can you tell me again? I think then the script will change right?
----------------------------------------------------------
Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy
emails: vivek.das at ieo.eu
vchris_05 at yahoo.co.in
vd4mmind at gmail.com
On Mon, Sep 9, 2013 at 8:27 PM, arun <smartpink111 at yahoo.com> wrote:
Hi,
>Try:
>dat1<- read.table("DEGs_all.txt",sep="",header=TRUE,stringsAsFactors=FALSE)
>dim(dat1)
>#[1] 725 32
>length(unique(dat1$gene))
>#[1] 639
> dat2<-dat1[!duplicated(dat1$gene),]
> dim(dat2)
>#[1] 639 32
>
>dim(unique(dat1))
>#[1] 725 32
>
>The duplicated genes have different expression values. You didn't provide information on how to select those unique genes. Here, the first row of every duplicated gene will be selected and others are removed.
>
>But suppose, you want to get the mean values of those rows.
>library(plyr)
> res<-ddply(dat1[,c(1,6:32)],.(gene), numcolwise(mean,na.rm=TRUE))
>dim(res)
>#[1] 639 28
>
>A.K.
>
>
>
>
>
>
>
>________________________________
>From: Vivek Das <vd4mmind at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Monday, September 9, 2013 1:35 PM
>Subject: Urgent help
>
>
>
>I have a data list with genes , I want to reduce the list to its unique genes. The genes are having expression values but some of the genes are duplicates. Is there any way where I can remove the duplicate names from the list and only have the genes once with their corresponding values.Please see the attached matrix.
>
>It will be nice if you can let me know. Its a bit urgent
>
>----------------------------------------------------------
>
>Vivek Das
>PhD Student in Computational Biology
>Giuseppe Testa's Lab
>European School of Molecular Medicine
>IFOM-IEO Campus
>Via Adamello, 16
>Milan, Italy
>
>emails: vivek.das at ieo.eu
> vchris_05 at yahoo.co.in
> vd4mmind at gmail.com
>
More information about the R-help
mailing list