[Bioc-sig-seq] Reads in 3'utr

Tue Sep 27 04:45:13 CEST 2011

On 09/26/11 09:16, rohan bareja wrote:

Yes, you need to merge the counts for each gene.

see ?split

Split the counts on the gene IDs then sum with lapply. Something like,

     lapply(split(df$counts, df$geneID), sum)

Valerie

> Hi Valerie,
>
> Thanks a lot..It worked finally.. So now I have a data frame for the 
> geneIds ,TranscriptIds and the counts (3'utr) which is given below:
>
>   GENE     TX     countsUTRctl
> [1,] "148398" "1121" "2"
> [2,] "339451" "1118" "0"
> [3,]"84069"  "1116" "0"
> [4,] "84069"  "1119" "11"
> [5,] "9636"   "1126" "11"
> [6,] "375790" "1127" "0"
>
> Now I want to do differential expression of genes using DESeq,so do I 
> have to merge the two same genes and its counts such as geneID 84069 
> (from above ) or i can proceed with the above dataframe?
> If I have to merge them how do I do that?
>
> Thanks,
> Rohan
>
> --- On *Sat, 24/9/11, Valerie Obenchain /<vobencha at fhcrc.org>/* wrote:
>
>
>     From: Valerie Obenchain <vobencha at fhcrc.org>
>     Subject: Re: [Bioc-sig-seq] Reads in 3'utr
>     To: "rohan bareja" <rohan_1925 at yahoo.co.in>
>     Cc: bioc-sig-sequencing at r-project.org
>     Date: Saturday, 24 September, 2011, 4:24 AM
>
>     On 09/23/2011 02:57 PM, rohan bareja wrote:
>>     Hi,
>>
>>     utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>>     So,utr is GRangesList of length 33381
>>     Then as u said,I did the following:
>>
>>     txBygene <- transcriptsBy(txdb, "gene")
>>        geneID <- rep(names(txBygene), elementLengths(txBygene))
>>        df <- data.frame(geneID=geneID,
>>     txID=values(unlist(txBygene))[["tx_id"]])
>>
>>      This gives me a dataframe with 40,780 rows with gene ID and txID
>>     from txBygene object.
>>               geneID  txID
>>     40775   9994 11731
>>     40776   9994 11730
>>     40777   9997 38491
>>     40778   9997 38489
>>     40779   9997 38496
>>     40780   9997 38497
>>
>>     Since my utr object is of length 33,381 ,my counts length is same
>>     i.e 33,381
>>     So I am not able to map the counts to the above data frame which
>>     has transcript and gene IDs.
>>
>
>     Yes, these lengths are different.
>
>     In this example we have utr regions from 58 transcripts.
>
>     > length(utr)
>     [1] 58
>
>
>     Those 58 transcripts can be matched to their gene ID's by looking
>     at the txBygene object. All of the transcripts fall into one (or
>     more) of 51 genes,
>
>     > length(txBygene)
>     [1] 51
>
>     There are multiple transcripts per gene so we expand the gene ID's
>     to map to the transcripts.
>
>     > dim(df)
>     [1] 79  2
>
>     This data.frame has all transcripts from the txdb mapped to the
>     gene ID's. Your utr data may contain only a subset of these
>     transcripts. That is something you need to check.  Match the
>     desired transcript names to the df, pull out the gene IDs. You
>     then have the gene ID's for your utr regions and can split or
>     group your counts by gene.
>
>     Valerie
>>
>>
>>
>>     --- On *Fri, 23/9/11, Valerie Obenchain /<vobencha at fhcrc.org>
>>     </mc/compose?to=vobencha at fhcrc.org>/*wrote:
>>
>>
>>         From: Valerie Obenchain <vobencha at fhcrc.org>
>>         </mc/compose?to=vobencha at fhcrc.org>
>>         Subject: Re: [Bioc-sig-seq] Reads in 3'utr
>>         To: "rohan bareja" <rohan_1925 at yahoo.co.in>
>>         </mc/compose?to=rohan_1925 at yahoo.co.in>
>>         Cc: bioc-sig-sequencing at r-project.org
>>         </mc/compose?to=bioc-sig-sequencing at r-project.org>
>>         Date: Friday, 23 September, 2011, 10:50 PM
>>
>>         Hi Rohan,
>>
>>         You can relate the counts for 3UTR regions to gene IDs
>>         through the transcript IDs.
>>
>>             txdb_file <- system.file("extdata",
>>         "UCSC_knownGene_sample.sqlite", package="GenomicFeatures")
>>             txdb <- loadFeatures(txdb_file)
>>             utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>>
>>
>>         The transcript names can be matched to the gene ID's through,
>>
>>             txBygene <- transcriptsBy(txdb, "gene")
>>             geneID <- rep(names(txBygene), elementLengths(txBygene))
>>             df <- data.frame(geneID=geneID,
>>         txID=values(unlist(txBygene))[["tx_id"]])
>>
>>         Now you know what gene ID each tx count belongs to. You can
>>         split your counts by gene ID ...
>>
>>
>>         Valerie
>>
>>
>>
>>         On 09/20/2011 12:13 PM, rohan bareja wrote:
>>>         Hi everyone,
>>>         I am doing NGS analysis using bam files.I have counted reads in 3'utr region using 
>>>         utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>>>         countsUTR<- countOverlaps(utr,reads)
>>>         I have got the transcript level counts from this.How can I get the gene level counts??It might sound silly but Does anybody have an idea on what type of anaylses we can do from this countsUTR ?
>>>         Thanks,Rohan
>>>         	[[alternative HTML version deleted]]
>>>
>>>
>>>
>>>         _______________________________________________
>>>         Bioc-sig-sequencing mailing list
>>>         Bioc-sig-sequencing at r-project.org
>>>         https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>