[BioC] Getting Introns Expression at a Per Gene Level

Wed Sep 11 19:24:45 CEST 2013

Hi,

Bringing the conversation back to the list.

There should be no need to lapply. psetdiff(x, y) is vectorized. It 
computes element-wise (parallel) asymmetric differences between 'x' and 
'y'.'x' should be all gene ranges and 'y' a GRangesList (same length as 
'x') containing the components of each gene. The result will be a 
GRangesList the same length as the number of genes.

Valerie

> On 09/10/2013 07:34 PM, Carl Baribault wrote:> Valerie,
>>
>> Thanks for your input.  FYI, my bed file has only 1 preferred isoform
>> per gene (subset from refSeq 05/01/2012 if I recall).  I already have
>> the import working, thank you.  The following is just one element of
>> what I want to obtain.  I just need to lapply/vectorize the right way.
>> Your thoughts?
>>
>> Best,
>> Carl
>>  > psetdiff(range(ref1), blocks(ref1))
>> GRangesList of length 1:
>> $1
>> GRanges with 2 ranges and 0 metadata columns:
>>        seqnames         ranges strand
>>           <Rle>      <IRanges>  <Rle>
>>    [1]     chr1 [12228, 12612]      +
>>    [2]     chr1 [12722, 13220]      +
>>
>> ---
>> seqlengths:
>>    chr1  chr2  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chrX  chrY
>> chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21
>> chr22
>>      NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
>> NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
>>  > blocks(ref1)
>> GRangesList of length 1:
>> $1
>> GRanges with 3 ranges and 0 metadata columns:
>>        seqnames         ranges strand
>>           <Rle>      <IRanges>  <Rle>
>>    [1]     chr1 [11874, 12227]      +
>>    [2]     chr1 [12613, 12721]      +
>>    [3]     chr1 [13221, 14408]      +
>>
>> ---
>> seqlengths:
>>    chr1  chr2  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chrX  chrY
>> chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21
>> chr22
>>      NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
>> NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
>>
>>

 > On 09/10/2013 09:31 AM, Valerie Obenchain wrote:
> Hi Carl,
>
> You can use import() from rtracklayer to read a bed in as a GRanges.
>
>      gr <- import('myfile.bed', asRangedData=FALSE)
>
> I'm not sure what you've got in your file but let's say they are gene
> isoforms. Presumably there is an identifier in the file that would let
> you group the ranges by gene (or whatever grouping you are after). This
> will likely end up as one of the metadata columns in the GRanges after
> import. Create a GRangesList by grouping the GRanges by gene.
>
>      grl <- split(gr, bySomeFactor)
>
> The introns are the gaps between the ranges in each list element of the
> GRangesList. To get at these we want the difference between the full
> range of the gene and the multiple elements (exons or transcripts etc.)
> of the gene.
>
> Create the gene ranges:
>
>      geneRanges <- range(grl)
>
> Extract the differences:
>
>      introns <- psetdiff(geneRanges, grl)
>
>
> If this doesn't help, I'll need to know more detail about the data in
> the isoform file.
>
> Valerie
>
>
> On 09/09/2013 07:31 PM, Carl Baribault wrote:
>> Dear Valerie,
>> I have a bed file of specific isoforms of interest.  Can you please
>> suggest
>> a best approach for obtaining the intron extents?  Thanks.
>>
>> Best,
>> Carl Baribault
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor