[Bioc-sig-seq] Minimal short sequences position/orientation container
Ivan Gregoretti
ivangreg at gmail.com
Thu Sep 24 20:40:13 CEST 2009
Hi Patrick,
Great. It works.
Can you clarify if the following observation is a feature or a bug?
When I download
http://dl.getdropbox.com/u/2051155/myTags.bed
and from the unix prompt I take a peek at it, I get:
head myTags.bed
chr1 3002444 3002479 +
chr1 3002989 3003024 -
chr1 3017603 3017638 +
chr1 3017879 3017914 -
chr1 3018173 3018208 +
chr1 3018183 3018218 -
chr1 3018183 3018218 -
chr1 3019065 3019100 +
chr1 3019761 3019796 -
chr1 3020044 3020079 -
fine. It shows the 36 bases long reads.
Now I follow your suggestion loading it into R:
suppressMessages(library(rtracklayer))
myTags <- import('myTags.bed')
ranges(myTags["chr1"])[[1]]
IRanges instance:
start end width
[1] 3002445 3002479 35
[2] 3002990 3003024 35
[3] 3017604 3017638 35
[4] 3017880 3017914 35
[5] 3018174 3018208 35
[6] 3018184 3018218 35
[7] 3018184 3018218 35
[8] 3019066 3019100 35
[9] 3019762 3019796 35
... ... ... ...
[322808] 197166880 197166914 35
[322809] 197167672 197167706 35
[322810] 197167851 197167885 35
[322811] 197185820 197185854 35
[322812] 197185850 197185884 35
[322813] 197188518 197188552 35
[322814] 197189251 197189285 35
[322815] 197189593 197189627 35
[322816] 197191697 197191731 35
So, all start positions are shown as starting one nucleotide upstream
from the original record and the features are reported as being 35
bases long instead of 36.
Is it feature or bug?
Thank you
Ivan
Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878
On Thu, Sep 24, 2009 at 2:51 AM, Patrick Aboyoun <paboyoun at fhcrc.org> wrote:
> Ivan,
> The RangedData class can store strand information in its values table. The
> values table can store any "vector-like" object from simple R vectors
> (including lists) to an instance of any of the *List classes defined in
> IRanges. If you use rtracklayer's import function on a bed file containing
> the information you have shown, the chromosome information will be used to
> segment the other values into spaces, the start and end values will be
> joined together in the ranges information (as a CompressedIRangesList
> object) and the strand information will be stored as a factor column across
> the values set (which is a CompressedDataFrameList object). The strand
> information can be accessed by the strand accessor function. If your data
> are sorted by strand within chromosome, you could add another level of
> compression by storing the strand information as a 'factor' Rle in the
> values table instead of a plain factor. rtracklayer's export function is
> aware of a possible strand column in the values table and handles it
> appropriately when serializing a RangedData object back into a bed file.
>
>
> Patrick
>
>
> Ivan Gregoretti wrote:
>>
>> Hi everybody,
>>
>> What is the minimal container class for position-and-orientation of
>> Solexa reads?
>>
>>
>> For example, the minimal positional information should be something
>> like a BED record, like this
>>
>> chr1\t3000001\t3000036\t\t\t+\t
>> ...(and many more lines)...
>>
>> sorry for the cumbersome string but I just want to stress that the
>> minimal information is:
>>
>> column 1: chromosome
>> column 2: start
>> column 3: end
>> column 6: orientation, either 'plus', 'minus' or undefined. (in this case
>> a '+')
>>
>> Is there any compact container to load, say, 50 million records? I
>> thought that RangedData could do that but after reading the
>> documentation I see that it does not hold strand information.
>>
>> If there is such container, how do you load it up from a BED file?
>>
>> Thank you,
>>
>> Ivan
>>
>> Ivan Gregoretti, PhD
>> National Institute of Diabetes and Digestive and Kidney Diseases
>> National Institutes of Health
>> 5 Memorial Dr, Building 5, Room 205.
>> Bethesda, MD 20892. USA.
>> Phone: 1-301-496-1592
>> Fax: 1-301-496-9878
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
>
More information about the Bioc-sig-sequencing
mailing list