[Bioc-sig-seq] Minimal short sequences position/orientation container

Patrick Aboyoun paboyoun at fhcrc.org
Thu Sep 24 08:51:16 CEST 2009


Ivan,
The RangedData class can store strand information in its values table. 
The values table can store any "vector-like" object from simple R 
vectors (including lists) to an instance of any of the *List classes 
defined in IRanges. If you use rtracklayer's import function on a bed 
file containing the information you have shown, the chromosome 
information will be used to segment the other values into spaces, the 
start and end values will be joined together in the ranges information 
(as a CompressedIRangesList object) and the strand information will be 
stored as a factor column across the values set (which is a 
CompressedDataFrameList object). The strand information can be accessed 
by the strand accessor function. If your data are sorted by strand 
within chromosome, you could add another level of compression by storing 
the strand information as a 'factor' Rle in the values table instead of 
a plain factor. rtracklayer's export function is aware of a possible 
strand column in the values table and handles it appropriately when 
serializing a RangedData object back into a bed file.


Patrick


Ivan Gregoretti wrote:
> Hi everybody,
>
> What is the minimal container class for position-and-orientation of
> Solexa reads?
>
>
> For example, the minimal positional information should be something
> like a BED record, like this
>
> chr1\t3000001\t3000036\t\t\t+\t
> ...(and many more lines)...
>
> sorry for the cumbersome string but I just want to stress that the
> minimal information is:
>
> column 1: chromosome
> column 2: start
> column 3: end
> column 6: orientation, either 'plus', 'minus' or undefined. (in this case a '+')
>
> Is there any compact container to load, say, 50 million records? I
> thought that RangedData could do that but after reading the
> documentation I see that it does not hold strand information.
>
> If there is such container, how do you load it up from a BED file?
>
> Thank you,
>
> Ivan
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1592
> Fax: 1-301-496-9878
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list