[Bioc-sig-seq] ExpressionSet alikes for next-gen data

Martin Morgan mtmorgan at fhcrc.org
Thu Apr 1 16:15:57 CEST 2010


On 03/31/2010 04:06 AM, Michael Lawrence wrote:
> On Wed, Mar 31, 2010 at 3:55 AM, David Rossell <
> david.rossell at irbbarcelona.org> wrote:
> 
>> Following a recent thread, I also have found convenient to store nextgen
>> data as RangedData instead of ShortRead objects. They require far less
>> memory and make feasible working with several samples at the same time (in
>> my 8Gb RAM desktop I can load 2 ShortRead objects at the most, with
>> RangedData I haven't struck the upper limit yet).
>>
>> I am thinking about taking this idea a step forward: RangedDataList allows
>> storing info from several samples (e.g. IP and control) in a single object.
>> The only problem is RangedDataList does not store information about the
>> samples, e.g. the phenoData we're used to in ExpressionSet objects. My idea
>> is to define something like a "SequenceSet" class, which would contain a
>> RangedDataList with the ranges, a phenoData with sample information, and
>> possibly also information about the experiment (e.g. with the MIAME analog
>> for sequencing, MIASEQE).
>>
>> The thing is I don't want to re-invent the wheel. I haven't seen that this
>> is implemented yet, but is someone working on it? Any criticism/ ideas?
>>
>>
> RangedDataList already supports this. See the 'elementMetadata' and
> 'metadata' slots in the Sequence class.

Hi David et al.,

I've also found the elementMetadata slot excellent for this purpose.
The ShortRead data objects retain sequence and quality information, this
information is often not needed after a certain point in the analysis.

Wanted to point to the GenomicRanges package in Bioc-devel, which has a
GRanges class that is more fastidious about strand information (maybe a
plus?) and conforms more to an 'I am a rectangular data structure' world
view. Also the GappedAlignments class for efficiently representing large
numbers of reads.

Martin

> 
> Michael
> 
> 
> 
>> Best,
>>
>> David
>>
>> --
>> David Rossell, PhD
>> Manager, Bioinformatics and Biostatistics unit
>> IRB Barcelona
>> Tel (+34) 93 402 0217
>> Fax (+34) 93 402 0257
>> http://www.irbbarcelona.org/bioinformatics
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list