[Bioc-sig-seq] A myriad of classes

Martin Morgan mtmorgan at fhcrc.org
Mon Apr 20 23:44:26 CEST 2009


ig2ar-saf2 at yahoo.co.uk writes:

> Hello Deepayan,
>
> hmmm... I get
>
>> str(head(as.list(alignedLocs)))
> List of 6
>  $ 0:0:1  :List of 2
>   ..$ -: int(0) 
>   ..$ +: int(0) 

these are reads for which no unique alignment was found. You'll want
to filter them, e.g.,

filt <- compose(chromosomeFilter("chr[0-9XY]+.fa"),
                alignDataFilter(expression(filtering == "Y")))
aln <- readAligned(<...>, filter=filt)

or just

aln[filt(aln)]

Martin


>  $ 0:0:10 :List of 2
>   ..$ -: int(0) 
>   ..$ +: int(0) 
>  $ 0:0:100:List of 2
>   ..$ -: int(0) 
>   ..$ +: int(0) 
>  $ 0:0:101:List of 2
>   ..$ -: int(0) 
>   ..$ +: int(0) 
>  $ 0:0:102:List of 2
>   ..$ -: int(0) 
>   ..$ +: int(0) 
>  $ 0:0:103:List of 2
>   ..$ -: int(0) 
>   ..$ +: int(0)
>
>
>
> -Ivan
>
>
>
> ----- Original Message ----
> From: Deepayan Sarkar <deepayan.sarkar at gmail.com>
> To: ig2ar-saf2 at yahoo.co.uk
> Cc: Michael Lawrence <mflawren at fhcrc.org>; bioc-sig-sequencing at r-project.org
> Sent: Monday, 20 April, 2009 16:50:46
> Subject: Re: [Bioc-sig-seq] A myriad of classes
>
> On Mon, Apr 20, 2009 at 1:27 PM,  <ig2ar-saf2 at yahoo.co.uk> wrote:
>> Hello Michael,
>>
>> A high level vignette with the infrastructure of the BioC would be great.
>>
>> Also, I can be more specific about a class problem I am facing. It concerns a developmental package that I am privileged to be allowed to test. It's chipseq.
>>
>> I am trying to follow a typical workflow guide as shown here:
>>
>> http://www.bioconductor.org/workshops/2009/SeattleJan09/ChIP-seq/ChipSeqWorkflow.pdf
>>
>> As you can see, the data that the package uses is not raw data but data that has been read in and labelled somehow beforehand. The document shows
>>
>> load("../data/alignedLocs.rda")
>>
>> That is not the scenario a user will find. A user will have one or several s_X_export.txt files.
>
> Right. The document refers to data that was provided to those who
> attended the course (which unfortunately we cannot yet make public).
>
>> So, my attempts to get my data read in in the simplest case is this
>>
>>> library(chipseq)
>>> library(lattice)
>>> setwd('/scratch1/igregore/ChIPseq/runs/09-04-10/GERALD_14-04-2009_niddk/')
>>> pattern <- "s_1_export.txt"
>>> alignedLocs <- as(readAligned(".",
>> +                               pattern,
>> +                               "SolexaExport",
>> +                               filter = alignDataFilter(expression(filtering == "Y"))),
>> +                   "GenomeData")
>>> class(alignedLocs)
>> [1] "GenomeData"
>> attr(,"package")
>> [1] "BSgenome"
>
> Perfect.
>
>> The guide says that alignedLocs should be a GenomeDataList class object but it shows up as class GenomeData. The guide also shows
>
> That's because 'alignedLocs' contained several such objects,
> representing the data obtained from multiple lanes (possibly across
> multiple runs). To create such an object, you can do
>
> alignedLocs <- GenomeDataList(list(a = alignedLocs1, b = alignedLocs2))
>
> etc. where alignedLocs1, alignedLocs2, etc. are "GenomeData" objects
> (from individual calls to readAligned).
>
>>> alignedLocs
>>  A GenomeDataList instance of length 3
>>
>> but when I try it as is I get:
>>
>>>  alignedLocs
>>   A GenomeData instance of length 51154
>
> That is a bit odd though. Individual lanes should consist of
> chromosome-level data, and this suggests that you have 51154
> chromosomes. Perhaps you can give us the output of
>
> str(head(as.list(alignedLocs)))
>
> -Deepayan
>
>
>
>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list