[Bioc-sig-seq] readAligned error: negative length vectors are not allowed

Martin Morgan mtmorgan at fhcrc.org
Tue Mar 10 03:25:39 CET 2009


Hi Joseph

joseph <jdsandjd at yahoo.com> writes:

> ------------------------------------------------------------------------------
>
> Hi Martin
>  
> "/../data" says "start at the root directory of the file system and go
> one level up, then down to the 'data' directory". Is this what you
> mean, or perhaps "./../data"?
> The actual whole path is:
> "/Users/jdhahbi/JosephDhahbi/SOLEXA/ShortRead/mono/data"
> "/../data" is my way of shorteneing it for the e-mail

ah

> the 4 files are in the subdirectory "data"
> readAligned as invoked above has no 'pattern' argument, and so will
> match all files in the directory '/../data'. Likely what you want is
> to add an argument pattern=".*_export.*" or similar; using
> list.files("/../data") is a good way to see what you're trying to read
> in, for instance list.files("/../data", ".*_export.*")
> the subdirectory 'data' contains only the 4 ".*_export.*" files; I meant to
> read in all 4.

ok

> The error itself likely comes from trying to allocate a very large
> object. In general at least for the initial stages of an analysis a
> good work flow might read a 'large' file, perform some processing to
> result in a 'small' object, read the next, etc., and finally combine
> the small objects. This is what qa() does (visit an _export file,
> summarize, visit the next, combine results into the object that you
> refer to as qaSummary); a simple (and too naive) start to a ChIP-seq
> analysis might generate a list of files containing aligned reads and
> then summarize where in the genome the reads align to using
> ShortRead::coverage (I did not test the following code),
> files <- list.files(dirPath, ".*_export.*", full=TRUE)
> cvg <- lapply(files, function(file) {
>   aln <- readAligned(dirname(file), basename(file), type="SolexaExport")
>   coverage(aln)
> })
> the results of coverage() are quite small and manageable, and the cvg
> object contains data from all 'files'; using srapply rather than
> lapply would allow this to be distrbuted across processors.
> I got a different error when I tried the code you suggested:
>> files <-
> list.files("/Users/jdhahbi/JosephDhahbi/SOLEXA/ShortRead/mono/data",
> ".*_export.*", full=TRUE)
>> files
> [1] "/Users/jdhahbi/JosephDhahbi/SOLEXA/ShortRead/mono/data/s_2_1_export.txt"
> [2] "/Users/jdhahbi/JosephDhahbi/SOLEXA/ShortRead/mono/data/s_2_2_export.txt"
> [3] "/Users/jdhahbi/JosephDhahbi/SOLEXA/ShortRead/mono/data/s_3_1_export.txt"
> [4] "/Users/jdhahbi/JosephDhahbi/SOLEXA/ShortRead/mono/data/s_3_2_export.txt"
>>
>> cvg <- lapply(files, function(file) {
> +   aln <- readAligned(dirname(file), basename(file), type="SolexaExport")
> +   coverage(aln)
> + })

Not meant to be code to be used verbatim, so sorry if it misled...

> Error in stop("'", argname, , "' cannot contain NAs") :
>   argument is missing, with no default
> Error in coverage(IRanges(rstart, rend), start, end, ...) :
>   error in evaluating the argument 'x' in selecting a method for function
> coverage'

This occurs when the IRanges being constructed contains NA
values. Why this is is harder to diagnose from afar, but the key
components to look at are the results of chromosome(), position() and
width() applied to the object read in by readAligned.

You'll likely want to figure out how to use R's debugging tools, if
you haven't already. The key tools are traceback(),
options(error=recover) / options(error=NULL), debug(), and browser().

Hope that helps,

Martin



> Thank you for help
>>
>>> sessioInfo()
>> R version 2.8.1 Patched (2009-03-03 r48046)
>> i386-apple-darwin9.6.0
>> locale:
>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>> attached base packages:
>> [1] tools    stats    graphics  grDevices utils    datasets  methods 
>> [8] base   
>> other attached packages:
>> [1] ShortRead_1.0.6    lattice_0.17-20    Biobase_2.2.2     
> Biostrings_2.10.16
>> [5] IRanges_1.0.13   
>>
>>
>>     
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> [[Bioc-sig-sequencing at r-project.org]]
>> [[https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing]]

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list