[BioC] BioStrings - BStringViews Object

Herve Pages hpages at fhcrc.org
Wed Mar 5 05:10:07 CET 2008


Hi Mayra,

Mayra Eduardoff wrote:
> Hi ,
> 
> I am now using R version 2.6.0 (2007-10-03)
> x86_64-unknown-linux-gnu
> 
> locale:
> LC_CTYPE=en_US.ISO8859-1;LC_NUMERIC=C;LC_TIME=en_US.ISO8859-1;LC_COLLATE=en_US.ISO8859-1;LC_MONETARY=en_US.ISO8859-1;LC_MESSAGES=en_US.ISO8859-1;LC_PAPER=en_US.ISO8859-1;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.ISO8859-1;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] Biobase_1.16.1   Biostrings_2.6.4 RMySQL_0.6-0     DBI_0.2-4
> [5] biomaRt_1.12.2   RCurl_0.8-3
> 
> loaded via a namespace (and not attached):
> [1] rcompgen_0.1-15 XML_1.93-2
> 
> 
> I was wondering whether the writeFASTA() function is not reading in
> BStringViews object anymore ? It only seems to work with a list.
> 
> class(bsv)
> [1] "BStringViews"
> attr(,"package")
> [1] "Biostrings"
> 
> writeFASTA(bsv, "Doc/tables/pta-seq.fasta")
> Error in writeFASTA(bsv, "Doc/tables/pta-seq.fasta") :
>   invalid type/length (S4/1) in vector allocation

You need to use write.BStringViews() instead.

readFASTA() and writeFASTA() work in a symmetric manner: since the former loads
the FASTA file into a list, then the latter writes a list to a FASTA file.

read.BStringViews() and write.BStringViews() also work in a symemtric manner.

> 
> My other question is whether the BStringViews object is only thought
> to be used for different views on the same BString ?

Yes, on the same string.

> I kept using it
> for different BString-objects to i.e use the function complement(),
> but I guess I shoud have sapplied the function to a vector of
> DNAStrings or something like that ?

Yes you are doing something that is not natural (i.e. putting unrelated
DNAStrings into a BStringViews object) to take advantage of the vectorized
feature of complement(). You can't be blamed for this because it's actually
the most efficient way to complement or reverse-complement a big collection
of sequences... but it's not natural ;-)

The reason why the BStringViews container has this constraint that it must
contain views on the same string is that it was originally designed to store
the result of a call to matchPattern(). For this use case it was natural
to return a set of views on the subject of the search.
But it feels wrong to use it for storing a set of unrelated BString objects.
The problem is that the Biostrings package was lacking a container for doing
this. I've added one recently in the devel version of the package: the BStringSet
container (there is also a DNAStringSet, RNAStringSet and a AAStringSet
container). It has some limitations too (e.g. you can't append new strings
to it) but it will not represent a set of views on the same string anymore.

This is still a work in progress but it should be ready for the next BioC
release. Like for BStringViews objects, there will be a read.BStringSet()
and a write.BStringSet() function. Also most of the functions that work for
a single BString object will also work in a parallelized fashion for a
BStringSet object. For example, if 'x' is a DNAStringSet object, complement(x)
will return the DNAStringSet object made of the complements of each sequence
in 'x'.

Cheers,
H.



> 
> thanks,
> 
> Mayra
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list