[BioC] Problem with BioStrings: patternMatch ALSO BLAST
hpages at fhcrc.org
hpages at fhcrc.org
Thu Jun 7 08:04:32 CEST 2007
Hi Paul,
Quoting Paul Leo <p.leo at uq.edu.au>:
> These commands are straight from the Biostrings; PatternMatch manual.
> Could not find any help online.
> Any advice on the key 86 error??
See below for the explanation.
>
>
> Otherwise has anyone do genomic BLAST from within R. Any advice on the
> best path.... Use Blastcl3 ? ,another standalone blast , ?
> Maybe a 1000 or do short sequences to blast....
There is no BLAST in Biostrings but there is the needwunsQS() function
for _global_ alignment of 2 sequences. I just rewrote this function in C
in Biostrings devel (2.5.9) to make it faster. Now it's possible to align
two 20000-letter sequences (nucleotides or amino acid) in 2 or 3 seconds,
granted that you have enough memory for this.
Look at the Alignment.pdf vignette, the man page for needwunsQS()
(?needwunsQS) and at ?scoring_matrices for more information on this.
[continued below]
>
> Thanks in advance
> Paul
>
> CODE
>
> > cI <- Mmusculus$chr13
> > length(cI)
> [1] 120614378
> > class(cI)
> [1] "DNAString"
> attr(,"package")
> [1] "Biostrings"
>
>
>
> > p <- "UUACAGUUGUUCAACCAGUUACU"
>
> *********** ERROR****************
> > matchPattern(p, cI, mismatch = 0)
> Error in CharBuffer.write(data, 1, length, value = src, enc = lkup) :
> key 85 not in lookup table
>
> *************************************
You are applying matchPattern() on a DNAString subject (cI) so the pattern (p)
is expected to be a DNAString object too or a string that can be converted
into a DNAString object. In your example p contains the letter U (looks like
an RNA sequence) so it can't be converted to a DNAString object.
key 85 is the ascii code of the U letter: this code does not belong to
the lookup table used internally to encode the letters of a DNAString object
(letters stored in a DNAString or RNAString objects are encoded, this is a
trick to allow some fast searching algorithms).
I agree that the current error message is not really helpful, sorry! I plan
to work on this and to improve the overall documentation too.
Your pattern (p) can be converted into a RNAString object first with
rnap <- RNAString(p)
and then into a DNAString object with:
dnap <- DNAString(rnap)
This will not necessarily give you what you want since when converting
from RNAString to DNAString (or inversely) the current translation is
applied:
U <-> A
G <-> C
C <-> G
A <-> T
(which mimics the transcription/reverse transcription processus) so
you might want to apply complement() (or reverseComplement()) to
the result.
Cheers,
H.
>
> > sessionInfo()
> R version 2.5.0 (2007-04-23)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_MON
> ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252
>
> attached base packages:
> [1] "tools" "stats" "graphics" "grDevices" "utils"
> "datasets"
> [7] "methods" "base"
>
> other attached packages:
> BSgenome.Mmusculus.UCSC.mm8 BSgenome
> "1.2.0" "1.4.0"
> Biobase Biostrings
> "1.14.0" "2.4.5"
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list