[R] Looping through multiple sub elements of a list to compare to multiple components of a vector
Adams, Jean
jvadams at usgs.gov
Wed Dec 2 18:53:32 CET 2015
First, a couple posting tips. It's helpful to provide some example data
people can work with. Also, please post in plain text (not html).
If you have a single standard for comparison, you might find an approach
like this helpful.
# example data
mylist <- c("AAEBCC", "AABDCC", "AABBCD")
list.2 <- strsplit(mylist, split=NULL)
# setting a standard for comparison
std.string <- "AABBCC"
standard <- unlist(strsplit(std.string, split=NULL))
sapply(list.2, function(x) x==standard)
This gives you a matrix of logicals with the number of rows the same length
as your original strings (the 99 amino acids) and the number of columns the
same length as the number of strings you're comparing (the 15 sequences).
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE TRUE
[3,] FALSE TRUE TRUE
[4,] TRUE FALSE TRUE
[5,] TRUE TRUE TRUE
[6,] TRUE TRUE FALSE
Jean
On Wed, Dec 2, 2015 at 9:39 AM, debra ragland via R-help <
r-help at r-project.org> wrote:
> I think I am making this problem harder than it has to be and so I keep
> getting stuck on what might be a trivial problem.
> I have used the seqinr package to load a protein sequence alignment
> containing 15 protein sequences;
> > library(seqinr) > x =
> read.alignment("proteins.fasta",format="fasta",forceToLower=FALSE)This
> automatically loads in a list of 4 elements including the sequences and
> other information.
> I store the sequences to a new list;
> > mylist = x$seqwhich returns a character vector of 15 strings.
> I have found that if I split the long character strings into individual
> characters it is easy to use lapply to loop over this list. So I use
> strsplit;
> >list.2 = strsplit(mylist, split = NULL)
> >From this list I can determine which proteins have changes at certain
> positions by using;
> >lapply(list.2, "[", 10) == "L"This returns a logical T/F vector for
> those elements of the list that do/do not the letter L at position 10.
> Because each of the protein sequences contains 99amino acids, I want to
> automate this process so that I do not have to compare/contrast positions 1
> x 1. Most of the changes occur between positions/letters 10-95. I have a
> standard character vector that I wish to use for comparison when looping
> through the list.
> Should I perhaps combine all -- the standard "letter"/aa vector, the list
> of protein sequences -- into one list? Or is it better to leave them
> separate for this comparison? I'm not sure what the output should be as I
> need to use it for another statistical test. Would a list of logical
> vectors be the most sufficient output to return?
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list