[BioC] KEGGSOAP questions: how to handle multiple annotations in R data frames, and why does a "for" loop only use one annotation.
ALAN SMITH
alansmith2 at gmail.com
Tue Nov 28 02:01:05 CET 2006
Hello,
I am attempting to use R to query KEGG in order to find cpd IDs from
neutral masses and eventually link these cpd IDs up to the pathways
they are part of. I have several question that are listed after the
example R code for the problems. Finally, at the end is what I think
the ideal result would look like, that I cannot achieve.
######### session info ##################
> sessionInfo()
R version 2.4.0 (2006-10-03)
i386-pc-mingw32
attached base packages:
[1] "methods" "stats" "graphics" "grDevices" "utils"
"datasets" "base"
other attached packages:
KEGG KEGGSOAP SSOAP RCurl XML
"1.8.1" "1.9.1" "0.4-0" "0.8-0" "1.2-0"
####################################################################
#Example R code for the problem I am having.#
cpdID<-c(1,2,3,4,5,6)
mass<-c(129.0426, 147.0532, 208.0848, 220.0848, 204.0899, 777.0317)
RT<-c(1,2,3,4,5,6)
ppmerror<-c(4,11,75,7,21,55)
floatmass=NULL
for (i in 1:length(cpdID)) {
floatmass[i]<-if(ppmerror[i]<10) {1e-5*mass[i]} else{(ppmerror[i]/10^6)*mass[i]}
}
testdata<-as.data.frame(cbind(cpdID, mass, RT, ppmerror, floatmass))
library(KEGGSOAP)
library(KEGG)
KEGGID=NULL
for (i in 1:length(testdata$cpdID)) {
KEGGID[i]<-(search.compounds.by.mass(testdata$mass[i], testdata$floatmass[i]))
}
KEGGID
tt<-cbind(KEGGID,testdata) ### this cbind does not work vectors are
different sizes ####
###the objects below contain the full query results for each output in
the loop above####
a<-t(as.data.frame(search.compounds.by.mass(129.0426,0.00129)))
b<-t(as.data.frame(search.compounds.by.mass(147.0532,0.00161)))
c<-t(as.data.frame(search.compounds.by.mass(208.0848,0.0156)))
d<-t(as.data.frame(search.compounds.by.mass(220.0848,0.0022)))
e<-t(as.data.frame(search.compounds.by.mass(204.0899,0.0042)))
f<-t(as.data.frame(search.compounds.by.mass(777.0317,0.0427)))
Problem #1 (probably has to do with how R works) Each queried mass
except the last which has none has more than one annotation. Why
does R only fill one (the first annotation returned of the query
result, while truncating the rest) value in the KEGGID loop? How can
I produce an output that will allow all the annotations for each mass
to be hooked back up to the table testdata using a loop that can cycle
through the table testdata?
Problem #2
What does R consider an output value from a KEGG query to be? If I am
going to solve problem 1 I need some way to fill in the missing
annotation where nothing is returned from KEGG. Currently this data is
skipped causing output to be too short. How can I write an if else
statement (or something similar) to fill in NA or a phrase like "no
hit" when no annotation is present?
I was thinking that something like the loop below, but i dont know
what "X" should be in the IF statement
KEGGID2=NULL
for (i in 1:length(testdata$cpdID)) {
KEGGID2[i]<-if((search.compounds.by.mass(testdata$mass[i],
testdata$floatmass[i]))==X)
{search.compounds.by.mass(testdata$mass[i], testdata$floatmass[i]}
else{NA}
}
#Ideal result, that I cannot achieve with my knowledge of R#
a11<-c(1, 129.0426,10, 4,
0.001290426,"cpd:C01877","cpd:C01879","cpd:C02237","cpd:C02238","cpd:C04281",
"cpd:C04282","blank","blank","blank","blank","blank")
b11<-c(2, 147.0532,15,11, 0.001617585,"cpd:C00025","cpd:C00217",
"cpd:C00302", "cpd:C00979", "cpd:C03618","cpd:C03790","cpd:C05574",
"cpd:C05938", "cpd:C05941", "cpd:C12269","blank")
c11<-c(3, 208.0848, 20,75, 0.015606360,"cpd:C00328", "cpd:C01484",
"cpd:C01718", "cpd:C02381", "cpd:C05610", "cpd:C05647", "cpd:C06487",
"cpd:C09816", "cpd:C11433", "cpd:C11690", "cpd:C15589")
d11<-c(4,220.0848,4,7,0.002200848,"cpd:C00643","cpd:C01017","cpd:C09985","blank","blank","blank","blank","blank","blank","blank","blank")
e11<-c(5,204.0899,7,21, 0.004285888,"cpd:C00078", "cpd:C00525",
"cpd:C00806", "cpd:C07839", "cpd:C10743", "cpd:C10968",
"cpd:C14916","blank","blank","blank","blank")
f11<-c(6,777.0317, 11, 55, 0.042736744,"no
hit","blank","blank","blank","blank","blank","blank","blank","blank","blank","blank")
ideal<-rbind(a11,b11,c11,d11,e11,f11)
colnames(ideal)<-c("cpdID","mass","RT","ppmerror","floatmass","cpd1","cpd2","cpd3","cpd4","cpd5","cpd6","cpd7","cpd8","cpd9","cpd10","cpd11")
ideal
Thank you,
Alan Smith
University of Wisconsin-Madison
More information about the Bioconductor
mailing list