[BioC] PostForm() with KEGG
Ovokeraye Achinike-Oduaran
ovokeraye at gmail.com
Wed Feb 29 11:19:16 CET 2012
Hi Morgan,
Thanks. I think there's possibly a bug with the
getHTMLFormDescription() but I do understand what you've explained.
Thanks again.
-Avoks
On Tue, Feb 28, 2012 at 6:19 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 02/28/2012 06:14 AM, Ovokeraye Achinike-Oduaran wrote:
>>
>> Hi Duncan,
>>
>> My understanding is that xpathSApply() combines both the geneSetNode()
>> and the sapply(). I hope that this is a correct assumption. In
>> attempting to retrieve nodes in general from the pathway, I used both
>>
>> xpathSApply(doc, "//li/node()", xmlGetAttr, "href")
>> and
>> xpathSApply(doc, "//li/a/node()", xmlGetAttr, "href")
>>
>> and the I get nothing (null) back even though no visible error pops
>> up. I something wrong with the way I'm using the path or do I just not
>> yet grasp the whole XPath concept (I did read the online tutorial)?
>
>
> the NULL means that no nodes match your xpath query.
>
>
>>
>> Sorry to drag this on, but please help.
>
>
> I used Duncan's RHTMLForms suggestion
>
> library(RHTMLForms)
> url = "http://www.genome.jp/kegg/tool/map_pathway1.html"
> u = "http://www.genome.jp/kegg-bin/search_pathway_object"
> ff = getHTMLFormDescription(url)
>
> fun = createFunction(ff[[1]])
> txt = fun(unclassified = "ko:K01803 cpd:C00111 cpd:C00118 K00134 C00236",
> target = "alias", .url = u)
>
> to retrieve the text and then
>
> library(XML)
> xml = htmlTreeParse(txt, asText=TRUE, useInternalNodes=TRUE)
>
> to parse to xml (maybe there is a more direct way, using the reader argument
> to createFunction?). If I experiment a little, I see for instance that
>
> getNodeSet(xml, "//li/a")
>
> returns the 'li' elements with nested 'a' elements, and
>
> getNodeSet(xml, "//li/a[@target]")
>
> returns the subset of those elements that have a 'target' attribute. Finally
>
>> head(xpathSApply(xml, "//li/a[@target]", xmlValue))
> [1] "ko00010 Glycolysis / Gluconeogenesis"
> [2] "ko01100 Metabolic pathways"
> [3] "ko01110 Biosynthesis of secondary metabolites"
> [4] "ko01120 Microbial metabolism in diverse environments"
> [5] "ko00710 Carbon fixation in photosynthetic organisms"
> [6] "ko00562 Inositol phosphate metabolism"
>
> seems to be about what you want, or
>
>
> head(xpathSApply(xml, "//li/a/@href"))
> href
> "/kegg-bin/show_pathway?13304448561022/ko00010.args"
> href
> "javascript:display('ko00010')"
> href
> "/kegg-bin/show_pathway?13304448561022/ko01100.args"
> href
> "javascript:display('ko01100')"
> href
> "/kegg-bin/show_pathway?13304448561022/ko01110.args"
> href
> "javascript:display('ko01110')"
>
> Maybe the KEGGSOAP package already does what you're interested in? The web
> scraping you're doing is going to break as soon as the web site tweaks its
> presentation.
>
> Or maybe
>
>> library(org.Hs.eg.db)
>> head(toTable(revmap(org.Hs.egPATH)[c("00232", "04142")]))
> gene_id path_id
> 1 9 00232
> 2 10 00232
> 3 20 04142
> 4 53 04142
> 5 54 04142
> 6 162 04142
>
> The KEGG information in the org.* and KEGG packages dates to the last free
> public release, and so are starting to be dated).
>
> Martin
>
>
>>
>> Thanks.
>>
>> Avoks
>>
>> On Mon, Feb 27, 2012 at 4:09 PM, Ovokeraye Achinike-Oduaran
>> <ovokeraye at gmail.com> wrote:
>>>
>>> Thank you so very much, Duncan. I will go get myself enlightened:).
>>> Thanks again.
>>>
>>> Avoks
>>>
>>> On Mon, Feb 27, 2012 at 3:50 PM, Duncan Temple Lang
>>> <duncan at wald.ucdavis.edu> wrote:
>>>>
>>>>
>>>> Use
>>>>
>>>> target = "alias"
>>>>
>>>> in the call.
>>>>
>>>> If you don't know how to map form elements to parameters in the request,
>>>> you
>>>> can either read a tutorial on HTML forms, or alternatively, use
>>>> the RHTMLForms package which you have loaded according to your search
>>>> path, e.g.
>>>>
>>>> # read the form and then turn the information into an R function.
>>>> ff =
>>>> getHTMLFormDescription("http://www.genome.jp/kegg/tool/map_pathway1.html")
>>>> fun = createFunction(ff[[1]])
>>>>
>>>> # Since the action in the form is javascript, we'll provide the
>>>> # URL manually.
>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>>>> out = fun(unclassified = "ko:K01803 cpd:C00111 cpd:C00118 K00134
>>>> C00236",
>>>> target = "alias", .url = u)
>>>>
>>>> The benefits of the RHTMLForms include using the same defaults
>>>> as the form on the Web page, adding hidden parameters, identifying
>>>> the names of the parameters.
>>>>
>>>> D
>>>>
>>>>
>>>> On 2/27/12 3:08 AM, Ovokeraye Achinike-Oduaran wrote:
>>>>>
>>>>> Hi Duncan,
>>>>>
>>>>> I noticed that with the script as is, it doesn't take into
>>>>> consideration the "include alias" checkbox. I tried modifying the
>>>>> script to force include that option but it still did not work. Any
>>>>> ideas?
>>>>>
>>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>>>>> data = postForm(u,
>>>>> .params = list(org_name = "hsadd",
>>>>> unclassified = paste(readLines(file.choose()), collapse
>>>>> = "\n"),
>>>>> file = "", checkbox = "alias", submit = "Exec"))
>>>>>
>>>>>
>>>>> Thanks again.
>>>>>
>>>>> Avoks
>>>>>
>>>>>
>>>>> On Mon, Feb 27, 2012 at 10:24 AM, Ovokeraye Achinike-Oduaran
>>>>> <ovokeraye at gmail.com> wrote:
>>>>>>
>>>>>> Hi Duncan,
>>>>>>
>>>>>> Thanks a bunch.
>>>>>>
>>>>>> -Avoks
>>>>>>
>>>>>> On Fri, Feb 24, 2012 at 11:09 PM, Duncan Temple Lang
>>>>>> <duncan at wald.ucdavis.edu> wrote:
>>>>>>>
>>>>>>> Hi Avoks
>>>>>>>
>>>>>>> While the form is provided by KEGG and so bio-relatd,
>>>>>>> you might have been better posting this to the more general r-help
>>>>>>> mailing list.
>>>>>>>
>>>>>>>
>>>>>>> You are posting the HTTP request to the wrong URL. That is the URL
>>>>>>> of the Web page that displays the form, not the URL that processes
>>>>>>> the input from the form.
>>>>>>> You have to look at the JavaScript that is referenced in the action
>>>>>>> attribute of the HTML form element.
>>>>>>>
>>>>>>> The second issue is that you are submitting the name of a local file.
>>>>>>> This won't work as is. You either need to identify this is the name
>>>>>>> of a file and not the contents
>>>>>>> of the file to send, or else send the contents. In this form, you
>>>>>>> can send the
>>>>>>> contents via the the unclassified parameter.
>>>>>>>
>>>>>>>
>>>>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>>>>>>> data = postForm(u,
>>>>>>> .params = list(org_name = "hsadd",
>>>>>>> unclassified = "hsa:7167 hsa:GPI
>>>>>>> cpd:C00118\nALDOA 1.2.1.12 C00236",
>>>>>>> file = "", submit = "Exec"))
>>>>>>>
>>>>>>>
>>>>>>> If your input is in a file, you can use
>>>>>>>
>>>>>>> unclassified = paste(readLines(file.choose()), collapse = "\n")
>>>>>>>
>>>>>>> as the value for the unclassified parameter.
>>>>>>>
>>>>>>>
>>>>>>> There are additional parameters that the form accepts that may be
>>>>>>> relevant for your search.
>>>>>>>
>>>>>>>
>>>>>>> As for processing the results, you will want to use
>>>>>>>
>>>>>>> doc = htmlParse(data, asText = TRUE)
>>>>>>>
>>>>>>> and then use getNodeSet()/xpathSApply() or direct tree extraction to
>>>>>>> access the nodes you want, e.g.
>>>>>>>
>>>>>>> xpathSApply(doc, "//li/a", xmlGetAttr, "href")
>>>>>>>
>>>>>>>
>>>>>>> D.
>>>>>>>
>>>>>>>
>>>>>>> On 2/24/12 6:09 AM, Ovokeraye Achinike-Oduaran wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I am trying to use postForm() with the KEGG website but I am stuck
>>>>>>>> on
>>>>>>>> how to get my results. Is it possible (code below) or am I using
>>>>>>>> postForm() wrongly? The code appears to run but I'm not quite sure
>>>>>>>> how
>>>>>>>> to read the results assuming there are any. Please help.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Avoks
>>>>>>>> ____
>>>>>>>>
>>>>>>>> data = postForm("http://www.genome.jp/kegg/tool/map_pathway1.html",
>>>>>>>> org_name = "hsadd",
>>>>>>>> file = file.choose(),
>>>>>>>> submit = "Exec")
>>>>>>>>
>>>>>>>>> sessionInfo()
>>>>>>>>
>>>>>>>> R version 2.14.1 (2011-12-22)
>>>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>>>
>>>>>>>> locale:
>>>>>>>> [1] LC_COLLATE=English_xxx.1252 LC_CTYPE=English_xxx.1252
>>>>>>>> [3] LC_MONETARY=English_xxx.1252 LC_NUMERIC=C
>>>>>>>> [5] LC_TIME=English_xxx.1252
>>>>>>>>
>>>>>>>> attached base packages:
>>>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>>>>
>>>>>>>> other attached packages:
>>>>>>>> [1] RHTMLForms_0.5-1 XML_3.9-4.1 RCurl_1.91-1.1
>>>>>>>> bitops_1.0-4.1
>>>>>>>>
>>>>>>>> loaded via a namespace (and not attached):
>>>>>>>> [1] tools_2.14.1
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
More information about the Bioconductor
mailing list