[BioC] PostForm and readHTML Table

Martin Morgan mtmorgan at fhcrc.org
Thu Oct 27 07:20:26 CEST 2011


On 10/17/2011 08:47 AM, Ovokeraye Achinike-Oduaran wrote:
> Hi all,
>
> I ran a query using postForm(), got my results that are supposed to
> have headers. But when I use header=TRUE in the readHTMLTable function
> I get an error. Without specifying any parameter for the header, I get
> the result but it's very hard to read. Is it possible to get this in a
> readable table form with headers? Any help will be greatly
> appreciated.
>
> Thanks.
>
> -Avoks
>
>> data = postForm("http://www.genome.gov/GWAStudies/",
> + multidisease = c("Fasting glucose-related traits"),
> + submit = "Search")
>
>> tbl = readHTMLTable(htmlParse(data, asText = TRUE), which = 5, header = TRUE)
> Error in seq.default(length = max(numEls)) :
>    length must be non-negative number
> In addition: Warning message:
> In max(numEls) : no non-missing arguments to max; returning -Inf

I think you are after table 6, but you will still have problems with 
screen scraping. Maybe more straight-forward to download the 
tab-delimited file of the entire data base, offered on that page?

Martin

>
>> tbl = readHTMLTable(htmlParse(data, asText = TRUE), which = 5)
>> tbl                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    V1
> 1 Date Added to Catalog (since 11/25/08)\r\n\r\n        First
> Author/Date/ Journal/Study\r\n\t\t\r\n
> Disease/Trait\r\n\t\t\r\n        InitialSample Size\r\n\t\t\r\n
> Replication Sample Size\r\n\r\n        Region\r\n\t\t\r\n
> Reported Gene(s)\r\n        Mapped Gene(s)\r\n        Strongest
> SNP-Risk Allele\r\n        Context\r\n\t\t\r\n        Risk Allele
> Frequency in Controls\r\n        P-value\r\n\t\t\r\n        OR or
> beta-coefficient and [95% CI]\r\n\r\n        Platform[SNPs passingÂ
> QC]\r\n        CNV\r\n\t\t\r\n\t\t\r\n\t  \r\n\t\t\r\n\t\t02/28/10\r\n
>         \r\n\t\t\tDupuis JJanuary 17, 2010Nat GenetNew genetic loci
> implicated in fasting glucose homeostasis and their impact on type 2
> diabetes risk.\r\n\t\t\t\r\n        Fasting glucose-related traits\r\n
>         up to 46,186 European descent individuals\r\n        up to
> 76,558 European ancestry individuals\r\n\r\n
> 11q14.32q31.17p132q31.17p21.211q14.32p23.32p23.33q21.12p23.311p11.27p21.27p1310q25.211q12.211p11.29p24.211q12.23q26.29p24.23q21.11q32.38q24.1112q23.210q25.212q23.215q22.210q25.21q32.33q26.2\r\n
>         MTNR1BG6PC2GCKG6PC2DGKB,
> TMEM195MTNR1BGCKRGCKRADCY5GCKRMADDDGKB,
> TMEM195GCKADRA2AFADS1CRY2GLIS3FADS1SLC2A2GLIS3ADCY5PROX1SLC30A8IGF1TCF7L2IGF1C2CD4BADRA2APROX1SLC2A2\r\n
>         \r\n\t\tMTNR1BG6PC2GCK - YKT6G6PC2EEF1A1P26 -
> TMEM195MTNR1BGCKRGCKRADCY5GCKRMADDEEF1A1P26 - TMEM195GCK - YKT6ADRA2A
> - RPS6P15FADS1CRY2GLIS3FADS1SLC2A2GLIS3ADCY5RPL31P13 -
> PROX1SLC30A8IGF1TCF7L2IGF1C2CD4A - C2CD4BADRA2A - RPS6P15RPL31P13 -
> PROX1SLC2A2\r\n
> rs10830963-Grs560887-Crs4607517-Ars560887-Crs2191349-Trs10830963-Grs780094-Crs780094-Crs11708067-Ars780094-Crs7944584-Ars2191349-Trs4607517-Ars10885122-Grs174550-Trs11605924-Ars7034200-Ars174550-Trs11920090-Trs7034200-Ars11708067-Ars340874-Crs11558471-Ars35767-Grs4506565-Trs35767-Grs11071657-Ars10885122-Grs340874-Crs11920090-T\r\n\t\tintronintronintergenicintronintergenicintronintronintronintronintronintronintergenicintergenicintergenicintronintronintronintronintronintronintronintergenicUTR-3nearGene-5intronnearGene-5intergenicintergenicintergenicintron\r\n
>         \r\n
> \r\n\t\t0.300.700.160.700.520.300.620.620.780.620.750.520.160.870.640.490.490.640.870.490.780.520.310.850.310.850.630.870.520.87\r\n
>         6 x 10-175 (FPG)9 x 10-218 (FPG)7 x 10-92 (FPG)2 x 10-66
> (HOMA-B)3 x 10-44 (FPG)3 x 10-43 (HOMA-B)6 x 10-38 (FPG)3 x 10-24
> (HOMA-IR)7 x 10-22 (FPG)4 x 10-20 (FI)2 x 10-18 (FPG)3 x 10-17
> (HOMA-B)2 x 10-16 (HOMA-B)3 x 10-16 (FPG)2 x 10-15 (FPG)1 x 10-14
> (FPG)1 x 10-13 (HOMA-B)5 x 10-13 (HOMA-B)8 x 10-13 (FPG)1 x 10-12
> (FPG)3 x 10-12 (HOMA-B)7 x 10-12 (FPG)3 x 10-11 (FPG)2 x 10-9
> (HOMA-IR)1 x 10-8 (FPG)3 x 10-8 (FI)4 x 10-8 (FPG)2 x 10-6 (HOMA-B)5 x
> 10-6 (HOMA-B)5 x 10-6 (HOMA-B)\r\n        \r\n
> NRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNR\r\n
>   \r\n\r\n        Affymetrix&  Illumina [~2.5 million]
> (imputed)\r\n\t\tN
>> sessionInfo()
> R version 2.13.2 (2011-09-30)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_xxx  LC_CTYPE=English_xxx
> [3] LC_MONETARY=English_xxx LC_NUMERIC=C
> [5] LC_TIME=English_xxx
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] RHTMLForms_0.5-1 XML_3.4-2.2      RCurl_1.6-10.1   bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.13.2
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list