[BioC] How do I parse HTML table using RCurl?

James F. Reid james.reid at ifom-ieo-campus.it
Mon Mar 14 23:15:45 CET 2011


Hi Ruppert,

the targetscan database for Human and Mouse is already available in 
bioconductor as an AnnotationDbi annotation resource 
(targetscan.Hs.eg.db and targetscan.Mm.eg.db), so is mirbase but without 
any target predictions. As others have pointed out on the mailing list I 
would not recommend parsing the html of a query as the format is likely 
to change in time, but rather download the database and re-format.
If you are interested in providing other miRNA target prediction 
resources to the community, I would be willing to help.

Best,
J.


On 03/14/2011 09:18 PM, Ruppert Valentino wrote:
>
>
> Hello,
>
> I am trying to write a script that will enter miRNA and get the predicted target genes for that miRNA. I am trying to use various software to do this, one of them is TargetScan. The problem is that I don't know how to parse the HTML output table so that I can get the target genes only.
>
> For example I am search for target genes for the miRNA mmu-miR-1 as follows:
>
> http://www.targetscan.org/cgi-bin/targetscan/vert_50/targetscan.cgi?species=Human&gid=&mir_sc=&mir_c=&mir_nc=&mirg=mmu-miR-1
>
> This generates a table
>
>
>
> The script is:
>
> URL<- "http://www.targetscan.org/cgi-bin/targetscan/vert_50/targetscan.cgi?species=Human&gid=&mir_sc=&mir_c=&mir_nc=&mirg=mmu-miR-1"
> dat<- readLines(URL)
>
>
> But I don't know how to parse the table to separate it into columns then I can take the column entitled "Human ortholog of target gene" which would have the target genes.
>
>
> In the example above the first gene COL4A3 starts at HTML code:
>
> <td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=ShowDetailView&TermToSearch=1285" target=new>COL4A3
>
>
>
> Is there any way to format such a table into columns then transpose the column entitled "Human ortholog of target gene" and pass that to a variable?
>
>
> Many thanks,
>
>
>    		 	   		
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list