[R] scan html: sep = "<td>"

Mon Apr 4 17:30:01 CEST 2005

Christoph Lehmann wrote:

> entry from html:
> 
>   <tr bgcolor=#9090f0><td align="right"><b>BM</b></td><td> 
> 0.952</td><td> 0.136</td><td> 6.984</td><td>0.000000</td></tr>
>   <tr bgcolor=#9090f0><td align="right"><b>BH</b></td><td> 
> 1.338</td><td> 0.136</td><td> 9.821</td><td>0.000000</td></tr>
> 
> 
> 
>  using
> left.data<- scan(paste(path, left.file, sep = ""), what = 'character',
>                sep=c("<td>", "</td>"))
> 
> 
> yields
> 
>  > left.data
>  [1] "  "                  "tr bgcolor=#9090f0>" "td align=right>"
>  [4] "b>BM"                "/b>"                 "/td>"
>  [7] "td> 0.952"           "/td>"                "td> 0.136"
> [10] "/td>"                "td> 6.984"           "/td>"
> [13] "td>0.000000"         "/td>"                "/tr>"
> [16] "  "                  "tr bgcolor=#9090f0>" "td align=right>"
> [19] "b>BH"                "/b>"                 "/td>"
> [22] "td> 1.338"           "/td>"                "td> 0.136"
> [25] "/td>"                "td> 9.821"           "/td>"
> [28] "td>0.000000"         "/td>"                "/tr>"
> 
> why doesn't it detect the whole '<tr> as sep?
> 
> 
> Uwe Ligges wrote:
> 
>> Christoph Lehmann wrote:
>>
>>> Hi
>>> I try to import html text and I need to split the fields at each <td> 
>>> or </td> entry
>>>
>>> How can I succeed? sep = '<td>' doens't yield the right result
>>
>>
>> If it fits pairwise together, use
>>   sep=c("<td>", "</td>")

Apologies, one should not send untested code.
"sep" must be a character rather than a string containg more than one 
character.

So you may want to try out my second suggestion.

Uwe Ligges

>> if not, you can read the whole lot with readLines and strsplit for 
>> both pattern after that, for example.
>>
>> Uwe Ligges
>>
>>
>>
>>> thanks for hints
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide! 
>>> http://www.R-project.org/posting-guide.html
>>
>>
>>