[R] scan html: sep = "<td>"
Uwe Ligges
ligges at statistik.uni-dortmund.de
Mon Apr 4 17:30:01 CEST 2005
Christoph Lehmann wrote:
> entry from html:
>
> <tr bgcolor=#9090f0><td align="right"><b>BM</b></td><td>
> 0.952</td><td> 0.136</td><td> 6.984</td><td>0.000000</td></tr>
> <tr bgcolor=#9090f0><td align="right"><b>BH</b></td><td>
> 1.338</td><td> 0.136</td><td> 9.821</td><td>0.000000</td></tr>
>
>
>
> using
> left.data<- scan(paste(path, left.file, sep = ""), what = 'character',
> sep=c("<td>", "</td>"))
>
>
> yields
>
> > left.data
> [1] " " "tr bgcolor=#9090f0>" "td align=right>"
> [4] "b>BM" "/b>" "/td>"
> [7] "td> 0.952" "/td>" "td> 0.136"
> [10] "/td>" "td> 6.984" "/td>"
> [13] "td>0.000000" "/td>" "/tr>"
> [16] " " "tr bgcolor=#9090f0>" "td align=right>"
> [19] "b>BH" "/b>" "/td>"
> [22] "td> 1.338" "/td>" "td> 0.136"
> [25] "/td>" "td> 9.821" "/td>"
> [28] "td>0.000000" "/td>" "/tr>"
>
> why doesn't it detect the whole '<tr> as sep?
>
>
> Uwe Ligges wrote:
>
>> Christoph Lehmann wrote:
>>
>>> Hi
>>> I try to import html text and I need to split the fields at each <td>
>>> or </td> entry
>>>
>>> How can I succeed? sep = '<td>' doens't yield the right result
>>
>>
>> If it fits pairwise together, use
>> sep=c("<td>", "</td>")
Apologies, one should not send untested code.
"sep" must be a character rather than a string containg more than one
character.
So you may want to try out my second suggestion.
Uwe Ligges
>> if not, you can read the whole lot with readLines and strsplit for
>> both pattern after that, for example.
>>
>> Uwe Ligges
>>
>>
>>
>>> thanks for hints
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>
>>
>>
More information about the R-help
mailing list