[R] import file formatted RFC-822
Barry Rowlingson
b.rowlingson at lancaster.ac.uk
Tue Apr 13 19:54:26 CEST 2010
On Tue, Apr 13, 2010 at 6:26 PM, Sebastian Kruk <residuo.solow at gmail.com> wrote:
> Dear R-list users:
>
> I would like to import a database of web robots,
> http://www.robotstxt.org/db/all.txt, it´s formatted RFC-822, ¿how can
> I do it?
RFC822 looks very much like R's package DESCRIPTION files, and they
are read in using read.dcf because they are conformant to 'Debian
Control File' format. So I tried read.dcf on it:
> robots = read.dcf("all.txt")
> dim(robots)
[1] 298 38
so that's a matrix:
> dimnames(robots)
[[1]]
NULL
[[2]]
[1] "robot-id" "robot-name"
[3] "robot-cover-url" "robot-details-url"
[5] "robot-owner-name" "robot-owner-url"
[7] "robot-owner-email" "robot-status"
[9] "robot-purpose" "robot-type"
[11] "robot-platform" "robot-availability"
[13] "robot-exclusion" "robot-exclusion-useragent"
[15] "robot-noindex" "robot-host"
[17] "robot-from" "robot-useragent"
[19] "robot-language" "robot-description"
[21] "robot-history" "robot-environment"
[23] "modified-date" "modified-by"
[25] "robot-nofollow" "robot-owner-name2"
[27] "robot-owner-url2" "robot-owner-email2"
[29] "robot-owner-name3" "robot-owner-name4"
[31] "robot-environment1" "robot-environment2"
[33] "robot-purpose1" "robot-purpose2"
[35] "robot-purpose3" "robot-platform1"
[37] "robot-description1" "robot-description2"
and I guess it pads out the columns so every row has every possible
variable value even if it doesn't exist in the record for that robot.
Sorted?
Barry
More information about the R-help
mailing list