[R] import file formatted RFC-822

Sebastian Kruk residuo.solow at gmail.com
Wed Apr 14 19:20:39 CEST 2010


I have a problem, In a few cases "robot-exclusion-useragent" have 2 or
more values, is there a manner to fix it? For example, robot askjeeves
has three names.

2010/4/13 Barry Rowlingson <b.rowlingson en lancaster.ac.uk>:
> On Tue, Apr 13, 2010 at 6:26 PM, Sebastian Kruk <residuo.solow en gmail.com> wrote:
>> Dear R-list users:
>>
>> I would like to import a database of web robots,
>> http://www.robotstxt.org/db/all.txt, it´s formatted RFC-822, ¿how can
>> I do it?
>
>  RFC822 looks very much like R's package DESCRIPTION files, and they
> are read in using read.dcf because they are conformant to 'Debian
> Control File' format. So I tried read.dcf on it:
>
>  > robots = read.dcf("all.txt")
>  > dim(robots)
>  [1] 298  38
>
>  so that's a matrix:
>
>  > dimnames(robots)
> [[1]]
> NULL
>
> [[2]]
>  [1] "robot-id"                  "robot-name"
>  [3] "robot-cover-url"           "robot-details-url"
>  [5] "robot-owner-name"          "robot-owner-url"
>  [7] "robot-owner-email"         "robot-status"
>  [9] "robot-purpose"             "robot-type"
> [11] "robot-platform"            "robot-availability"
> [13] "robot-exclusion"           "robot-exclusion-useragent"
> [15] "robot-noindex"             "robot-host"
> [17] "robot-from"                "robot-useragent"
> [19] "robot-language"            "robot-description"
> [21] "robot-history"             "robot-environment"
> [23] "modified-date"             "modified-by"
> [25] "robot-nofollow"            "robot-owner-name2"
> [27] "robot-owner-url2"          "robot-owner-email2"
> [29] "robot-owner-name3"         "robot-owner-name4"
> [31] "robot-environment1"        "robot-environment2"
> [33] "robot-purpose1"            "robot-purpose2"
> [35] "robot-purpose3"            "robot-platform1"
> [37] "robot-description1"        "robot-description2"
>
>  and I guess it pads out the columns so every row has every possible
> variable value even if it doesn't exist in the record for that robot.
>
>  Sorted?
>
> Barry
>



More information about the R-help mailing list