[R] IP-Address
Peter Dalgaard
P.Dalgaard at biostat.ku.dk
Thu Jun 4 14:46:29 CEST 2009
edwin7 at web.de wrote:
> Hi,
>
>
> Unfortunately, they can't handle NA. Any suggestion? Some row for Ip
> don't have ip address. This cause an error/ wrong result.
A quick fix could be to substitute "..." or "0.0.0.0" for the "NA"
entries. (Use something like
ipch <- as.character(df$ip)
ipch[is.na(df$ip)] <- "..."
connection <- textConnection(ipch)
)
>
> Eddie
>
>
>
>> library(gsubfn)
>> library(gtools)
>> library(rbenchmark)
>>
>> n <- 10000
>> df <- data.frame(
>> a = rnorm(n),
>> b = rnorm(n),
>> c = rnorm(n),
>> ip = replicate(n, paste(sample(255, 4), collapse='.'), simplify=TRUE)
>> )
>>
>> res <- benchmark(columns=c('test', 'elapsed'), replications=10,
> order=NULL,
>> peda = {
>> connection <- textConnection(as.character(df$ip))
>> o <- do.call(order, read.table(connection, sep='.'))
>> close(connection)
>> df[o, ]
>> },
>>
>> peda2 = {
>> connection <- textConnection(as.character(df$ip))
>> dfT <- read.table(connection, sep='.', colClasses=rep("integer",
>> 4), quote="", na.strings=NULL, blank.lines.skip=FALSE)
>> close(connection)
>> o <- do.call(order, dfT)
>> df[o, ]
>> },
>>
>> hb = {
>> ip <- strsplit(as.character(df$ip), split=".", fixed=TRUE)
>> ip <- unlist(ip, use.names=FALSE)
>> ip <- as.integer(ip)
>> dim(ip) <- c(4, nrow(df))
>> ip <- 256^3*ip[1,] + 256^2*ip[2,] + 256*ip[3,] + ip[4,]
>> o <- order(ip)
>> df[o, ]
>> },
>>
>> hb2 = {
>> ip <- strsplit(as.character(df$ip), split=".", fixed=TRUE)
>> ip <- unlist(ip, use.names=FALSE)
>> ip <- as.integer(ip);
>> dim(ip) <- c(4, nrow(df))
>> o <- sort.list(ip[4,], method="radix", na.last=TRUE)
>> for (kk in 3:1) {
>> o <- o[sort.list(ip[kk,o], method="radix", na.last=TRUE)]
>> }
>> df[o, ]
>> }
>> )
>>
>> print(res)
>>
>> test elapsed
>> 1 peda 4.12
>> 2 peda2 4.08
>> 3 hb 0.28
>> 4 hb2 0.25
>>
>>
>> On Sun, May 31, 2009 at 12:42 AM, Wacek Kusnierczyk
>>
>> <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>> > edwin Sendjaja wrote:
>> >> Hi VQ,
>> >>
>> >> Thank you. It works like charm. But I think Peter's code is faster.
> What
>> >> is the difference?
>> >
>> > i think peter's code is more r-elegant, though less generic. here's a
>> > quick test, with not so surprising results. gsubfn is implemented in r,
>> > not c, and it is painfully slow in this test. i also added gabor's
>> > suggestion.
>> >
>> > library(gsubfn)
>> > library(gtools)
>> > library(rbenchmark)
>> >
>> > n = 1000
>> > df = data.frame(
>> > a=rnorm(n),
>> > b = rnorm(n),
>> > c = rnorm(n),
>> > ip = replicate(n, paste(sample(255, 4), collapse='.'),
>> > simplify=TRUE))
>> > benchmark(columns=c('test', 'elapsed'), replications=10, order=NULL,
>> > peda={
>> > connection = textConnection(as.character(df$ip))
>> > o = do.call(order, read.table(connection, sep='.'))
>> > close(connection)
>> > df[o, ] },
>> > waku=df[order(gsubfn(perl=TRUE,
>> > '[0-9]+',
>> > ~ sprintf('%03d', as.integer(x)),
>> > as.character(df$ip))), ],
>> > gagr=df[mixedorder(df$ip), ] )
>> >
>> > # peda 0.070
>> > # waku 7.070
>> > # gagr 4.710
>> >
>> >
>> > vQ
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html and provide commented,
>> > minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide commented,
> minimal,
>> self-contained, reproducible code.
>
>
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list