[R] Convert COLON separated format
Rui Barradas
ruipbarradas at sapo.pt
Tue Oct 9 07:28:07 CEST 2012
Hello,
Here's a function that doesn't do it all but might help.
fun <- function(x){
x1 <- unlist(strsplit(x, " "))
x2 <- x1[nchar(x1) > 0]
i <- as.integer(x2[1])
x3 <- unlist(strsplit(x2[-1], ":"))
j <- as.integer(x3[rep(c(TRUE, FALSE), length(x3)/2)])
y <- numeric(max(j))
y[j] <- as.numeric(x3[rep(c(FALSE, TRUE), length(x3)/2)])
list(row = i, line = y)
}
x <- "1 5:1 27:3 345:10"
fun(x)
If you know that your labels, i.e., row numbers are consecutive, have
the function return just 'y', not a list.
Then use readLines to read the file in and lapply fun to it. Something like
ln <- readLines(filename)
lst <- lapply(ln, fun)
Then you'll have another problem. The lines' lengths. They shouldn't be
all the same, so in order to make a data.frame or matrix you'll need
extra work. Try the code above and say whether it's on the right track.
Also, take a look at package Matrix. It's a recommended package and it
implements sparse matrices.
Hope this helps,
Rui Barradas
Em 09-10-2012 05:56, Noah Silverman escreveu:
> I have a bunch of data sets that were created for the libsvm tool. They are in "colon separated sparse format".
>
> i.e.
>
> 1 5:1 27:3 345:10
>
> Is a row with the label of "1" and only has values in columns 5, 27, and 345.
>
> I want to read these into a data.frame in R.
>
> Is there a simple way to do this?
>
> --
> Noah Silverman, M.S.
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list