[Rd] paste strings in C

Adrian Dușa dusa.adrian at unibuc.ro
Tue Jun 27 14:29:49 CEST 2017


Dear R-devs,

Below is a small example of what I am trying to achieve, that is trivial in
R and I would like to learn how to do in C, for very large matrices:

> (mymat <- matrix(c(1,0,0,2,2,1), nrow = 2))
     [,1] [,2] [,3]
[1,]    1    0    2
[2,]    0    2    1

And I would like to produce:
[1] "a*C" "B*c"


Which can be trivially done in R via something like:

foo <- function(mymat, colnms, tilde = FALSE) {
    apply(mymat, 1, function(x) {
        if (tilde) {
            colnms[x == 1] <- paste0("~", colnms[x == 1])
        } else {
            colnms[x == 1] <- tolower(colnms[x == 1])
        }
        paste(colnms[x > 0], collapse = "*")
    })
}

> foo(mymat, LETTERS[1:3])
[1] "a*C" "B*c"

> foo(mymat, LETTERS[1:3], tilde = TRUE)
[1] "~A*C" "B*~C"


I know that strings in C are far from trivial (encodings being one
important issue), and this is the sort of thing much easier to do in R. On
the other hand I found that, for a large matrix of say 1 million rows and
25 columns, setting the rownames of colnames in R copies the matrix and
costs a lot of memory and time in the process.

Having all necessary headers in C, the solution I came up with involves
calling the function foo() from within C:

SEXP test(SEXP mymat, SEXP colnms, SEXP tilde) {

    SEXP call = PROTECT(LCONS(install("foo"),
                        LCONS(mymat,
                        LCONS(colnms,
                        LCONS(tilde, R_NilValue)))));

    SEXP out = PROTECT(eval(call, R_GlobalEnv));

    UNPROTECT(2);
    return(out);
}


After compilation, say in a file called test.c, back in R I get:

> dyn.load("test.so")

> .Call("test", mymat, LETTERS[1:3], FALSE)
[1] "a*C" "B*c"

> .Call("test", mymat, LETTERS[1:3], TRUE)
[1] "~A*C" "B*~C"


In my real situation, the matrix I am working on is produced in the C code
(and it's much larger).
I don't know for sure, when calling the R function foo(), if the matrix is
copied: if not, this might be the best solution for me.

Otherwise I know there is a function do_paste() in C, and wondered whether
I could use that directly instead of calling R from C.

I hope this explains what I would like to do, many thanks in advance for
any hint,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania

	[[alternative HTML version deleted]]



More information about the R-devel mailing list