[Rd] paste strings in C

Michael Lawrence lawrence.michael at gene.com
Tue Jun 27 16:31:41 CEST 2017


To do this in C, it would probably be easier and faster to just do the
string manipulation directly. Luckily, there are already packages that
have done this for you. See an example below using the S4Vectors
package.

foo2 <- function(mymat, colnms, tilde=FALSE) {
    chars <- colnms[col(mymat)]
    lowerChars <- if (tilde) paste0("~", chars) else tolower(chars)
    chars <- ifelse(mymat==1L, lowerChars, chars)
    keep <- mymat > 0L
    charList <- split(chars[keep], row(chars)[keep])
    S4Vectors::unstrsplit(charList, "*")
}

Or perhaps more efficient using the compressed lists of IRanges:

foo3 <- function(mymat, colnms, tilde=FALSE) {
    mymat <- t(mymat)
    chars <- colnms[row(mymat)]
    lowerChars <- if (tilde) paste0("~", chars) else tolower(chars)
    chars <- ifelse(mymat==1L, lowerChars, chars)
    keep <- mymat > 0L
    charList <- IRanges::splitAsList(chars[keep], col(chars)[keep])
    S4Vectors::unstrsplit(charList, "*")
}

Michael

On Tue, Jun 27, 2017 at 5:29 AM, Adrian Dușa <dusa.adrian at unibuc.ro> wrote:
> Dear R-devs,
>
> Below is a small example of what I am trying to achieve, that is trivial in
> R and I would like to learn how to do in C, for very large matrices:
>
>> (mymat <- matrix(c(1,0,0,2,2,1), nrow = 2))
>      [,1] [,2] [,3]
> [1,]    1    0    2
> [2,]    0    2    1
>
> And I would like to produce:
> [1] "a*C" "B*c"
>
>
> Which can be trivially done in R via something like:
>
> foo <- function(mymat, colnms, tilde = FALSE) {
>     apply(mymat, 1, function(x) {
>         if (tilde) {
>             colnms[x == 1] <- paste0("~", colnms[x == 1])
>         } else {
>             colnms[x == 1] <- tolower(colnms[x == 1])
>         }
>         paste(colnms[x > 0], collapse = "*")
>     })
> }
>
>> foo(mymat, LETTERS[1:3])
> [1] "a*C" "B*c"
>
>> foo(mymat, LETTERS[1:3], tilde = TRUE)
> [1] "~A*C" "B*~C"
>
>
> I know that strings in C are far from trivial (encodings being one
> important issue), and this is the sort of thing much easier to do in R. On
> the other hand I found that, for a large matrix of say 1 million rows and
> 25 columns, setting the rownames of colnames in R copies the matrix and
> costs a lot of memory and time in the process.
>
> Having all necessary headers in C, the solution I came up with involves
> calling the function foo() from within C:
>
> SEXP test(SEXP mymat, SEXP colnms, SEXP tilde) {
>
>     SEXP call = PROTECT(LCONS(install("foo"),
>                         LCONS(mymat,
>                         LCONS(colnms,
>                         LCONS(tilde, R_NilValue)))));
>
>     SEXP out = PROTECT(eval(call, R_GlobalEnv));
>
>     UNPROTECT(2);
>     return(out);
> }
>
>
> After compilation, say in a file called test.c, back in R I get:
>
>> dyn.load("test.so")
>
>> .Call("test", mymat, LETTERS[1:3], FALSE)
> [1] "a*C" "B*c"
>
>> .Call("test", mymat, LETTERS[1:3], TRUE)
> [1] "~A*C" "B*~C"
>
>
> In my real situation, the matrix I am working on is produced in the C code
> (and it's much larger).
> I don't know for sure, when calling the R function foo(), if the matrix is
> copied: if not, this might be the best solution for me.
>
> Otherwise I know there is a function do_paste() in C, and wondered whether
> I could use that directly instead of calling R from C.
>
> I hope this explains what I would like to do, many thanks in advance for
> any hint,
> Adrian
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr. 90-92
> 050663 Bucharest sector 5
> Romania
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list