[Bioc-devel] showAsCell, character feature request [Re: Possible bug in showAsCell, character]

Laurent Gatto |@urent@g@tto @end|ng |rom uc|ouv@|n@be
Tue May 7 00:03:02 CEST 2019


Ok, so it appears I have posted a bug of my own and a fix in the same email. Trying to be positive and turning this into an opportunity, I will change it into a feature request. 

The current showAsCell,ANY implementation results in the following annoyance:

> x <- DataFrame(x = c('A veeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeery loooooooooooooooooooooooooooooong sting', 'A short string'))
> x
DataFrame with 2 rows and 1 column
                                                                                   x
                                                                         <character>
1 A veeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeery loooooooooooooooooooooooooooooong sting
2                                                                     A short string

This isn't an unusual situation with tables that contain protein/gene descriptions. 

Would the S4Vector maintainers consider a showAsCell,character method along the lines of 

setMethod("showAsCell", "character",
          function (object) {
              n <- 10
              sapply(object, function(x) {
                  if (!is.na(x) & nchar(x) & nchar(x) > n)
                      paste0(paste(strsplit(x, "")[[1]][1:n], collapse = ""),
                             "...")
                  else x
              }, USE.NAMES = FALSE)
          })

to show 

> x
DataFrame with 2 rows and 1 column
              x
    <character>
1 A veeeeeee...
2 A short st...

Best wishes,

Laurent

> sessionInfo()
R version 3.6.0 RC (2019-04-21 r76417)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] S4Vectors_0.22.0    BiocGenerics_0.30.0

loaded via a namespace (and not attached):
[1] BiocManager_1.30.4 compiler_3.6.0     tools_3.6.0 




________________________________________
From: Laurent Gatto
Sent: 05 May 2019 04:36
To: bioc-devel using r-project.org
Subject: Possible bug in showAsCell,character

Example code to reproduce the bug report:

> DataFrame(a = 'foo', b = NA_character_)
DataFrame with 1 row and 2 columns
Error in if (nchar(x) > n) paste0(paste(strsplit(x, "")[[1]][1:n], collapse = ""),  (from reduce.R#6) :
  missing value where TRUE/FALSE needed

Suggested patch:

setMethod("showAsCell", "character",
          function (object) {
              n <- 10
              sapply(object, function(x) {
                  if (!is.na(x) & nchar(x) & nchar(x) > n)
                      paste0(paste(strsplit(x, "")[[1]][1:n], collapse = ""),
                             "...")
                  else x
              })
          })


With patch:

> DataFrame(a = 'foo', b = NA_character_)
DataFrame with 1 row and 2 columns
               a           b
     <character> <character>
foo        foo          NA


Best wishes,

Laurent



More information about the Bioc-devel mailing list