[R] Creating a data frame from scratch

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Tue May 24 21:38:15 CEST 2016


I would probably write the function something like this:


t_count_na <- function(dataset,
                       variables = "all") {
  if (identical(variables, "all")) {
    variable_list <- names(dataset)
  }  else {
    variable_list <- variables
  }  
  apply(dataset[,variable_list], 1, function(x) sum(is.na(x)))
}


Hope this is helpful,

Dan

Daniel Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services


> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
> G.Maubach at gmx.de
> Sent: Tuesday, May 24, 2016 11:55 AM
> To: r-help at r-project.org
> Subject: [R] Creating a data frame from scratch
> 
> Hi All,
> 
> I need to create a data frame from scratch and fill variables created on the fly
> with values. What I have so far:
> 
> -- schnipp --
> 
> # Example dataset
> gene <-
> c("ENSG00000208234","ENSG00000199674","ENSG00000221622","ENSG00000
> 207604",
> 
> "ENSG00000207431","ENSG00000221312","ENSG00134940305","ENSG0039403
> 9490",
>   "ENSG09943004048")
> hsap <- c(0,0,0, 0, 0, 0, 1,1, 1)
> mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA)
> mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1) rnor <- c(NA,2 ,NA, 1 , NA, 3 ,
> NA,NA, 2) cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA)
> 
> ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam)
> ds_example$gene <- as.character(ds_example$gene)
> 
> t_count_na <- function(dataset,
>                        variables = "all")
>   # credit: http://stackoverflow.com/questions/4862178/remove-rows-with-
> nas-in-data-frame
>   {
>   ds_na <- data.frame()
>   # if variables = "all" create character vector of variable names
>   if (variables == "all") {
>     variable_list <- dimnames(dataset)[[ 2 ]]
>   }
>   # if a character vector with variable names is given
>   # to run the function on a defined set of selected variables
>   else {
>     variable_list <- variables
>   }
> 
>   for (var in variable_list) {
>     new_name <- paste0("na_", var)
>     ds_na[[ new_name ]] <- as.data.frame(is.na(dataset[[ var ]]))
>   }
> 
>   ds_na[[ "na_count" ]] <- rowSums(ds_na)
>   return(ds_na)
> }
> 
> test <- t_count_na(dataset = ds_example, variables = c("mmul", "mmus"))
> 
> -- schnipp --
> 
> gives:
> 
>  Error in `[[<-.data.frame`(`*tmp*`, new_name, value =
> list(`is.na(dataset[[var]])` = c(TRUE,  :
>   replacement has 9 rows, data has 0 In addition: Warning message:
> In if (variables == "all") { :
>   the condition has length > 1 and only the first element will be used
> 
> My goal is to create a dataset from scratch on the fly which has the same
> amount of variables as the dataset ds_example plus a single variable storing
> the amount of NA's in a row for the given variables. This is the basis for a
> decious which cases to keep and which to drop.
> 
> I do not want to alter the base dataset like ds_example in the first place nor
> do I want to make a copy of the existing dataset due to memory allocation.
> The function shall also work with big data, e. g. datasets with more than 1 GB
> memory consumption.
> 
> I also do not want the newly created variables to be stored in the original
> data frame. They shall be separate.
> 
> A former similar solution worked:
> http://r.789695.n4.nabble.com/Creating-variables-on-the-fly-td4720034.html
> 
> Why doesn't this one?
> 
> How do I create the variables within the data frame if the data frame is
> empty?
> 
> Kind regards
> 
> Georg Maubach
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list