[R] select partial name and full name columns

Wed Jan 9 18:09:13 CET 2013

Hi,
You can use the same code:
set.seed(15)
 dat1<-data.frame(sample(1:10,5,replace=TRUE),sample(20:30,5,replace=TRUE),sample(1:15,5,replace=TRUE),sample(1:8,5,replace=TRUE),datetime=as.POSIXct(paste(rep("6/3/2011",5),c("0:00","0:30","0:35","0:40","0:45")),format="%m/%d/%Y %H:%M"))

 colnames(dat1)[1:4]<-c("01_00060_00003","01_000060_00003_cd","15_000060_00003","15_00060")

dat1
#  01_00060_00003 01_000060_00003_cd 15_000060_00003 15_00060
#1              7                 30               2        7
#2              2                 28              10        4
#3             10                 22               8        8
#4              7                 27              11        2
#5              4                 29              13        7
  #           datetime
#1 2011-06-03 00:00:00
#2 2011-06-03 00:30:00
#3 2011-06-03 00:35:00
#4 2011-06-03 00:40:00
#5 2011-06-03 00:45:00

dat1[,c("datetime",colnames(dat1)[grep("00060_00003",colnames(dat1))])]
#             datetime 01_00060_00003 01_000060_00003_cd 15_000060_00003
#1 2011-06-03 00:00:00              7                 30               2
#2 2011-06-03 00:30:00              2                 28              10
#3 2011-06-03 00:35:00             10                 22               8
#4 2011-06-03 00:40:00              7                 27              11
#5 2011-06-03 00:45:00              4                 29              13

A.K.
________________________________
From: Irucka Embry <iruckaE at mail2world.com>
To: smartpink111 at yahoo.com 
Cc: r-help at r-project.org 
Sent: Wednesday, January 9, 2013 11:36 AM
Subject: Re: [R] select partial name and full name columns

Hi Arun, thank-you for your suggestion.

I made a mistake previously when I suggested that there was a "prefix" in front of "00060_00003" possibly suggesting that it was a string of characters rather than numbers. The "prefix" in front of "00060_00003" is actually two numbers, see the examples below:

01_00060_00003 01_00060_00003_cd 15_00060_00003 15_00060_00003_cd 02_00060_00003 02_00060_00003_cd

How can the following code be modified to reflect the numerical rather than character prefix? 

dat1[,c("datetime",colnames(dat1)[grep("00060_00003",colnames(dat1))])]

Thank-you.

Irucka Embry

<-----Original Message-----> 
>From: arun [smartpink111 at yahoo.com]
>Sent: 1/9/2013 7:13:05 AM
>To: iruckaE at mail2world.com
>Cc: r-help at r-project.org
>Subject: Re: [R] select partial name and full name columns
>
>
>
>Hi,
>
>May be this is creating the problem:
>
>set.seed(15)
>dat1<-data.frame(A_00060_00003=sample(1:10,5,replace=TRUE),B_00060_00003_cd=sample(20:30,5,replace=TRUE),C_00060_00003=sample(1:15,5,replace=TRUE),D_00060=sample(1:8,5,replace=TRUE),datetime=as.POSIXct(paste(rep("6/3/2011",5),c("0:00","0:30","0:35","0:40","0:45")),format="%m/%d/%Y
>%H:%M"))
> dat1[,c("datetime",grep("00060_00003",colnames(dat1)))]
>#Error in `[.data.frame`(dat1, , c("datetime", grep("00060_00003",
>colnames(dat1)))) : 
>  #undefined columns selected
>dat1[,c("datetime",colnames(dat1)[grep("00060_00003",colnames(dat1))])]
>#             datetime A_00060_00003 B_00060_00003_cd C_00060_00003
>#1 2011-06-03 00:00:00             7               30             2
>#2 2011-06-03 00:30:00             2               28            10
>#3 2011-06-03 00:35:00            10               22             8
>#4 2011-06-03 00:40:00             7               27            11
>#5 2011-06-03 00:45:00             4               29            13
>A.K.
>
>
>
>----- Original Message -----
>From: Irucka Embry <iruckaE at mail2world.com>
>To: r-help at r-project.org
>Cc: 
>Sent: Wednesday, January 9, 2013 5:44 AM
>Subject: [R] select partial name and full name columns
>
>Hi, I have the following function:
>
>getDataFromDVFileCustom <- function (file, hasHeader = TRUE, separator =
>"\t") 
>{
>DVdatatmp <- as.matrix(read.table(file, sep = "\t", fill = TRUE,
>comment.char = "#", as.is = TRUE, stringsAsFactors = FALSE, na.strings =
>"NA"))
>DVdatatmper <- as.matrix(DVdatatmp[ , c("datetime",
>grep("^_00060_00003", colnames(DVdatatmp)))])
>retval <- as.data.frame(DVdatatmper, colClasses = c("character"), fill =
>TRUE, comment.char = "#", stringsAsFactors = FALSE)
>if (ncol(retval) == 2) {
>names(retval) <- c("dateTime", "value")
>}
>else if (ncol(retval) == 3) {
>names(retval) <- c("dateTime", "value", "code")
>}
>if (dateFormatCheck(retval$dateTime)) {
>retval$dateTime <- as.Date(retval$dateTime)
>}
>else {
>retval$dateTime <- as.Date(retval$dateTime, format = "%m/%d/%Y")
>}
>retval$value <- as.numeric(retval$value)
>return(retval)
>}
>
>The function gives me this error:
>getDataFromDVFileCustom(file)
>Error in as.matrix(DVdatatmp[, c("datetime", grep("^_00060_00003",
>colnames(DVdatatmp)))]) : 
>subscript out of bounds
>
>I am trying to only select 3 columns (datetime and then two partial name
>columns that end in 00060_00003 and 00060_00003_cd. Each file that I
>will be reading into the function has a different number of columns and
>a different prefix in front of 00060_00003 and 00060_00003_cd. I have
>searched online and tried those possible solutions, but they did not
>work for my function and data.
>
>What is the best way to select those 3 columns only?
>
>Thank-you.
>
>Irucka Embry 
>
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code. _______________________________________________________________
Get the Free email that has everyone talking at http://www.mail2world.com
Unlimited Email Storage – POP3 – Calendar – SMS – Translator – Much More!