[R] Dropping columns from data frame
David Winsemius
dwinsemius at comcast.net
Fri Jan 6 18:19:18 CET 2012
On Jan 6, 2012, at 11:43 AM, Mike Harwood wrote:
> Thank you, David. I was merely using "head" to limit the code/
> output. My question remains, because a created data frame has the
> same columns as was output from "head":
>
>> head(orig.df,3)
> num1.10 num11.20 lc1.10 lc11.20 uc1.10 uc11.20
> 1 1 11 a k A K
> 2 2 12 b l B L
> 3 3 13 c m C M
>> # Illustration 1: contiguous columns at beginning of data frame
>> head(orig.df[,-c(1:3)],2)
> lc11.20 uc1.10 uc11.20
> 1 k A K
> 2 l B L
>> new.df <- orig.df[,-c(1:3)]
>> head(new.df,2)
> lc11.20 uc1.10 uc11.20
> 1 k A K
> 2 l B L
>>
>> # Illustration 2: non-contiguous columns
>> head(orig.df[,-c(1,3,5)],2)
> num11.20 lc11.20 uc11.20
> 1 11 k K
> 2 12 l L
>> new.df <- orig.df[,-c(1,3,5)]
>> head(new.df,2)
> num11.20 lc11.20 uc11.20
> 1 11 k K
> 2 12 l L
I guess my short attention span got the better of me. (But calling
them "unary errors" was somewhat cryptic and not a particularly
helpful description of what you were actually seeing.) Here are more
constructive responses:
Negative indexing is not accepted for character vectors, so you need
to convert to either numeric or logical and then "negativize":
orig.df[ !names(orig.df) %in% c('num1.10', 'lc1.10', 'uc1.10')]
These are equivalent:
orig.df[ , !names(orig.df) %in% c('num1.10', 'lc1.10', 'uc1.10')]
orig.df[,-match( c("num1.10", "lc1.10", "uc1.10"), names(orig.df))]
orig.df[ , -sapply(c('num1.10', 'lc1.10', 'uc1.10'), grep,
x=names(orig.df)) ]
And when there is a pattern, such as with your not wanting any of the .
10" names, then grep can be quite efficient:
orig.df[ , -grep(".10", names(orig.df), fixed=TRUE)]
--
David
>
>
>
>
> On Jan 6, 9:49 am, David Winsemius <dwinsem... at comcast.net> wrote:
>> On Jan 6, 2012, at 10:00 AM, Mike Harwood wrote:
>>
>>> How does R do it, and should I ever be worried? I always remove
>>> columns by index, and it works exactly as I would naively expect -
>>> but
>>> HOW? The second illustration, which deletes non contiguous columns,
>>> represents what I do all the time and have some trepidation about
>>> because I don't know the mechanics (e.g. why doesn't the column
>>> formerly-known-as-4 become 3 after column 1 is dropped: doesn't
>>> vector
>>> removal from a df/list invoke a loop in C?).
>>
>> You are NOT "removing columns". You are returning (to `head` and then
>> to `print`) an extract from the dataframe, but that does not change
>> the original dataframe. To effect a change you would need to assign
>> the value back to the same name as the original daatframe.
>>
>> --
>> David
>>
>>> Can I delete a named
>>> list of columns, which are examples 4 and 5 and which generate the
>>> "unary error' mesages, without resorting to "orig.df$num1.10 <-
>>> NULL"?
>>
>>> Thanks!
>>
>>> orig.df <- data.frame(cbind(
>>> 1:10
>>> ,11:20
>>> ,letters[1:10]
>>> ,letters[11:20]
>>> ,LETTERS[1:10]
>>> ,LETTERS[11:20]
>>> ))
>>> names(orig.df) <- c(
>>> 'num1.10'
>>> ,'num11.20'
>>> ,'lc1.10'
>>> ,'lc11.20'
>>> ,'uc1.10'
>>> ,'uc11.20'
>>> )
>>> # Illustration 1: contiguous columns at beginning of data frame
>>> head(orig.df[,-c(1:3)])
>>
>>> # Illustration 2: non-contiguous columns
>>> head(orig.df[,-c(1,3,5)])
>>
>>> # Illustration 3: contiguous columns at end of data frame
>>> head(orig.df[,-c(4:6)]) ## as expected
>>
>>> # Illustrations 4-5: unary errors
>>> head(orig.df[,-c(as.list('num1.10', 'lc1.10', 'uc1.10'))])
>>> head(orig.df[,-c('num1.10', 'lc1.10', 'uc1.10')])
>>
>>> Mike
>>
>>> ______________________________________________
>>> R-h... at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/
>> listinfo/r-help
>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list