[R] Writing a function to return column position XXXX
R. Michael Weylandt
michael.weylandt at gmail.com
Wed Jan 25 02:16:38 CET 2012
I think you are getting stuck on the same regexp problem as before
(i.e., once again the dollar sign is being interpreted as the
beginning of the line rather than an actual dollar sign)
If I understand your question, might I suggest something much easier?
x = data.frame(a = c("$1034.23","1,230"), b = c(4,5))
sapply(x, function(x) as.numeric(gsub("[\\$,]","",x)))
That is, go by each column of the data frame and replace anything
that's either a literal dollar sign or a comma with empty space (i.e.,
remove it) and then convert the result to numeric. If it's already
numeric, this will simply return it unaltered so I think it's safe to
apply to each row.
M
On Tue, Jan 24, 2012 at 11:07 AM, Dan Abner <dan.abner99 at gmail.com> wrote:
> Hi everyone,
>
> I am using Michael's approach (grepl()) to identify which columns
> containing $ signs. I was hoping to incorporate this into a line of
> code that would automatically 1) find which columns contain $ signs,
> 2) strip the $ and commas, and 3) convert the result to a numeric
> vector.
>
> I have the following:
>
> col.id<-function(x) any(grepl("\\$",x))
>
> cand2[which(sapply(cand2,col.id))]<-
> as.numeric(gsub("[$,]","",cand2[which(sapply(cand2,col.id))]))
>
> However, I am doing something wrong: while the code correctly
> identifies the columns containing $ signs, it also returns ALL NA for
> those columns.
>
> See my initial message for this thread for example data.
>
> Any assistance is appreciated.
>
> Thanks!
>
> Dan
>
>
> On Tue, Jan 24, 2012 at 9:04 AM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>> Either
>>
>> any(grepl("$",x, fixed = TRUE)) # You probably want grepl not grep
>> any(grepl("\\$",x) )
>> ? regexpr # $ has a special value
>>
>> Michael
>>
>> PS -- Stop with HTML postings (seriously, it actually does mess up
>> what the rest of us see and I think it causes trouble for the archives
>> as well)
>>
>> On Tue, Jan 24, 2012 at 8:49 AM, Dan Abner <dan.abner99 at gmail.com> wrote:
>>> Hello everyone,
>>>
>>> I am writing my own function to return the column index of all variables
>>> (these are currently character vectors) in a data frame that contain a
>>> dollar sign($). A small piece of the data look like this:
>>>
>>> can_sta can_zip ind_ite_con ind_uni_con AL 36106 $251,895.80 $22,874.43
>>> AL 35802 $141,373.60 $7,100.00 AL 35201 $273,208.50 $18,193.66 AR
>>> 72404 $186,918.00
>>> $25,391.00 AR 72217 $451,127.00 $27,255.23 AR 7.28E+08 $58,336.22 $5,293.82
>>>
>>>
>>> So far I have:
>>>
>>>
>>> col.id<-function(x) any(grep("$",x))
>>> sapply(cand2,col.id)
>>>
>>> However, this returns TRUE for all columns (even those that do not contain
>>> the $).
>>>
>>> Any assistance is appreciated.
>>>
>>> Thank you,
>>>
>>> Dan
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list