[R] dataframe: string operations on columns

Niels Richard Hansen Niels.R.Hansen+lists at math.ku.dk
Wed Jan 19 01:42:53 CET 2011


> On 2011-01-18 08:14, Ivan Calandra wrote:
>> Hi,
>>
>> I guess it's not the nicest way to do it, but it should work for you:
>>
>> #create some sample data
>> df<- data.frame(a=c("A B", "C D", "A C", "A D", "B D"),
>> stringsAsFactors=FALSE)
>> #split the column by space
>> df_split<- strsplit(df$a, split=" ")
>>
>> #place the first element into column a1 and the second into a2
>> for (i in 1:length(df_split[[1]])){
>>    df[i+1]<- unlist(lapply(df_split, FUN=function(x) x[i]))
>>    names(df)[i+1]<- paste("a",i,sep="")
>> }
>>
>> I hope people will give you more compact solutions.
>> HTH,
>> Ivan
>>
> You can replace the loop with
>
>  df <- transform(df, a1 = sapply(df_split, "[[", 1),
>                      a2 = sapply(df_split, "[[", 2))

df <- cbind(df, do.call(rbind, df_split)

seems to do the same (up to column names) but faster. However,
all the solutions rely on there being exactly two strings when
you split. The different solutions behave differently if this
assumption is violated and none of them really checks this. You
can, for instance, check this with all(sapply(df_split, length) == 2)

Best, Niels R. Hansen

>
> Peter Ehlers
>
>>
>>
>> Le 1/18/2011 16:30, boris pezzatti a écrit :
>>>
>>> Dear all,
>>> how can I perform a string operation like strsplit(x," ")  on a column
>>> of a dataframe, and put the first or the second item of the split into
>>> a new dataframe column?
>>> (so that on each row it is consistent)
>>>
>>> Thanks
>>> Boris
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list