[R] Splitting one column value into multiple rows

David Winsemius dwinsemius at comcast.net
Mon Jul 18 06:09:41 CEST 2011


On Jul 17, 2011, at 11:47 PM, David Winsemius wrote:

>
> On Jul 17, 2011, at 11:27 PM, Madana_Babu wrote:
>
>> Hi David,
>>
>> PFB

What ever that TLA means ....

>> the details of my query. Request your help in getting this resolved.
>>
>> # TESTING is my dataset with almost 40K rows.

A small dataset.
>> I am importing this dataset
>> from my local desktop
>>
>> TESTING <- read.table("/Users/madana/Desktop/testing.txt",  
>> header=FALSE,
>> sep="\t", na.strings="", dec=".", strip.white=TRUE)
>
> This is the start of problems. Any text column will come in as a  
> factor.


You should also get in the habit of looking at your data as soon as it  
comes in with str() and summary()

>>
>> TESTING
>>
>> # I tried the following two ways. Let me know if i am using right  
>> syntax.
>>
>> Lines <- readLines(textConnection(data.frame(TESTING$V1)))
>
> You would need to instead use:
>
> Lines <- readLines(textConnection(as.character(TESTING$V1)))

I was lying in bed about to go to sleep and realized that this  
untested strategy was unnecessary (even if it does work which suspect  
it may not.)


  Lines <- as.character(TESTING$V1)  # should be enough.

The goal here it to get a character with which to work.


Good night.

-- 
David,

>
> (Or you could have just read in the entire dataset with rreadLines  
> instead of read.table>)
>
> (Or you could have used read.table with as.is=TRUE or  
> stringsAsFactors = FALSE)
>
> Seekers of advice take heed. Madana_Babu violated the advice in the  
> Posting Guide to include his code in the his two earlier postings.  
> Those of use who make efforts at offering advice are unable to read  
> minds.
>
>>
>> # Error message is:
>> Error in textConnection(data.frame(TESTING$V1)) : invalid 'text'  
>> argument
>>
>> Lines <- readLines(textConnection(data.frame("TESTING", header=FALSE,
>> sep="\t", na.strings="", dec=".", strip.white=TRUE)))
>>
>> # Error message is:
>> Error in textConnection(data.frame("TESTING", header = FALSE, sep =  
>> "\t",  :
>> argument 'object' must deparse to a single character string
>>
>> closeAllConnections()
>> newlines <- strsplit(Lines, ":")
>>
>> # Error message is:
>> Error in strsplit(Lines, ":") : non-character argument
>>
>> newlines2 <- unlist(newlines)
>>
>>
>> cleaned_data <- read.table(textConnection(newlines2), sep=",")
>>
>> # Error message is:
>> Error in textConnection(newlines2) : invalid 'text' argument
>>
>> My machine Config is: Dual Core.
>
> I doubt that makes any difference, and furthermore it does not temm  
> me your OS or your version of R which in some cases does made a  
> difference, but again I think it was the default stringsAsFactors  
> setting, which is a universal pitfall..
>
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list