[R] Splitting one column value into multiple rows
David Winsemius
dwinsemius at comcast.net
Mon Jul 18 05:47:00 CEST 2011
On Jul 17, 2011, at 11:27 PM, Madana_Babu wrote:
> Hi David,
>
> PFB the details of my query. Request your help in getting this
> resolved.
>
> # TESTING is my dataset with almost 40K rows. I am importing this
> dataset
> from my local desktop
>
> TESTING <- read.table("/Users/madana/Desktop/testing.txt",
> header=FALSE,
> sep="\t", na.strings="", dec=".", strip.white=TRUE)
This is the start of problems. Any text column will come in as a factor.
>
> TESTING
>
> # I tried the following two ways. Let me know if i am using right
> syntax.
>
> Lines <- readLines(textConnection(data.frame(TESTING$V1)))
You would need to instead use:
Lines <- readLines(textConnection(as.character(TESTING$V1)))
(Or you could have just read in the entire dataset with rreadLines
instead of read.table>)
(Or you could have used read.table with as.is=TRUE or stringsAsFactors
= FALSE)
Seekers of advice take heed. Madana_Babu violated the advice in the
Posting Guide to include his code in the his two earlier postings.
Those of use who make efforts at offering advice are unable to read
minds.
>
> # Error message is:
> Error in textConnection(data.frame(TESTING$V1)) : invalid 'text'
> argument
>
> Lines <- readLines(textConnection(data.frame("TESTING", header=FALSE,
> sep="\t", na.strings="", dec=".", strip.white=TRUE)))
>
> # Error message is:
> Error in textConnection(data.frame("TESTING", header = FALSE, sep =
> "\t", :
> argument 'object' must deparse to a single character string
>
> closeAllConnections()
> newlines <- strsplit(Lines, ":")
>
> # Error message is:
> Error in strsplit(Lines, ":") : non-character argument
>
> newlines2 <- unlist(newlines)
>
>
> cleaned_data <- read.table(textConnection(newlines2), sep=",")
>
> # Error message is:
> Error in textConnection(newlines2) : invalid 'text' argument
>
> My machine Config is: Dual Core.
I doubt that makes any difference, and furthermore it does not temm me
your OS or your version of R which in some cases does made a
difference, but again I think it was the default stringsAsFactors
setting, which is a universal pitfall..
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list