[R] Importing data from text file with mixed format

delnatan delnatan at gmail.com
Mon Oct 26 18:01:05 CET 2009


All these have been really helpful. Once again I see that anything's possible
in R! 

Thank you for the suggestion Bill, I think arranging the data in one data
frame is a good idea.

-Daniel


William Dunlap wrote:
> 
> 
>> -----Original Message-----
>> From: r-help-bounces at r-project.org 
>> [mailto:r-help-bounces at r-project.org] On Behalf Of delnatan
>> Sent: Saturday, October 24, 2009 8:32 PM
>> To: r-help at r-project.org
>> Subject: [R] Importing data from text file with mixed format
>> 
>> 
>> Hi,
>> I'm having difficulty importing my textfile that looks 
>> something like this:
>> 
>> #begin text file
>> Timepoint 1
>> ObjectNumber     Volume     SurfaceArea
>> 1                      5.3          9.7
>> 2                      4.9          8.3
>> 3                      5.0          9.1
>> 4                      3.5          7.8
>> 
>> Timepoint 2
>> ObjectNumber     Volume     SurfaceArea
>> 1                      5.1          9.0
>> 2                      4.7          8.9
>> 3                      4.3          8.3
>> 4                      4.2          7.9
>> 
>> ... #goes on to Timepoint 80
>> 
>> How would I import this data into a list containing 
>> data.frame for each
>> timepoint?
>> I'd like my data to be organized like this:
>> 
>> >myList
>> [[1]]
>>    ObjectNumber     Volume     SurfaceArea
>> 1  1                      5.3          9.7
>> 2  2                      4.9          8.3
>> 3  3                      5.0          9.1
>> 4  4                      3.5          7.8
>> 
>> [[2]]
>>   ObjectNumber     Volume     SurfaceArea
>> 1 1                      5.1          9.0
>> 2 2                      4.7          8.9
>> 3 3                      4.3          8.3
>> 4 4                      4.2          7.9
> 
> The following function reads that text file into one data.frame,
> which has a Timepoint column, which is a format I usually find
> more convenient.  You can use split(data, data$Timepoint)
> to get to the format you asked for.  If you use the one-data-frame
> format you can use the cast and melt functions from the reshape
> package to rearrange it.
> 
> readMyData <- function (file) {
>     # read every line in the file
>     lines <- readLines(file)
>     # drop empty lines
>     lines <- grep("^[[:space:]]*$", lines, value=TRUE, invert=TRUE)
>     # find and check header lines
>     isHeaderLine <- regexpr("^ObjectNumber", lines) > 0
>     if (sum(isHeaderLine)==0)
>         stop("No header lines of form 'ObjectNumber ...'")
>     if (length(u <- unique(lines[isHeaderLine]))>1)
>         stop("Header lines vary: ", paste(sQuote(head(u)), collapse=",
> "))
>     col.names <- strsplit(lines[which(isHeaderLine)[1]],
> "[[:space:]]+")[[1]]
>     # after making column names from header lines, drop header lines
>     lines <- lines[!isHeaderLine]
>     # process Timepoint lines
>     isTimepointLine <- regexpr("^Timepoint", lines) > 0    
>     if (sum(isTimepointLine)==0)
>         stop("No lines of form 'Timepoint <number>'")
>     timepoints <- sub("^Timepoint[[:space:]]*", "",
> lines[isTimepointLine])
>     timepoints <- as.integer(timepoints)
>     if (any(is.na(timepoints)))
>         stop("Non-integer found in a Timepoint line: ",
>             sQuote(lines[isTimepointLine][which(is.na(timepoints))[1]]))
>     nRowsPerTimepoint <-
> diff(c(which(isTimepointLine),length(isTimepointLine)+1)) - 1
>     # drop Timepoint lines.  Remaining lines should be data lines
>     lines <- lines[!isTimepointLine]
>     # An error in read.table means there were lines we should have
> dropped
>     result <- read.table(header=FALSE,
>         row.names=NULL,
>         col.names=col.names,
>         textConnection(lines))
>     # Add Timepoint column
>     result$Timepoint <- rep(timepoints, nRowsPerTimepoint)
>     result 
> }
> 
> E.g.,
>> data <- readMyData("c:/temp/t.txt")
>> data
>   ObjectNumber Volume SurfaceArea Timepoint
> 1            1    5.3         9.7         1
> 2            2    4.9         8.3         1
> 3            3    5.0         9.1         1
> 4            4    3.5         7.8         1
> 5            1    5.1         9.0         2
> 6            2    4.7         8.9         2
> 7            3    4.3         8.3         2
> 8            4    4.2         7.9         2
>> split(data, data$Timepoint)
> $`1`
>   ObjectNumber Volume SurfaceArea Timepoint
> 1            1    5.3         9.7         1
> 2            2    4.9         8.3         1
> 3            3    5.0         9.1         1
> 4            4    3.5         7.8         1
> 
> $`2`
>   ObjectNumber Volume SurfaceArea Timepoint
> 5            1    5.1         9.0         2
> 6            2    4.7         8.9         2
> 7            3    4.3         8.3         2
> 8            4    4.2         7.9         2
>> mdata <- melt(data, id=c("ObjectNumber","Timepoint"))
>> cast(mdata, Timepoint~variable, fun.aggregate=c,
> subset=variable=="SurfaceArea")
>   Timepoint SurfaceArea_X1 SurfaceArea_X2 SurfaceArea_X3 SurfaceArea_X4
> 1         1            9.7            8.3            9.1            7.8
> 2         2            9.0            8.9            8.3            7.9
>> cast(mdata, ObjectNumber~variable, fun.aggregate=c,
> subset=variable=="SurfaceArea")
>   ObjectNumber SurfaceArea_X1 SurfaceArea_X2
> 1            1            9.7            9.0
> 2            2            8.3            8.9
> 3            3            9.1            8.3
> 4            4            7.8            7.9
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com 
> 
>> 
>> -Daniel
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Importing-data-from-text-file-with-mixed
> -format-tp26045031p26045031.html
>> Sent from the R help mailing list archive at Nabble.com.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Importing-data-from-text-file-with-mixed-format-tp26045031p26063496.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list