[R] queer data set

Tony Plate tplate at acm.org
Tue Aug 16 00:40:05 CEST 2005


Here's one way of working with the data you gave:

 > x <- read.table(file("clipboard"), fill=T, header=T)
 > x
   HEADER1 HEADER2 HEADER3           HEADER3.1
1      A1      B1      C1         X11;X12;X13
2      A2      B2      C2 X21;X22;X23;X24;X25
3      A3      B3      C3
4      A4      B4      C4         X41;X42;X43
5      A5      B5      C5                 X51
 > apply(x, 1, function(x) strsplit(x[4], ";")[[1]])
$"1"
[1] "X11" "X12" "X13"

$"2"
[1] "X21" "X22" "X23" "X24" "X25"

$"3"
character(0)

$"4"
[1] "X41" "X42" "X43"

$"5"
[1] "X51"

 > do.call("rbind", apply(x, 1, function(x) {
+    y <- strsplit(x[4], ";")[[1]]
+    x3 <- matrix(x[1:3], ncol=3, nrow=max(1,length(y)), byrow=T)
+    return(cbind(x3, if (length(y)) y else "NA"))
+ }))
       [,1] [,2] [,3] [,4]
  [1,] "A1" "B1" "C1" "X11"
  [2,] "A1" "B1" "C1" "X12"
  [3,] "A1" "B1" "C1" "X13"
  [4,] "A2" "B2" "C2" "X21"
  [5,] "A2" "B2" "C2" "X22"
  [6,] "A2" "B2" "C2" "X23"
  [7,] "A2" "B2" "C2" "X24"
  [8,] "A2" "B2" "C2" "X25"
  [9,] "A3" "B3" "C3" "NA"
[10,] "A4" "B4" "C4" "X41"
[11,] "A4" "B4" "C4" "X42"
[12,] "A4" "B4" "C4" "X43"
[13,] "A5" "B5" "C5" "X51"
 >

This of course is a matrix; you can convert it back to a dataframe using 
as.data.frame() if you desire.  Use either "NA" (with quotes) or NA 
(without quotes) to control whether you get just the string "NA" or an 
actual character NA value in column 4.  If you're processing a huge 
amount of data, you can probably do better by rewriting the above code 
to avoid implicit coercions of data types.

hope this helps,

Tony Plate

S.O. Nyangoma wrote:
> I have a dataset that is basically structureless. Its dimension varies 
> from row to row and sep(s) are a mixture of tab and semi colon (;) and 
> example is
> 
> HEADER1 HEADER2 HEADER3   HEADER3
> A1       B1      C1       X11;X12;X13
> A2       B2      C2       X21;X22;X23;X24;X25
> A3       B3      C3       
> A4       B4      C4       X41;X42;X43
> A5       B5      C5       X51
> 
> etc., say. Note that a blank under HEADER3 corresponds to non 
> occurance and all semi colon (;) delimited variables are under 
> HEADER3. These values run into tens of thousands. I want to give some 
> order to this queer matrix to something like:
> 
> HEADER1 HEADER2 HEADER3   HEADER3
> A1       B1      C1       X11
> A1       B1      C1       X12
> A1       B1      C1       X13
> A1       B1      C1       X14
> A2       B2      C2       X21
> A2       B2      C2       X22
> A2       B2      C2       X23
> A2       B2      C2       X24
> A2       B2      C2       X25
> A2       B2      C2       X26
> A3       B3      C3       NA
> A4       B4      C4       X41
> A4       B4      C4       X42
> A4       B4      C4       X43
> 
> Is there a brilliant R-way of doing such task?
> 
> Goodday. Stephen.
> 
> 
> 
> 
> 
> 
> 
> 
> ----- Original Message -----
> From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
> Date: Monday, August 15, 2005 11:13 pm
> Subject: Re: [R] How to get a list work in RData file
> 
> 
>>On Mon, 15 Aug 2005, Xiyan Lon wrote:
>>
>>
>>>Dear R-Helper,
>>
>>(There are quite a few of us.)
>>
>>
>>>I want to know how I get a list  work which I saved in RData 
>>
>>file. For
>>
>>>example,
>>
>>I don't understand that at all, but it looks as if you want to 
>>save an 
>>unevaluated call, in which case see ?quote and use something like
>>
>>xyadd <- quote(test.xy(x=2, y=3))
>>
>>load and saving has nothing to do with this: it doesn't change the 
>>meaning 
>>of objects in the workspace.
>>
>>
>>>>test.xy <- function(x,y) {
>>>
>>>+    xy <- x+y
>>>+    xy
>>>+ }
>>>
>>>>xyadd <- test.xy(x=2, y=3)
>>>>xyadd
>>>
>>>[1] 5
>>>
>>>>x1 <- c(2,43,60,8)
>>>>y1 <- c(91,7,5,30)
>>>>
>>>>xyadd1 <- test.xy(x=x1, y=y1)
>>>>xyadd1
>>>
>>>[1] 93 50 65 38
>>>
>>>>save(list = ls(all=TRUE), file = "testxy.RData")
>>>>rm(list=ls(all=TRUE))
>>>>load("C:/R/useR/testxy.RData")
>>>>ls()
>>>
>>>[1] "test.xy" "x1"      "xyadd"   "xyadd1"  "y1"
>>>
>>>>ls.str(pat="xyadd")
>>>
>>>xyadd :  num 5
>>>xyadd1 :  num [1:4] 93 50 65 38
>>>
>>>When I run, I know the result like above
>>>
>>>>xyadd
>>>
>>>[1] 5
>>>
>>>>xyadd1
>>>
>>>[1] 93 50 65 38
>>>
>>>what I want to know, is there any function to make the result like:
>>>
>>>
>>>>xyadd
>>>
>>>        test.xy(x=2, y=3)
>>>
>>>and
>>>
>>>
>>>>xyadd1
>>>
>>>       test.xy(x=x1, y=y1)
>>
>>-- 
>>Brian D. Ripley,                  ripley at stats.ox.ac.uk
>>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>University of Oxford,             Tel:  +44 1865 272861 (self)
>>1 South Parks Road,                     +44 1865 272866 (PA)
>>Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-
>>guide.html
> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list