[R] Reorganize(stack data) a dataframe inducing names

jim holtman jholtman at gmail.com
Mon Aug 1 18:22:38 CEST 2011


Try this:  had to add extra names to your data since it was not clear
how it was organized.  Next time use 'dput' to enclose data.

> x <- read.table(textConnection(" index time  key date       values
+ 13732  27965 DATA.Q211.SUM.Index    04/08/11         1.42
+ 13733  27974 DATA.Q211.SUM.Index    05/10/11         1.45
+ 13734  27984 DATA.Q211.SUM.Index    06/01/11         1.22
+ 13746  28615 DATA.Q211.TDS.Index    04/07/11         1.35
+ 13747  28624 DATA.Q211.TDS.Index    05/20/11         1.40
+ 13754  29262 DATA.Q211.UBS.Index    05/02/11         1.30
+ 13755  29272 DATA.Q211.UBS.Index    05/03/11         1.48
+ 13761  29915 DATA.Q211.UCM.Index    04/28/11         1.43
+ 13768  30565 DATA.Q211.VDE.Index    05/02/11         1.48
+ 13775  31215 DATA.Q211.WF.Index     04/14/11         1.44
+ 13776  31225 DATA.Q211.WF.Index     05/12/11         1.42
+ 13789  31865 DATA.Q211.WPC.Index    04/01/11         1.40
+ 13790  31875 DATA.Q211.WPC.Index    04/08/11         1.42
+ 13791  31883 DATA.Q211.WPC.Index    05/10/11         1.43
+ 13804  32515 DATA.Q211.XTB.Index    04/29/11         1.50
+ 13805  32525 DATA.Q211.XTB.Index    05/30/11         1.40
+ 13806  32532 DATA.Q211.XTB.Index    06/28/11         1.43")
+     , header = TRUE
+     , as.is = TRUE
+ )
> closeAllConnections()
> x
   index  time                 key     date values
1  13732 27965 DATA.Q211.SUM.Index 04/08/11   1.42
2  13733 27974 DATA.Q211.SUM.Index 05/10/11   1.45
3  13734 27984 DATA.Q211.SUM.Index 06/01/11   1.22
4  13746 28615 DATA.Q211.TDS.Index 04/07/11   1.35
5  13747 28624 DATA.Q211.TDS.Index 05/20/11   1.40
6  13754 29262 DATA.Q211.UBS.Index 05/02/11   1.30
7  13755 29272 DATA.Q211.UBS.Index 05/03/11   1.48
8  13761 29915 DATA.Q211.UCM.Index 04/28/11   1.43
9  13768 30565 DATA.Q211.VDE.Index 05/02/11   1.48
10 13775 31215  DATA.Q211.WF.Index 04/14/11   1.44
11 13776 31225  DATA.Q211.WF.Index 05/12/11   1.42
12 13789 31865 DATA.Q211.WPC.Index 04/01/11   1.40
13 13790 31875 DATA.Q211.WPC.Index 04/08/11   1.42
14 13791 31883 DATA.Q211.WPC.Index 05/10/11   1.43
15 13804 32515 DATA.Q211.XTB.Index 04/29/11   1.50
16 13805 32525 DATA.Q211.XTB.Index 05/30/11   1.40
17 13806 32532 DATA.Q211.XTB.Index 06/28/11   1.43
> # get index of first occurance of 'key' column
> indx <- !duplicated(x$key)
> x[indx,]
   index  time                 key     date values
1  13732 27965 DATA.Q211.SUM.Index 04/08/11   1.42
4  13746 28615 DATA.Q211.TDS.Index 04/07/11   1.35
6  13754 29262 DATA.Q211.UBS.Index 05/02/11   1.30
8  13761 29915 DATA.Q211.UCM.Index 04/28/11   1.43
9  13768 30565 DATA.Q211.VDE.Index 05/02/11   1.48
10 13775 31215  DATA.Q211.WF.Index 04/14/11   1.44
12 13789 31865 DATA.Q211.WPC.Index 04/01/11   1.40
15 13804 32515 DATA.Q211.XTB.Index 04/29/11   1.50
>
>



On Mon, Aug 1, 2011 at 11:13 AM, Francesca <francesca.pancotto at gmail.com> wrote:
> Dear Contributors
> thanks for any help you can provide. I searched the threads
> but I could not find any query that satisfied my needs.
> This is my database:
>  index time         values
> 13732  27965 DATA.Q211.SUM.Index    04/08/11         1.42
> 13733  27974 DATA.Q211.SUM.Index    05/10/11         1.45
> 13734  27984 DATA.Q211.SUM.Index    06/01/11         1.22
> 13746  28615 DATA.Q211.TDS.Index    04/07/11         1.35
> 13747  28624 DATA.Q211.TDS.Index    05/20/11         1.40
> 13754  29262 DATA.Q211.UBS.Index    05/02/11         1.30
> 13755  29272 DATA.Q211.UBS.Index    05/03/11         1.48
> 13761  29915 DATA.Q211.UCM.Index    04/28/11         1.43
> 13768  30565 DATA.Q211.VDE.Index    05/02/11         1.48
> 13775  31215 DATA.Q211.WF.Index     04/14/11         1.44
> 13776  31225 DATA.Q211.WF.Index     05/12/11         1.42
> 13789  31865 DATA.Q211.WPC.Index    04/01/11         1.40
> 13790  31875 DATA.Q211.WPC.Index    04/08/11         1.42
> 13791  31883 DATA.Q211.WPC.Index    05/10/11         1.43
> 13804  32515 DATA.Q211.XTB.Index    04/29/11         1.50
> 13805  32525 DATA.Q211.XTB.Index    05/30/11         1.40
> 13806  32532 DATA.Q211.XTB.Index    06/28/11         1.43
>
> I need to select only the rows of this database that correspond to each
> of the first occurrences of the string represented in column
> index. In the example shown I would like to obtain a new
> data.frame which is
>
> index time         values
> 13732  27965 DATA.Q211.SUM.Index    04/08/11         1.42
> 13746  28615 DATA.Q211.TDS.Index    04/07/11         1.35
> 13754  29262 DATA.Q211.UBS.Index    05/02/11         1.30
> 13761  29915 DATA.Q211.UCM.Index    04/28/11         1.43
> 13768  30565 DATA.Q211.VDE.Index    05/02/11         1.48
> 13775  31215 DATA.Q211.WF.Index    04/14/11         1.44
> 13789  31865 DATA.Q211.WPC.Index    04/01/11         1.40
> 13804  32515 DATA.Q211.XTB.Index    04/29/11         1.50
>
> As you can see, it is not the whole string to change,
> rather a substring that is part of it. I want to select
> only the first values related to the row that presents for the first time
> the different part of the string(substring).
> I know how to select rows according to a substring condition on the
> index column, but I cannot use it here because the substring changes
> and moreover the number of occurrences per substring is variable.
>
> Thank you for any help you can provide.
> Francesca
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list