[R] Reorganize(stack data) a dataframe inducing names
jim holtman
jholtman at gmail.com
Mon Aug 1 18:22:38 CEST 2011
Try this: had to add extra names to your data since it was not clear
how it was organized. Next time use 'dput' to enclose data.
> x <- read.table(textConnection(" index time key date values
+ 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42
+ 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45
+ 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22
+ 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35
+ 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40
+ 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30
+ 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48
+ 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43
+ 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48
+ 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44
+ 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42
+ 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40
+ 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42
+ 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43
+ 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50
+ 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40
+ 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43")
+ , header = TRUE
+ , as.is = TRUE
+ )
> closeAllConnections()
> x
index time key date values
1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42
2 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45
3 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22
4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35
5 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40
6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30
7 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48
8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43
9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48
10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44
11 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42
12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40
13 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42
14 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43
15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50
16 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40
17 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43
> # get index of first occurance of 'key' column
> indx <- !duplicated(x$key)
> x[indx,]
index time key date values
1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42
4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35
6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30
8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43
9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48
10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44
12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40
15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50
>
>
On Mon, Aug 1, 2011 at 11:13 AM, Francesca <francesca.pancotto at gmail.com> wrote:
> Dear Contributors
> thanks for any help you can provide. I searched the threads
> but I could not find any query that satisfied my needs.
> This is my database:
> index time values
> 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42
> 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45
> 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22
> 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35
> 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40
> 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30
> 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48
> 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43
> 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48
> 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44
> 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42
> 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40
> 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42
> 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43
> 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50
> 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40
> 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43
>
> I need to select only the rows of this database that correspond to each
> of the first occurrences of the string represented in column
> index. In the example shown I would like to obtain a new
> data.frame which is
>
> index time values
> 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42
> 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35
> 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30
> 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43
> 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48
> 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44
> 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40
> 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50
>
> As you can see, it is not the whole string to change,
> rather a substring that is part of it. I want to select
> only the first values related to the row that presents for the first time
> the different part of the string(substring).
> I know how to select rows according to a substring condition on the
> index column, but I cannot use it here because the substring changes
> and moreover the number of occurrences per substring is variable.
>
> Thank you for any help you can provide.
> Francesca
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list