[R] Excluding fixed number of rows from calculation while summarizing using ddply() function.

arun smartpink111 at yahoo.com
Mon Nov 5 14:58:45 CET 2012


HI Siddu,

Sorry, I looked only at your first statement:
"in this example case I need to exclude the first row and last row(N=1)"

You can try this: 
dat1<-read.table(text="
Unique StepNo Data1 Data2
1      A      1    4    5     
2      A      1    5    6     
3      A      1    7    8   
4      A      1    3    4     
5      A      1    1    1     
6      B      1    2    4      
7      B      1    3    5     
8      B      1    4    5
9      B      1    5    6    
10    B      1    6    7   
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-ddply(dat1,.(Unique,StepNo), head,2)
 dat3<-ddply(dat1,.(Unique,StepNo), tail,2)
 dat1$newcoldat1<-TRUE
 dat2$newcoldat2<-TRUE
 dat3$newcoldat3<-TRUE
dat4<-merge(merge(dat1,dat2,all=TRUE),dat3,all=TRUE)
 dat5<-dat4[is.na(dat4$newcoldat2) &is.na(dat4$newcoldat3),1:4] 
 ddply(dat5,.(Unique,StepNo),numcolwise(mean)) #not required here as there were only one row for A and B.
#  Unique StepNo Data1 Data2
#1      A      1     7     8
#2      B      1     4     5

 
A.K.



----- Original Message -----
From: siddu479 <onlyfordigitalstuff at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Sunday, November 4, 2012 10:36 PM
Subject: Re: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function.

Hi Arun,

   Thanks for your reply but your script is removing only one row( first row
and last row) for each Unique and StepNo combination and calculating mean
for the rest of rows.
For below data , your script removing the #'s rows perfectly.
But in reality I may need to ignore *say first 10 rows and last 20 rows for
each Unique and StepNo combination.
* for statistics calculation.
Unique StepNo Data1 Data2
1      A      1     4     5      #Your script removing this row
successfully.
2      A      1     5     6
3      A      1     7     8    
4      A      1     3     4      
5      A      1     1     1      #Your script removing this row
successfully.
6      B      1     2     4      #Your script removing this row
successfully.
7      B      1     3     5  
8      B      1     4     5
9      B      1     5     6
10    B      1     6     7      #Your script removing this row successfully.

Can you modify your script to get my requirement like below (making it
generic, here *N=2*, removing first 2 lines and last 2 lines.. *sometimes I
may have two numbers N1 & N2 (no.of rows need to be removed from and top and
bottom respectively*)

Unique StepNo Data1 Data2
1      A      1     4     5      #Ignore this
2      A      1     5     6      #Ignore this
3      A      1     7     8    
4      A      1     3     4      #Ignore this
5      A      1     1     1      #Ignore this
6      B      1     2     4      #Ignore this
7      B      1     3     5      #Ignore this
8      B      1     4     5
9      B      1     5     6     #Ignore this
10    B      1     6     7     #Ignore this

and then calculate the statistics using ddply.

I hope my problem statement is much clear now.







-----
Sidda
Business Analyst Lead
Applied Materials Inc.

--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648447.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list