[R] Thank you your help and one more question.
arun
smartpink111 at yahoo.com
Tue Jan 29 04:20:10 CET 2013
HI,
I don't have Amelia package installed.
If you want to get the mean value, you could use either ?aggregate(), or ?ddply() from library(plyr)
library(plyr)
imputNew<-do.call(rbind,imput1_2_3)
res1<-ddply(imputNew,.(ID,CTIME),function(x) mean(x$WEIGHT))
names(res1)[3]<-"WEIGHT"
head(res1)
# ID CTIME WEIGHT
#1 HM001 1223 24.90000
#2 HM001 1224 25.20000
#3 HM001 1225 25.50000
#4 HM001 1226 25.41933
#5 HM001 1227 25.70000
#6 HM001 1228 27.10000
#or
res2<-aggregate(.~ID+CTIME,data=imputNew,mean)
#or
res3<- do.call(rbind,lapply(split(imputNew,imputNew$CTIME),function(x) {x$WEIGHT<-mean(x[,3]);head(x,1)}))
row.names(res3)<-1:nrow(res3)
identical(res1,res2)
#[1] TRUE
identical(res1,res3)
#[1] TRUE
A.K.
________________________________
From: 남윤주 <jamansymptom at naver.com>
To: arun <smartpink111 at yahoo.com>
Sent: Monday, January 28, 2013 9:47 PM
Subject: Re: Thank you your help and one more question.
Thank you for replying my question.
What I want is the matrix like below.
I have 3 data sets that named weightimp1, 2, 3.
And, to get the matrix like below, I have to combine 3 data sets(named weightimp1, 2, 3).
I don't know how to 3data sets combined. It could be mean of 3 data set. Or, there might be a value(temp2$imputations$...) in Amelia package.
I prefer to use Amelia package method, but if it dosen't exist, can u recommend how to set as a mean value?
# ID CTIME WEIGHT (It represents 3 data sets(weightimp1, 2, 3)
#1 HM001 1223 24.90000
#2 HM001 1224 25.20000
#3 HM001 1225 25.50000
#4 HM001 1226 25.24132
#5 HM001 1227 25.70000
#6 HM001 1228 27.10000
#7 HM001 1229 27.30000
#8 HM001 1230 27.40000
#9 HM001 1231 28.40000
#10 HM001 1232 29.20000
#11 HM001 1233 30.13770
#12 HM001 1234 31.17251
#13 HM001 1235 32.40000
#14 HM001 1236 33.70000
#15 HM001 1237 34.30000
-----Original Message-----
From: "arun"<smartpink111 at yahoo.com>
To: "남윤주"<jamansymptom at naver.com>;
Cc: "R help"<r-help at r-project.org>;
Sent: 2013-01-29 (화) 11:25:38
Subject: Re: Thank you your help and one more question.
HI,
How do you want to combine the results?
It looks like the 5 datasets are list elements.
If I take the first three list elements,
imput1_2_3<-list(imp1=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9,
25.2, 25.5, 25.24132, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.1377,
31.17251, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")),
imp2=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9,
25.2, 25.5, 25.54828, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 29.8977,
31.35045, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")),
imp3=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9,
25.2, 25.5, 25.46838, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.88185,
31.57952, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")))
#It could be combined by:
do.call(rbind, imput1_2_3)# But if you do this the total number or rows will be the sum of the number of rows of each dataset.
I guess you want something like this:
res<-Reduce(function(...) merge(...,by=c("ID","CTIME")),imput1_2_3)
names(res)[3:5]<- paste("WEIGHT","IMP",1:3,sep="")
res
# ID CTIME WEIGHTIMP1 WEIGHTIMP2 WEIGHTIMP3
#1 HM001 1223 24.90000 24.90000 24.90000
#2 HM001 1224 25.20000 25.20000 25.20000
#3 HM001 1225 25.50000 25.50000 25.50000
#4 HM001 1226 25.24132 25.54828 25.46838
#5 HM001 1227 25.70000 25.70000 25.70000
#6 HM001 1228 27.10000 27.10000 27.10000
#7 HM001 1229 27.30000 27.30000 27.30000
#8 HM001 1230 27.40000 27.40000 27.40000
#9 HM001 1231 28.40000 28.40000 28.40000
#10 HM001 1232 29.20000 29.20000 29.20000
#11 HM001 1233 30.13770 29.89770 30.88185
#12 HM001 1234 31.17251 31.35045 31.57952
#13 HM001 1235 32.40000 32.40000 32.40000
#14 HM001 1236 33.70000 33.70000 33.70000
#15 HM001 1237 34.30000 34.30000 34.30000
A.K.
________________________________
From: 남윤주 <jamansymptom>@naver.com>
To: arun <smartpink111>@yahoo.com>
Sent: Monday, January 28, 2013 7:35 PM
Subject: Thank you your help and one more question.
http://us-mg6.mail.yahoo.com/neo/launch?.rand=3qkohpi922i2q#
I deeply appreciate your help. Answering your question, I am software engineer. And I am developing system accumulating data to draw chart and table.
For higher perfromance, I have to deal missing value treatment. So, I use Amelia Pacakge. Below is the result follwing your answer.
----------------------------------------------------------------
>temp2 #origin data
ID CTIME WEIGHT
1 HM001 1223 24.9
2 HM001 1224 25.2
3 HM001 1225 25.5
4 HM001 1226 NA
5 HM001 1227 25.7
6 HM001 1228 27.1
7 HM001 1229 27.3
8 HM001 1230 27.4
9 HM001 1231 28.4
10 HM001 1232 29.2
11 HM001 1233 1221.0
12 HM001 1234 NA
13 HM001 1235 32.4
14 HM001 1236 33.7
15 HM001 1237 34.3
> temp2$WEIGHT<- ifelse(temp2$WEIGHT>50,NA,temp2$WEIGHT)
>temp2 # After eliminating strange value
ID CTIME WEIGHT
1 HM001 1223 24.9
2 HM001 1224 25.2
3 HM001 1225 25.5
4 HM001 1226 NA
5 HM001 1227 25.7
6 HM001 1228 27.1
7 HM001 1229 27.3
8 HM001 1230 27.4
9 HM001 1231 28.4
10 HM001 1232 29.2
11 HM001 1233 NA
12 HM001 1234 NA
13 HM001 1235 32.4
14 HM001 1236 33.7
15 HM001 1237 34.3
--------------------------------------------------------------
I have One more question. Below are codes and results.
--------------------------------------------------------------
> a.out2<-amelia(temp2, m=5, ts="CTIME", cs="ID", polytime=1)
-- Imputation 1 --
1 2 3 4
-- Imputation 2 --
1 2 3
-- Imputation 3 --
1 2 3 4
-- Imputation 4 --
1 2 3
-- Imputation 5 --
1 2 3
> a.out2$imputations
$imp1
ID CTIME WEIGHT
1 HM001 1223 24.90000
2 HM001 1224 25.20000
3 HM001 1225 25.50000
4 HM001 1226 25.24132
5 HM001 1227 25.70000
6 HM001 1228 27.10000
7 HM001 1229 27.30000
8 HM001 1230 27.40000
9 HM001 1231 28.40000
10 HM001 1232 29.20000
11 HM001 1233 30.13770
12 HM001 1234 31.17251
13 HM001 1235 32.40000
14 HM001 1236 33.70000
15 HM001 1237 34.30000
$imp2
ID CTIME WEIGHT
1 HM001 1223 24.90000
2 HM001 1224 25.20000
3 HM001 1225 25.50000
4 HM001 1226 25.54828
5 HM001 1227 25.70000
6 HM001 1228 27.10000
7 HM001 1229 27.30000
8 HM001 1230 27.40000
9 HM001 1231 28.40000
10 HM001 1232 29.20000
11 HM001 1233 29.89770
12 HM001 1234 31.35045
13 HM001 1235 32.40000
14 HM001 1236 33.70000
15 HM001 1237 34.30000
$imp3
ID CTIME WEIGHT
1 HM001 1223 24.90000
2 HM001 1224 25.20000
3 HM001 1225 25.50000
4 HM001 1226 25.46838
5 HM001 1227 25.70000
6 HM001 1228 27.10000
7 HM001 1229 27.30000
8 HM001 1230 27.40000
9 HM001 1231 28.40000
10 HM001 1232 29.20000
11 HM001 1233 30.88185
12 HM001 1234 31.57952
13 HM001 1235 32.40000
14 HM001 1236 33.70000
15 HM001 1237 34.30000
$imp4
ID CTIME WEIGHT
1 HM001 1223 24.90000
2 HM001 1224 25.20000
3 HM001 1225 25.50000
4 HM001 1226 25.86703
5 HM001 1227 25.70000
6 HM001 1228 27.10000
7 HM001 1229 27.30000
8 HM001 1230 27.40000
9 HM001 1231 28.40000
10 HM001 1232 29.20000
11 HM001 1233 30.61241
12 HM001 1234 30.17042
13 HM001 1235 32.40000
14 HM001 1236 33.70000
15 HM001 1237 34.30000
$imp5
ID CTIME WEIGHT
1 HM001 1223 24.90000
2 HM001 1224 25.20000
3 HM001 1225 25.50000
4 HM001 1226 26.05747
5 HM001 1227 25.70000
6 HM001 1228 27.10000
7 HM001 1229 27.30000
8 HM001 1230 27.40000
9 HM001 1231 28.40000
10 HM001 1232 29.20000
11 HM001 1233 31.03894
12 HM001 1234 30.90960
13 HM001 1235 32.40000
14 HM001 1236 33.70000
15 HM001 1237 34.30000
----------------------------------------
I got 5 datasets including imputed values. But What I want is not five datasets, only one data set which combine those 5 imputed datasets.
I wannacombine $imp1, $imp2... $imp5 to get a final result set. This result set is also (3 X 15) matrix.
Would you help me once more please?
-----Original Message-----
From: "arun"<smartpink111>@yahoo.com>
To: "남윤주"<jamansymptom>@naver.com>;
Cc: "R help"<r-help>@r-project.org>;
Sent: 2013-01-28 (월) 23:48:51
Subject: Re: Thank you your help.
Hi,
temp3<- read.table(text="
ID CTIME WEIGHT
HM001 1223 24.0
HM001 1224 25.2
HM001 1225 23.1
HM001 1226 NA
HM001 1227 32.1
HM001 1228 32.4
HM001 1229 1323.2
HM001 1230 27.4
HM001 1231 22.4236 #changed here to test the previous solution
",sep="",header=TRUE,stringsAsFactors=FALSE)
tempnew<- na.omit(temp3)
grep("\\d{4}",temp3$WEIGHT)
#[1] 7 9 #not correct
temp3[,3][grep("\\d{4}..*",temp3$WEIGHT)]<-NA #match 4 digit numbers before the decimals
tail(temp3)
# ID CTIME WEIGHT
#4 HM001 1226 NA
#5 HM001 1227 32.1000
#6 HM001 1228 32.4000
#7 HM001 1229 NA
#8 HM001 1230 27.4000
#9 HM001 1231 22.4236
#Based on the variance,
You could set up some limit, for example 50 and use:
tempnew$WEIGHT<- ifelse(tempnew$WEIGHT>50,NA,tempnew$WEIGHT)
A.K.
________________________________
From: 남윤주 <jamansymptom>@naver.com>
To: arun <smartpink111>@yahoo.com>
Sent: Monday, January 28, 2013 2:20 AM
Subject: Re: Thank you your help.
Thank you for your reply again. Your understanding is exactly right.
I attached a picture that show dataset.
'weight' is a dependent variable. And CTIME means hour/minute. This data will have accumulated for years.
Speaking of accepted variance range, it would be from 10 to 50.
Actually, I am java programmer. So, I am strange this R Language.
Can u give me some example to use grep function?
-----Original Message-----
From: "arun"<smartpink111>@yahoo.com>
To: "jamansymptom at naver.com"<jamansymptom>@naver.com>;
Cc:
Sent: 2013-01-28 (월) 15:27:12
Subject: Re: Thank you your help.
Hi,
Your original post was that
"...it was evaluated from 20kg -40kg. But By some errors, it is evaluated 2000 kg".
So, my understanding was that you get values 2000 or 2000-4000 reads in place of 20-40 occasionally due to some misreading.
If your dataset contains observed value, strange value and NA and you want to replace the strange value to NA, could you mention the range of strange values. If the strange value ranges anywhere between 1000-9999, it should get replaced with the ?grep() solution. But, if it depends upon something else, you need to specify. Also, regarding the variance, what is your accepted range of variance.
A.K.
----- Original Message -----
From: "jamansymptom at naver.com" <jamansymptom>@naver.com>
To: smartpink111 at yahoo.com
Cc:
Sent: Monday, January 28, 2013 1:15 AM
Subject: Thank you your help.
Thank you to answer my question.
It is not exactly what I want. I should have informed detailed situation.
There is a sensor get data every minute. And that data will be accumulated and be portion of dataset.
And the dataset contains observed value, strange value and NA.
Namely, I am not sure where strange value will be occured.
And I can't expect when strange value will be occured.
I need the procedure performing like below.
1. using a method, set the range of variance
2. using for(i) statement, check whether variance(weihgt) is in the range.
3. when variance is out of range, impute weight[i] as NA.
Thank you.
More information about the R-help
mailing list