[R] HELP!! how to remove 10% of data randomly in R
arun
smartpink111 at yahoo.com
Wed Oct 31 19:48:39 CET 2012
HI,
May be this helps.
dat1<-read.table(text="
TDate TTime O3 No2 Temp Sun Wspeed Wdirect Hum Indicator
1 19980101 2400 0.065 0.036 31.4 765 9.9 351 NA 1
2 19980102 2400 0.053 0.025 31.8 624 7.7 351 NA 1
3 19980103 2400 0.027 0.033 31.5 852 8.8 331 NA 2
4 19980104 2400 0.034 0.023 30.7 679 7.0 338 NA 2
5 19980105 2400 0.019 0.016 28.1 376 9.6 354 NA 1
6 19980106 2400 0.021 0.018 29.9 603 9.3 356 NA 1
7 19980107 2400 0.026 0.047 31.2 857 10.7 336 NA 1
8 19980108 2400 0.024 0.014 31.1 635 7.8 330 NA 1
9 19980109 2400 0.058 0.033 32.5 742 10.7 334 NA 1
10 19980110 2400 0.026 0.032 33.9 923 10.6 347 NA 2
11 19980111 2400 0.064 0.034 32.5 751 6.3 355 NA 2
12 19980112 2400 0.066 0.034 33.3 697 8.5 319 NA 1
13 19980113 2400 0.026 0.030 33.4 992 12.5 341 NA 1
14 19980114 2400 0.101 0.028 33.8 705 8.7 349 NA 1
15 19980115 2400 0.069 0.030 33.3 718 11.4 348 NA 1
16 19980116 2400 0.054 0.026 33.4 639 10.9 354 NA 1
17 19980117 2400 0.090 0.039 33.1 653 13.2 342 NA 2
18 19980118 2400 0.048 0.017 33.2 825 10.8 323 NA 2
19 19980119 2400 0.038 0.027 33.7 984 10.3 353 NA 1
20 19980120 2400 0.026 0.032 34.2 994 15.0 357 NA 1
21 19980121 2400 0.065 0.044 33.8 999 17.5 343 NA 1
22 19980122 2400 0.046 0.024 33.5 931 10.1 332 NA 1
23 19980123 2400 0.050 0.041 33.9 881 11.3 353 NA 1
24 19980124 2400 0.036 0.027 33.8 877 9.1 328 NA 2
25 19980125 2400 0.043 0.021 33.2 777 10.5 340 NA 2
26 19980126 2400 0.029 0.016 33.1 999 14.1 341 NA 1
27 19980127 2400 0.033 0.030 33.9 943 12.9 344 NA 1
28 19980128 2400 0.040 0.022 33.7 805 12.6 354 NA 1
29 19980129 2400 0.029 0.015 30.2 512 7.4 356 NA 1
30 19980130 2400 0.027 0.013 31.7 656 13.9 349 NA 1
",sep="",header=TRUE,stringsAsFactors=FALSE)
#creating NA for 10% of data in the specified columns (deviant of David's method).
is.na(dat1[sample(1:nrow(dat1),0.1*nrow(dat1)),3:7])<-TRUE
tail(dat1)
# TDate TTime O3 No2 Temp Sun Wspeed Wdirect Hum Indicator
#25 19980125 2400 NA NA NA NA NA 340 NA 2
#26 19980126 2400 0.029 0.016 33.1 999 14.1 341 NA 1
#27 19980127 2400 0.033 0.030 33.9 943 12.9 344 NA 1
#28 19980128 2400 0.040 0.022 33.7 805 12.6 354 NA 1
#29 19980129 2400 0.029 0.015 30.2 512 7.4 356 NA 1
#30 19980130 2400 0.027 0.013 31.7 656 13.9 349 NA 1
#If you need to create NA for individual columns randomly
res<-do.call(cbind,lapply(lapply(dat1[,3:7],function(x) data.frame(x)),function(x) x[sample(1:nrow(x),0.1*nrow(x)),]))
dat1[,3][dat1[,3]%in%res[,1]]<-NA
dat1[,4][dat1[,4]%in%res[,2]]<-NA
dat1[,5][dat1[,5]%in%res[,3]]<-NA
dat1[,6][dat1[,6]%in%res[,4]]<-NA
dat1[,7][dat1[,7]%in%res[,5]]<-NA
head(dat1)
# TDate TTime O3 No2 Temp Sun Wspeed Wdirect Hum Indicator
#1 19980101 2400 0.065 0.036 31.4 765 9.9 351 NA 1
#2 19980102 2400 0.053 0.025 31.8 624 7.7 351 NA 1
#3 19980103 2400 0.027 0.033 31.5 852 8.8 331 NA 2
#4 19980104 2400 NA NA 30.7 679 7.0 338 NA 2
#5 19980105 2400 0.019 0.016 28.1 376 9.6 354 NA 1
#6 19980106 2400 0.021 0.018 29.9 603 NA 356 NA 1
A.K.
----- Original Message -----
From: Eugenie <leemeanwei at hotmail.com>
To: r-help at r-project.org
Cc:
Sent: Wednesday, October 31, 2012 8:42 AM
Subject: Re: [R] HELP!! how to remove 10% of data randomly in R
tDate tTime O3 No2 Temp Sun Wspeed Wdirect Hum Indicator
1 19980101 2400 0.065 0.036 31.4 765 9.9 351 NA 1
2 19980102 2400 0.053 0.025 31.8 624 7.7 351 NA 1
3 19980103 2400 0.027 0.033 31.5 852 8.8 331 NA 2
4 19980104 2400 0.034 0.023 30.7 679 7.0 338 NA 2
5 19980105 2400 0.019 0.016 28.1 376 9.6 354 NA 1
6 19980106 2400 0.021 0.018 29.9 603 9.3 356 NA 1
7 19980107 2400 0.026 0.047 31.2 857 10.7 336 NA 1
8 19980108 2400 0.024 0.014 31.1 635 7.8 330 NA 1
9 19980109 2400 0.058 0.033 32.5 742 10.7 334 NA 1
10 19980110 2400 0.026 0.032 33.9 923 10.6 347 NA 2
11 19980111 2400 0.064 0.034 32.5 751 6.3 355 NA 2
12 19980112 2400 0.066 0.034 33.3 697 8.5 319 NA 1
13 19980113 2400 0.026 0.030 33.4 992 12.5 341 NA 1
14 19980114 2400 0.101 0.028 33.8 705 8.7 349 NA 1
15 19980115 2400 0.069 0.030 33.3 718 11.4 348 NA 1
16 19980116 2400 0.054 0.026 33.4 639 10.9 354 NA 1
17 19980117 2400 0.090 0.039 33.1 653 13.2 342 NA 2
18 19980118 2400 0.048 0.017 33.2 825 10.8 323 NA 2
19 19980119 2400 0.038 0.027 33.7 984 10.3 353 NA 1
20 19980120 2400 0.026 0.032 34.2 994 15.0 357 NA 1
21 19980121 2400 0.065 0.044 33.8 999 17.5 343 NA 1
22 19980122 2400 0.046 0.024 33.5 931 10.1 332 NA 1
23 19980123 2400 0.050 0.041 33.9 881 11.3 353 NA 1
24 19980124 2400 0.036 0.027 33.8 877 9.1 328 NA 2
25 19980125 2400 0.043 0.021 33.2 777 10.5 340 NA 2
26 19980126 2400 0.029 0.016 33.1 999 14.1 341 NA 1
27 19980127 2400 0.033 0.030 33.9 943 12.9 344 NA 1
28 19980128 2400 0.040 0.022 33.7 805 12.6 354 NA 1
29 19980129 2400 0.029 0.015 30.2 512 7.4 356 NA 1
30 19980130 2400 0.027 0.013 31.7 656 13.9 349 NA 1
if given data like this,how to remove the data in O3,NO2,sun,temp,wspeed
randomly??(missing values in these rows & columns)
--
View this message in context: http://r.789695.n4.nabble.com/HELP-how-to-remove-10-of-data-randomly-in-R-tp4647879p4647994.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list