[R] Filling missing data in a Panel
arun
smartpink111 at yahoo.com
Mon Feb 17 09:53:53 CET 2014
Hi,
Looks like one column name is missing. I am not sure about the output you wanted. May be this helps.
dat <- read.table(text="row.names bank_name date px_last Q_Y p_made q_made p_for
1 2 1 11/30/06 1.31 p406-q406 406 406 1
2 47 1 02/26/09 1.27 p109-q109 109 109 10
3 55 1 06/08/2009 1.40 p209-q209 209 209 11
4 68 1 12/01/2009 1.51 p409-q409 409 409 13
5 87 1 05/26/10 1.22 p210-q210 210 210 15
6 96 1 7/22/2010 1.25 p310-q310 310 310 16
7 221 2 11/14/06 1.30 p406-q406 406 406 1
8 16 2 02/13/07 1.27 p107-q107 107 107 2
9 31 2 5/15/2007 1.36 p207-q207 207 207 3
10 222 3 11/29/2007 1.50 p407-q407 407 407 5
11 1110 3 02/25/08 1.48 p108-q108 108 108 6
12 6 4 02/15/07 1.35 p107-q107 107 107 2
13 18 4 5/24/2007 1.39 p207-q207 207 207 3
14 292 4 08/21/07 1.39 p307-q307 307 307 4
15 38 4 11/29/2007 1.49 p407-q407 407 407 5
16 49 4 01/28/08 1.43 p108-q108 108 108 6
17 61 4 05/15/08 1.52 p208-q208 208 208 7
18 71 4 08/18/08 1.45 p308-q308 308 308 8
19 78 4 11/20/08 1.30 p408-q408 408 408 9
20 88 4 02/19/09 1.35 p109-q109 109 109 10
21 941 4 05/28/09 1.35 p209-q209 209 209 11",sep="",header=TRUE,stringsAsFactors=FALSE)
##Possible solution 1
tbl <- table(dat$bank_name)
dat2 <- data.frame(bank_name=as.numeric(rep(names(tbl),max(tbl)-tbl)),p_for=NA)
res1 <- merge(dat,dat2,all=TRUE)[colnames(dat)]
table(res1$bank_name)
#
# 1 2 3 4
#10 10 10 10
###2
vec1 <- with(dat,tapply(p_for,list(bank_name),FUN=max))
vec2 <- as.numeric(rep(names(vec1),each=max(vec1)))
dat2New <- data.frame(bank_name=vec2,p_for=rep(seq(max(vec1)),4))
res2 <- merge(dat,dat2New,all=TRUE)[colnames(dat)]
table(res2$bank_name)
#
# 1 2 3 4
#16 16 16 16
#or
####3
#using 18 as mentioned in the description
vec3 <- rep(unique(dat$bank_name),each=18)
dat3 <- data.frame(bank_name=vec2,p_for=rep(seq(18),length(unique(dat$bank_name))))
res3 <- merge(dat,dat3,all=TRUE)[colnames(dat)]
table(res3$bank_name)
# 1 2 3 4
#18 18 18 18
A.K.
On Monday, February 17, 2014 2:40 AM, Francesca Pancotto <francesca.pancotto at gmail.com> wrote:
Dear R contributors,
I have a problem with a database that at the moment I find hard to solve.
I have a panel composed of n subjects, whose names in the table that I report is bank_name,
and observations for each of the individuals of bank_name from 1 to 18, as reported from the column p_for.
As you can see from p_for, there are missing values in the panel that are not present and that create problems to my estimation.
Do you know an efficient way to introduce missing values in the rows of the panel so that each cross section bank_name has the same number of observations
p_for, even though some of them are NA?
Thanks for any help you can provide,
Best,
Francesca
row.names bank_name date px_last Q_Y p_made p_for
1 2 1 11/30/06 1.31 p406-q406 406 406 1
2 47 1 02/26/09 1.27 p109-q109 109 109 10
3 55 1 06/08/2009 1.40 p209-q209 209 209 11
4 68 1 12/01/2009 1.51 p409-q409 409 409 13
5 87 1 05/26/10 1.22 p210-q210 210 210 15
6 96 1 7/22/2010 1.25 p310-q310 310 310 16
7 221 2 11/14/06 1.30 p406-q406 406 406 1
8 16 2 02/13/07 1.27 p107-q107 107 107 2
9 31 2 5/15/2007 1.36 p207-q207 207 207 3
10 222 3 11/29/2007 1.50 p407-q407 407 407 5
11 1110 3 02/25/08 1.48 p108-q108 108 108 6
12 6 4 02/15/07 1.35 p107-q107 107 107 2
13 18 4 5/24/2007 1.39 p207-q207 207 207 3
14 292 4 08/21/07 1.39 p307-q307 307 307 4
15 38 4 11/29/2007 1.49 p407-q407 407 407 5
16 49 4 01/28/08 1.43 p108-q108 108 108 6
17 61 4 05/15/08 1.52 p208-q208 208 208 7
18 71 4 08/18/08 1.45 p308-q308 308 308 8
19 78 4 11/20/08 1.30 p408-q408 408 408 9
20 88 4 02/19/09 1.35 p109-q109 109 109 10
21 941 4 05/28/09 1.35 p209-q209 209 209 11 ----------------------------------
Francesca Pancotto
Università degli Studi di Modena e Reggio Emilia
Palazzo Dossetti - Viale Allegri, 9 - 42121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
----------------------------------
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list