[R] reshaping column items into rows per unique ID
Eric Berger
ericjberger at gmail.com
Sun Feb 25 18:56:37 CET 2018
Hi Allaisone,
I took a slightly different approach but you might find this either as or
more useful than your approach, or at least a start on the path to a
solution you need.
df1 <-
data.frame(CustId=c(1,1,1,2,3,3,4,4,4),DietType=c("a","c","b","f","a","j","c","c","f"),
stringsAsFactors=FALSE)
custs <- unique(df1$CustId)
dtype <- unique(df1$DietType)
nc <- length(custs)
nd <- length(dtype)
df2 <- as.data.frame( matrix(rep(0,nc*(nd+1)),nrow=nc),
stringsAsFactors=FALSE)
colnames(df2) <- c("CustId",dtype[order(dtype)])
df2$CustId <- custs[ order(custs) ]
for ( i in 1:nrow(df1) ) {
iRow <- match(df1$CustId[i],df2$CustId)
iCol <- match(df1$DietType[i],colnames(df2))
df2[ iRow, iCol ] <- df2[ iRow, iCol] + 1
}
> df2
# CustId a b c f j
# 1 1 1 1 1 0 0
# 2 2 0 0 0 0 0
# 3 3 1 0 0 0 1
# 4 4 0 0 2 1 0
The dataframe df2 will have a column for the CustId and one column for each
unique diet type.
Each row is a unique customerId, and each entry contains the number of
times the given diet type occurred for that customer.
I hope that helps,
Eric
On Sun, Feb 25, 2018 at 7:08 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> I believe you need to spend time with an R tutorial or two: a data frame
> (presumably the "table" data structure you describe) can *not* contain
> "blanks" -- all columns must be the same length, which means NA's are
> filled in as needed.
>
> Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any
> local version of R (maybe it would in some server version -- others more
> knowledgeable should comment on this).
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Sun, Feb 25, 2018 at 4:59 AM, Allaisone 1 <allaisone1 at hotmail.com>
> wrote:
>
> > Hi All
> >
> > I have a datafram which looks like this :
> >
> > CustomerID DietType
> > 1 a
> > 1 c
> > 1 b
> > 2 f
> > 2 a
> > 3 j
> > 4 c
> > 4 c
> > 4 f
> >
> > And I would like to reshape this so I can see the list of DietTypes per
> > customer in rows instead of columns like this :
> >
> > > MyDf
> > CustomerID DietType DietType DietType
> > 1 a c b
> > 2 f a
> > 3 j
> > 4 c c f
> >
> > I tried many times using melt(),spread (),and dcast () functions but was
> > not able to produce the desired table. The best attempt was by typing :
> >
> > # 1) Adding new column with unique values:
> > MyDf $newcol <- c (1:9)
> > #2) then :
> > NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)
> >
> > This produces the desired table but with many NA values like this :
> >
> > CustomerID 1 2 3 4 5 6 7 8 9
> > 1 a c b NA NA NA NA NA NA
> > 2 NA NA NA f a NA NA NA NA
> > 3 NA NA NA NA NA j NA NA NA
> > 4 NA NA NA NA NA NA c c f
> >
> > As you see, the lette/s indicating DietType move to the right side each
> > time we move down leaving many NA values and as my original files is very
> > large, I expect that the final output would contain around 800,000
> columns
> > and 70,000 rows. This is why my code works with small data but does not
> > work with my large file because of memory issue even though I'm using
> large
> > PC.
> >
> > What changes I need to do with my code to produce the desired table where
> > the list of DietTypes are grouped in rows exactly like the second table
> > shown abover?
> >
> > Regards
> > Allaisnoe
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list