[R] Help deciding on data format for sales data (newbie)

jonathanbriggs jonathanbriggs at mac.com
Tue Jun 2 17:46:38 CEST 2009


Dear All

Beginning data mining and need some help working out the best way to
represent data. I have searched here and online and not found any real help.
Imagines that I have a file of order(sales) data

OrderNo CustomerNo ItemsInOrder
1           1                a,b,c
2           1                d
3           2                a,d

I can represent this as a data.frame but then need to parse my ItemsInOrder?
This seems quite clumsy. Alternatively I can try this sort of representation

OrderNo  CustomerNo  a  b  c  d
1            1                1  1   1  NA
2            1                NA NA NA 1
3            2                1  NA  NA 1

Are these really the two choices and how well does the second representation
scale? (I have 50,000 SKUs)

Can anyone point me in the direction of some worked examples that take such
data and manipulate it; looking for association rules and clusters?

Thanks

Jonathan
-- 
View this message in context: http://www.nabble.com/Help-deciding-on-data-format-for-sales-data-%28newbie%29-tp23835331p23835331.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list