[R] Loop to check for large dataset
PIKAL Petr
petr.pikal at precheza.cz
Mon Oct 10 12:44:25 CEST 2016
Hi
Given this example data, you can get same answer with less typing and without loops.
res<-xtabs(~W+P+S,mydata)
res1<-which(res==0, arr.ind=T)
head(res1)
W P S
10 10 1 1
11 11 1 1
82 82 1 1
100 100 1 1
117 117 1 1
148 148 1 1
Cheers
Petr
From: dusa.adrian at gmail.com [mailto:dusa.adrian at gmail.com] On Behalf Of Adrian Du?a
Sent: Monday, October 10, 2016 12:26 PM
To: Christoph Puschmann <c.puschmann at student.unsw.edu.au>
Cc: r-help at r-project.org; PIKAL Petr <petr.pikal at precheza.cz>
Subject: Re: [R] Loop to check for large dataset
This is an example of how a reproducible code looks like, assuming you have three columns in your dataset named S (store), P (product) and W (week), and also assuming they have integer values from 1 to 19, 1 to 22 and 1 to 157 respectively:
#########
mydata <- expand.grid(seq(19), seq(22), seq(157))
names(mydata) <- c("S", "P", "W")
# randomly delete 65626 - 63127 = 2499 rows
set.seed(12345) # make it replicable
mydata <- mydata[-sample(seq(nrow(mydata)), nrow(mydata) - 63127), ]
#########
Now the dataframe mydata contains exactly 63127 rows, just as in your case. The task is to find which weeks are missing, from which store and for which product.
Below is a possible code to do that. Given you have a small number of stores and products, I'll keep it simple and stupid, by using for loops:
#########
result <- matrix(nrow = 0, ncol = 3)
for (i in seq(19)) {
for (j in seq(22)) {
miss <- setdiff(seq(157), mydata$W[mydata$S == i & mydata$P == j])
if (length(miss) > 0) {
result <- rbind(result, cbind(S = i, P = j, W = miss))
}
}
}
# The result matrix contains 2499 rows that are missing.
> head(result)
S P W
[1,] 1 1 10
[2,] 1 1 11
[3,] 1 1 82
[4,] 1 1 100
[5,] 1 1 117
[6,] 1 1 148
#########
In this example, for S(tore) number 1 and P(roduct) number 1, you are missing W(eek) 10, 11, 82 and so on.
In hoping you can adapt this code to your particular example,
Adrian
On Sun, Oct 9, 2016 at 2:26 AM, Christoph Puschmann <c.puschmann at student.unsw.edu.au<mailto:c.puschmann at student.unsw.edu.au>> wrote:
>
> Dear Adrian,
>
> Yes it is a cyclical data set and theoretically it should repeat this interval until 61327. The data set itself is divided into 2 Parts:
> 1. Product category (column 10)
> 2. Number of Stores Participating (column 01)
> Overall there are 22 different products and in each you have 19 different stores participating. And theoretically each store over each product category should have a 1 - 157 week interval.
>
> The part I am struggling with is how do I run a loop over the whole data set, while checking if all stores participated 157 weeks over the different products.
>
> So far I came up with this:
>
> n=61327 # Generate Matrix to check for values
> Control = matrix(
> 0,
> nrow = n,
> ncol = 1)
>
> s <- seq(from =1 , to = 157, by = 1)
> CW = matrix(
> s,
> nrow = 157,
> ncol = 1
> )
>
> colnames(CW)[1] <- ’s'
>
> CW = as.data.frame(CW)
>
> for (i in 1:nrow(FD)) { # Let run trhough all the rows
> for (j in 1:157) {
> if(FD$WEEk[j] == C$s[j]) {
> Control[i] = 1 # coresponding control row = 1
> } else {
> Control[i] = 0 # corresponding control row = 0
> }
> }
> }
>
> I coded a MRE and attached an sample of my data set.
>
> MRE:
>
> #MRE
>
> dat <- data.frame(
> Store = c(rep(8, times = 157), rep(12, times = 157)), # Number of stores
> WEEK = rep(seq(from=1, to = 157, by = 1), times = 2)
> )
>
>
>
>
--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania
________________________________
Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu.
V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá.
This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system.
If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email.
In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
[[alternative HTML version deleted]]
More information about the R-help
mailing list