[R] create new matrix from user-defined function

arun smartpink111 at yahoo.com
Thu Jul 11 22:45:21 CEST 2013


Hi BNC,
No problem.
You could also use ?with() 

data.frame(MW_EEsDue_ERRORS=with(dat3,A_CaseID[D_MW_EEsDueTotal!=rowSums(cbind(B_MW_EEsDue1,C_MW_EEsDue2))]))
#  MW_EEsDue_ERRORS
#1             1882
#2             1884
#3             1885
A.K.



----- Original Message -----
From: "Crombie, Burnette N" <bcrombie at utk.edu>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Thursday, July 11, 2013 4:40 PM
Subject: RE: [R] create new matrix from user-defined function

You understood me perfectly, and I agree is it easier to index using numbers than names.  I'm just afraid if my dataset gets too big I'll mess up which index numbers I'm supposed to be using.  "data.table()" looks very useful and a good way to approach the issue.  Thanks.  I really appreciate your (everyone's) help.  BNC

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com] 
Sent: Thursday, July 11, 2013 4:29 PM
To: Crombie, Burnette N
Cc: R help
Subject: Re: [R] create new matrix from user-defined function

Hi,
Not sure I understand you correctly.
I found it easier to index using number than replace it by lengthy column names.
You could do it similar to the one below.

matNew<-matrix(dat3[rowSums(dat3[c("B_MW_EEsDue1","C_MW_EEsDue2")])!=dat3["D_MW_EEsDueTotal"],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS"))

 matNew
#     MW_EEsDue_ERRORS
#[1,]             1882
#[2,]             1884
#[3,]             1885

If you have very large dataset, you could also check ?data.table().


library(data.table)
dt3<- data.table(dat3)
dtNew<-subset(dt3[D_MW_EEsDueTotal!=B_MW_EEsDue1+C_MW_EEsDue2],select=1)
 dtNew
#   A_CaseID
#1:     1882
#2:     1884
#3:     1885


#Some speed comparisons:
set.seed(1254)
datTest<- data.frame(A=sample(1000:15000,1e7,replace=TRUE),B= sample(1:10,1e7,replace=TRUE),C=sample(5:15,1e7,replace=TRUE),D=sample(5:25,1e7,replace=TRUE))

system.time(res1<- data.frame(MW_EEsDue_ERRORS=datTest[datTest[[4]] != datTest[[2]]+datTest[[3]],][[1]]))
# user  system elapsed
#  2.256   0.000   2.145 

system.time(mat1<-matrix(datTest[rowSums(datTest[,2:3])!=datTest[,4],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS")))
 #  user  system elapsed
 # 0.756   0.088   0.849 

system.time(res2<- data.frame(MW_EEsDue_ERRORS=datTest[addmargins(as.matrix(datTest[,2:3]),2)[,3]!=datTest[,4],1]))
#   user  system elapsed
#115.740   0.000 105.778 

dtTest<- data.table(datTest)
system.time(res3<- subset(dtTest[D!=B+C],select=1))
 # user  system elapsed
 # 0.508   0.000   0.477 

identical(res1,res2)
#[1] TRUE
setnames(res3,"A","MW_EEsDue_ERRORS")
 identical(res1,as.data.frame(res3))
#[1] TRUE
A.K.




----- Original Message -----
From: bcrombie <bcrombie at utk.edu>
To: r-help at r-project.org
Cc: 
Sent: Thursday, July 11, 2013 3:54 PM
Subject: Re: [R] create new matrix from user-defined function

Dan and Arun, thank you very much for your replies.  They are both very helpful and I love to get different versions of an answer so I can learn more R code.  You both used indexing to refer to the columns needed in the function, but since my real data frame will be much larger I'm assuming I can replace the index numbers with the names of the columns in quotes instead?   I'll try this on my own if you're busy with other forum questions.  Thanks, again.

From: Nordlund, Dan (DSHS/RDA) [via R] [mailto:ml-node+s789695n4671267h89 at n4.nabble.com]
Sent: Wednesday, July 10, 2013 5:46 PM
To: Crombie, Burnette N
Subject: Re: create new matrix from user-defined function

> -----Original Message-----
> From: [hidden email]</user/SendEmail.jtp?type=node&node=4671267&i=0> 
> [mailto:r-help-bounces at r- 
> project.org<mailto:r-help-bounces at r-%20%0b%3e%20project.org>] On 
> Behalf Of bcrombie
> Sent: Wednesday, July 10, 2013 12:19 PM
> To: [hidden email]</user/SendEmail.jtp?type=node&node=4671267&i=1>
> Subject: [R] create new matrix from user-defined function
>
> #Let's say I have the following data set:
>
> dat3 = data.frame(A_CaseID = c(1881, 1882, 1883, 1884, 1885),
>                   B_MW_EEsDue1 = c(2, 2, 1, 4, 6),
>                   C_MW_EEsDue2 = c(5, 5, 4, 1, 6),
>                   D_MW_EEsDueTotal = c(7, 9, 5, 6, 112))
> dat3
> # A_CaseID B_MW_EEsDue1 C_MW_EEsDue2 D_MW_EEsDueTotal  # 1     1881            
>2            5                7  # 2     1882            2            5                
>9  # 3     1883            1            4                5  # 4     
>1884            4            1                6  # 5     1885            
>6            6              112
>
> # I want to:
> #CREATE A NEW 1-COLUMN MATRIX (of unknown #rows) LISTING ONLY "A"'s 
> WHERE "D != B + C"
> #THIS COLUMN CAN BE LABELED "MW_EEsDue_ERRORS", and output for this 
> example should be:
>
> # MW_EEsDue_ERRORS
> # 1 1882
> # 2 1884
> # 3 1885
>
> #What is the best way to do this?  Thanks for your time.  BNC
>
>

Here is one option, there are many others.  Only you can decide what is "best".

data.frame(MW_EEsDue_ERRORS=dat3[dat3[[4]] != dat3[[2]]+dat3[[3]],][[1]])


Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204

______________________________________________
[hidden email]</user/SendEmail.jtp?type=node&node=4671267&i=2> mailing list https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

________________________________
If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/create-new-matrix-from-user-defined-function-tp4671250p4671267.html
To unsubscribe from create new matrix from user-defined function, click here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4671250&code=YmNyb21iaWVAdXRrLmVkdXw0NjcxMjUwfC0xMzI5MzM0NzI3>.
NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: http://r.789695.n4.nabble.com/create-new-matrix-from-user-defined-function-tp4671250p4671361.html
Sent from the R help mailing list archive at Nabble.com.
    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list