[R] create new matrix from user-defined function
arun
smartpink111 at yahoo.com
Thu Jul 11 22:28:32 CEST 2013
Hi,
Not sure I understand you correctly.
I found it easier to index using number than replace it by lengthy column names.
You could do it similar to the one below.
matNew<-matrix(dat3[rowSums(dat3[c("B_MW_EEsDue1","C_MW_EEsDue2")])!=dat3["D_MW_EEsDueTotal"],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS"))
matNew
# MW_EEsDue_ERRORS
#[1,] 1882
#[2,] 1884
#[3,] 1885
If you have very large dataset, you could also check ?data.table().
library(data.table)
dt3<- data.table(dat3)
dtNew<-subset(dt3[D_MW_EEsDueTotal!=B_MW_EEsDue1+C_MW_EEsDue2],select=1)
dtNew
# A_CaseID
#1: 1882
#2: 1884
#3: 1885
#Some speed comparisons:
set.seed(1254)
datTest<- data.frame(A=sample(1000:15000,1e7,replace=TRUE),B= sample(1:10,1e7,replace=TRUE),C=sample(5:15,1e7,replace=TRUE),D=sample(5:25,1e7,replace=TRUE))
system.time(res1<- data.frame(MW_EEsDue_ERRORS=datTest[datTest[[4]] != datTest[[2]]+datTest[[3]],][[1]]))
# user system elapsed
# 2.256 0.000 2.145
system.time(mat1<-matrix(datTest[rowSums(datTest[,2:3])!=datTest[,4],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS")))
# user system elapsed
# 0.756 0.088 0.849
system.time(res2<- data.frame(MW_EEsDue_ERRORS=datTest[addmargins(as.matrix(datTest[,2:3]),2)[,3]!=datTest[,4],1]))
# user system elapsed
#115.740 0.000 105.778
dtTest<- data.table(datTest)
system.time(res3<- subset(dtTest[D!=B+C],select=1))
# user system elapsed
# 0.508 0.000 0.477
identical(res1,res2)
#[1] TRUE
setnames(res3,"A","MW_EEsDue_ERRORS")
identical(res1,as.data.frame(res3))
#[1] TRUE
A.K.
----- Original Message -----
From: bcrombie <bcrombie at utk.edu>
To: r-help at r-project.org
Cc:
Sent: Thursday, July 11, 2013 3:54 PM
Subject: Re: [R] create new matrix from user-defined function
Dan and Arun, thank you very much for your replies. They are both very helpful and I love to get different versions of an answer so I can learn more R code. You both used indexing to refer to the columns needed in the function, but since my real data frame will be much larger I'm assuming I can replace the index numbers with the names of the columns in quotes instead? I'll try this on my own if you're busy with other forum questions. Thanks, again.
From: Nordlund, Dan (DSHS/RDA) [via R] [mailto:ml-node+s789695n4671267h89 at n4.nabble.com]
Sent: Wednesday, July 10, 2013 5:46 PM
To: Crombie, Burnette N
Subject: Re: create new matrix from user-defined function
> -----Original Message-----
> From: [hidden email]</user/SendEmail.jtp?type=node&node=4671267&i=0> [mailto:r-help-bounces at r-
> project.org<mailto:r-help-bounces at r-%20%0b%3e%20project.org>] On Behalf Of bcrombie
> Sent: Wednesday, July 10, 2013 12:19 PM
> To: [hidden email]</user/SendEmail.jtp?type=node&node=4671267&i=1>
> Subject: [R] create new matrix from user-defined function
>
> #Let's say I have the following data set:
>
> dat3 = data.frame(A_CaseID = c(1881, 1882, 1883, 1884, 1885),
> B_MW_EEsDue1 = c(2, 2, 1, 4, 6),
> C_MW_EEsDue2 = c(5, 5, 4, 1, 6),
> D_MW_EEsDueTotal = c(7, 9, 5, 6, 112))
> dat3
> # A_CaseID B_MW_EEsDue1 C_MW_EEsDue2 D_MW_EEsDueTotal
> # 1 1881 2 5 7
> # 2 1882 2 5 9
> # 3 1883 1 4 5
> # 4 1884 4 1 6
> # 5 1885 6 6 112
>
> # I want to:
> #CREATE A NEW 1-COLUMN MATRIX (of unknown #rows) LISTING ONLY "A"'s
> WHERE "D
> != B + C"
> #THIS COLUMN CAN BE LABELED "MW_EEsDue_ERRORS", and output for this
> example
> should be:
>
> # MW_EEsDue_ERRORS
> # 1 1882
> # 2 1884
> # 3 1885
>
> #What is the best way to do this? Thanks for your time. BNC
>
>
Here is one option, there are many others. Only you can decide what is "best".
data.frame(MW_EEsDue_ERRORS=dat3[dat3[[4]] != dat3[[2]]+dat3[[3]],][[1]])
Hope this is helpful,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
______________________________________________
[hidden email]</user/SendEmail.jtp?type=node&node=4671267&i=2> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
________________________________
If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/create-new-matrix-from-user-defined-function-tp4671250p4671267.html
To unsubscribe from create new matrix from user-defined function, click here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4671250&code=YmNyb21iaWVAdXRrLmVkdXw0NjcxMjUwfC0xMzI5MzM0NzI3>.
NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
--
View this message in context: http://r.789695.n4.nabble.com/create-new-matrix-from-user-defined-function-tp4671250p4671361.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list