[R] very long processing time

Jim Lemon drjimlemon at gmail.com
Wed May 11 03:07:21 CEST 2016


Hi Shashi,
First off, keep the thread on the list. Compare the two statements below:

Jim:  If this method is revealed to us, we may be able to help you.

Shashi: "if this method reveal to me i can help"

Regardless, I will attempt to help. This looks like number 2 - inefficient
code

You appear to be forming a very large vector bit by bit. This is _very_
inefficient. If you want to get the data frame "matrixdata" as a vector:

# this may work
fitness_1_data<-unlist(matrixdata)
# if not, try this
fitness_1_data<-as.vector(as.matrix(matrixdata))

This is written to a file and the file is read and again reformatted into
vectors for processing. If you are able, try to create a _small_ data set
that will be processed in the same way as "matrixdata" (e.g. a 10x10 data
frame):

smalldata<-as.data.frame(matrix(sample(1:100,100,nrow=10))
names(smalldata)<-paste("Col",1:10,sep="")

This will allow you to try out your code without spending a day on each
run. For instance, you can probably substitute:

matrixdata2<-matrixdata[,-1]

for a lot of the code in the second half of your script.

Jim

On Wed, May 11, 2016 at 10:16 AM, SHASHI SETH <sethshashi at rediffmail.com>
wrote:

>
> Hi Jim,
>
> Thanks a lot.. I could not understand what do u mean by "if this method
> reveal to me i can help" I am
> giving full program again and putting comment at calculation part. When I
> execute it, I can see after
> every one minute 29 kb is written in the file. Pls see.
>
>
> fitness_1_data <- c();
> src="dtm_mydata.csv"
> matrixdata <- read.csv(src)
> #get no vector/column from file/matrix
> noofvec <- length(matrixdata)
>
> #set no of records/rows/document
> noofrecords <- length(matrixdata[,1])
> #set row index
> rindex<-1;
> #preapare header
> colindex<-1;
> colList <- colnames(matrixdata)
>
> combine<-"";
>
> vec_fitness_data<- c();
>
> while(colindex <= length(colList))
> {
> fitness_1_data <- append(fitness_1_data,colList[colindex])
>
> colindex<- colindex+1
> }
> #add two additional vector for percentage and cluster
> fitness_1_data <- append(fitness_1_data,"percentage")
> fitness_1_data <- append(fitness_1_data,"Cluster")
> #write.csv(matrix(fitness_1_data, nrow=1), file ="myfile.csv",
> row.names=FALSE)
> write.table(as.list(fitness_1_data), file ="Res_mydata_cycle1.csv",append
> = TRUE,
> row.names=FALSE, col.names=FALSE, sep=",")
>
> #end header record
>
> #while (rindex < 2) #fitness will apply for first record everytime (first
> record will
> be compare with all below records)
>
> nestedloopindex <- 2
>
>
> while( nestedloopindex <= noofrecords )
> {
>
> #init of temperory variables
> sums1 <- 0;
> sums2 <- 0;
> sum <- 0;
>
> #set initial index of column 2 , coloumn one hold document no not
> actual data
> colindex <- 3;
>
> # combine <-"";
>
> vec1 <- c();
> vec2 <- c();
>
> #add document number in vector
> vec1 <- append(vec1,matrixdata[rindex,1]);
> vec2 <- append(vec2,matrixdata[nestedloopindex,1]);
> vec1 <- append(vec1,matrixdata$ID[rindex]);
> vec2 <- append(vec2,matrixdata$ID[nestedloopindex]);
>
>
> baseSum <- 0;
>
> ##############################################Calculation
> Part#######################################
> while(colindex <= noofvec )
> {
>
> baseSum <- baseSum + matrixdata[rindex,colindex]
>
> vec1 <- append(vec1,matrixdata[rindex,colindex]);
> vec2 <- append(vec2,matrixdata[nestedloopindex,colindex]);
>
> sum = sum +
> matrixdata[rindex,colindex]*matrixdata[nestedloopindex,colindex]
>
> sums1 <- sums1 + matrixdata[rindex,colindex]^2;
>
> sums2 <- sums2 + matrixdata[nestedloopindex,colindex]^2;
>
> colindex <- colindex+1
> }
>
> if(sum > 0 && sums1 > 0 && sums2 > 0)
> {
> out <- sum / ((sqrt(sums1) * sqrt(sums2)))
> }else
> {
> out <-0
> }
> #################################### End Calculation
> ################################################
> vec1 <- append(vec1,out);
> vec1 <-append(vec1, "1")
> vec2 <- append(vec2, out);
>
> if(nestedloopindex==2)
> {
> write.table(as.list(vec1), file ="Res_mydata_cycle1.csv",append =
> TRUE, row.names=FALSE, col.names=FALSE, sep=",")
> write.table(as.list(vec2), file ="Res_mydata_cycle1.csv",append =
> TRUE, row.names=FALSE, col.names=FALSE, sep=",")
> nestedloopindex<- nestedloopindex+1
> } else
> {
> write.table(as.list(vec2), file ="Res_mydata_cycle1.csv",append =
> TRUE, row.names=FALSE, col.names=FALSE, sep=",")
> nestedloopindex<- nestedloopindex+1
> }
> }
>
> Thanks,
> Shashi
>
>
>
>
> On Wed, 11 May 2016 03:49:19 +0530 Jim Lemon wrote
> >Hi Shashi,
>
> The assumption that anyone on the list apart from yourself knows what
>
> "some calculation" involves is incorrect. I suspect that "what is
>
> wrong" may be one of two things:
>
>
> 1) "some calculation" includes a very large number of operations,
>
> perhaps leading to "disk-thrashing" when your 16GB of memory is full
>
> of intermediate values. There is no software problem, buy more
>
> hardware.
>
>
>
> 2) "some calculation" is a very inefficient method of getting the
>
> result you want. If this method is revealed to us, we may be able to
>
> help you.
>
>
>
> Jim
>
>
>
>
>
> On Wed, May 11, 2016 at 2:24 AM, SHASHI SETH wrote:
>
> > Hi,
>
> >
>
> >
>
> >
>
> > I have implemented following program in R, that reads data from the
> "dtm_mydata.csv". file size is
>
> >
>
> > 114,029 kB, saved document Term matrix. Prog. performing some
> calculation and writing in a file. my
>
> >
>
> > computer RAM is 16 GB. To execute this program its taking around 25
> hours. can any body help me what
> is
>
> >
>
> > wrong, why this much time is taken. Although it is doing the job what is
> required
>
> >
>
> > fitness_1_data
>
> > [[alternative HTML version deleted]]
>
> >
>
> > ______________________________________________
>
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
> > https://stat.ethz.ch/mailman/listinfo/r-help
>
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> <https://sigads.rediff.com/RealMedia/ads/click_nx.ads/www.rediffmail.com/signatureline.htm@Middle?>
>
> Get your own *FREE* website, *FREE* domain & *FREE* mobile app with
> Company email.
> *Know More >*
> <http://track.rediff.com/click?url=___http://businessemail.rediff.com?sc_cid=sign-1-10-13___&cmp=host&lnk=sign-1-10-13&nsrv1=host>

	[[alternative HTML version deleted]]



More information about the R-help mailing list