[R] trouble with looping for effect of sampling interval increase

Tue Aug 7 16:41:06 CEST 2012

My apologies, here is a sample dataset generator:

#Running sum Test Data
Coin <- c(-1,1)
flips=sample(Coin, 1000, replace=T)
Runningsum <-cumsum (flips)
#A deactivated plot
#plot (Runningsum)
Test <- cbind (Runningsum)
datasetORIGINAL  <- cbind (Runningsum)

________________________________________
From: Jean V Adams [jvadams at usgs.gov]
Sent: Monday, August 06, 2012 1:33 PM
To: White, William Patrick
Cc: r-help at r-project.org
Subject: Re: [R] trouble with looping for effect of sampling interval increase

You would make it much easier for R-help readers to solve your problem if you provided a small example data set with your code, so that we could reproduce your results and troubleshoot the issues.

Jean

Naidraug <white.232 at wright.edu> wrote on 08/05/2012 09:08:25 AM:
>
> I've looked everywhere and tinkered for three days now, so I figure asking
> might be good.
> So here's a general rundown of what I am trying to get my code to do I am
> giving you the whole rundown because I need a solution that retain certain
> ways of doing things because they give me the information i need.
> I want to examine the effect of increasing my sampling interval on my data.
> Example: what if instead of sampling every hour I sampled every two, oh
> yeah, how about every three?.. etc ad nausea.  How I want to do this is to
> take the data I have now, add an index  to it, that contains counters. Those
> counters will look something like 1,2,1,2,.. for the first one,
> 1,2,3,1,2,3.. for the next one. I have a lot of them, like say a thousand...
> Then for each column in the index my loops should start in the first column,
> run only the ones, store that, then run the twos, and store that in the same
> column of output in a different row. Then move to the next column run the
> ones, store in the next column of output, run the twos, store in the next
> row of that column, run the threes, etc on out until there is no more. I
> want to use this index for a number of reasons. The first is that after this
> I will be going back through and using a different method for sub-sampling
> but keeping all else the same. So all I have to do there is change the way I
> generate the index. The second is that it allows me to run  many subsamples
> and see their range.  So the code I have made, generates my index, and does
> the heavy lifting all correctly, as well as my averages, and quartiles, but
> a look at the head () of my key output (IntervalBetas)  shows that something
> has gone a miss. You have to look close to catch it.  The values generated
> for each row of output are identical, this should not be the case, as row
> one of the first output column should be generated from all values indexed
> by a one in the first column, whereas in column two there are different
> values indexed by the number one. I've checked about everything I can think
> of, done print() on my loop sequence things (those little i and j) and
> wiggled about everything. I am flummoxed. I think the bit that is messing up
> is in here :
> #Here is the loop for betas from sampling interval increase
>  c <- WHOLESIZE[2]-1
>  for (i in 1:c)
>  {
>  x <- length(unique(index[,i]))
>
>  for (j in 1:x)
>  {
>
>  data <- WHOLE [WHOLE[,x]==j,1]
>
> But also here is the whole code in case I am wrong that that is the problem
> area:
>
> #loop for making index
>
>
>  #clean dataset of empty cells
>  dataset <- na.omit (datasetORIGINAL)
>  #how messed up was the data?
>  holeyDATA <- datasetORIGINAL - dataset
>
>  D <- dim(dataset)
>
> #what is the smallest sample?
> tinysample <- 100
>
>
>
>
> #how long is the dataset?
>  datalength <- length (dataset)
>
>
>  #MD <- how many divisions
>
> MD <- datalength/tinysample
>
>  #clear things up for the index loop
>  WHOLE <- NULL
> index <- NULL
>  #do the index loop
>
>  for (a in 1:MD)
>  {
>  index <- cbind (index, rep (1:a, length = D[1]))
>  }
> index <- subset(index, select = -c(1) )
>
>  #merge dataset and index loop
>  WHOLE <- cbind (dataset, index)
>
>  WHOLESIZE <- dim (WHOLE)
>
> #Housekeeping before loops
> IntervalBetas <- NULL
>
>
> IntervalBetas <- c(NA,NA)
> IntervalBetas <- as.data.frame (IntervalBetas)
> IntervalLowerQ <- NULL
> IntervalUpperQ <- NULL
> IntervalMean <- NULL
> IntervalMedian <- NULL
>
> #Here is the loop for betas from sampling interval increase
>  c <- WHOLESIZE[2]-1
>  for (i in 1:c)
>  {
>  x <- length(unique(index[,i]))
>
>  for (j in 1:x)
>  {
>
>  data <- WHOLE [WHOLE[,x]==j,1]
>
>
>
>
>  #get power spectral density
>
>  PSDPLOT <- spectrum (data, detrend = TRUE, plot = FALSE)
>  frequency <- PSDPLOT$freq
>  PSD <- PSDPLOT$spec
>  #log transform the power spectral density
>  Logfrequency <- log(frequency)
>  LogPSD<- log(PSD)
>  #fit my line to the data
>  Line <- lm (LogPSD ~ Logfrequency)
>  #store the slope of the line
>  Betas <- rbind (Betas, -coef(Line)[2])
>
> #Get values on the curve shape
> BSkew <- skew (Betas)
> BMean <- mean (Betas)
> BMedian <- median (Betas)
> Q <- quantile (Betas)
>
>
> #store curve shape values
> IntervalLowerQ <- rbind (IntervalLowerQ , Q[2])
> IntervalUpperQ <- rbind (IntervalUpperQ , Q[4])
> IntervalSkew <- rbind (IntervalSkew , BSkew)
> IntervalMean <- rbind (IntervalMean , BMean)
> IntervalMedian <- rbind (IntervalMedian , BMedian)
>
> #Store the Betas
> #This is a pain
>
>
> BetaSave <- Betas
> no.r <- nrow(IntervalBetas)
> l.v <- length(BetaSave)
> difer <- no.r - l.v
> difers <- abs(difer)
> if (no.r < l.v){
> IntervalBetas <- rbind(IntervalBetas,rep(NA,difers))
> }
> else {
> (BetaSave <- rbind(BetaSave,rep(NA,difers)))
> }
>
> IntervalBetas <- cbind (IntervalBetas, BetaSave)
>
>
>  }
>
>  }
>
> #That ends the loop within a loop for how sampling interval
> #changes beta
> head (IntervalBetas)