[R] [FORGED] Newbie Question on R versus Matlab/Octave versus C

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Mon Feb 4 03:09:57 CET 2019


Your code seems to be attempting to modify global variables from within 
functions... R purposely makes this hard to do. Don't fight it. Instead, 
use function arguments and produce function outputs with your functions.

Also, the ifelse function does not control flow of execution of code... it 
selects values between two vectors according to the state of the logical 
input vector. Note that all values in both possible input values must be 
computed when using ifelse before it can do its magic, so ifelse can be 
significantly slower than assigning into an indexed vector if a small 
fraction of the vector will be changing.

Below is some proof-of-concept code. It mostly modifies values in-place 
within the data frame rather than using ifelse.

You might want to read the Intro to R document available through the R 
console via:

RShowDoc("R-intro")

to look up numeric indexing and logical indexing syntax while reading 
through this.

#####################
makeNewWomen <- function( nWomen ) {
   data.frame( isAlive = rep_len( TRUE, nWomen )
             , isPregnant = rep_len( FALSE, nWomen )
             , nChildren = rep_len( 0L, nWomen )
             , age = rep_len( 0, nWomen )
             , dateOfPregnancy = rep_len( 0, nWomen )
             , endDateLastPregnancy = rep_len( 0.0, nWomen )
             )
}

updateWomen <- function( DF
                        , jd
                        , maxAge
                        , timeStep
                        , pregProb
                        , gestation
                        , minBirthAge
                        , maxBirthAge
                        ) {
   DF$isAlive[ maxAge <= DF$age ] <- FALSE
   fertileIdx <- with( DF, isAlive & !isPregnant & minBirthAge <= age & age <= maxBirthAge )
   conceiveIdx <- fertileIdx
   conceiveIdx[ conceiveIdx ] <- sample( c( FALSE, TRUE )
                                       , size = sum( fertileIdx )
                                       , replace = TRUE
                                       , prob = c( 1-pregProb, pregProb )
                                       )
   DF$isPregnant[ conceiveIdx ] <- TRUE
   DF$dateOfPregnancy[ conceiveIdx ] <- jd
   birthIdx <- with( DF, isAlive & isPregnant & ( dateOfPregnancy + gestation ) <= jd )
   femalechild <- sample( c( FALSE, TRUE )
                        , size = sum( birthIdx )  # random within birthing group
                        , replace = TRUE
                        , prob = c( 0.5, 0.5 )
                        )
   DF$isPregnant[ birthIdx ] <- FALSE # pregnancy over
   birthIdx[ birthIdx ] <- femalechild # track births further only where female
   # DF$age <- ifelse( DF$isAlive
   #                 , DF$age + timeStep
   #                 , DF$age
   #                 )
   DF$age[ DF$isAlive ] <- DF$age[ DF$isAlive ] + timeStep
   numNotAlive <- sum( !DF$isAlive )
   numBirths <- sum( birthIdx )
   if ( 0 < numBirths ) { # if needed, start female babies in existing or new rows
     if ( 0 < numNotAlive ) {
       reuseidx <- which( !DF$isAlive )
       if ( numBirths <= numNotAlive ) {
         # can fit all new births into existing DF
         reuseidx <- reuseidx[ seq.int( numBirths ) ]
         DF[ reuseidx, ] <- makeNewWomen( numBirths )
       } else {
         DF[ reuseidx, ] <- makeNewWomen( length( reuseidx ) )
         DF <- rbind( DF
                    , makeNewWomen( numBirths - length( reuseidx ) )
                    )
       }
     } else { # no empty rows in DF
       DF <- rbind( DF
                  , makeNewWomen( numBirths )
                  )
     }
   }
   DF  # return the updated data frame to the caller
}

calculatePopulation <- function( nWomen
                                , maxDate
                                , dpy
                                , pregProb
                                , maxAge
                                , timeStep
                                , gestation
                                , minBirthAge
                                , maxBirthAge
                                , prealloc
                                ) {
   jd <- 0
   nextSampleJd <- jd + dpy
   numSamples <- maxDate %/% dpy
   result <- data.frame( jd = rep( NA, numSamples )
                       , NAlive = rep( NA, numSamples )
                       , NPreg = rep( NA, numSamples )
                       , NNotAlive = rep( NA, numSamples )
                       )
   i <- 1L
   DF <- makeNewWomen( prealloc )
   DF$isAlive <- seq.int( prealloc ) <= nWomen # leave most entries "dead"
   while( jd < maxDate ) {
     DF <- updateWomen( DF
                      , jd
                      , maxAge
                      , timeStep
                      , pregProb
                      , gestation
                      , minBirthAge
                      , maxBirthAge
                      )
     if ( nextSampleJd <= jd ) {
       result$jd[ i ] <- jd
       result$NAlive[ i ] <- sum( DF$isAlive )
       result$NPreg[ i ] <- sum( DF$isPregnant )
       result$NNotAlive <- sum( !DF$isAlive )
       nextSampleJd <- nextSampleJd + dpy
       i <- i + 1L
     }
     # Do various other things
     jd <- jd + timeStep
   }
   result
}

nWomen <- 5
numberOfYears <- 30
maxDate <- 300 * 365
dpy <- 365
pregProb <- 0.01
maxAge <- 50 * 365
minBirthAge <- 18 * 365
maxBirthAge <- 45 * 365
timeStep <- 30
gestation <- 30 * 9
prealloc <- 10000
set.seed(42)
simresult <- calculatePopulation( nWomen
                                 , maxDate
                                 , dpy
                                 , pregProb
                                 , maxAge
                                 , timeStep
                                 , gestation
                                 , minBirthAge
                                 , maxBirthAge
                                 , prealloc
                                 )

plot( simresult$jd/365, simresult$NAlive )
plot( simresult$jd/365, simresult$NNotAlive )
plot( simresult$jd/365, simresult$NPreg )
#####################

On Sat, 2 Feb 2019, Alan Feuerbacher wrote:

> On 1/29/2019 11:50 PM, Jeff Newmiller wrote:
>> On Tue, 29 Jan 2019, Alan Feuerbacher wrote:
>> 
>>> After my failed attempt at using Octave, I realized that most likely the 
>>> main contributing factor was that I was not able to figure out an 
>>> efficient data structure to model one person. But C lent itself perfectly 
>>> to my idea of how to go about programming my simulation. So here's a 
>>> simplified pseudocode sort of example of what I did:
>> 
>> Don't model one person... model an array of people.
>> 
>>> To model a single reproducing woman I used this C construct:
>>> 
>>> typedef struct woman {
>>>  int isAlive;
>>>  int isPregnant;
>>>  double age;
>>>  . . .
>>> } WOMAN;
>> 
>> # e.g.
>> Nwomen <- 100
>> women <- data.frame( isAlive = rep( TRUE, Nwomen )
>>                     , isPregnant = rep( FALSE, Nwomen )
>>                     , age = rep( 20, Nwomen )
>>                     )
>> 
>>> Then I allocated memory for a big array of these things, using the C 
>>> malloc() function, which gave me the equivalent of this statement:
>>> 
>>> WOMAN women[NWOMEN];  /* An array of NWOMEN woman-structs */
>>> 
>>> After some initialization I set up two loops:
>>> 
>>> for( j=0; j<numberOfYears; j++) {
>>>  for(i=1; i< numberOfWomen; i++) {
>>>    updateWomen();
>>>  }
>>> }
>> 
>> for ( j in seq.int( numberOfYears ) {
>>    # let vectorized data storage automatically handle the other for loop
>>    women <- updateWomen( women )
>> }
>> 
>>> The function updateWomen() figures out things like whether the woman 
>>> becomes pregnant or gives birth on a given day, dies, etc.
>> 
>> You can use your "fixed size" allocation strategy with flags indicating 
>> whether specific rows are in use, or you can only work with valid rows and 
>> add rows as needed for children... best to compute a logical vector that 
>> identifies all of the birthing mothers as a subset of the data frame, and 
>> build a set of children rows using the birthing mothers data frame as 
>> input, and then rbind the new rows to the updated women dataframe as 
>> appropriate. The most clear approach for individual decision calculations 
>> is the use of the vectorized "ifelse" function, though under certain 
>> circumstances putting an indexed subset on the left side of an assignment 
>> can modify memory "in place" (the functional-programming restriction 
>> against this is probably a foreign idea to a dyed-in-the-wool C programmer, 
>> but R usually prevents you from modifying the variable that was input to a 
>> function, automatically making a local copy of the input as needed in order 
>> to prevent such backwash into the caller's context).
>
> Hi Jeff,
>
> I'm well along in implementing your suggestions, but I don't understand the 
> last paragraph. Here is part of the experimenting I've done so far:
>
> *=======*=======*=======*=======*=======*=======*
> updatePerson <- function() {
>  ifelse( women$isAlive,
>    {
> # Check whether to kill off this person, if she's pregnant whether
> # to give birth, whether to make her pregnant again.
>      women$age = women$age + timeStep
> # Check if the person has reached maxAge
>    }
>  )
> }
>
> calculatePopulation <- function() {
>  lastDate = 0
>  jd = 0
>  while( jd < maxDate ) {
>    for( i in seq_len( nWomen ) ) {
>      updatePerson();
>    }
>    todaysDateInt = floor(jd/dpy)
>    NAlive[todaysDateInt] = nWomen - nDead
> # Do various other things
>    todaysDate = todaysDate + timeStep
>    jd = jd + timeStep
>  }
> }
>
> nWomen <- 5
> numberOfYears <- 30
> women <- data.frame( isAlive = rep_len( TRUE, nWomen )
>                   , isPregnant = rep_len( FALSE, nWomen )
>                   , nChildren = rep_len( 0L, nWomen )
>                   , ageInt = rep_len( 0L, nWomen )
>                   , age = rep_len( 0, nWomen )
>                   , dateOfPregnancy = rep_len( 0, nWomen )
>                   , endDateLastPregnancy = rep_len( 0.0, nWomen )
>                   , minBirthAge = rep_len( 0, nWomen )
>                   , maxBirthAge = rep_len( 0, nWomen )
>                   )
>
> # . . .
>
>  calculatePopulation()
>
> *=======*=======*=======*=======*=======*=======*
>
> The above code (in its complete form) executes without errors. I don't 
> understand at least two things:
>
> In the updatePerson function, in the ifelse statement, how do I change the 
> appropriate values in the women dataframe?
>
> I don't understand most of your last paragraph at all.
>
> Thanks so much for your help in learning R!
>
> Alan
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil using dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------


More information about the R-help mailing list