[R] If Loop I Think

William Michels wjm1 @end|ng |rom c@@@co|umb|@@edu
Thu Oct 24 21:44:23 CEST 2019


Hi Phillip,

Jim and David and Petr all wrote you good code, but you have major
problems in data formatting. Your data uses spaces both as a column
separator and also to denote "blank fields". Because of problems with
your input data structure, it's doubtful whether the good code you've
received will result in the correct baseball answer.

The Arizona Diamondbacks data you posted shows runner positions for
about seven outs of a game (about 1-and-1/6 inning)--I say "about"
because there may be subsequent rows with the same number of outs
listed in row 14. However, rows 10/11 have two blank spaces between
the number-of-outs and a runner_ID (suggesting one "blank field" to
the left of the first runner_ID), while row 12 has three blank spaces
between the number-of-outs and the first runner_ID (suggesting two
"blank fields" to the left of the first runner_ID).

Since bases are loaded in row 9 and no outs are recorded between rows
9 and 10, the game situation suggests that two runners score between
rows 9 and 10 (polla001 and perad001), with the remaining baserunners
ending up on second and third base, not first and second base (best
guess: batter lambj001 hits a double, winds up on second base, and
gets two RBIs). Similarly between rows 11 and 12, goldp001 is removed
as a baserunner and an out is recorded, however no new baserunners
appear. This game situation suggests both runners advancing (e.g. by a
sacrifice fly) with goldp001 scoring and the remaining baserunner
(lambj001) ending up on third base, not second base or first base.

Now if you run the code posted earlier using read.table(), in all
cases you will find blank fields removed between the "outs" column and
the first baserunner listed, so every row of your data with
runners-on-base will have a runner on first-base. Intuitively, you
know this must be wrong (think doubles and triples). The mechanics of
read.table() are such that the field separator character ("sep"
parameter) defaults to 'white space', that is to say, "ONE OR MORE
spaces, tabs, newlines or carriage returns" (capitalization mine). So
multiple white space characters in your file are read as a single
"field separator" separating two adjacent columns.

What you really need to do is export your data in a format that R can
easily understand. There's a possibility that posting your code in
HTML to the R-Help mailing list may have corrupted your data (e.g.
removing tabs and inserting spaces instead), but no matter. You need
to set up a workflow so this **cannot** happen, i.e. start exporting
from a spreadsheet program in ".csv" format and start importing into R
using R's read.csv() function instead. Colleagues have recommended the
book "Beyond Spreadsheets with R" by Dr. Jonathan Carroll to me as a
good introductory text for tackling these issues.

Finally (if you're read this far), the truth is if you work at it a
little bit, you can get the data you posted into R into a reasonable
format using lists (although starting from a ".csv" file may be
conceptually easier for you). Lists are very useful when you have
multiple vectors of different lengths. See the code below (note--I
dropped your first "Row#" column):

> zz <- textConnection("ari18.test3.raw", "w")
> writeLines(con=zz, c("0
+ 1
+ 1
+ 1 arenn001
+ 2 arenn001
+ 0
+ 0 perad001
+ 0 polla001 perad001
+ 0 goldp001 polla001 perad001
+ 0  lambj001 goldp001
+ 1  lambj001 goldp001
+ 2   lambj001
+ 0
+ 1       "))
> close(zz)
> ari18.test3.raw
 [1] "0       "                         "1       "
 [3] "1       "                         "1 arenn001      "
 [5] "2 arenn001      "                 "0       "
 [7] "0 perad001      "                 "0 polla001 perad001     "
 [9] "0 goldp001 polla001 perad001    " "0  lambj001 goldp001    "
[11] "1  lambj001 goldp001    "         "2   lambj001    "
[13] "0       "                         "1       "
> aa <- strsplit(trimws(ari18.test3.raw), split=" ")
> bb <- t(sapply(aa, FUN=function(x) {c(x, rep(NA, length.out=4-length(x)))} ))
> cc <- t(apply(bb[,-1], 1, FUN=function(x) {ifelse(test=nchar(x), yes=1, no=0)} ))
> bb
      [,1] [,2]       [,3]       [,4]
 [1,] "0"  NA         NA         NA
 [2,] "1"  NA         NA         NA
 [3,] "1"  NA         NA         NA
 [4,] "1"  "arenn001" NA         NA
 [5,] "2"  "arenn001" NA         NA
 [6,] "0"  NA         NA         NA
 [7,] "0"  "perad001" NA         NA
 [8,] "0"  "polla001" "perad001" NA
 [9,] "0"  "goldp001" "polla001" "perad001"
[10,] "0"  ""         "lambj001" "goldp001"
[11,] "1"  ""         "lambj001" "goldp001"
[12,] "2"  ""         ""         "lambj001"
[13,] "0"  NA         NA         NA
[14,] "1"  NA         NA         NA
> cc
      [,1] [,2] [,3]
 [1,]   NA   NA   NA
 [2,]   NA   NA   NA
 [3,]   NA   NA   NA
 [4,]    1   NA   NA
 [5,]    1   NA   NA
 [6,]   NA   NA   NA
 [7,]    1   NA   NA
 [8,]    1    1   NA
 [9,]    1    1    1
[10,]    0    1    1
[11,]    0    1    1
[12,]    0    0    1
[13,]   NA   NA   NA
[14,]   NA   NA   NA
>

HTH, Bill.

W. Michels, Ph.D.





On Wed, Oct 23, 2019 at 12:40 AM PIKAL Petr <petr.pikal using precheza.cz> wrote:
>
> Hi
>
> ***do not think in if or if loops in R***.
>
> to elaborate Jim's solution further
>
> With simple function based on logical expression
> fff <- function(x) (x!="")+0
>
> you could use apply
>
> t(apply(phdf[,3:5], 1, fff))
>
> and add results to your data frame columns
> phdf[, 6:8] <- t(apply(phdf[,3:5], 1, fff))
>
> Regarding some tutorial
>
> Basic stuff is in R-intro, there is excellent documentation to each function.
>
> And as R users pool is huge, you could simply ask Google
> e.g.
> r change values based on condition
>
> Cheers
> Petr
>
> > -----Original Message-----
> > From: R-help <r-help-bounces using r-project.org> On Behalf Of Jim Lemon
> > Sent: Wednesday, October 23, 2019 12:26 AM
> > To: Phillip Heinrich <herd_dog using cox.net>
> > Cc: r-help <R-help using r-project.org>
> > Subject: Re: [R] If Loop I Think
> >
> > Hi Philip,
> > Try this:
> >
> > phdf<-read.table(
> > text="Row Outs RunnerFirst RunnerSecond RunnerThird R1 R2 R3
> > 1 0
> > 2 1
> > 3 1
> > 4 1 arenn001
> > 5 2 arenn001
> > 6 0
> > 7 0 perad001
> > 8 0 polla001 perad001
> > 9 0 goldp001 polla001 perad001
> > 10 0  lambj001 goldp001
> > 11 1  lambj001 goldp001
> > 12 2   lambj001
> > 13 0
> > 14 1       ",
> > header=TRUE,stringsAsFactors=FALSE,fill=TRUE)
> > phdf$R1<-ifelse(nchar(phdf$RunnerFirst) > 0,1,0)
> > phdf$R2<-ifelse(nchar(phdf$RunnerSecond) > 0,1,0)
> > phdf$R3<-ifelse(nchar(phdf$RunnerThird) > 0,1,0)
> >
> > Jim
> >
> > On Wed, Oct 23, 2019 at 7:54 AM Phillip Heinrich <herd_dog using cox.net>
> > wrote:
> > >
> > >       Row Outs RunnerFirst RunnerSecond RunnerThird R1 R2 R3
> > >       1 0
> > >       2 1
> > >       3 1
> > >       4 1 arenn001
> > >       5 2 arenn001
> > >       6 0
> > >       7 0 perad001
> > >       8 0 polla001 perad001
> > >       9 0 goldp001 polla001 perad001
> > >       10 0  lambj001 goldp001
> > >       11 1  lambj001 goldp001
> > >       12 2   lambj001
> > >       13 0
> > >       14 1
> > >
> > >
> > >
> > > With the above data, Arizona Diamondbacks baseball, I’m trying to put
> > zeros into the R1 column is the RunnerFirst column is blank and a one if the
> > column has a coded entry such as rows 4,5,7,8,& 9.  Similarly I want zeros in
> > R2 and R3 if RunnerSecond and RunnerThird respectively are blank and ones
> > if there is an entry.
> > >
> > > I’ve tried everything I know how to do such as “If Loops”, “If-Then loops”,
> > “apply”, “sapply”, etc.  I wrote function below and it ran without errors but I
> > have no idea what to do with it to accomplish my goal:
> > >
> > > R1 <- function(x) {
> > >   if (ari18.test3$RunnerFirst == " "){
> > >        ari18.test3$R1 <- 0
> > >        return(R1)
> > >          }else{
> > >            R1 <- ari18.test3$R1 <- 1
> > >            return(R1)
> > >          }
> > >    }
> > >
> > > The name of the data frame is ari18.test3
> > >
> > > On a more philosophical note, data handling in R seems to be made up of
> > thousands of details with no over-riding principles.  I’ve read two books on R
> > and a number of tutorial and watched several videos but I don’t seem to be
> > making any progress.  Can anyone suggest videos, or tutorials, or books that
> > might help?  Database stuff has never been my strong point but I’m
> > determined to learn.
> > >
> > > Thanks,
> > > Philip Heinrich
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list