[R] subset data using a vector

DIGHE, NILESH [AG/2362] nilesh.dighe at monsanto.com
Mon Nov 23 17:05:32 CET 2015


Dear R users,
                I like to split my data by a vector created by using variable "ranges".  This vector will have the current range (ranges), preceding range (ranges - 1), and post range (ranges + 1) for a given plotid.  If the preceding or post ranges in this vector are outside the levels of ranges in the data set then I like to drop those ranges and only include the ranges that are available.  Variable "rangestouse" includes all the desired ranges I like to subset a given plotid.  After I subset these dataset using these desired ranges, then I like to extract the yield data for checks in those desired ranges and adjust yield of my data by dividing yield of a given plotid with the check average for the desired ranges.

I have created this function (fun1) but when I run it, I get the following error:

Error in m1[[i]] : subscript out of bounds

Any help will be highly appreciated!
Thanks, Nilesh

Dataset:
dput(mydata)
structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3",
"4"), class = "factor"), cols = structure(c(1L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"),
    plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
    290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
    374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
    379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
    399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
    480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
    477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9,
    4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5,
    5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6,
    5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5,
    5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9,
    5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8
    ), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
    1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
    2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check",
    "variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
    ), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2",
    "1,2,3", "2,3,4", "3,4"), class = "factor")), .Names = c("rows",
"cols", "plotid", "yield", "linecode", "ranges", "rangestouse"

), class = "data.frame", row.names = c(NA, -64L))

Function:

fun1<- function (dataset, plot.id, ranges2use, control)

{

    m1 <- strsplit(as.character(dataset$ranges2use), ",")

    dat1 <- data.frame()

    m2 <- c()

    row_check_mean <- c()

    row_check_adj_yield <- c()

    x <- length(plot.id)

    for (i in (1:x)) {

        m2[i] <- m1[[i]]

        dat1 <- dataset[dataset$ranges %in% m2[i], ]

        row_check_mean[i] <- tapply(dat1$trait, dat1$control,

            mean, na.rm = TRUE)[1]

        row_check_adj_yield[i] <- ifelse(control[i] == "variety",

            trait[i]/dataset$row_check_mean[i], trait[i]/trait[i])

    }

    data.frame(dataset, row_check_adj_yield)

}

Apply function:
fun1(mydata, plot.id=mydata$plotid, ranges2use = mydata$rangestouse,control=mydata$linecode)

Error:

Error in m1[[i]] : subscript out of bounds

Session info:

R version 3.2.1 (2015-06-18)

Platform: i386-w64-mingw32/i386 (32-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1



locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252



attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base



loaded via a namespace (and not attached):

 [1] magrittr_1.5    plyr_1.8.3      tools_3.2.1     reshape2_1.4.1  Rcpp_0.12.1     stringi_1.0-1

 [7] grid_3.2.1      agridat_1.12    stringr_1.0.0   lattice_0.20-31


Nilesh Dighe
(806)-252-7492 (Cell)
(806)-741-2019 (Office)


This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you are obligated to comply with all
applicable U.S. export laws and regulations.

	[[alternative HTML version deleted]]



More information about the R-help mailing list