[R] Extract student ID that match certain criteria

roslinazairimah zakaria roslinaump at gmail.com
Wed Mar 15 23:34:18 CET 2017


Hi Rui,

Both functions work beautifully.

I really appreciate your help and others very much.

Thank you

On Wed, Mar 15, 2017 at 10:46 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

> Hello,
>
> I believe your request is a bit confusing since you say you want to filter
> the student id but then you have many years in dt_all3 and only one program
> ("IJAZAH SARJANA MUDA"). So I've written two simple functions, one to
> filter by year and the other by program.
>
>
> fun1 <- function(x, year){
>         inx <- substr(x[["STUDENT_ID"]], 3, 4) == as.character(year)
>         x[inx, ]
> }
>
> fun2 <- function(x, program){
>         inx <- x[["PROGRAM"]] == program
>         x[inx, ]
> }
>
> fun1(dt_all2, 14)  # filter by year = 14
> fun2(dt_all2, "IJAZAH SARJANA MUDA")
>
> Hope this helps,
>
> Rui Barradas
>
>
>
> Em 15-03-2017 13:49, roslinazairimah zakaria escreveu:
>
>> Hi Caitlin,
>>
>> I tried so many ways as suggested but unsuccessful...and I realise that I
>> need to filter the student ID and their CGPA, but if I change the ID into
>> character I lost the CGPA value.  It is easy to do in excel, however a bit
>> time consuming and trying to do in R.
>>
>> I have these data:
>>
>> dput(dt_all2)
>> structure(list(FAC_CODE = structure(c(2L, 2L, 2L, 4L, 1L, 1L,
>> 4L, 7L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 5L, 6L), .Label = c("FKASA",
>> "FKEE", "FKKSA", "FKM", "FKP", "FSKKP", "FTK"), class = "factor"),
>>      STUDENT_ID = structure(c(9L, 6L, 7L, 17L, 2L, 3L, 18L, 19L,
>>      13L, 12L, 14L, 15L, 16L, 10L, 8L, 1L, 5L, 11L, 4L), .Label =
>> c("AA14068",
>>      "AB15103", "AB15124", "CC14107", "EA13043", "EB14059", "EB14073",
>>      "EB14101", "EC14021", "EC15063", "FB14085", "KA13142", "KA13143",
>>      "KA13156", "KE13034", "KE13046", "MA14071", "MA14115", "PA13048"
>>      ), class = "factor"), PROGRAM = structure(c(2L, 1L, 1L, 2L,
>>      1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L
>>      ), .Label = c("DIPLOMA", "IJAZAH SARJANA MUDA"), class = "factor"),
>>      CGPA = c(2.42, 3.27, 1.98, 2.85, 2.24, 3.01, 3.31, 2.88,
>>      3.61, 3.69, 3.2, 3.85, 3.63, 2.67, 2.35, 2.74, 1.96, 2.89,
>>      2.59)), .Names = c("FAC_CODE", "STUDENT_ID", "PROGRAM", "CGPA"
>> ), class = "data.frame", row.names = c(NA, -19L))
>>
>> and I want to filter my data as follows:
>>
>> dput(dt_all3)
>>>
>> structure(list(FAC_CODE = structure(c(2L, 2L, 4L, 4L, 5L, 1L,
>> 6L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("FKASA", "FKEE", "FKKSA",
>> "FKM", "FKP", "FTK"), class = "factor"), STUDENT_ID = structure(c(4L,
>> 3L, 11L, 12L, 5L, 1L, 13L, 7L, 6L, 8L, 9L, 10L, 2L), .Label = c("AA14068",
>> "EA13043", "EC14021", "EC15063", "FB14085", "KA13142", "KA13143",
>> "KA13156", "KE13034", "KE13046", "MA14071", "MA14115", "PA13048"
>> ), class = "factor"), PROGRAM = structure(c(1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "IJAZAH SARJANA MUDA", class =
>> "factor"),
>>      CGPA = c(2.67, 2.42, 2.85, 3.31, 2.89, 2.74, 2.88, 3.61,
>>      3.69, 3.2, 3.85, 3.63, 1.96)), .Names = c("FAC_CODE", "STUDENT_ID",
>> "PROGRAM", "CGPA"), class = "data.frame", row.names = c(NA, -13L
>> ))
>>
>> I would like to select the student id where the third and fourth value
>> represent the year they register data is eg. AA15..., AE14,... and I would
>> also to select their cgpa value.
>>
>> Thank you.
>>
>> On Mon, Mar 13, 2017 at 2:26 PM, roslinazairimah zakaria <
>> roslinaump at gmail.com> wrote:
>>
>> Thank you so much for your help.
>>>
>>> On Mon, Mar 13, 2017 at 1:52 PM, bioprogrammer <bioprogrammer at gmail.com>
>>> wrote:
>>>
>>> Hi.
>>>>
>>>> I would use the "substr" function:
>>>>
>>>> https://stat.ethz.ch/R-manual/R-devel/library/base/html/substr.html
>>>>
>>>> ...assuming you're working with character data.
>>>>
>>>> Another useful skill involves working with regular expressions.
>>>>
>>>> http://www.endmemo.com/program/R/grep.php
>>>>
>>>> http://regular-expressions.mobi/tutorial.html
>>>>
>>>> Hope these help :)
>>>>
>>>> ~Caitlin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Sent from my T-Mobile 4G LTE Device
>>>>
>>>>
>>>> -------- Original message --------
>>>> From: roslinazairimah zakaria <roslinaump at gmail.com>
>>>> Date:03/12/2017 10:18 PM (GMT-07:00)
>>>> To: Bert Gunter <bgunter.4567 at gmail.com>
>>>> Cc: r-help mailing list <r-help at r-project.org>
>>>> Subject: Re: [R] Extract student ID that match certain criteria
>>>>
>>>> Another question,
>>>>
>>>> How do I extract ID based on the third and fourth letter:
>>>>
>>>> I have for example, AA14004, AB15035, CB14024, PA14009, PA14009 etc
>>>>
>>>> I would like to extract ID no. of AB14..., CB14..., PA14...
>>>>
>>>> On Mon, Mar 13, 2017 at 12:37 PM, roslinazairimah zakaria <
>>>> roslinaump at gmail.com> wrote:
>>>>
>>>> Hi Bert,
>>>>>
>>>>> Thank you so much for your help.  However I don't really sure what is
>>>>>
>>>> the
>>>>
>>>>> use of y values.  Can we do without it?
>>>>>
>>>>> x <- as.character(FKASA$STUDENT_ID)
>>>>> y <- c(1,786)
>>>>> My.Data <- data.frame (x,y)
>>>>>
>>>>> My.Data[grep("^AA14", My.Data$x), ]
>>>>>
>>>>> I got the following data:
>>>>>
>>>>>            x   y
>>>>> 1   AA14068   1
>>>>> 7   AA14090   1
>>>>> 11  AA14099   1
>>>>> 14  AA14012 786
>>>>> 15  AA14039   1
>>>>> 22  AA14251 786
>>>>>
>>>>> On Mon, Mar 13, 2017 at 11:51 AM, Bert Gunter <bgunter.4567 at gmail.com>
>>>>> wrote:
>>>>>
>>>>> 1. Your code is incorrect. All entries are character strings and must
>>>>>>
>>>>> be
>>>>
>>>>> quoted.
>>>>>>
>>>>>> 2. See ?grep  and note in particular (in the "Value" section):
>>>>>>
>>>>>> "grep(value = TRUE) returns a character vector containing the selected
>>>>>> elements of x (after coercion, preserving names but no other
>>>>>> attributes)."
>>>>>>
>>>>>>
>>>>>> 3. While the fixed = TRUE option will work here, you may wish to learn
>>>>>> about "regular expressions", which can come in very handy for
>>>>>> character string manipulation tasks. ?regex in R has a terse, but I
>>>>>> have found comprehensible, discussion. There are many good gentler
>>>>>> tutorials on the web, also.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Bert
>>>>>>
>>>>>> Bert Gunter
>>>>>>
>>>>>> "The trouble with having an open mind is that people keep coming along
>>>>>> and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 12, 2017 at 8:32 PM, roslinazairimah zakaria
>>>>>> <roslinaump at gmail.com> wrote:
>>>>>>
>>>>>>> Dear r-users,
>>>>>>>
>>>>>>> I have this list of student ID,
>>>>>>>
>>>>>>> dt <- c(AA14068, AA13194, AE11054, AA12251, AA13228, AA13286,
>>>>>>>
>>>>>> AA14090,
>>>>
>>>>> AA13256, AA13260, AA13291, AA14099, AA15071, AA13143, AA14012,
>>>>>>>
>>>>>> AA14039,
>>>>
>>>>> AA15018, AA13234, AA13149, AA13282, AA13218)
>>>>>>>
>>>>>>> and I would like to extract all student of ID AA14... only.
>>>>>>>
>>>>>>> I search and tried substrt, subset and select but it fail.
>>>>>>>
>>>>>>>   substr(FKASA$STUDENT_ID, 2, nchar(string1))
>>>>>>> Error in nchar(string1) : 'nchar()' requires a character vector
>>>>>>>
>>>>>>>> subset(FKASA, STUDENT_ID=="AA14" )
>>>>>>>>
>>>>>>>   [1] FAC_CODE    FACULTY     STUDENT_ID  NAME        PROGRAM
>>>>>>>
>>>>>> KURSUS
>>>>
>>>>>   CGPA        ACT_SS      ACT_VAL     ACT_CS      ACT_LED     ACT_PS
>>>>>>>   ACT_IM
>>>>>>> [14] ACT_ENT     ACT_CRE     ACT_UNI     ACT_VOL...
>>>>>>>
>>>>>>> Thank you so much for your help.
>>>>>>>
>>>>>>> How do I do it?
>>>>>>> --
>>>>>>> *Roslinazairimah Zakaria*
>>>>>>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766
>>>>>>>
>>>>>> <+60%209-549%202766>*
>>>>
>>>>>
>>>>>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>;
>>>>>>> roslinaump at gmail.com <roslinaump at gmail.com>*
>>>>>>> Faculty of Industrial Sciences & Technology
>>>>>>> University Malaysia Pahang
>>>>>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>>>>>>
>>>>>>>          [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>>>>>
>>>>>> ng-guide.html
>>>>>>
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Roslinazairimah Zakaria*
>>>>> *Tel: +609-5492370 <+60%209-549%202370> <+60%209-549%202370>; Fax. No.
>>>>>
>>>> +609-5492766 <+60%209-549%202766>
>>>>
>>>> <+60%209-549%202766>*
>>>>>
>>>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>;
>>>>> roslinaump at gmail.com <roslinaump at gmail.com>*
>>>>> Faculty of Industrial Sciences & Technology
>>>>> University Malaysia Pahang
>>>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Roslinazairimah Zakaria*
>>>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766
>>>> <+60%209-549%202766>*
>>>>
>>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>;
>>>> roslinaump at gmail.com <roslinaump at gmail.com>*
>>>> Faculty of Industrial Sciences & Technology
>>>> University Malaysia Pahang
>>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>>
>>> --
>>> *Roslinazairimah Zakaria*
>>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766
>>> <+60%209-549%202766>*
>>>
>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>;
>>> roslinaump at gmail.com <roslinaump at gmail.com>*
>>> Faculty of Industrial Sciences & Technology
>>> University Malaysia Pahang
>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>>
>>>
>>
>>
>>


-- 
*Roslinazairimah Zakaria*
*Tel: +609-5492370; Fax. No.+609-5492766*

*Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>;
roslinaump at gmail.com <roslinaump at gmail.com>*
Faculty of Industrial Sciences & Technology
University Malaysia Pahang
Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia

	[[alternative HTML version deleted]]



More information about the R-help mailing list