[R] Basic question for subset of dataframe

Ivan Calandra ivan.calandra at univ-fcomte.fr
Thu Feb 27 16:46:01 CET 2014


Hi,

Thanks for the example!

I cannot really tell you why you get what you get when you type 
leadership[leadership$country == "US"]

But what I know (or think I know) is that when you don't write the 
comma, R will take it as a condition for the columns.
It means that leadership[1:2] is identical to leadership[,1:2]
identical(leadership[1:2],leadership[,1:2])
[1] TRUE

If you want all rows where "US" is present in "country", then you did it 
fine using leadership[leadership$country == "US", ]

HTH,
Ivan

--
Ivan Calandra, ATER
Université de Franche-Comté
UFR STGI - UMR 6249 Chrono-Environnement
4 Place Tharradin - BP 71427
25211 Montbéliard Cedex, FRANCE
ivan.calandra at univ-fcomte.fr
http://biogeosciences.u-bourgogne.fr/calandra

Le 27/02/14 16:00, Kapil Shukla a écrit :
> All - firstly apology if this is a very basic question but i tried myself
> and could not find a satisfied answer.
>
> I know that i can subset a dataframe using dataframe[row,column] and if i
> give dataframe[row,] that specific row is provided and similarly i can do
> dataframe[,column] to get the entire column.
>
> what i don't understand is that if i do dataframe[<conditional
> expression>]and don't provide the 'comma' what is being returned
>
> e.g. i have the below code:
>
> manager <- c(1, 2, 3, 4, 5)
> date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09")
> country <- c("US", "US", "UK", "UK", "UK")
> gender <- c("M", "F", "F", "M", "F")
> age <- c(32, 45, 25, 39, 99)
> q1 <- c(5, 3, 3, 3, 2)
> q2 <- c(4, 5, 5, 3, 2)
> q3 <- c(5, 2, 5, 4, 1)
> q4 <- c(5, 5, 5, NA, 2)
> q5 <- c(5, 5, 2, NA, 1)
> leadership <- data.frame(manager, date, country, gender, age, q1, q2, q3,
> q4, q5, stringsAsFactors=FALSE)
>
> now if i do
>
>
> leadership[leadership$country == "US",]
>
> two row are being returned as
>
>
>
>    managerID JoinDate country gender age q1 q2 q3 q4 q5 agecat
> 1         1 10/24/08      US      M  32  5  4  5  5  5  Young
> 2         2 10/28/08      US      F  45  3  5  2  5  5  Young
>
>
> but if i do
>
> leadership[leadership$country == "US"] to get the entire data frame
> where country is US i am getting below
>
>
>    managerID JoinDate q1 q2 agecat
> 1         1 10/24/08  5  4  Young
> 2         2 10/28/08  3  5  Young
> 3         3  10/1/08  3  5  Young
> 4         4 10/12/08  3  3  Young
> 5         5   5/1/09  2  2   <NA>
>
>
>
> Please guide me what am i doing wrong.
>
>
> Thanks
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list