[R] data frames, na.omit, and sums

Petr Pikal petr.pikal at precheza.cz
Mon Dec 5 14:05:39 CET 2005


Hi

try to
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

I guess you probably need aggregate function like

aggregate(your.df[,-(1:2)], list(semestr = your.df$sem, year= 
your.df$year), sum, na.rm=T)

Simple working example what you have done, what was Response and how 
it failed your expectations could be helpful.

HTH
Petr



On 4 Dec 2005 at 18:55, Jason Miller wrote:

To:             	r-help at stat.math.ethz.ch
From:           	Jason Miller <millerj at truman.edu>
Date sent:      	Sun, 4 Dec 2005 18:55:06 -0600
Subject:        	[R] data frames, na.omit, and sums

> Dear R-helpers,
> 
> New to R, I'm in the middle of a project that I'm using to force me 
> learn R.  I'm running into some behavior that I don't understand, and 
> I need some advice.  In the last week I've gotten some great advice 
> from the list on visualizing my data, and I was hoping people could 
> help me get over another barrier I've encountered to my progress.
> 
> Before I describe what I'm trying to do and where I'm stuck with R, 
> let me quickly outline what I need help with: (1) summing over the
> non-NA entries in each row of a data frame, and (1) using na.omit()
> and na.action() with rows of data from a frame.
> 
> I have a data frame that contains information about when my academic 
> department offered courses and their enrollments.  The data frame 
> looks something like
> 
> sem     year    C1e C1s C2e C2s
> Fall    1991    10  2   NA  NA
> Spring  1992    3   1   8   1
> Summer  1992    NA  NA  100 10
> 
> where C?e represents a specific course's enrollment that semester and 
> C?s represents the number of sections of that course offered.  The 
> frame is filled with integers and NAs.  The data frame is of medium 
> size, with about 180 columns and 45 rows.
> 
> I need to cull some basic information from this dataset such as:
> (1) total number of sections offered each semester (and each year),
> (2) total number of credit hours generated each semester (and each 
> year), and (3) the student-to-faculty ratio of the department each
> semester (and  each year).
> 
>  From a mathematical standpoint, how to do each of these is obvious 
> to me.  But having to negotiate working withing data frames and with 
> matrices that have NA entries has really gotten me confused
> +frustrated.  (I have no programming background.)
> 
> To calculate (1) above for semester (rows), I know how to select the 
> "sections" columns using grep().  What I'd like to do is sum the 
> selected frame's non-NA entries row-by-row.  For some reason, I was 
> able to do this earlier today using the rowsum() function with 
> na.rm=TRUE, but now it's not working. It complains of non-numeric 
> entries.  (In fact, I was able to use the rowsum() function to 
> calculate (1) for each year.)  When I try to convert the data frame 
> (or a sub-frame) to a matrix, my integers turn into strings/
> characters, and I have no idea what to do with that!
> 
> To calculate (2) above for a semester, I know how to select the 
> enrollment columns using grep().  What I'd like to do is calculate 
> the total credits generated by taking the dot product of each row 
> with a vector whose components are the credit hour values of each 
> course in my data frame.  Of course, I'd nave to account for the NA 
> values in my data frame, but in the past I've had decent luck with 
> using na.omit() and na.action() to select the non-NA components of a 
> vector. Unfortunately, na.omit is absolutely no working with my 
> dataframe; it just returns the names of all the columns!
> 
> Until I get (1) and (2) figured out, I have no hope of figuring out
> (3).
> 
> Thank you for reading this far into this post.  If you have any 
> suggestions for how I can get na.omit() and summing to work for me, 
> I'd appreciate hearing from you.
> 
> Jason Miller
> 
> 
> ================================================================
> Jason E. Miller, Ph.D.
> Associate Professor of Mathematics
> Truman State University
> Kirksville, MO
> http://pyrite.truman.edu/~millerj/
> 660.785.7430
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
petr.pikal at precheza.cz




More information about the R-help mailing list