[R] Working with data frames

Sun Shine phaedrusv at gmail.com
Thu Dec 11 18:05:32 CET 2014


Hello William, Ivan and Jim

I appreciate your replies.

I did suppress the factors using stringsAsFactors=FALSE and in that way 
was able to progress some more on getting a sense of the data set, so 
thanks for that suggestion. I had previously overlooked it.

Also thanks William, I never understood what those thick line segs were 
- now I do. That had been about the best I could get by that point and 
still not with the names on the x axis.

Unfortunately using William's suggestion of 'with' gave me errors:

 > with(MHP.def, {plot(as.integer(MHP.def$Names),cH.E, axes=FALSE, 
xlab='Area') axis(side=2) axis(side=1, 
at=seq_along(levels(MHP.def$Names)), lab=levels(MHP.def$Names))})

Error: unexpected symbol in "with(MHP.def, 
{plot(as.integer(MHP.def$Names), MHP.def$cH.E, axes=FALSE, xlab='Area') 
axis"

This may have something to do with the period between cH and E or 
perhaps from the $ to access data from a column?

I have now installed ggplot2 and with the help of the graphics cookbook 
will see if I can make some headway like this, at least for now. I think 
William's suggestion about learning to work with factors is 
fundamentally sound and something I will need to get my head around. For 
now though, I think I'll stick to exploring ggplot2 so that I can 
visualise this data set more easily.

Thanks again.

Best

Sun

On 11/12/14 16:06, William Dunlap wrote:
> Here is a reproducible example
>   > d <- read.csv(text="Name,Age\nBob,2\nXavier,25\nAdam,1")
>   > str(d)
>   'data.frame':   3 obs. of  2 variables:
>    $ Name: Factor w/ 3 levels "Adam","Bob","Xavier": 2 3 1
>    $ Age : int  2 25 1
>
> Do you get something similar?  If not, show us what you have (you
> could trim it down to a few columns).
>
> Let's try some plots.
>     > plot(d$Age)
> This shows a plot of d$Age (on y axis) vs "Index", where Index is
> 1:length(d$Age).  The points are at (1,2), (2,25), and (3,1). You gave
> plot() no information about what should be on the x axis so it gave
> you the index numbers.
>
> Now asking for d$Name on the x axis and d$Age on the y.
>     > plot(d$Name, d$Age)
> This put the names, in alphabetical order on the x axis. The y axis
> ranges from about 0 to 25 and neither axis is labelled. There are
> thick horizontal line segments where you expect the the points to
> be.  These are degenerate boxplots - when you ask to plot a
> 'factor' variable on the x axis and numbers on the y you get such
> a plot.
>
> Some folks suggested you avoid factors by adding stringsAsFactors=FALSE
> (or as.is <http://as.is>=TRUE) to your call to read.csv.  Let's try that
>   > d2 <- read.csv(stringsAsFactors=FALSE,
>         text="Name,Age\nBob,2\nXavier,25\nAdam,1")
>   > plot(d2$Name, d2$Age)
>   Error in plot.window(...) : need finite 'xlim' values
>   In addition: Warning messages:
>   1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
>   2: In min(x) : no non-missing arguments to min; returning Inf
>   3: In max(x) : no non-missing arguments to max; returning -Inf
> You get no plot at all.
>
> You can get closer to what I think you want with
>   with(d, {
>     plot(as.integer(Name), Age, axes=FALSE, xlab="Name")
>     axis(side=2) # draw the usual y axis
>     axis(side=1, at=seq_along(levels(Name)), lab=levels(Name))
>   })
> If you want the names in a different order on the x axis, then reconstruct
> the factor object d$Name with a different order of levels.  E.g.,
>   d$Name <- factor(d$Name, levels=c("Xavier", "Bob", "Adam"))
> and replot.
>
> There are various plotting packages, e.g., ggplot2, that can make this
> sort of thing easier, but I think the recommendation not to use factors
> is wrong.  You do need to learn how to use them to your advantage.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
> On Thu, Dec 11, 2014 at 5:00 AM, Sun Shine <phaedrusv at gmail.com 
> <mailto:phaedrusv at gmail.com>> wrote:
>
>     Hello
>
>     I am struggling with data frames and would appreciate some help
>     please.
>
>     I have a data set of 13 observations and 80 variables. The first
>     column is the names of different political area boundaries (e.g.
>     MHad, LBNW, etc), the first row is a vector of variable names
>     concerning various census data (e.g. age.T, hse.Unk, etc.). The
>     first cell [1,1] is blank.
>
>     I have loaded this via read.csv('path.to/data.set.csv'
>     <http://path.to/data.set.csv%27>), and now want to run some
>     analyses on this data frame. If I want to get a list of the names
>     of the political areas (i.e. the first column), the result is a
>     vector of numbers which appear to correlate with the factors, but
>     I don't get the text names, just the corresponding number. So, if
>     I want to plot something basic, like the area that uses the most
>     gas for central heating, for example:
>
>     > plot(data.set$ch.Gas)
>
>     The result is the y-axis gives the gas usage for the areas, but
>     the x-axis gives only the numbers of the areas, not the names of
>     the areas (which is preferred).
>
>     So, two questions:
>
>     (1) have I set up my csv file correctly to be read as a data frame
>     as the first row of all of the remaining columns with the values
>     for that political area in the corresponding row in the column
>     with the specific variable name? So far, looking through tutorials
>     and books seems to suggest yes, but at this point I'm no longer sure.
>
>     (2) How can I access the names of the political areas when
>     plotting so that these are given on the x-axis instead of the numbers?
>
>     Thanks for any help.
>
>     Cheers
>     Sun
>
>     ______________________________________________
>     R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>
>


	[[alternative HTML version deleted]]



More information about the R-help mailing list