[R] Arranging column data to create plots
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Sun Jul 16 23:48:51 CEST 2017
Correction at the end.
On Sun, 16 Jul 2017, Jeff Newmiller wrote:
> On Sat, 15 Jul 2017, Michael Reed via R-help wrote:
>
>> Dear All,
>>
>> I need some help arranging data that was imported.
>
> It would be helpful if you were to use dput to give us the sample data since
> you say you have already imported it.
>
>> The imported data frame looks something like this (the actual file is huge,
>> so this is example data)
>>
>> DF:
>> IDKey X1 Y1 X2 Y2 X3 Y3 X4 Y4
>> Name1 21 15 25 10
>> Name2 15 18 35 24 27 45
>> Name3 17 21 30 22 15 40 32 55
>
> That data is missing in X3 etc, but would be NA in an actual data frame, so I
> don't know if my workaround was the same as your workaround. Dput
> would have clarified the starting point.
>
>> I would like to create a new data frame with the following
>>
>> NewDF:
>> IDKey X Y
>> Name1 21 15
>> Name1 25 10
>> Name2 15 18
>> Name2 35 24
>> Name2 27 45
>> Name3 17 21
>> Name3 30 22
>> Name3 15 40
>> Name3 32 55
>>
>> With the data like this I think I can do the following
>>
>> ggplot(NewDF, aes(x=X, y=Y, color=IDKey) + geom_line
>
> You are missing parentheses. If you use the reprex library to test your
> examples before posting them, you can be sure your simple errors don't send
> us off on wild goose chases.
>
>> and get 3 lines with the various number of points.
>>
>> The point is that each of the XY pairs is a data point tied to NameX. I
>> would like to rearrange the data so I can plot the points/lines by the
>> IDKey. There will be at least 2 points, but the number of points for each
>> IDKey can be as many as 4.
>>
>> I have tried using the gather() function from the tidyverse package, but
>
> The tidyverse package is a virtual package that pulls in many packages.
>
>> I can't make it work. The issue is that I believe I need two separate
>> gather statements (one for X, another for Y) to consolidate the data. This
>> causes the pairs to not stay together and the data becomes jumbled.
>
> No, what you need is a gather-spread.
>
> ######
> library(dplyr)
> library(tidyr)
>
> DF <- read.table( text=
> "IDKey X1 Y1 X2 Y2 X3 Y3 X4 Y4
> Name1 21 15 25 10 NA NA NA NA
> Name2 15 18 35 24 27 45 NA NA
> Name3 17 21 30 22 15 40 32 55
> ", header=TRUE, as.is=TRUE )
>
> NewDF <- ( dta
> %>% gather( XY, value, -IDKey )
> %>% separate( XY, c( "Coord", "Num" ), 1 )
> %>% spread( Coord, value )
> %>% filter( !is.na( X ) & !is.na( Y ) )
> )
> ######
Sorry, should have practiced what I preached...
##########
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
DF <- structure(list(IDKey = c("Name1", "Name2", "Name3"), X1 = c(21L,
15L, 17L), Y1 = c(15L, 18L, 21L), X2 = c(25L, 35L, 30L), Y2 = c(10L, 24L,
22L), X3 = c(NA, 27L, 15L), Y3 = c(NA, 45L, 40L), X4 = c(NA, NA, 32L), Y4
= c(NA, NA, 55L)), .Names = c("IDKey", "X1", "Y1", "X2", "Y2", "X3", "Y3",
"X4", "Y4"), class = "data.frame", row.names = c(NA, -3L))
NewDF <- ( DF
%>% gather( XY, value, -IDKey )
%>% separate( XY, c( "Coord", "Num" ), 1 )
%>% spread( Coord, value )
%>% filter( !is.na( X ) & !is.na( Y ) )
)
NewDF
#> IDKey Num X Y
#> 1 Name1 1 21 15
#> 2 Name1 2 25 10
#> 3 Name2 1 15 18
#> 4 Name2 2 35 24
#> 5 Name2 3 27 45
#> 6 Name3 1 17 21
#> 7 Name3 2 30 22
#> 8 Name3 3 15 40
#> 9 Name3 4 32 55
##########
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list