[R] melt function chooses wrong id variable with large datasets

Thu Apr 16 14:53:14 CEST 2015

Maybe what you really want is the ?stack function.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On April 16, 2015 4:59:47 AM PDT, Joachim Audenaert <Joachim.Audenaert at pcsierteelt.be> wrote:
>Thanks,
>
>indeed norm should be in the same group as as the months. everything
>works 
>fine when the number of data is quite small, but with big datasets (15
>000 
>values) things seem to go wrong and I can't explain why. It puts norm
>as 
>an individual column in stead of in the group of months as it does when
>
>the dataset is small.
>
>Met vriendelijke groeten - With kind regards,
>
>Joachim Audenaert 
>onderzoeker gewasbescherming - crop protection researcher
>
>PCS | proefcentrum voor sierteelt - ornamental plant research
>
>Schaessestraat 18, 9070 Destelbergen, Belgi�
>T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95
>E: joachim.audenaert at pcsierteelt.be | W: www.pcsierteelt.be 
>
>
>
>From:   PIKAL Petr <petr.pikal at precheza.cz>
>To:     Joachim Audenaert <Joachim.Audenaert at pcsierteelt.be>
>Cc:     "r-help at r-project.org" <r-help at r-project.org>
>Date:   16/04/2015 13:41
>Subject:        RE: [R]  melt function chooses wrong id variable with 
>large datasets
>
>
>
>Hi
> 
>With this dataset I get
> 
>> dd.m0<-melt(dataset, na.rm=T)
>Using norm as id variables
>> head(dd.m0)
>                norm variable value
>1   45.8713463281901  januari  38.1
>2 24.047250681782984  januari  32.4
>3 3.7533684144746324  januari  34.5
>4 38.594241119279324  januari  20.7
>5 26.391897460120358  januari  21.5
>6 61.746470001194638  januari  23.1
>> 
>or
> 
>dd.m<-melt(dataset, id.vars=NULL, na.rm=T)
> 
>> head(dd.m)
>  variable value
>1  januari  38.1
>2  januari  32.4
>3  januari  34.5
>4  januari  20.7
>5  januari  21.5
>6  januari  23.1
>> tail(dd.m)
>    variable              value
>255     norm  4.856812959269508
>256     norm 5.3982910143166514
>257     norm 46.553976273304215
>258     norm 17.566272518985429
>259     norm 20.552451905814117
>260     norm 61.894775704479279
> 
>The latter will put norm to the same column as months. Is it intended?
> 
>Maybe you want
> 
>> dd.m1<-melt(dataset[,-13], na.rm=T)
>No id variables; using all as measure variables
>> head(dd.m1)
>  variable value
>1  januari  38.1
>2  januari  32.4
>3  januari  34.5
>4  januari  20.7
>5  januari  21.5
>6  januari  23.1
>> tail(dd.m1)
>    variable value
>235 december  20.7
>236 december  30.9
>237 december  36.2
>238 december  21.0
>239 december  20.2
>240 december  21.3
> 
>Cheers
>Petr
> 
>From: Joachim Audenaert [mailto:Joachim.Audenaert at pcsierteelt.be] 
>Sent: Thursday, April 16, 2015 1:13 PM
>To: PIKAL Petr
>Cc: r-help at r-project.org
>Subject: RE: [R] melt function chooses wrong id variable with large 
>datasets
> 
>Hello, 
>
>This is a part of my dataset: 
>
>structure(list(januari = c(38.1, 32.4, 34.5, 20.7, 21.5, 23.1, 
>29.7, 36.6, 36.1, 20.6, 20.4, 30.1, 38.7, 41.4, 37, 36, 37, 38, 
>23, 26.7), februari = c(31.5, 36.2, 38.2, 26.4, 20.9, 21.5, 30.2, 
>33.4, 32.6, 22.2, 21.7, 30, 35.7, 32.8, 39.3, 25.5, 23, 19.9, 
>21.3, 20.8), maart = c(34.2, 27, 24.2, 19.9, 19.7, 21.5, 30.6, 
>30, 19, 19.6, 20.6, 23.6, 17.9, 17.3, 21.4, 24.1, 20.9, 30.1, 
>32.6, 21.3), april = c(26.3, 29.6, 30.3, 23.6, 28.4, 20.7, 24.1, 
>27.3, 23.2, 18.3, 24.6, 27.4, 20.4, 18.1, 25.2, 19.8, 21, 23.7, 
>19.6, 18.1), mei = c(23.7, 24, 17.2, 23.2, 25.2, 17.2, 16, 15.6, 
>13.4, 16, 16.8, 14.6, 19.4, 21, 19.5, 18.5, 13.3, 13.7, 14.3, 
>14.1), juni = c(17.7, 14.2, 16.6, 15.7, 13.7, 14.7, 13.1, 12.9, 
>15.4, 11.9, 15.2, 15.3, 16.5, 16.1, 11.7, 11.2, 11.5, 10.8, 16.1, 
>14.8), juli = c(15.7, 14.5, 10.8, 10.5, 13.4, 12.2, 13.2, 13, 
>12.4, 13.1, 9.8, 10.5, 13.4, 11, 13.1, 15, 16.7, 16.1, 18.2, 
>15.7), augustus = c(12.9, 12.8, 15.2, 14.5, 17.2, 14.5, 14.4, 
>11, 13.1, 13.6, 14.6, 12.7, 13.6, 12.7, 15.5, 17.4, 15.2, 14.2, 
>17.7, 19.2), september = c(15.6, 15.5, 15.9, 15.1, 16, 19.4, 
>21.5, 23.7, 18.7, 23.8, 18, 16.2, 18.5, 20.6, 18.3, 22.5, 26.9, 
>19.4, 15.9, 20.5), oktober = c(21.4, 20.8, 14, 17, 23, 26.4, 
>19.6, 22.7, 26.9, 14.7, 15.2, 19.8, 26.9, 20.2, 14.3, 14.8, 18.5, 
>21.7, 21.4, 21.8), november = c(24.7, 26.2, 29, 21.6, 17.1, 16.9, 
>19.1, 24.7, 25.4, 19.8, 18.2, 16.3, 17, 17.7, 15.5, 14.7, 15.8, 
>19.9, 20.4, 23.3), december = c(19.8, 27, 21, 33, 22.6, 28.3, 
>21.1, 19, 17.3, 27, 30.2, 24.8, 17.9, 17.9, 20.7, 30.9, 36.2, 
>21, 20.2, 21.3), norm = c("45.8713463281901", "24.047250681782984", 
>"3.7533684144746324", "38.594241119279324", "26.391897460120358", 
>"61.746470001194638", "6.8321020448487992", "11.933109250115226", 
>"51.951891096493924", "37.424611852237945", "5.1587836676942374", 
>"36.552835044409434", "31.781209673851027", "29.09146215582853", 
>"4.856812959269508", "5.3982910143166514", "46.553976273304215", 
>"17.566272518985429", "20.552451905814117", "61.894775704479279"
>)), .Names = c("januari", "februari", "maart", "april", "mei", 
>"juni", "juli", "augustus", "september", "oktober", "november", 
>"december", "norm"), row.names = c(NA, 20L), class = "data.frame") 
>
>I transform my dataset with the following script: 
>
>y <- melt(dataset,na.rm=TRUE) 
>variable <- y[,1] 
>value <- y[,2] 
>
>and can then perform a levene test as follows: 
>
>LEVENE <- leveneTest(value~variable,y) 
>
>When the dataset is small, lets say less than 100 values per column 
>everything works great. I get the message: 
>
>No id variables; using all as measure variables 
>
>When the dataset is much bigger I get the following message 
>
>Using norm as id variables, why does this function pick norm as id 
>variable? and how can I tell R that each column title is my variable 
>
>  
>Met vriendelijke groeten - With kind regards, 
>
>Joachim Audenaert 
>onderzoeker gewasbescherming - crop protection researcher
>
>PCS | proefcentrum voor sierteelt - ornamental plant research 
>
>
>Schaessestraat 18, 9070 Destelbergen, Belgi�
>T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95
>E: joachim.audenaert at pcsierteelt.be | W: www.pcsierteelt.be 
>
>
>
>From:        PIKAL Petr <petr.pikal at precheza.cz> 
>To:        Joachim Audenaert <Joachim.Audenaert at pcsierteelt.be>, "
>r-help at r-project.org" <r-help at r-project.org> 
>Date:        16/04/2015 12:13 
>Subject:        RE: [R]  melt function chooses wrong id variable with 
>large datasets 
>
>
>
>
>Hi
>
>There is something weird with your data and melt function.
>
>AFAIK melt does not use first row as id.variables.
>
>What is result of
>
>str(dataset)
>
>Instead of
>
>melt(dataset,id.vars=dataset[1,], na.rm=TRUE)
>
>melt expects something like
>
>melt(dataset, id.vars=c("norm, "jaar"), na.rm=TRUE)
>
>If you want more specific answer you shall show us part of your data, 
>preferably copy output of
>
>dput(dataset[1:20,])
>
>into your mail.
>
>Cheers
>Petr
>
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
>Joachim
>> Audenaert
>> Sent: Thursday, April 16, 2015 11:37 AM
>> To: r-help at r-project.org
>> Subject: [R] melt function chooses wrong id variable with large
>> datasets
>>
>> Hello all,
>>
>> I'm using a large dataset consisting of 2 groups of data, 2 columns
>in
>> excel with a header (group name) and 15 000 rows of data. I would
>like
>> like to compare this data, so I transform my dataset with the melt
>> function to get 1 column of data and 1 column of ID variables, then I
>> can apply different statistical tests. With small datasets this works
>> great, the melt function automatically chooses the name in row 1 as
>ID
>> variable and melts the data, thus giving me a matrix with all ID
>> variables in column one and the data accordingly in column 2.
>> With this big dataset however it chooses the whole first column as ID
>> variables in stead of the first row. Is there a reason why this
>happens
>> and how can I make sure the first row is chosen as ID variabele and
>the
>> lower rows as data?
>>
>> If I specify that I want the first row to be the id variable I also
>get
>> error.
>>
>> melt(dataset,id.vars=dataset[1,], na.rm=TRUE)
>>
>> Error: id variables not found in data: norm, jaar
>>
>> Are there alternative ways to create a good reshaped dataset?
>>
>> Met vriendelijke groeten - With kind regards,
>>
>> Joachim Audenaert
>> onderzoeker gewasbescherming - crop protection researcher
>>
>> PCS | proefcentrum voor sierteelt - ornamental plant research
>>
>> Schaessestraat 18, 9070 Destelbergen, Belgi
>> T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95
>> E: joachim.audenaert at pcsierteelt.be | W: www.pcsierteelt.be
>>
>> Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? |
>> Het PCS op LinkedIn Disclaimer | Please consider the environment
>before
>> printing. Think green, keep it on the screen!
>>       [[alternative HTML version deleted]]
>
>
>________________________________
>Tento e-mail a jak�koliv k n�mu p�ipojen� dokumenty jsou d�v�rn� a jsou
>
>ur�eny pouze jeho adres�t�m.
>Jestli�e jste obdr�el(a) tento e-mail omylem, informujte laskav� 
>neprodlen� jeho odes�latele. Obsah tohoto emailu i s p��lohami a jeho 
>kopie vyma�te ze sv�ho syst�mu.
>Nejste-li zam�len�m adres�tem tohoto emailu, nejste opr�vn�ni tento
>email 
>jakkoliv u��vat, roz�i�ovat, kop�rovat �i zve�ej�ovat.
>Odes�latel e-mailu neodpov�d� za eventu�ln� �kodu zp�sobenou
>modifikacemi 
>�i zpo�d�n�m p�enosu e-mailu.
>
>V p��pad�, �e je tento e-mail sou��st� obchodn�ho jedn�n�:
>- vyhrazuje si odes�latel pr�vo ukon�it kdykoliv jedn�n� o uzav�en� 
>smlouvy, a to z jak�hokoliv d�vodu i bez uveden� d�vodu.
>- a obsahuje-li nab�dku, je adres�t opr�vn�n nab�dku bezodkladn�
>p�ijmout; 
>Odes�latel tohoto e-mailu (nab�dky) vylu�uje p�ijet� nab�dky ze strany 
>p��jemce s dodatkem �i odchylkou.
>- trv� odes�latel na tom, �e p��slu�n� smlouva je uzav�ena teprve 
>v�slovn�m dosa�en�m shody na v�ech jej�ch n�le�itostech.
>- odes�latel tohoto emailu informuje, �e nen� opr�vn�n uzav�rat za 
>spole�nost ��dn� smlouvy s v�jimkou p��pad�, kdy k tomu byl p�semn� 
>zmocn�n nebo p�semn� pov��en a takov� pov��en� nebo pln� moc byly 
>adres�tovi tohoto emailu p��padn� osob�, kterou adres�t zastupuje, 
>p�edlo�eny nebo jejich existence je adres�tovi �i osob� j�m zastoupen� 
>zn�m�.
>
>This e-mail and any documents attached to it may be confidential and
>are 
>intended only for its intended recipients.
>If you received this e-mail by mistake, please immediately inform its 
>sender. Delete the contents of this e-mail with all attachments and its
>
>copies from your system.
>If you are not the intended recipient of this e-mail, you are not 
>authorized to use, disseminate, copy or disclose this e-mail in any 
>manner.
>The sender of this e-mail shall not be liable for any possible damage 
>caused by modifications of the e-mail or by delay with transfer of the 
>email.
>
>In case that this e-mail forms part of business dealings:
>- the sender reserves the right to end negotiations about entering into
>a 
>contract in any time, for any reason, and without stating any
>reasoning.
>- if the e-mail contains an offer, the recipient is entitled to 
>immediately accept such offer; The sender of this e-mail (offer)
>excludes 
>any acceptance of the offer on the part of the recipient containing any
>
>amendment or variation.
>- the sender insists on that the respective contract is concluded only 
>upon an express mutual agreement on all its aspects.
>- the sender of this e-mail informs that he/she is not authorized to
>enter
>into any contracts on behalf of the company except for cases in which 
>he/she is expressly authorized to do so in writing, and such
>authorization
>or power of attorney is submitted to the recipient or the person 
>represented by the recipient, or the existence of such authorization is
>
>known to the recipient of the person represented by the recipient.
>
>
>
>Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? |
>Het 
>PCS op LinkedIn
>Disclaimer | Please consider the environment before printing. Think
>green,
>keep it on the screen!
>
>Tento e-mail a jak�koliv k n�mu p�ipojen� dokumenty jsou d�v�rn� a jsou
>
>ur�eny pouze jeho adres�t�m.
>Jestli�e jste obdr�el(a) tento e-mail omylem, informujte laskav� 
>neprodlen� jeho odes�latele. Obsah tohoto emailu i s p��lohami a jeho 
>kopie vyma�te ze sv�ho syst�mu.
>Nejste-li zam�len�m adres�tem tohoto emailu, nejste opr�vn�ni tento
>email 
>jakkoliv u��vat, roz�i�ovat, kop�rovat �i zve�ej�ovat.
>Odes�latel e-mailu neodpov�d� za eventu�ln� �kodu zp�sobenou
>modifikacemi 
>�i zpo�d�n�m p�enosu e-mailu.
>
>V p��pad�, �e je tento e-mail sou��st� obchodn�ho jedn�n�:
>- vyhrazuje si odes�latel pr�vo ukon�it kdykoliv jedn�n� o uzav�en� 
>smlouvy, a to z jak�hokoliv d�vodu i bez uveden� d�vodu.
>- a obsahuje-li nab�dku, je adres�t opr�vn�n nab�dku bezodkladn�
>p�ijmout; 
>Odes�latel tohoto e-mailu (nab�dky) vylu�uje p�ijet� nab�dky ze strany 
>p��jemce s dodatkem �i odchylkou.
>- trv� odes�latel na tom, �e p��slu�n� smlouva je uzav�ena teprve 
>v�slovn�m dosa�en�m shody na v�ech jej�ch n�le�itostech.
>- odes�latel tohoto emailu informuje, �e nen� opr�vn�n uzav�rat za 
>spole�nost ��dn� smlouvy s v�jimkou p��pad�, kdy k tomu byl p�semn� 
>zmocn�n nebo p�semn� pov��en a takov� pov��en� nebo pln� moc byly 
>adres�tovi tohoto emailu p��padn� osob�, kterou adres�t zastupuje, 
>p�edlo�eny nebo jejich existence je adres�tovi �i osob� j�m zastoupen� 
>zn�m�.
>
>This e-mail and any documents attached to it may be confidential and
>are 
>intended only for its intended recipients.
>If you received this e-mail by mistake, please immediately inform its 
>sender. Delete the contents of this e-mail with all attachments and its
>
>copies from your system.
>If you are not the intended recipient of this e-mail, you are not 
>authorized to use, disseminate, copy or disclose this e-mail in any 
>manner.
>The sender of this e-mail shall not be liable for any possible damage 
>caused by modifications of the e-mail or by delay with transfer of the 
>email.
>
>In case that this e-mail forms part of business dealings:
>- the sender reserves the right to end negotiations about entering into
>a 
>contract in any time, for any reason, and without stating any
>reasoning.
>- if the e-mail contains an offer, the recipient is entitled to 
>immediately accept such offer; The sender of this e-mail (offer)
>excludes 
>any acceptance of the offer on the part of the recipient containing any
>
>amendment or variation.
>- the sender insists on that the respective contract is concluded only 
>upon an express mutual agreement on all its aspects.
>- the sender of this e-mail informs that he/she is not authorized to
>enter
>into any contracts on behalf of the company except for cases in which 
>he/she is expressly authorized to do so in writing, and such
>authorization
>or power of attorney is submitted to the recipient or the person 
>represented by the recipient, or the existence of such authorization is
>
>known to the recipient of the person represented by the recipient.
>
>
>Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? |
>Het 
>PCS op LinkedIn
>Disclaimer | Please consider the environment before printing. Think
>green,
>keep it on the screen!
>	[[alternative HTML version deleted]]
>
>
>
>------------------------------------------------------------------------
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.