[R] Joining two datasets - recursive procedure?

Luca Meyer lucam1968 at gmail.com
Wed Mar 18 17:05:37 CET 2015


Thanks for you input Michael,

The continuous variable I have measures quantities (down to the 3rd
decimal level) so unfortunately are not frequencies.

Any more specific suggestions on how that could be tackled?

Thanks & kind regards,

Luca


===

Michael Friendly wrote:
I'm not sure I understand completely what you want to do, but
if the data were frequencies, it sounds like task for fitting a
loglinear model with the model formula

~ V1*V2 + V3

On 3/18/2015 2:17 AM, Luca Meyer wrote:
>* Hello,
*>>* I am facing a quite challenging task (at least to me) and I was wondering
*>* if someone could advise how R could assist me to speed the task up.
*>>* I am dealing with a dataset with 3 discrete variables and one continuous
*>* variable. The discrete variables are:
*>>* V1: 8 modalities
*>* V2: 13 modalities
*>* V3: 13 modalities
*>>* The continuous variable V4 is a decimal number always greater than zero in
*>* the marginals of each of the 3 variables but it is sometimes equal to zero
*>* (and sometimes negative) in the joint tables.
*>>* I have got 2 files:
*>>* => one with distribution of all possible combinations of V1xV2 (some of
*>* which are zero or neagtive) and
*>* => one with the marginal distribution of V3.
*>>* I am trying to build the long and narrow dataset V1xV2xV3 in such a way
*>* that each V1xV2 cell does not get modified and V3 fits as closely as
*>* possible to its marginal distribution. Does it make sense?
*>>* To be even more specific, my 2 input files look like the following.
*>>* FILE 1
*>* V1,V2,V4
*>* A, A, 24.251
*>* A, B, 1.065
*>* (...)
*>* B, C, 0.294
*>* B, D, 2.731
*>* (...)
*>* H, L, 0.345
*>* H, M, 0.000
*>>* FILE 2
*>* V3, V4
*>* A, 1.575
*>* B, 4.294
*>* C, 10.044
*>* (...)
*>* L, 5.123
*>* M, 3.334
*>>* What I need to achieve is a file such as the following
*>>* FILE 3
*>* V1, V2, V3, V4
*>* A, A, A, ???
*>* A, A, B, ???
*>* (...)
*>* D, D, E, ???
*>* D, D, F, ???
*>* (...)
*>* H, M, L, ???
*>* H, M, M, ???
*>>* Please notice that FILE 3 need to be such that if I aggregate on V1+V2 I
*>* recover exactly FILE 1 and that if I aggregate on V3 I can recover a file
*>* as close as possible to FILE 3 (ideally the same file).
*>>* Can anyone suggest how I could do that with R?
*>>* Thank you very much indeed for any assistance you are able to provide.
*>>* Kind regards,
*>>* Luca*

	[[alternative HTML version deleted]]



More information about the R-help mailing list