[R] Possible causes of unexpected behavior

Fri Mar 4 10:21:05 CET 2022

Dear Eric,

Thank you for the response. Yes, I can confirm that, please see below the
behavior.
For #1, results are identical. For #2, they are not identical but very
close. For #3, they are completely different.

Best regards,
Arthur

--

For #1,
- qsub execution:
[1] "ll: 565.7251"
[1] "norm gr @ minimum: 2.96967368608131e-08"

- manual check:
f(v*): 565.7251
gradient norm at v*: 2.969674e-08

#
For #2,

- qsub execution:
[1] "ll: 14380.8308"
[1] "norm gr @ minimum: 0.0140857561408041"

- manual check:
f(v*): 14380.84
gradient norm at v*: 0.01404779

#
For #3,

- qsub execution:
[1] "ll: 14310.6812"
[1] "norm gr @ minimum: 6232158.38877002"

- manual check:
f(v*): 97604.69
gradient norm at v*: 6266696595

Em sex., 4 de mar. de 2022 às 09:48, Eric Berger <ericjberger using gmail.com>
escreveu:

> Please confirm that when you do the manual load and check that f(v*)
> matches the result from qsub() it succeeds for cases #1,#2 but only fails
> for #3.
>
>
> On Fri, Mar 4, 2022 at 10:06 AM Arthur Fendrich <arthfen using gmail.com> wrote:
>
>> Dear all,
>>
>> I am currently having a weird problem with a large-scale optimization
>> routine. It would be nice to know if any of you have already gone through
>> something similar, and how you solved it.
>>
>> I apologize in advance for not providing an example, but I think the
>> non-reproducibility of the error is maybe a key point of this problem.
>>
>> Simplest possible description of the problem: I have two functions: g(X)
>> and f(v).
>> g(X) does:
>>  i) inputs a large matrix X;
>>  ii) derives four other matrices from X (I'll call them A, B, C and D)
>> then
>> saves to disk for debugging purposes;
>>
>> Then, f(v) does:
>>  iii) loads A, B, C, D from disk
>>  iv) calculates the log-likelihood, which vary according to a vector of
>> parameters, v.
>>
>> My goal application is quite big (X is a 40000x40000 matrix), so I created
>> the following versions to test and run the codes/math/parallelization:
>> #1) A simulated example with X being 100x100
>> #2) A degraded version of the goal application, with X being 4000x4000
>> #3) The goal application, with X being 40000x40000
>>
>> When I use qsub to submit the job, using the exact same code and
>> processing
>> cluster, #1 and #2 run flawlessly, so no problem. These results tell me
>> that the codes/math/parallelization are fine.
>>
>> For application #3, it converges to a vector v*. However, when I manually
>> load A, B, C and D from disk and calculate f(v*), then the value I get is
>> completely different.
>> For example:
>> - qsub job says v* = c(0, 1, 2, 3) is a minimum with f(v*) = 1.
>> - when I manually load A, B, C, D from disk and calculate f(v*) on the
>> exact same machine with the same libraries and environment variables, I
>> get
>> f(v*) = 1000.
>>
>> This is a very confusing behavior. In theory the size of X should not
>> affect my problem, but it seems that things get unstable as the dimension
>> grows. The main issue for debugging is that g(X) for simulation #3 takes
>> two hours to run, and I am completely lost on how I could find the causes
>> of the problem. Would you have any general advices?
>>
>> Thank you very much in advance for literally any suggestions you might
>> have!
>>
>> Best regards,
>> Arthur
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

	[[alternative HTML version deleted]]