Memory allocation failed in burnup restart

Report all good and bad behavior here
Post Reply
bilodid
Posts: 36
Joined: Wed May 22, 2013 11:07 am
Security question 1: No
Security question 2: 92

Memory allocation failed in burnup restart

Post by bilodid » Mon Jun 28, 2021 1:57 pm

Hi!

I did burnup calculation with DD with 4MPIs, got *.wkr_dd[1-4] file.
I try to continue those calculation with

Code: Select all

set rfr continue
Exactly same MPI/openMP job partitioning, same hardware.
Serpent reads composition files, runs transport calculation but crushes with error directly after:

Code: Select all

Serpent 2.1.32 -- Static criticality source simulation

Title: "X2 VVER-1000 full core 3D"

Transport calculation: step = 8 / 31 (predictor)
                       BU   = 1.07 MWd/kgU
                       time = 45.3 days

Active cycle  500 / 500  Source neutrons : 26107

Running time :                  0:18:39
Estimated running time :        0:18:39   0:21:31
Estimated running time left :   0:00:00   0:02:51

Estimated relative CPU usage :  1561.3%

k-eff (analog)    = 1.02857 +/- 0.00026  [1.02806  1.02909]
k-eff (collision) = 1.02852 +/- 0.00025  [1.02804  1.02900]

(O1) (MPI=4) (OMP=16) (DD) (CE/LI)
------------------------------------------------------------

Transport cycle completed in 17.3 minutes.

Waiting for results from other MPI tasks...
OK.

Calculating activities...
OK.

Writing depletion output...

***** Mon Jun 28 12:44:22 2021:

 - MPI task         = 0
 - OpenMP thread    = 0
 - RNG parent seed  = 1624875928221
 - RNG history seed = 2961732423565118109
 - Host name        = reac007.cluster

Fatal error in function Mem:

Memory allocation failed (calloc, 4607514446148020575, 72, 4355.25)

Simulation aborted.


Ville Valtavirta
Posts: 541
Joined: Fri Sep 07, 2012 1:43 pm
Security question 1: No
Security question 2: 92

Re: Memory allocation failed in burnup restart

Post by Ville Valtavirta » Mon Jun 28, 2021 2:40 pm

Hi Yuri,

based on a quick source code check, I imagine this is related to writing the <input>_dep.m file. Because of the way the dep.m file is structured (and the way Serpent writes it) Serpent tries to load in all of the nuclide data for a specific material (all nuclides from all burnups) in memory at the same time so that it can write the composition matrix. This probably causes your node to run out of memory and crashes the calculation.

If you can live without <input>_dep.m files (at least temporarily) you could simply add a "return;" statement in the beginning of printdepoutput.c to skip writing of the _dep.m file. You would still get the binary .dep and .wrk files which can be used for restarting later on (and from which material wise nuclide compositions could be parsed with some work).

This would probably allow you to run your burnup calculation to its end and then, once you have the binary restart for the whole history, either find a way to parse the binary data to another format or load it in Serpent for printing the _dep.m file.

-Ville

bilodid
Posts: 36
Joined: Wed May 22, 2013 11:07 am
Security question 1: No
Security question 2: 92

Re: Memory allocation failed in burnup restart

Post by bilodid » Mon Jun 28, 2021 2:46 pm

thanks, I'll try it now.
btw, in

Code: Select all

Memory allocation failed (calloc, 4607514446148020575, 72, 4355.25)
is the first number - a memory tried to allocate? in bytes?
what are other numbers?

Antti Rintala
Posts: 85
Joined: Tue Jul 03, 2012 1:25 pm
Security question 1: No
Security question 2: 92

Re: Memory allocation failed in burnup restart

Post by Antti Rintala » Mon Jun 28, 2021 3:03 pm

I'd say setting STEP to 2 in set depout should also work.

Ville Valtavirta
Posts: 541
Joined: Fri Sep 07, 2012 1:43 pm
Security question 1: No
Security question 2: 92

Re: Memory allocation failed in burnup restart

Post by Ville Valtavirta » Mon Jun 28, 2021 3:20 pm

For the values,

let's see. The first is number of elements, which in your case seems pretty nonsensically large. The second is size of each element in bytes. For you this is 72 bytes = 9 doubles or longs per element, which would suggest that the actual failure happens at line 132 of printdepoutput.c

Code: Select all

      /* Allocate memory for nuclide data */

      nuc = (struct depnuc *)Mem(MEM_ALLOC, nnuc + 1, sizeof(struct depnuc));
Size of the depnuc struct seems to be 2 longs and 7 doubles, which sums to 72 bytes.

The third value is the current size of allocated memory in megabytes.

So it seems that for you, nnuc+1 is 4607514446148020575, which probably points to some kind of error when Serpent reads the binary depletion file.

-Ville

Ana Jambrina
Posts: 683
Joined: Tue May 26, 2020 5:32 pm
Security question 1: No
Security question 2: 7

Re: Memory allocation failed in burnup restart

Post by Ana Jambrina » Mon Jun 28, 2021 3:24 pm

Not writing the Matlab-formatted '_dep.m' file will decrease memory demand (as Antti pointed out). In addition, and the reason for adding the option, is the reduction of computational time - IO in burnup calculations could easily take as much time as the simulation itself.
Check the CPU usage at node-level/process-level to have a better idea of what is going on. If the crush arises due to running out of memory there is not much to do. It seems that until certain point Serpent was able to run the case and, write/read the files without problem, are you running in shared or exclusive node mode?
- Ana

Ville Valtavirta
Posts: 541
Joined: Fri Sep 07, 2012 1:43 pm
Security question 1: No
Security question 2: 92

Re: Memory allocation failed in burnup restart

Post by Ville Valtavirta » Mon Jun 28, 2021 3:25 pm

If you compile with -DDEBUG, the insane value of nnuc should get caught in the CheckValue call (and maybe some other debug check could give you a better idea on what's going wrong).

-Ville

bilodid
Posts: 36
Joined: Wed May 22, 2013 11:07 am
Security question 1: No
Security question 2: 92

Re: Memory allocation failed in burnup restart

Post by bilodid » Mon Jun 28, 2021 6:23 pm

Ville Valtavirta wrote:
Mon Jun 28, 2021 2:40 pm
a "return;" statement in the beginning of printdepoutput.c
that helped. didn't try any other option yet. I can send you my input to debug it yourself, the problem is only ~3GB *.wkr_dd* files

Ana Jambrina
Posts: 683
Joined: Tue May 26, 2020 5:32 pm
Security question 1: No
Security question 2: 7

Re: Memory allocation failed in burnup restart

Post by Ana Jambrina » Mon Jun 28, 2021 6:26 pm

I have run some tests, including the input case you sent "X2 VVER-1000 full core 3D" with writing/reading the restart file with different options, e.g., ‘continue’ and, I could not reproduce the error you were facing: neither crush or problems in writing/reading/writing. However, it might be something off when depleting a single material.

Note: to run domain decomposition in mode 3, the number of MPI tasks has to equal or greater than 5. If not, domain decomposition is executed in mode 2 (without the central zone).
- Ana

Ana Jambrina
Posts: 683
Joined: Tue May 26, 2020 5:32 pm
Security question 1: No
Security question 2: 7

Re: Memory allocation failed in burnup restart

Post by Ana Jambrina » Sun Jul 04, 2021 7:58 pm

I tested in DEBUG mode the case from the beginning, writing and reading the restart file (2 MPI tasks - I did not have more processors available to check the case with the provided 4 domain division). It ran until completion, with no errors. The depletion output is consistent between simulations.
It seems unlikely that the number of MPI tasks would make a difference. In any case, I will come back to this when I have more resources to test it.

Note: the only way I found possible to trigger Serpent to crush was using a different number of MPI tasks between the first and the second simulation, 4 MPI tasks in the former, using the provided files and, 2 MPI tasks in the latter.
  • DEBUG mode:

    Code: Select all

    Fatal error in function PrintDepOutput:
    
    Value 4.607515E+18 of parameter "nmat" above upper limit 1.000000E+07 
    
    NOTE: This value check was performed because the code was compiled in the
          debug mode and it may or may not be an indication of an actual problem.
          The debugger mode can be switched off by recompiling the source code
          without the -DDEBUG option (see Makefile).
    
    Simulation aborted.
    
    --------------------------------------------------------------------------
    Primary job  terminated normally, but 1 process returned
    a non-zero exit code. Per user-direction, the job has been aborted.
    --------------------------------------------------------------------------
    --------------------------------------------------------------------------
    mpirun detected that one or more processes exited with non-zero status, thus causing
    the job to be terminated. The first process to do so was:
    
      Process name: [[52748,1],0]
      Exit code:    255
    --------------------------------------------------------------------------
    
    The error print-out corresponds with the check performed in line 131 within the 'printdepout.c' routine.
  • RELEASE mode:

    Code: Select all

    Fatal error in function Mem:
    
    Memory allocation failed (calloc, 4603034189072598458, 72, 4331.63)
    
    Simulation aborted.
    
    --------------------------------------------------------------------------
    Primary job  terminated normally, but 1 process returned
    a non-zero exit code. Per user-direction, the job has been aborted.
    --------------------------------------------------------------------------
    --------------------------------------------------------------------------
    mpirun detected that one or more processes exited with non-zero status, thus causing
    the job to be terminated. The first process to do so was:
    
      Process name: [[65044,1],0]
      Exit code:    255
    --------------------------------------------------------------------------
    
- Ana

Post Reply