Page 1 of 1

memory errors when increasing number of mpi tasks

Posted: Fri Jul 21, 2017 10:27 pm
by keckler
I am running a transport/depletion calculation with the coupled neutron/gamma mode right now with version 2.1.29. I am running with 600,000 particles per cycle and 520 cycles. I need this many particles due to very poor fission source convergence.

When I run this calculation without MPI, but using the same code compilation, the calculation runs just fine, but takes a very long time because there are so many particles.

When I begin using MPI and run with 2 nodes, the calculation also runs fine, but again is still rather slow.

Once I begin running with more MPI nodes, the code starts crashing after just a couple minutes. Each time the code exists with the error:

Code: Select all

slurmstepd: error: Exceeded step memory limit at some point.
slurmstepd: error: Exceeded job memory limit at some point.
The point in the calculation at which it fails varies, but is always before the transport cycles begin. I have tried turning off the burnup calculation, and the same still occurs.

Perhaps I have a fundamental misunderstanding of what is going on, but I do not understand why this should happen. Any advice would be very much appreciated.

**Edit: I should also note that I have successfully run a similar job with 6 MPI nodes and 525,000 source particles without burnup. All other parameters should be the same.

Re: memory errors when increasing number of mpi tasks

Posted: Fri Jul 21, 2017 11:12 pm
by keckler
Further update:

I just noticed in one of my most recent jobs that there is some very random stuff in my output file. Out of nowhere the typical output block that I see during active transport cycles is smack in the middle of the calculation initialization...
...

Warning message from function CheckNuclideData:

Stable nuclide 300700 has 1 decay modes

OK.

Adding all nuclides in inventory list...
OK.

Sharing input data to MPI tasks...
------------------------------------------------------------

Serpent 2.1.29 -- Static criticality source simulation

Input file: "BnB"

Transport calculation: step = 1 / 4
BU = 0.00 MWd/kgU
time = 0.00 days

Active cycle 105 / 520 Source neutrons : 300157

Running time : 17:49:40
Estimated running time : 85:07:54
Estimated running time left : 67:18:13

Estimated relative CPU usage : 100.0%

k-eff (analog) = 0.99413 +/- 0.00026 [0.99361 0.99464]
k-eff (implicit) = 0.99395 +/- 0.00013 [0.99371 0.99420]

(O4) (N/P) (MPI=2) (OMP=20) (CE)
------------------------------------------------------------
OK.

Generating unionize energy grids...

Adding points:

1001.06c -- Points added in neutron grid: 252
1001.12c -- Points added in neutron grid: 0
...
This seems very very strange... The listed number of active cycles and the keff value look correct, but the job I am running has MPI=10, not MPI=2, and the number of source neutrons listed is about half of what I specified... I think this might be a job that I tried to run (unsuccessfully) earlier in the day?

Re: memory errors when increasing number of mpi tasks

Posted: Sat Jul 22, 2017 11:17 am
by Jaakko Leppänen
Are you sure the unsuccessful job didn't start? You may have two jobs writing in the same files.

Re: memory errors when increasing number of mpi tasks

Posted: Wed Jul 26, 2017 4:57 pm
by keckler
Okay, so for now I would disregard my second post above. It seems like this was probably an anomaly, and you are probably correct that an older job was still running somehow and writing into the same file. That was weird, but probably not the root of my issue.

After asking around and trying some more stuff, I have found somewhat of a work-around, but the problem still persists. Working with the IT guys that manage my computing cluster, we were able to figure out the problem. When trying to use the hybrid MPI/openMP mode (i.e. specifying both '-omp N' and '-mpi N' in the submission command), even though Serpent2 would spawn the correct number of processes and the job scheduler would allocate the correct number of processors, for whatever reason all of the processes would go onto a single CPU. So for instance, if I specify 'sss2 -omp 20 -mpi 2 inp', I would end up with 40 processes all running on a single CPU with 39 other processors sitting idle.

So with just a couple MPI tasks, although it was slow, the CPU would not fail. But when I increased the number of MPI tasks, too many processes would be pushed on a single CPU and it would run out of memory and fail.

To get around this, for the moment, I have figured out that the MPI tasks are distributed properly across multiple nodes if I instead run the job as 'mpirun sss2 inp' and allocate the proper number of nodes in my job scheduler script. I am not sure why this would be different, but for whatever reason this puts a single task on each processor.

I suppose this issue may be similar to the one at: http://ttuki.vtt.fi/serpent/viewtopic.p ... 8&start=10

Re: memory errors when increasing number of mpi tasks

Posted: Thu Jul 27, 2017 11:07 am
by Jaakko Leppänen
I'm really not an MPI expert, but based on practical experience it seems that different systems behave differently. The "-mpi" command line option in Serpent actually calls mpirun (the executable path is defined with MPIRUN_PATH in header.h), and in some systems this default method simply fails.

Could you post your scheduler script in case other users encounter the same problem?

Re: memory errors when increasing number of mpi tasks

Posted: Thu Jul 27, 2017 4:44 pm
by keckler
Sure thing. The script that I am using for job submission with SLURM is below with all the specifics replaced by variable names:

Code: Select all

#!/bin/sh
#SBATCH --job-name=$name
#SBATCH --output=$name.o
#SBATCH --error=$name.error
#SBATCH --partition=$partition
#SBATCH --time=$time
#SBATCH --nodes=$nnodes
#SBATCH --ntasks-per-node=$ntasks
#SBATCH --cpus-per-task=$ncpus
#SBATCH --qos=$qualityofservice
#SBATCH -A $account
#SBATCH --mail-user=$email
#SBATCH --mail-type=all

# run command
mpirun sss2 $name
I am not sure if you need to specify all three of nnodes, ntasks, and ncpus, but this way you can specify exactly how you want the work to be split up among the processors.