MPI run does not stop using Torque

Parallelization with OpenMP and MPI, scalability, reproducibility, errors, problems suggestions
Post Reply
drvcj
Posts: 16
Joined: Fri Jan 24, 2014 8:40 am
Security question 1: No
Security question 2: 92
Location: North Carolina, US

MPI run does not stop using Torque

Post by drvcj » Wed Oct 18, 2017 3:23 am

Our cluster use Torque/Maui for workload management but I'm not if this is relevant or not. The issue is Serpent MPI calculation does not stop on completion automatically. It just keep running without producing any new results or updating output files.

I'm running Serpent 2.1.29 and the MPI executable was compiled using MPICH.

Has anyone else seen this before?

User avatar
Jaakko Leppänen
Site Admin
Posts: 2377
Joined: Thu Mar 18, 2010 10:43 pm
Security question 2: 0
Location: Espoo, Finland
Contact:

Re: MPI run does not stop using Torque

Post by Jaakko Leppänen » Wed Oct 18, 2017 9:16 am

What is the last thing that the code prints out?
- Jaakko

drvcj
Posts: 16
Joined: Fri Jan 24, 2014 8:40 am
Security question 1: No
Security question 2: 92
Location: North Carolina, US

Re: MPI run does not stop using Torque

Post by drvcj » Sat Oct 21, 2017 7:00 am

In the .out file, it is:

Isotopic composition (non-zero densities):

-------------------------------------------------------------------
Nuclide a. weight temp a. dens a. frac m. frac
-------------------------------------------------------------------
6012.12c 11.99999 1200.0 3.13749E-05 3.54948E-04 8.00000E-05
......

In the _res.m file:
% Delayed neutron parameters (Meulekamp method):

BETA_EFF (idx, [1: 14]) = [ 6.98240E-03 0.00799 2.03562E-04 0.04540 1.07096E-03 0.02025 1.04737E-03 0.02181 3.18216E-03 0.01214 1.09692E-03 0.02026 3.81435E-04 0.03572 ];
LAMBDA (idx, [1: 14]) = [ 8.69875E-01 0.01924 1.24908E-02 3.0E-06 3.15473E-02 0.00039 1.10531E-01 0.00046 3.21989E-01 0.00035 1.34274E+00 0.00025 8.97243E+00 0.00224 ];

In the _sens.m file:
ADJ_PERT_KEFF_SENS_E_INT = reshape(ADJ_PERT_KEFF_SENS_E_INT, [2, SENS_N_PERT, SENS_N_ZAI, SENS_N_MAT]);
ADJ_PERT_KEFF_SENS_E_INT = permute(ADJ_PERT_KEFF_SENS_E_INT, [4, 3, 2, 1]);

BTW, I'm running sensitivity calculation. It has been hanging there for a day. Thank you very much for your help.

User avatar
Jaakko Leppänen
Site Admin
Posts: 2377
Joined: Thu Mar 18, 2010 10:43 pm
Security question 2: 0
Location: Espoo, Finland
Contact:

Re: MPI run does not stop using Torque

Post by Jaakko Leppänen » Sat Oct 21, 2017 4:53 pm

What about the run-time log?
- Jaakko

drvcj
Posts: 16
Joined: Fri Jan 24, 2014 8:40 am
Security question 1: No
Security question 2: 92
Location: North Carolina, US

Re: MPI run does not stop using Torque

Post by drvcj » Sat Oct 21, 2017 6:36 pm

Hi,

Here is the run time log.

Code: Select all

------------------------------------------------------------

Serpent 2.1.29 -- Static criticality source simulation

Title: "DLFR-Core"

Active cycle  500 / 500  Source neutrons :  7981

Running time :                  6:30:02
Estimated running time :        6:30:02
Estimated running time left :   0:00:00

Estimated relative CPU usage :    99.5%

k-eff (analog)    = 1.03990 +/- 0.00066  [1.03861  1.04119]
k-eff (implicit)  = 1.04034 +/- 0.00027  [1.03981  1.04088]

(O4) (SENS) (MPI=1) (OMP=1)
------------------------------------------------------------

Transport cycle completed in 4.44 hours.
Note there is another "completion" section 206 lines above the bottom:

Code: Select all

------------------------------------------------------------

Serpent 2.1.29 -- Static criticality source simulation

Title: "DLFR-Core"

Active cycle  500 / 500  Source neutrons :  7981

Running time :                  6:05:09
Estimated running time :        6:05:09
Estimated running time left :   0:00:00

Estimated relative CPU usage :    98.6%

k-eff (analog)    = 1.03990 +/- 0.00066  [1.03861  1.04119]
k-eff (implicit)  = 1.04034 +/- 0.00027  [1.03981  1.04088]

(O4) (SENS) (MPI=1) (OMP=1)
------------------------------------------------------------

Transport cycle completed in 4.35 hours.

User avatar
Jaakko Leppänen
Site Admin
Posts: 2377
Joined: Thu Mar 18, 2010 10:43 pm
Security question 2: 0
Location: Espoo, Finland
Contact:

Re: MPI run does not stop using Torque

Post by Jaakko Leppänen » Tue Oct 24, 2017 12:56 pm

The "MPI=1" suggests that you may be running multiple independent calculations instead of a single MPI-parallelized calculation. See: http://serpent.vtt.fi/mediawiki/index.p ... t_MPI_mode
- Jaakko

drvcj
Posts: 16
Joined: Fri Jan 24, 2014 8:40 am
Security question 1: No
Security question 2: 92
Location: North Carolina, US

Re: MPI run does not stop using Torque

Post by drvcj » Wed Oct 25, 2017 3:23 am

Thanks for the explanation.

I compiled Serpent 2 with the following option

Code: Select all

# GNU Compiler:

CC       = gcc
CFLAGS   = -Wall -ansi -ffast-math -O3
LDFLAGS  = -lm

# Parallel calculation using Open MP:

CFLAGS  += -DOPEN_MP
CFLAGS  += -fopenmp
LDFLAGS += -fopenmp

# This is needed in newer gcc versions to supress some unnecessary warnings

CFLAGS += -Wno-unused-but-set-variable

# Remove this if compilation with mpicc produces unnecessary warnings

CFLAGS += -pedantic
And at the same time, I also turned on

Code: Select all

# Parallel calculation using MPI:

# NOTE: The use of hybrid MPI/OpenMP mode requires thread-safe MPI
#       implementation. Some MPI implementations, such as some versions (?) of
#       Open MPI are not thread safe, which will cause problems in memory
#       management routines (calloc, realloc and free). These problems may
#       result in failure in memory allocation or unexpected behaviour due to
#       corrupted registers (?).

CC       = mpicc
CFLAGS  += -DMPI
The calculation is then executed by "sss2 -mpi N". To be honest, I'm not sure if the MPI messes things up. I'm going to run a case with "sss2 -omp N".

Post Reply