Page 1 of 1


Posted: Thu Mar 01, 2012 9:31 pm
by orca.blu
Some running-time results attached.
I am trying quite long burnup calculations with serpent 2.
With only 2 burn zones opt 3 seems the best choice.
I have 32 core and 60 Gb of memory (i. e., no ram limitation).
I tried different combination of OMP and MPI, looking for the best solution.
I thought that [32 MPI_TASKS * 1 OMP_THREADS] was the best solution by far.
This is not the case. Maybe the reason is that with many MPI_TASKS a lot of time is wasted
waiting for all the task to finish after each transport cycles???
It seems that in may case the best option is to have a similar number of MPI_TASKS and OMP_THREADS
(e.g., 8 MPI_TASKS * 4 OMP_THREADS).
Up to 8 OMP_THREADS transport_cpu_usage is close to 1.
RUNNING_TIME.png (39.75 KiB) Viewed 3533 times
CPU_USAGE.png (38.54 KiB) Viewed 3533 times
MEMSIZE.png (32.93 KiB) Viewed 3533 times
45 burning step
45x2 (predictor corrector) transport cycle
5.5e6 total neutrons per cycle
dots: 10000 x (500 + 50 skip)
solid lines: 25000 x (200 + 20 skip)
"LELI" [10 10]
xscalc 2
no ures
case: molten salt fast reactor

The [32 MPI_TASKS * 1 OMP_THREADS] calculations where performed after recompiling the code,
without OMP parallel options.

Re: MPI vs OMP

Posted: Fri Mar 02, 2012 1:02 am
by Jaakko Leppänen
Thank you for the results!

Like you said, with only 2 burnable material regions optimization mode 3 is probably the best choice. Since the parallelization of burnup and processing routines is (mostly) handled by dividing the materials between MPI tasks and OpenMP threads, the speed-up factor for those routines cannot be much higher than 2. What kind of values did you get for processing time (PROCESS_TIME)?

A lot of data is passed between the MPI tasks, especially during the processing and burnup routines, and the overhead should be proportional to the number of tasks. In this case it doesn't seem to have a major impact. Since MPI reproducibility is switched off, the transport routine doesn't involve any communication between the tasks until the end.

The moderate performance of MPI parallelization might result from the fact that some tasks are slower than others, and the calculation has to wait for the slowest to complete. Another factor is that since the population size is divided by the number of tasks, some output routines become relatively more time-consuming. This is especially the case for the mesh plotter. Do you have any large mesh plots in the calculation?

Did you run the calculations with or without the debug option, and have you compared the running times to a single-CPU calculation?

Re: MPI vs OMP

Posted: Fri Mar 02, 2012 2:51 am
by orca.blu
I can't access my data now.
As far as I remember, I always got PROCESS_TIME in the order of few minutes (X.XXXE+00).
(Not sure)

Compilation was made with no DEBUG or GD options.

In the average cpu_utilization graph (attached), in the calculation 32 mpi * 1 omp,
it seems like if all the cpu are continuosly working at 100%...
so... what are they doing? :)
cpu utilization
cpu.jpg (31.36 KiB) Viewed 3525 times
Anyway, I think that serpent has very good parallelization capabilities,
at least for my needs (I know that 2 burning materials are not representative)
I'll try opt 4 with as much MPI_TASKS as I can next week.

Re: MPI vs OMP

Posted: Fri Mar 02, 2012 11:04 am
by Jaakko Leppänen
The CPU utilization could be explained by the fact that all CPU's are constantly working in MPI mode, except when they are waiting for others. In OpenMP parallelization only parts of the calculation are parallelized, and the rest is handled by a single CPU.

Re: MPI vs OMP

Posted: Fri Mar 02, 2012 2:25 pm
by orca.blu
ok thank you.

PROCESS_TIME(last idx,1) is always less than 3 min for any case but
[32 MPI * 1 OMP with executable compiled with OMP parallelization ON]
(not presented in the previous post).

In that case, total PROCESS_TIME increases very much (from 2.5 min to 36 min).

Re: MPI vs OMP

Posted: Fri Mar 02, 2012 2:48 pm
by Jaakko Leppänen
As a last minute addition to update 2.1.3 I added a timer for MPI routines (MPI_OVERHEAD_TIME in the _res.m output). I'm not 100% sure this measures the true overhead from MPI parallelization, but it should give some esimate on the time spent for communication between the tasks and waiting for the slowest task to complete. To be precise, the waiting time is actually calculated relative to task 0, so if the other tasks finish earlier, the waiting time is also zero.

Re: MPI vs OMP

Posted: Fri Mar 02, 2012 8:35 pm
by orca.blu
I tried a shorter burnup case (5x2 steps) with the latest sss2 version.

MPI_OVERHEAD_TIME seems always quite low.
TIMES.png (55.44 KiB) Viewed 3515 times