Test case 2: PWR full-core calculation

Set of standardized calculation cases for testing the parallel scalability of Serpent 2
Post Reply
User avatar
Jaakko Leppänen
Site Admin
Posts: 2179
Joined: Thu Mar 18, 2010 10:43 pm
Security question 2: 0
Location: Espoo, Finland
Contact:

Test case 2: PWR full-core calculation

Post by Jaakko Leppänen » Thu Dec 13, 2012 12:59 pm

The input file for the second case is found at:

http://virtual.vtt.fi/virtual/montecarl ... /PWR_CORE/

The case is taken from the Hoogenboom-Martin Monte Carlo performance benchmark, described at:

http://www.oecd-nea.org/dbprog/MonteCar ... chmark.htm

The challenge here is the large number of tallies -- six mesh plots, full-core power distribution with over 6 million zones ("set cpd ...") and three detectors. What I expect to see here is a good scalability if all tally calculation is switched off, but a saturation in perfomance after 5 or 6 CPU's when all tallies are calculated.

Things that will most likely affect the scalability include:

1) Batching interval (the 5th parameter in "set pop") -- increasing the number means that the code runs more criticality cycles without processing statistics between them.
2) Score buffering -- private buffer means that each OpenMP thread writes tally scores in it's own memory space. With this option, there is no need to set barriers when the data is written, which should improve scalability. On the other hand, collecting the data from the private buffers when the statistics are processed (done after each criticality cycle or batch) requires extra CPU time.
3) OpenMP reproducibility -- Population size is relatively large in this case, and if the reproducibility option is on, the CPU time required for sorting the banked fission source before each cycle is run may affect scalability.

Optimization mode most likely does not affect the scalability because no burnup calculation is run. The uniform fission source method, invoked by "set ufs ..." changes the total number of neutron histories run, so changing or removing this parameter changes your entire calculation. I don't expect this parameter to have a major impact in scalability.
- Jaakko

Ville Rintala
Posts: 3
Joined: Thu Mar 25, 2010 8:58 pm
Security question 2: 0
Location: Lappeenranta University of Technology, Finland

Re: Test case 2: PWR full-core calculation

Post by Ville Rintala » Sun Jan 06, 2013 2:50 pm

Some results from Lappeenranta University of Tech:

See this post: Re: Serpent 2 scalability benchmark

Calculation nodes: 2 x E5-2660 Xeon (8+8 cores, Hyper-threading off), 128 GiB

No changes in input file. I did following calculations (multiple serial jobs couldn't be started in one SLURM batch file with MPI support compiled in so different executable was used in first four cases):

One calculation node with only OpenMP support:
1. -omp 1 16 cases running
2. -omp 2 8 cases running
3. -omp 4 4 cases running
4. -omp 8 2 cases running
With MPI support:
5. -mpi 1 -omp 16
6. -mpi 2 -omp 8
7. -mpi 4 -omp 4
8. -mpi 8 -omp 2
Two node calculations:
9. -mpi 4 -omp 8
10. -mpi 8 -omp 4

Results:

Code: Select all

----------------------------------------------------
MPI OMP PARA :  Total        :  Transport          :
----------------------------------------------------
  1   1    1 :   714.8   1.0 :   714.7   1.0  1.00 : 
  1   2    2 :   375.5   1.9 :   375.5   1.9  1.00 : 
  1   4    4 :   200.6   3.6 :   200.6   3.6  1.00 : 
  1   8    8 :   114.9   6.2 :   114.9   6.2  1.00 : 
  1  16   16 :    78.3   9.1 :    78.3   9.1  1.00 : 
  2   8   16 :    68.9  10.4 :    68.9  10.4  1.00 : 
  4   4   16 :    64.4  11.1 :    64.4  11.1  1.00 : 
  8   2   16 :    64.1  11.1 :    64.1  11.1  1.00 : 
  4   8   32 :    47.2  15.1 :    47.2  15.1  1.00 : 
  8   4   32 :    40.7  17.6 :    40.7  17.6  1.00 : 
----------------------------------------------------
Image

hartanto
Posts: 13
Joined: Sun Oct 21, 2012 11:28 am
Security question 1: No
Security question 2: 92
Location: University of Sharjah

Re: Test case 2: PWR full-core calculation

Post by hartanto » Mon Jan 14, 2013 12:25 pm

This is the result from KAIST for the test case 2. No change was made in the input file.

The CPU type is Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz.

Code: Select all

---------------------------------------------------------
NODE MPI OMP PARA :  Total        :  Transport          :
---------------------------------------------------------
  1    1   1    1 :   646.3   1.0 :   646.3   1.0  1.00 : 
  1    1   2    2 :   336.5   1.9 :   336.5   1.9  1.00 : 
  1    2   1    2 :   325.4   2.0 :   325.4   2.0  1.00 : 
  1    1   4    4 :   185.6   3.5 :   185.6   3.5  1.00 : 
  1    2   2    4 :   181.2   3.6 :   181.2   3.6  1.00 : 
  1    4   1    4 :   177.3   3.6 :   177.3   3.6  1.00 : 
  1    1   8    8 :   110.4   5.9 :   110.4   5.9  1.00 : 
  1    2   4    8 :   105.3   6.1 :   105.3   6.1  1.00 : 
  1    4   2    8 :   104.8   6.2 :   104.8   6.2  1.00 : 
  1    8   1    8 :   101.4   6.4 :   101.4   6.4  1.00 : 
  1    1  16   16 :    88.4   7.3 :    88.4   7.3  1.00 : 
  1    2   8   16 :    69.7   9.3 :    69.7   9.3  1.00 : 
  1    4   4   16 :    65.8   9.8 :    65.8   9.8  1.00 : 
  1    8   2   16 :    64.1  10.1 :    64.1  10.1  1.00 : 
  1   16   1   16 :    65.7   9.8 :    65.7   9.8  1.00 : 
  2    2   8   16 :    72.6   8.9 :    72.6   8.9  1.00 : 
  4    4   4   16 :    61.5  10.5 :    61.5  10.5  1.00 : 
  2    2  16   32 :    71.3   9.1 :    71.3   9.1  1.00 : 
  2   32   1   32 :    38.9  16.6 :    38.9  16.6  1.00 : 
  4    4   8   32 :    55.5  11.7 :    55.5  11.7  1.00 : 
  8    8   4   32 :    41.3  15.6 :    41.3  15.6  1.00 : 
---------------------------------------------------------
plot.jpg
PWR
plot.jpg (40.56 KiB) Viewed 3104 times
Last edited by hartanto on Tue Jan 29, 2013 9:30 am, edited 3 times in total.

User avatar
Jaakko Leppänen
Site Admin
Posts: 2179
Joined: Thu Mar 18, 2010 10:43 pm
Security question 2: 0
Location: Espoo, Finland
Contact:

Re: Test case 2: PWR full-core calculation

Post by Jaakko Leppänen » Mon Jan 14, 2013 2:48 pm

This is pretty much what I expected -- the large number of tallies has a negative impact on scalability. As I mention in the introduction of this benchmark, this results from the excessive CPU time spent for processing the results between criticality cycles. I will try to see if some of that processing could be done in parallel. Another way to improve the scalability is to increase the batching interval, which means that the results are collected over multiple cycles before processing them. I ran a quick test with 12 CPU's with OpenMP parallelization using batching interval of 1:

Code: Select all

set pop 500000 1000 200 1.0 1
and 10:

Code: Select all

set pop 500000 1000 200 1.0 10
The running times that I got were 86 and 59 minutes, respectively, so increasing the batching interval will probably improve the scalability as well.

While doing these tests I noticed a bug in the calculation routines, which results in some biases and over-estimated statistical errors for some output variables. This will be fixed in the next update. The problem shouldn't affect the running times, so the results of these scalability tests should still be valid.
- Jaakko

User avatar
gavin.ridley.utk
Posts: 95
Joined: Wed Mar 16, 2016 7:07 am
Security question 1: No
Security question 2: 40

Re: Test case 2: PWR full-core calculation

Post by gavin.ridley.utk » Tue Nov 06, 2018 6:31 pm

Hi,

Is the input for this still available? I'd like to test it on our cluster.

Thanks,
Gavin Ridley

Yellowstone Energy

User avatar
Jaakko Leppänen
Site Admin
Posts: 2179
Joined: Thu Mar 18, 2010 10:43 pm
Security question 2: 0
Location: Espoo, Finland
Contact:

Re: Test case 2: PWR full-core calculation

Post by Jaakko Leppänen » Wed Nov 07, 2018 1:29 pm

The server settings no longer allow directory listing, but the input file can be accessed at:

http://virtual.vtt.fi/virtual/montecarl ... _CORE/core
- Jaakko

Post Reply