Execution performance of NAS tests

The tables, presented below, contain information about sizes and performance of MPI-programs and DVM-programs for NAS tests.

In comparison with sequential program a size of DVM-program is increased on average by 5%, whereas the size of MPI-program is increased on average by 40%. Note, that the size of DVM-program is increased because of inserting special comments independent from array sizes and a number of processors. Additional code of MPI-program is complicated system of managing programs to pass messages, which depend on array sizes and the number of processors.

Performances of  DVM-programs and MPI-programs are comparable. However sometimes DVM-program performance is less by 50-60%. It is caused by two reasons. First, DVM-system doesn't use MPI collective operations, which are performed on some parallel systems more efficiently than their realization via point-to-point communications. Second, MPI-versions of some tests use parallelization along two dimensions of processor grid, whereas DVM-versions of all tests are performed now only on a line of processors. At present the works to eliminate these two reasons are performed.

Size of codes (in lines)

Test

SEQ

MPI

DVM

MPI/SEQ

DVM/SEQ

BT 4059 5744 4146 1.41 1.02
CG 1108 1793 1118 1.62 1.01
EP 641 670 649 1.04 1.01
FT 1500 2352 1605 1.57 1.07
IS 925 1218 1085 1.32 1.17
LU 4189 5497 4269 1.31 1.02
MG 1898 2857 1992 1.50 1.05
SP 3361 5020 3580 1.49 1.06
S 17681 25151 18444 1.42 1.04

SEQ – serial code
MPI – parallel code in Fortran77 or C (IS) + MPI
DVM - parallel code in FORTRAN-DVM or C-DVM (IS)

NCI-cluster       Pentium III/500+Mayrinet,
                          Windows NT,
                          MPI-FM,
                          Visual C++ 6.0,
                          Digital Fortran 5.0 
RCC-cluster      Pentium III/500 + SCI,
                          Red Hat Linux release 6.1 (Cartman),
                          ScaMPI,
                          Portland Group C compiler,
                          Portland Group F77  compiler
MVS-1000/16   Pentium III/800 + Fast Ethernet,
                          Red Hat Linux release 7.0 (Guinness),
                          Router, LAM-MPI,
                          GNU C compiler version 2.96,
                          GNU Fortran compiler version 2.96

BT test execution times  in seconds (class A)

NP NCI-cluster (Peking) RCC-cluster (MSU) MVS-1000/16 (KIAM)
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
1         2548,5        
2                  
4 656,9 716,7 1,09 606,1 712,3 1,17 568,2 571,1 1,00
9 446,3 390,4 0,87 284,7 380,6 1,34 314,8 303,5 0,96
16 271,4 270,8 1,00 220,8 231,2 1,04   208,9  

CG test execution times  in seconds (class A)

NP NCI-cluster (Peking) RCC-cluster (MSU) MVS-1000/16 (KIAM)
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
1 43,7 45,4 1,04 41,4 42,9 1,04 30,6 30,9 1,01
2 22,0 24,9 1,13 28,3 22,8 0,81 16,7 19,4 1,16
4 12,0 14,0 1,17 11,7 13,6 1,16 12,0 13,1 1,09
8  6,4  9,0 1,41  6,3  9,1 1,44  7,3  9,9 1,36
16  5,0  8,9 1,78  5,0  7,0 1,40    8,6  

EP test execution times  in seconds (class A)

NP NCI-cluster (Peking) RCC-cluster (MSU) MVS-1000/16 (KIAM)
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI

MPI

DVM

DVM/
MPI
1 434,3 414,4 0,95 389,3 393,1 1,01 306,7 305,7 0,99
2 217,1 207,3 0,95 179,7 196,7 1,09 153,2 153,0 1,00
4 108,6 103,7 0,95  97,7  98,4 1,01  77,4  77,3 1,00
8  54,3  51,9 0,95  48,9  49,3 1,01  38,7  38,9 1,01
16  28,0  26,9 0,96  24,5  25,0 1,02    21,1  

FT test execution times  in seconds (class A)

NP NCI-cluster (Peking) RCC-cluster (MSU) MVS-1000/16 (KIAM)
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI

MPI

DVM

DVM/
MPI
1       130,2  136,1  1,04      
2       88,2 75,8 0,86 58,1    
4 47,5 45,9 0,97 42,5 42,6 1,00 33,7 32,9 0,98
8 27,1 24,7 0,91 21,2 26,0 1,23 19,8 19,8 1,00
16 21,2 14,8 0,70 13,3 14,5 1,09   13,5  

IS test execution times  in seconds (class A)

NP NCI-cluster (Peking)  RCC-cluster (MSU) MVS-1000/16 (KIAM)
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
1 18,3 19,6 1,07 15,7 19,5 1,24 10,1 13,2 1,31
2 11,7 14,9 1,27 10,7 13,5 1,26 11,9 14,8 1,24
4  7,7  8,6 1,12  5,2 7,2 1,38  8,3  9,0 1,08
8  5,0  4,6 0,92  2,9 3,9 1,34  5,4  5,0 0,92
16  3,8  3,2 0,84  2,3  3,4 1,48    3,3  

LU test execution times  in seconds (class A)

NP NCI-cluster (Peking) RCC-cluster (MSU) MVS-1000/16 (KIAM)
MPI

DVM

DVM/
MPI
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
1 1796,6 1739,7 0,97 1581,5 1886,0 1,19 1186,2    
2 911,9 820,5 0,90 989,5 974,4 0,98 617,5 624,9 1,01
4 452,8 448,9 0,99 361,5 512,3 1,41 323,4 349,6 1,08
8 202,4 248,5 1,23 165,7 265,9 1,60 172,9 198,6 1,15
16 111,3 172,2 1,55  84,5 143,2 1,69   141,4  

MG test execution times  in seconds (class A)

NP NCI-cluster (Peking) RCC-cluster (MSU) MVS-1000/16 (KIAM)
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
1       77,7 71,5 0,92      
2       47,9 36,5 0,76 33,0 30,5 0,92
4 20,7 22,2 1,07 22,2 18,8 0,85 18,2 16,1 0,88
8  9,3 13,5 1,45  9,7 10,5 1,08  9,5  9,1 0,96
16  5,9  9,7 1,64  7,0  6,7 0,96    6,5  

SP test execution times  in seconds (class A)

NP NCI-cluster (Peking) RCC-cluster (MSU) MVS-1000/16 (KIAM)
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
MPI DVM DVM/
MPI
1 1681,0 2040,0 1,21 1670,7 2132,2 1,28 1616,5 1534,4 0,95
2                  
4 435,4 562,4 1,29 435,2 616,6 1,42 472,4 450,3 0,95
9 271,9 309,5 1,14 207,7 311,7 1,50 274,9 258,2 0,94
16 150,2 222,7 1,48 146,5 201,6 1,38   196,8