Execution performance of NAS tests
The tables, presented below, contain information about sizes and performance of MPI-programs and DVM-programs for NAS tests.
In comparison with sequential program a size of DVM-program is increased on average by 5%, whereas the size of MPI-program is increased on average by 40%. Note, that the size of DVM-program is increased because of inserting special comments independent from array sizes and a number of processors. Additional code of MPI-program is complicated system of managing programs to pass messages, which depend on array sizes and the number of processors.
Performances of DVM-programs and MPI-programs are comparable. However sometimes DVM-program performance is less by 50-60%. It is caused by two reasons. First, DVM-system doesn't use MPI collective operations, which are performed on some parallel systems more efficiently than their realization via point-to-point communications. Second, MPI-versions of some tests use parallelization along two dimensions of processor grid, whereas DVM-versions of all tests are performed now only on a line of processors. At present the works to eliminate these two reasons are performed.
Size of codes (in lines)
Test |
SEQ |
MPI |
DVM |
MPI/SEQ |
DVM/SEQ |
BT | 4059 | 5744 | 4146 | 1.41 | 1.02 |
CG | 1108 | 1793 | 1118 | 1.62 | 1.01 |
EP | 641 | 670 | 649 | 1.04 | 1.01 |
FT | 1500 | 2352 | 1605 | 1.57 | 1.07 |
IS | 925 | 1218 | 1085 | 1.32 | 1.17 |
LU | 4189 | 5497 | 4269 | 1.31 | 1.02 |
MG | 1898 | 2857 | 1992 | 1.50 | 1.05 |
SP | 3361 | 5020 | 3580 | 1.49 | 1.06 |
S | 17681 | 25151 | 18444 | 1.42 | 1.04 |
SEQ – serial code
MPI – parallel code in Fortran77 or C (IS) + MPI
DVM - parallel code in FORTRAN-DVM or C-DVM (IS)
NCI-cluster
Pentium III/500+Mayrinet,
Windows NT,
MPI-FM,
Visual C++ 6.0,
Digital
Fortran 5.0
RCC-cluster Pentium III/500 + SCI,
Red Hat Linux release 6.1
(Cartman),
ScaMPI,
Portland Group C compiler,
Portland
Group F77 compiler
MVS-1000/16 Pentium III/800 + Fast Ethernet,
Red Hat Linux release 7.0
(Guinness),
Router, LAM-MPI,
GNU C compiler version 2.96,
GNU Fortran compiler version 2.96
BT test execution times in seconds (class A)
NP | NCI-cluster (Peking) | RCC-cluster (MSU) | MVS-1000/16 (KIAM) | ||||||
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
|
1 | 2548,5 | ||||||||
2 | |||||||||
4 | 656,9 | 716,7 | 1,09 | 606,1 | 712,3 | 1,17 | 568,2 | 571,1 | 1,00 |
9 | 446,3 | 390,4 | 0,87 | 284,7 | 380,6 | 1,34 | 314,8 | 303,5 | 0,96 |
16 | 271,4 | 270,8 | 1,00 | 220,8 | 231,2 | 1,04 | 208,9 |
CG test execution times in seconds (class A)
NP | NCI-cluster (Peking) | RCC-cluster (MSU) | MVS-1000/16 (KIAM) | ||||||
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
|
1 | 43,7 | 45,4 | 1,04 | 41,4 | 42,9 | 1,04 | 30,6 | 30,9 | 1,01 |
2 | 22,0 | 24,9 | 1,13 | 28,3 | 22,8 | 0,81 | 16,7 | 19,4 | 1,16 |
4 | 12,0 | 14,0 | 1,17 | 11,7 | 13,6 | 1,16 | 12,0 | 13,1 | 1,09 |
8 | 6,4 | 9,0 | 1,41 | 6,3 | 9,1 | 1,44 | 7,3 | 9,9 | 1,36 |
16 | 5,0 | 8,9 | 1,78 | 5,0 | 7,0 | 1,40 | 8,6 |
EP test execution times in seconds (class A)
NP | NCI-cluster (Peking) | RCC-cluster (MSU) | MVS-1000/16 (KIAM) | ||||||
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
MPI |
DVM |
DVM/ MPI |
|
1 | 434,3 | 414,4 | 0,95 | 389,3 | 393,1 | 1,01 | 306,7 | 305,7 | 0,99 |
2 | 217,1 | 207,3 | 0,95 | 179,7 | 196,7 | 1,09 | 153,2 | 153,0 | 1,00 |
4 | 108,6 | 103,7 | 0,95 | 97,7 | 98,4 | 1,01 | 77,4 | 77,3 | 1,00 |
8 | 54,3 | 51,9 | 0,95 | 48,9 | 49,3 | 1,01 | 38,7 | 38,9 | 1,01 |
16 | 28,0 | 26,9 | 0,96 | 24,5 | 25,0 | 1,02 | 21,1 |
FT test execution times in seconds (class A)
NP | NCI-cluster (Peking) | RCC-cluster (MSU) | MVS-1000/16 (KIAM) | ||||||
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
MPI |
DVM |
DVM/ MPI |
|
1 | 130,2 | 136,1 | 1,04 | ||||||
2 | 88,2 | 75,8 | 0,86 | 58,1 | |||||
4 | 47,5 | 45,9 | 0,97 | 42,5 | 42,6 | 1,00 | 33,7 | 32,9 | 0,98 |
8 | 27,1 | 24,7 | 0,91 | 21,2 | 26,0 | 1,23 | 19,8 | 19,8 | 1,00 |
16 | 21,2 | 14,8 | 0,70 | 13,3 | 14,5 | 1,09 | 13,5 |
IS test execution times in seconds (class A)
NP | NCI-cluster (Peking) | RCC-cluster (MSU) | MVS-1000/16 (KIAM) | ||||||
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
|
1 | 18,3 | 19,6 | 1,07 | 15,7 | 19,5 | 1,24 | 10,1 | 13,2 | 1,31 |
2 | 11,7 | 14,9 | 1,27 | 10,7 | 13,5 | 1,26 | 11,9 | 14,8 | 1,24 |
4 | 7,7 | 8,6 | 1,12 | 5,2 | 7,2 | 1,38 | 8,3 | 9,0 | 1,08 |
8 | 5,0 | 4,6 | 0,92 | 2,9 | 3,9 | 1,34 | 5,4 | 5,0 | 0,92 |
16 | 3,8 | 3,2 | 0,84 | 2,3 | 3,4 | 1,48 | 3,3 |
LU test execution times in seconds (class A)
NP | NCI-cluster (Peking) | RCC-cluster (MSU) | MVS-1000/16 (KIAM) | ||||||
MPI |
DVM |
DVM/ MPI |
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
|
1 | 1796,6 | 1739,7 | 0,97 | 1581,5 | 1886,0 | 1,19 | 1186,2 | ||
2 | 911,9 | 820,5 | 0,90 | 989,5 | 974,4 | 0,98 | 617,5 | 624,9 | 1,01 |
4 | 452,8 | 448,9 | 0,99 | 361,5 | 512,3 | 1,41 | 323,4 | 349,6 | 1,08 |
8 | 202,4 | 248,5 | 1,23 | 165,7 | 265,9 | 1,60 | 172,9 | 198,6 | 1,15 |
16 | 111,3 | 172,2 | 1,55 | 84,5 | 143,2 | 1,69 | 141,4 |
MG test execution times in seconds (class A)
NP | NCI-cluster (Peking) | RCC-cluster (MSU) | MVS-1000/16 (KIAM) | ||||||
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
|
1 | 77,7 | 71,5 | 0,92 | ||||||
2 | 47,9 | 36,5 | 0,76 | 33,0 | 30,5 | 0,92 | |||
4 | 20,7 | 22,2 | 1,07 | 22,2 | 18,8 | 0,85 | 18,2 | 16,1 | 0,88 |
8 | 9,3 | 13,5 | 1,45 | 9,7 | 10,5 | 1,08 | 9,5 | 9,1 | 0,96 |
16 | 5,9 | 9,7 | 1,64 | 7,0 | 6,7 | 0,96 | 6,5 |
SP test execution times in seconds (class A)
NP | NCI-cluster (Peking) | RCC-cluster (MSU) | MVS-1000/16 (KIAM) | ||||||
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
MPI | DVM | DVM/ MPI |
|
1 | 1681,0 | 2040,0 | 1,21 | 1670,7 | 2132,2 | 1,28 | 1616,5 | 1534,4 | 0,95 |
2 | |||||||||
4 | 435,4 | 562,4 | 1,29 | 435,2 | 616,6 | 1,42 | 472,4 | 450,3 | 0,95 |
9 | 271,9 | 309,5 | 1,14 | 207,7 | 311,7 | 1,50 | 274,9 | 258,2 | 0,94 |
16 | 150,2 | 222,7 | 1,48 | 146,5 | 201,6 | 1,38 | 196,8 |