Predictor of DVM-program performance |
- last edited 20.10.00 -
Contents
1 Functions of
Predictor
2 The Content of
Predictor
3
Approach and Principle of Predictor Implementation
3.1 Representation of the program as hierarchy of intervals
3.2 Processing trace information
3.2.1 Accumulation of the information about intervals
3.2.2 Simulation of parallel execution of DVM-program
3.2.3 Calculation of the main performance characteristics
The predictor is intended for performance analysis and debugging of DVM-programs without usage of a real parallel computer (access to which is usually limited or complicated). With the predictor user can get the predicted temporal characteristics of execution of his program on MPP or workstation cluster in more or less detail.
The performance of parallel programs on multiprocessor computers with distributed memory is determined by the following major factors:
The DVM-system has an information whether sequential or parallel part of the program is executed on any processor at any moment. This approach is an essential advantage in comparison with the approach based on explicit use of communication libraries (MPI, PVM). Besides, all synchronization operations of the program are known. Therefore there is an opportunity to quantify the influence of four above factors on the program execution efficiency. It is impossible to distinguish sequential calculations from parallel ones and to determine a degree of parallelism of the program if the parallel program is submitted as a system of cooperating processes using communication libraries explicitly.
The opportunity to distinguish sequential and parallel parts of the program during its execution on the multiprocessor computer allows the predictor to give a user the following basic parameters of the parallel program execution:
Execution time is the maximum from the times of the program execution on each processor.
To calculate the main characteristic of parallel execution (efficiency coefficient) it is necessary to calculate two amount of time. First, a productive time required for the program execution on serial computer. Second, a total processor time, calculated as a product of execution time by number of processors. Efficiency coefficient is a ratio of the productive time to the total processor time.
The lost time is the total processor time of parallel execution subtracted by the productive time. If the programmer is not satisfied with efficiency coefficient he should analyze components of the lost time and their origin.
There are following components of the lost time:
To estimate the total potential losses, which can arise because of non-simultaneous start of collective operations on different processors, the special characteristic - synchronization should be calculated and given the user. A main reason of these losses is imbalance of processors loading during execution of parallel loops. For estimating the potential losses because of imbalance, the generalized characteristic imbalance should be given the user. Dissynchronization can occur not only due to imbalance, but also due to differences in completion times of collective operations on different processors. To evaluate the potential dissynchronization the programmer is provided with a special characteristic time variation of collective operation completion.
An important characteristic, showing the degree of overlapping of interprocessor communications with calculations, is the time of overlapping.
For more detail analysis of the program efficiency user can apply special language features to split program into intervals and to get performance characteristics for each interval.
The predictor is the system for processing the trace information gathered by LIB-DVM system during DVM-program execution on a workstation (in case of several workstations there is special utility to unite several traces in one). This system using trace information and parameters given by the user calculates and produces for him the predicted temporal characteristics of the program execution on MPP or on workstation cluster, using the library, which simulates a parallel execution of DVM-programs.
3 Approach and Principle of Predictor Implementation
3.1 Representation of the program as hierarchy of intervals
The execution of the program can be represented as a sequence of intervals. By default, the program is considered as one interval. Also user can define intervals by means of C-DVM and Fortran DVM languages (Constraint: when predictor is used, the integer expression with an interval must not appear inside parallel loop).
There is also opportunity to set a mode of compilation, when all parallel loops, or all sequential loops, containing parallel loops or all sequential loops are declared as intervals.
The user can also split any interval into smaller intervals or unite neighbor intervals (in order of execution) in a new one, i.e. to present the program as hierarchy of intervals of several levels (the whole program is an interval of highest level).
The mechanism of splitting the program into intervals serves for more detail analysis of behavior of the program during its execution. Looking through results with the help of the predictor, user can set a depth of details to leave out of consideration intervals of prescribed levels.
3.2 Processing trace information
For simplification of the further description we shall enter the following notions. An interval will be named simple, if it does not contain other intervals (nested intervals). We refer to intervals including nested intervals as composite. While processing trace information some intervals are active intervals (entered but not exited). The active interval of the lowest level, we shall name a current interval.
The trace information saved in file during the DVM-program execution is processed then on a workstation as follows.
3.2.1 Accumulation of the information about intervals
During processing the trace, the information on each interval is gathered (type of the interval, the number of the interval, the number of the level, number of the interval entries, source file name and a line number corresponding to the beginning of the interval). Also for current intervals the following times describing its execution on a serial computer are calculated (the processor performance is supposed to be equal to processors performance of a target system):
3.2.2 Simulation of parallel execution of DVM-program
The simulation of basic LIB-DVM functions is performed, using the trace information accumulated in the file and parameters of the multiprocessor system given by the user. It allows to determine times necessary for execution of interprocessor communication, and also information about distribution of calculations between processors. Thus, the following times describing parallel execution of intervals are calculated (these characteristics are generalized for all processors):
At the exit from composite intervals the correction of all characteristics described in sections 3.2.1. and 3.2.2. (except the characteristic Idle) is performed. Also each characteristic of any interval of the i-th level is calculated by means of adding to it the same characteristics of all nested intervals of the (i+1)-th level.
After correction and also at the exit from simple intervals the characteristic imbalance (Load_Imbalance) is calculated.
3.2.3 Calculation of the main performance characteristics
These characteristics concern all parallel program and its intervals. According to the degree of details, given by the user (level of intervals), the following information on predicted performance execution of the program can be given: