Parallel Program Performance Analyzer (PPPA)
Detailed design
* 21 June 2000 *


Contents

1 Introduction
2 Subsystem for statistics accumulation

2.1 Module statevnt.c
2.2 Module interval.c
2.3 Module statist.c
2.4 Definition of functions used for representation of interval information, general information of support system and times of collective operations

3 Processing subsystem

3.1 Class Cinter, module inter.cpp

3.1.1 Constructor
3.1.2 Destructor
3.1.3 Functions to save characteristics
3.1.4 Functions to read characteristics
3.1.5 Interval identification information reading
3.1.6 Comparison of identification information
3.1.7 Summing time characteristics

3.2 Class CtreeInter, module treeinter.cpp

3.2.1 Constructor
3.2.2 Destructor
3.2.3 Getting constructor result
3.2.4 Error message
3.2.5 Interval tree walking
3.2.6 Summing characteristics of all interval tree

3.3 Class Csynchro, module synchro.cpp

3.3.1 Constructor
3.3.2 Destructor
3.3.3 Getting constructor result
3.3.4 Error message
3.3.5 Calculation of dissynchronization and variation times in the interval
3.3.6 Getting the number of dissynchronization or variation times
3.3.7 Reading the value of dissynchronization or variation time
3.3.8 Reading the value of the nearest dissynchronization and variation time

3.4 Class CstatRead, module statread.cpp

3.4.1 Constructor
3.4.2 Destructor
3.4.3 Number of processors
3.4.4 Input/output processor number
3.4.5 Type of message passing
3.4.6 Tree walking
3.4.7 Getting constructor result
3.4.8 Error message
3.4.9 Read identifire information of interval
3.4.10 Reading time characteristics accorrding to defined processor numbers
3.4.11 Minimal, maxumal and sum time characteristics
3.4.12 Matrix rank
3.4.13 Number of collective operations

3.5 Module statfile.cpp, function main


1 Introduction

The performance analyzer consists of two subsystems - accumulation subsystem and subsystem of information processing.

The first subsystem provides accumulation of execution characteristics of parallel program on each processor. This subsystem is called from Lib-DVM during the parallel program execution. Besides Fortran DVM language have the features for description of intervals of the program execution, for which the user would like to get the performance characteristics. The compiler inserts accumulation subsystem calls at the beginning and the end of each interval. The information from every processor outputs into a file upon the termination of the program.

The second subsystem running on a workstation, processes the information gathered on parallel computer and outputs performance characteristics requested by user.

2 Subsystem for statistics accumulation

Subsystem of statistics accumulation consists of the following modules:

statevnt.c, statist.c, interval.c, strall.h, statist.h, interval.h

These modules are intended for accumulation of following information: DVM-function execution time, DVM-system interval execution time, dissynchronization times of collective operations and time variations. Statistics are written into file, name and length of the file are defined in parameter file. This file is used by performance analyzer to calculate performance characteristics.

2.1 Module statevnt.c

This module is intended for marking DVM-function, the beginning time and completion time of which are to be written into file. These times are used by performance analyzer to calculate dissynchronization characteristics and time variation for each interval.

2.2 Module interval.c

This module contains functions called by converter to create and delete intervals. These functions also provide writing trace of corresponding event into files.

void __callstd binter_(long *nfrag, long *val)
void __callstd bsloop_(long *nfrag)
void __callstd bploop_(long *nfrag)

nfrag - interval number,
nline - line number in the source file,
val - expression value in user interval.

These functions serve for creating intervals of three types: user intervals, parallel intervals and sequential intervals.

void __callstd einter_(long *nfrag, long *nline)
void __callstd eloop_(long *nfrag, long *nline)

These functions serve for closing current interval, the first function closes user interval, the second one closes sequential and parallel intervals. Function prototypes are in file interval.h.

2.3 Module statist.c

Functions of the module provide direct creation of intervals, searching them in buffer, deleting intervals and writing both interval information and synchronization times into file. External function definitions are in file statist.h.

The support system calls the following functions during task execution:

void stat_init(void) - the function fills buffer with zeros, initiates variables, writes support system information (for example the number of processors) at the beginning of the buffer and creates an interval for the whole program.
void stat_done(void) - the function closes the whole program interval. If there is no enough space in buffer for intervals or for times of collective operations then the function outputs error message.
void stat_event(int numbevent) - the function writes times of beginning and completion of collective operations at the end of the buffer. Parameters ‘numbevent’ is a situation number.

Interval is identified by line number, by the name of source file and by expression value in case of user interval. The line number and file name are saved in external variables DVM_LINE[0] and DVM_FILE[0], expression value is passed as parameters. References to higher level interval, to lower level interval and to the next interval of the same level are saved for the interval. Interval is considered as current in the following cases: if it has been created last; if it has been found in the list of intervals of corresponding level and its identifier values are equal to the given ones; in case interval is closed the interval of higher level becomes current.

The following functions serves for creating searching and deleting intervals.

void CreateInter(int typef,long val) - the function allocates record (fields are described below) in buffer. Parameter ‘typef’ is interval type, ‘val’ is the value of expression defined in interval created in program by means of language. Newly created interval becomes current, DVM-function execution times are written in the current interval. If there is no enough space in buffer interval is created in memory.
int FindInter(long val) - the function is intended for searching interval. This function is called before creating interval function. The function returns 1- interval is found, 0 - interval is not found.
void EndInter(long nline) - closing current interval, parameter ‘nline’- line number of the beginning of the interval - is passed for the control.
void FreeInter(void) - free memory function. The function is called at the end of execution when messages about lack of the buffer space are writing.

2.4 Definition of functions used for representation of interval information, general information of support system and times of collective operations

These definitions are in file strall.h. The file is used at the second stage by processing subsystem.

#define SZSH sizeof(short)
#define SZL sizeof(long)
#define SZINT sizeof(int)
#define SZD sizeof(double)
#define SZV sizeof(void*)

General information structure written at the beginning of each buffer.

typedef struct tvms_ch { unsigned char reverse[SZSH]
    rank[SZSH],
    maxnlev[SZSH],
    szsh[SZSH],
    szl[SZSH],
    szv[SZSH],
    szd[SZSH],
    smallbuff[SZSH]
    proccount[SZL],
    mpstype[SZL],
    ioproc[SZL],
    qfrag[SZL],
    pbuffer[SZV],
    lbuf[SZL],
    linter[SZL],
    lsynchro[SZL]
} *pvms_ch;    
   
reverse - information was collected not at workstation,
rank - matrix rank,
maxnlev - maximal nesting level,
szsh - size of short,
szl - size of long,
szv - size of void*,
szd - size of double,
smallbuff - no space in file,
proccount - number of processors,
mpstype - type of message passing,
ioproc - input/output processor number,
qfrag - number of intervals,
pbuffer - pointer to the buffer,
lbuf - buffer size,
linter - size in bytes of all interval records,
lsynchro - size in bytes of synchronization time records.

Interval structure.

At the end of each interval source file name is written.

typedef struct tinter_ch {











}*pinter_ch;
unsigned char
type[SZSH],
nlev[SZSH],
nenter[SZD],
nline[SZL],
valvar[SZL],
qproc[SZL],
ninter[SZL],
up[SZV],
next[SZV],
down[SZV],
ptimes[SZV],
times[3*StatGrpCount*StatGrpCount][SZD];
   
type - interval type,
nlev - nesting level number,
enter - number of interval entries,
nline - user program line number,
alvar - expression value,
qproc - number of processors, on which interval was executed,
ninter - interval number,
up - pointer to an interval of higher level,
next - pointer to next interval of the same level,
down - pointer to an interval of lower level,
ptimes - pointer to time array, saved by system,
times - time array.

Structure of time of collective operation.

typedef struct tsyn_ch {



}*psyn_ch;
unsigned char
nitem[SZSH],
ninter[SZL],
pgrp[SZV],
time [SZD]
   
nitem - collective operation number,
ninter - current interval number,
pgrp - reference to operation group,
time - time when situation happen.

3 Processing subsystem

This tool consists of the following modules: inter.cpp, treeinter.cpp, synchro.cpp, statread.cpp, statfile.cpp. Header files: inter.h, treeinter.h, synchro.h, statread.h, strall.h, bool.h, sysstat.h. Function main is in statfile.cpp module and may have seven parameters (the first two parameters are required). Parameters: name of file where information has been saved during program execution; name of file for processing information; three symbols y/n indicating output of general, comparative and basic characteristics; maximal level of nesting intervals; list of processor numbers for which basic characteristics are to be output.

3.1 Class Cinter, module inter.cpp

This class contains member functions and member data for processing interval time characteristics.

enum typegrp {COM,RCOM,SYN,VAR,OVERLAP,CALL};
enum typecom {IO,RD,SH,RA,RED};
enum typetime {LOST, INSUFUSR, INSUF, IDLE, SUMCOM, SUMRCOM, SUMSYN, SUMVAR, SUMOVERLAP, IMB, EXEC, CPUUSR, CPU, IOTIME, START, PROC, ITER};
   
typedef struct tident {





}ident;
char
unsigned long
unsigned long
short
long
double
typefrag
*pname,
nline,
proc;
nlev,
expr,
nenter,
t
   
pname - name of source file where interval is defined,
nline - line number in the source file,
proc - the number of processors where interval was executed,
nlev - nesting level number,
expr - expression value,
nenter - number of interval entries,
t - interval type.
   
double mgen[ITER+1] - time array, to output characteristics for processors,
double mcom[RED+1] - array of passing message times in collective operations,
double mrcom[RED+1] - array of times of real dissynchronization,
double msyn[RED+1] - array of dissynchronization times,
double mvar[RED+1] - time variation array,
double moverlap[RED+1] - array of operation overlapping,
double mcall[RED+1] - number of collective operation calls,
Ident idint - interval identifier information.

3.1.1 Constructor

Cinter ( S_GRPTIMES
ident
unsigned long
(*pt)[StatGrpCount],
p,
nint);
   
(*pt)StatGrpCount - pointer to time array read from file,
p - interval identifier,
nint - interval number.

Serve for writing interval identifier values and some characteristics which values are in file.

3.1.2 Destructor

~CInter(void); - Free source file name memory.

3.1.3 Functions to save characteristics

void AddTime(typetime t2,double val);
void WriteTime(typetime t2,double val);
void AddTime(typegrp t1,typecom t2,double val);

These member functions write or add up new time characteristic values. The first function AddTime and function WriteTime are intended for processing ‘mgen’ array, the first parameter is index number and the second parameters is value itself. The second function AddTime is working with the rest of arrays, parameter ‘t1’ indicates array, parameter ‘t2’ is array index value, parameter ‘val’ is value.

3.1.4 Functions to read characteristics

void ReadTime(typegrp t1,typecom t2,double &val);
void ReadTime(typetime t2,double &val);

These member functions read time characteristic values. Parameters are the same as for writing but last parameter is passed as reference.

3.1.5 Interval identification information reading

void ReadIdent(ident **p);

Set pointer equal to interval identifier address.

3.1.6 Comparison of identification information

int CompIdent(ident *p);

Compare current interval identifier with interval identifier from other processor, parameter ‘p’ is pointer to interval identifier. If identifiers are fully matching (all structure elements are equal ) the function returns 1, otherwise it returns 0.

3.1.7 Summing time characteristics

void SumInter(CInter *p);

This member function sum up interval characteristic values with characteristic values of interval of higher level. Parameter ‘p’ – pointer to higher level interval.

3.2 Class CtreeInter, module treeinter.cpp

This class is intended for working with intervals collected on one processor. It contains the following member data: ‘nproc’ – processor number; ‘qinter’ – number of intervals; ‘curninter’ – current interval number; ‘maxnlev’ – maximal nesting level number; pointer to dynamic structure array containing interval numbers of lower level, higher level or the same level.

typedef struct ttree{




} ptree;
unsigned long


int
Cinter
up,
next,
down,
sign,
*pint
   
up, next, down - interval numbers,
sign - a sign that interval has been already passed during interval walking,
*pint - pointer to class Cinter,
ptree (*pt) - pointer to structure array.

3.2.1 Constructor

CtreeInter( FILE
int
char*
unsigned int
unsigned long
short
*stream,
lint,
pbuffer,
n,
qfrag,
maxn);
   
stream - file descriptor pointer,
lint - information length in bytes,
pbuffer - beginning of the buffer at the accumulation stage,
n - processor number,
qfrag - number of intervals,
maxn - maximal nesting level.

This member function allocates array of structures of the given interval tree, fills it and calls class constructor Cinter with interval characteristics read from file.

3.2.2 Destructor

CTreeInter(); - free structure array memory.

3.2.3 Getting constructor result

bool Valid();

The member function analyzes result of constructor execution. It returns TRUE if constructor execution completed successfully, otherwise it returns FALSE.

3.2.4 Error message

void TextErr(char *t);

The member function receives error messages if function Valid() returned FALSE.

3.2.5 Interval tree walking

void BeginInter(void);
void NextInter(ident **id);
CInter *FindInter(ident *id);

The member functions are intended for interval walking. Parameter *id is pointer to interval identifier. Interval walking is made in the same order as interval filling at the stage of accumulation. Function FindInter finds interval with identification values given in parameter, if interval is not found the function returns NULL.

3.2.6 Summing characteristics of all interval tree

void SumLevel(void);

The member function sums up temporal characteristics of low level intervals with high level intervals for all interval tree.

3.3 Class Csynchro, module synchro.cpp

The class is intended for saving dissynchronization time values and time variation values.

typedef struct tsyn {



}psyn*ps;
short
unsigned long
void
double
nitem,
ninter,
*pgrp,
time
   
nitem - collective operation number, set in module statevnt.c,
ninter - current interval number, at the moment when situation arise,
pgrp - reference to operation group,
time - time when situation arises,
psyn - pointer to dissynchronization structure array.

Member functions:

3.3.1 Constructor

CSynchro( FILE
int
*stream,
l);
   
stream - pointer to the file written during DVM-program execution,
l - length of written information.

Reading times of collective operations from file.

3.3.2 Destructor

~CSynchro(); - free requested memory.

3.3.3 Getting constructor result

bool Valid();

The member function returns TRUE if constructor finished successfully, otherwise it returns FALSE.

3.3.4 Error message

void TextErr(char *t);

The member function output error message.

3.3.5 Calculation of dissynchronization and variation times in the interval

void Count( unsigned long
short
nint,
waserr);
   
nint - interval number,
waserr - no space in file to save dissynchronization times.

The member function calculates the number of dissynchronizations and time variation for interval of given number for each type of collective operations. Interval of given number becomes current. The following member functions return values for the current interval.

3.3.6 Getting the number of dissynchronization or variation times

int GetCount(typecollect nitem);

typecollect nitem – type of collective operation.

The member function returns number of dissynchronizations for the given type of collective operations.

3.3.7 Reading the value of dissynchronization or variation time

double Find(typecollect nitem);
double GetCurr(void);

The functions serves for getting dissynchronization time and time variation for collective operation. Function Find() searches a new time value for the current interval and function GetCurr() reads this current value.

3.3.8 Reading the value of the nearest dissynchronization and variation time

double FindNearest(typecollect nitem);

The function searches the dissynchronization time of the start of completing of the waiting operation nearest to the current time set by Find() function. The function is used for operation time overlapping calculation.

3.4 Class CstatRead, module statread.cpp

The class serves for reading support system information from file, calls all class constructors listed above, provides interval walking for all interval collected on all processors, calculates some temporal characteristics and writes them into corresponding intervals, finds maximal, minimal and total values of those characteristics.

Member functions:

3.4.1 Constructor

CStatRead(const char *name);

const char *name – file name.

The constructor serves for reading support system information from file and calls constructors of other classes to write information from the file into its own structures.

3.4.2 Destructor

~CStatRead(void);

The member function calls destructors of other classes and free memory allocated for its dynamic arrays to store pointers to these classes.

3.4.3 Number of processors

unsigned long QProc(void);

The member function returns number of processors on which the program has been executed.

3.4.4 Input/output processor number

unsigned long IOProc(void);

The member function returns input/output processor number.

3.4.5 Type of message passing

tmps MPSType(void);

The function returns type of message passing.

3.4.6 Tree walking

int BeginTreeWalk(void);
int TreeWalk(void);

These member functions serve for interval tree walking and return the number of intervals which have the same identification values.

3.4.7 Getting constructor result

bool Valid(void);

The member function returns TRUE if constructor completed successfully, otherwise it returns FALSE.

3.4.8 Error message

void TextErr(char* t);

The function writes error message at pointer ‘t’.

3.4.9 Read identifire information of interval

void ReadTitle(char* p);

char *p – pointer to a symbol string where interval identifying header is written.

3.4.10 Reading time characteristics accorrding to defined processor numbers

bool ReadProc( typeprint
unsigned long
int
double
char*
t,
*pnumb,
qnumb,
sum,
str);
   
t - type of information of characteristics for each processor,
pnumb - pointer to array of processor numbers, for which characteristics are to be output,
qnumb - number of elements of processor number array,
sum - total characteristic value for each processor,
str - string where characteristic name and time values are written.

The member function serves for output of program execution characteristics on each processor, returns TRUE when there is no more characteristic of given type.

3.4.11 Minimal, maxumal and sum time characteristics

void MinMaxSum( typeprint
double*
unsigned long*
double*
unsigned long*
double*
t,
min,
nprocmin,
max,
nprocmax,
sum);
   
t - characteristic type,
min - pointer to array of minimal characteristic values,
nprocmin - pointer to processor number array, corresponding to minimal values,
max - pointer to array of maximal characteristic values,
nprocmax - pointer to processor number array, corresponding to maximal values,
sum - pointer to array of total characteristic values.

The member function is intended for calculation of minimal, maximal and total characteristics, it is used for output of comparative characteristics.

3.4.12 Matrix rank

void VMSSize(char *p);

char *p – pointer to string where matrix rank is written for the matrix on which the program was executed.

3.4.13 Number of collective operations

long ReadCall(typecom t);

typecom t – collective operation type.

It returns number of collective operation calls of the given operation type.

3.5 Module statfile.cpp, function main

The function reads parameters, calls constructor CstatRead(…) to read saved times and intervals from the file, outputs requested characteristics into the file. Characteristics with zero value are not output.