<= Predictor of DVM-program performance. Detailed design.
(beginning)

Predictor of DVM-program performance.
Detailed design (continuation)
* October, 2000*

- last edited 22.05.01 -


Appendix 1. Link from field name in output HTML-file to field name in structure _IntervalResult

Field Name     Anchor inter variable
         
Efficiency     Effic Efficiency
Execution time     Exec Execution_time
Total time     Total Total_time
Productive time     Ptime Productive_time
  CPU   Ptimec Productive_CPU_time
  SYS   Ptimes Productive_SYS_time
  I/O   Ptimei IO_time
Lost time     Lost Lost_time
  Insufficient parallelism   Insuf Insuff_parallelism
    USR iuser Insuff_parallelism_sys
    SYS isyst Insuff_parallelism_usr
  Communications   comm Communication
    SYN csyn Communication_SYNCH
  Idle time   idle Idle
Load imbalance     imbal Load_imbalance
Synchronization     synch Synchronization
Time variation     vary Time_variation
Overlap     over Overlap
         
IO # op   nopi Num_op_io
  Communications   comi IO_comm
  Real synch   synchi IO_synch
  Overlap   overi IO_overlap
Reduction # op   nopr Num_op_reduct
  Communications   comr Wait_reduction
  Real synch   synchr Reduction_synch
  Overlap   overr Reduction_overlap
Shadow # op   nops Num_op_shadow
  Communications   coms Wait_shadow
  Real synch   synchs Shadow_synch
  Overlap   overs Shadow_overlap
Remote access # op   nopa Num_op_remote
  Communications   coma Remote_access
  Real synch   syncha Remote_synch
  Overlap   overa Remote_overlap
Redistribution # op   nopd Num_op_redist
  Communications   comd Redistribution
  Real synch   synchd Redistribution_synch
  Overlap   overd Redistribution_overlap

Appendix 2. Definition of auxiliary functions and classes

Below there is a description of functions and classes used in implementation of algorithms described in the previous chapter.

// Base class for most of the classes
class Space {
protected:
	long Rank; 	      	  // Number of space dimensions
	vector<long> SizeArray;	  // Size of each dimension
	vector<long> MultArray;	  // Multiplier for each dimension
public:
	Space();
	Space(long ARank, vector<long> ASizeArray, vector<long> MultArray);
	Space(long ARank, long *ASizeArray);
	Space(const Space &);
	~Space();

	Space & operator= (const Space &x);
	long GetRank();
	long GetSize(long AAxis); 
	void GetSI(long LI, vector<long> & SI); 
	long GetLI(const vector<long> & SI); 
	long GetCenterLI();
	long GetSpecLI(long LI, long dim, int shift);
	long GetLSize();
	long GetNumInDim(long LI, long dimNum);
	long GetDistance(long LI1, long LI2); 
};
GetRank returns space rank.
GetSize returns size of the space with number AAxis.
GetSI calculates coordinates of SI by the linear index LI.
GetLI calculates the linear index by coordinates in the given space.
GetCenterLI returns the linear index of the element that is the geometric center of the space.
GetSpecLI returns the linear index of the element moved by shift in the dimension dim from the element with linear index LI.
GetLSize returns linear size (number of elements) of the space.
GetNumInDim returns coordinate of the element with linear index LI in the given dimension dimNum.
GetDistance distance between two elements of the space with linear indexes LI1 and LI2.


“Virtual machine” (“Processor system”) class.

class VM : public Space {
	int MType;	// distributed processor system type
			// 0 – net with bus organization, 1 – transputer system
	double TStart;	// Start time of exchange operation
	double TByte;	// Tyme to send one byte
public:
			// constructor
	VM(vector<long>& ASizeArray, int AMType, double ATStart, double ATByte, 
		double AProcPower);
	~VM();
	double	 getTByte();
	double	getTStart();
	int 	getMType();
 };


“Abstract machine representation” class.

class AMView : public Space  {
public:	 
	VM *VM_Dis;			// Processor system on which the template is mapped
	list<DArray *> AlignArrays;	// List of arrays aligned by the given template
	vector<DistAxis> DistRule;	// Rule by which the template is mapped
					// on the processor system
	vector<long> FillArr;		// Array containing the information about how the processor
					// system is filled with the template elements

	AMView(long ARank, long *ASizeArray);
	AMView(const AMView &); 
	~AMView();

	void DelDA(DArray *RAln_da); 
	void AddDA(DArray *Aln_da); 
	void DisAM(VM *AVM_Dis, long AParamCount, long *AAxisArray,
		long *ADistrParamArray);

	double RDisAM(long AParamCount,  long *AAxisArray, long *ADistrParamArray,
		long ANewSign);
	bool IsDistribute();
}; 
DelDA - removes DArray from the list of aligned arrays.
AddDA - adds DArray to the list of the aligned arrays.
DisAM - function that maps the template onto processor system. The pointer on the processor system, mapping rule and array with information about filling the processor system with template elements are initialized according to function parameters.
RdisAM - function that determines the time spent in exchanges when the template mapping is changed (template redistribution). The algorithm implemented in it is described in 3.2.
IsDistribute - checks if the template is already distributed on the processor system.


“Distributed array” class.

class DArray : public Space {
private:
	void PrepareAlign(long& TempRank, long *AAxisArray, long *ACoeffArray,
		long *AConstArray, vector<AlignAxis>& IniRule);
	long CheckIndex(long *InitIndexArray, long *LastIndexArray, long *StepArray); 
public:
	long TypeSize;			// Size of one array element in bytes.
	AMView *AM_Dis;			// Template the array is aligned by.
	vector<AlignAxis> AlignRule	// Align rule.
	int Repl;			// Criterion of fully replicated array. 

	DArray();
	DArray(long ARank, long *ASizeArray, long ATypeSize);
	DArray(const DArray &);
	~DArray();
	DArray & operator= (DArray &x);

	void AlnDA(AMView *APattern, long *AAxisArray, long *ACoeffArray,
		long *AConstArray);
	void AlnDA(DArray *APattern, long *AAxisArray, long *ACoeffArray,
		long *AConstArray);
	double RAlnDA(AMView *APattern, long *AAxisArray, long *ACoeffArray,
		long *AConstArray, long ANewSign);
	double RAlnDA(DArray *APattern, long *AAxisArray, long *ACoeffArray,
		long *AConstArray, long ANewSign);
	friend double ArrayCopy(DArray *AFromArray, long *AFromInitIndexArray, 
		long *AFromLastIndexArray, long *AFromStepArray, DArray *AToArray, 
		long *AToInitIndexArray, long *AToLastIndexArray, long *AToStepArray, 
		long ACopyRegim);
	long GetMapDim(long arrDim, int &dir); 
	bool IsAlign();
};
PrepareAlign initializes the rule by which the array is aligned by the template.
CheckIndex returns number of elements in the array section given in the function parameters (0 – if it is empty or array indexes are out of bounds).
AlnDA functions that set the position (alignment) of the distributed array. In the second function, the template is set indirectly through the distributed array. Template pointer is initialized. In the first function, the fully replicated array criterion is determined. In the second, it is inherited from the array that acts as a mapping template. Also, the align rule is initialized in the first function using the PrepareAlign function, and besides, in the second function the rule is altered according to how the template is aligned (alignment superposition is used to receive the resulting alignment).
RAlnDA function that determines the time needed for exchanges during the array realignment. Algorithm is described in 3.2.
ArrayCopy function that determines the time needed for exchanges while loading the buffers with remote array elements. Algorithm is described in 3.5.
GetMapDim function returns number of the processor system dimension on which the arrDim array dimension is mapped as a result. If the array dimension is replicated by all the directions of processor matrix, 0 is returned. 1 or –1 is put into dir, according to the direction of array dimension break-down.
IsAlign checks if the array is aligned by the template.


”Bound group” class.

class BoundGroup  {
	AMView *amPtr;		// Template by which the arrays with bounds in the group
				// are aligned 
	CommCost boundCost; 	// Processor exchange information.
public:
	BoundGroup();
	virtual ~BoundGroup();
	void AddBound(DArray *ADArray, long *ALeftBSizeArray, 
		long *ARightBSizeArray, long ACornerSign);
	double StartB();
};
AddBound inclusion of the distributed array bound in the bound group. Algorithm is described in 3.3.
StartB function that determines the time spent in distributed array bound exchanges, with the bounds that are in the group. Algorithm is described in 3.3.


“Reduction variable” class.

class RedVar  {
public:
	long RedElmSize; 	// Size of the reduction variable–array in bytes
	long RedArrLength; 	// Number of elements in the reduction variable-array
	long LocElmSize;   	// Size of one element of the array with auxiliary information 

	RedVar(long ARedElmSize, long ARedArrLength, long ALocElmSize);
	RedVar();
	virtual ~RedVar();
	long GetSize();
};
GetSize returns the size of the reduction variable and of the array with auxiliary information in bytes.


“Reduction group” class.

class RedGroup  {
public:
	VM *vmPtr;		    // Pointer to the processor system
	vector<RedVar *> redVars;   // Array of reduction variables
	long TotalSize;		    // Total size of reduction variables in the group with their
				    // auxiliary information, in bytes
	long CentralProc;	    // Linear index of the geometrical center of the processor
				    //system

	RedGroup(VM *AvmPtr);
	virtual ~RedGroup();	
	void AddRV(RedVar *ARedVar); 
	double StartR(DArray *APattern, long ALoopRank, long *AAxisArray);
};
void AddRV inclusion of the reduction variable in the reduction group. Algorithm is described in 3.4.
StartR function that returns the time spent in exchanges during the reduction operation. Algorithm is described in 3.4.


“Distribution of the array dimension” class.

class DistAxis  {
public:
	long Attr;     // Distribution type
	long Axis;     // Number of the template dimension 
	long PAxis;    // Number of the processor system dimension

	DistAxis(long AAttr, long AAxis, long APAxis);
	DistAxis();
	virtual ~DistAxis();
	DistAxis& operator= (const DistAxis&);
};

“Alignment of the distributed array by the template” class.

class AlignAxis  {
public:
	long Attr; 	// Distribution type
	long Axis;	// Number of the array dimension
	long TAxis;	// Number of the template dimension
	long A;		// Coefficient for the index variable of the array in the linear align rule of
			//the TAxis template dimension
	long B;		// Constant of the linear align rule for the TAxis template dimension
	long Bound;	// Dimension size of the array that acts as a template
			//during the partial replication of the array being aligned

	AlignAxis(long AAttr, long AAxis, long ATAxis,  long AA = 0, long AB = 0, long ABound = 0);
	AlignAxis();
	virtual ~AlignAxis();
	AlignAxis& operator= (const AlignAxis&); 

};

“Shadow edge by one distributed array dimension” class.

class DimBound  {
public:
	long arrDim;		// Array dimension number
	long vmDim;		// Processor system dimension number
	int dir;		// 1 or –1 according to the break-down direction of the array dimension
	long LeftBSize;		// Width of the left bound for the arrDim array dimension
	long RightBSize;	// Width of the right bound for the arrDim array dimension

	DimBound(long AarrDim, long AvmDim, int Adir, long ALeftBSize,  long ARightBSize);
	DimBound();
	virtual ~DimBound();
};

“Array section” class.

class Block {
	vector<LS> LSDim;   // Vector containing the corresponding linear segments for every array
			    // dimension, that describe the section
public: 
	Block(vector<LS> &v); 
	Block(DArray *da, long ProcLI);
	Block();
	virtual ~Block();
	Block & operator =(const Block & x);

	long GetRank();
	long GetBlockSize(); 
	long GetBlockSizeMult(long dim); 
	long GetBlockSizeMult2(long dim1, long dim2);
	bool IsLeft(long arrDim, long elem); 
	bool IsRight(long arrDim, long elem);
	bool IsBoundIn(long *ALeftBSizeArray,long *ARightBSizeArray);
	bool empty();
	friend Block operator^ (Block &x, Block &y); 
};
Block creates da array section situated on the processor with ProcLI linear index.
GetRank returns rank of the section.
GetBlockSize number of elements in the section.
GetBlockSizeMult,GetBlockSizeMult2 these functions return the result of multiplying sizes of the section in all the dimensions except dimensions that have been given in the function call.
IsLeft, IsRight checks if the element elem is positioned to the left (right) of the section in the arrDim dimension.
IsBoundIn checks if the distributed array bound is in the given section.
Empty checks if the section has no elements.
Block operator^ returns the intersection of the sections given in the function call.


“Linear segment” class.

class LS  {
public:	
	long Lower; 	 // Lower index value
	long Upper;	 // Upper index value

	LS(long ALower, long AUpper);
	LS();
	virtual ~LS();

	long GetLSSize(); 
	void transform(long A, long B, long daDimSize);
	bool IsLeft(long elem);
	bool IsRight(long elem);
	bool IsBoundIn(long ALeftBSize, long ARightBSize); 
	bool empty();
	LS operator^ (LS &x); 
};
GetLSSize returns the size of the linear segment.
Transform transforms the linear segment of the template into the linear segment of the distributed array aligned by the given template.
IsLeft, IsRight check that the elem element is to the left(right) of the segment.
IsBoundIn check that the given bound is in the bounds of the segment.
empty check if there are no elements in the segment.
LS operator segment intersection operator.


“Evaluation of interprocessor exchanges” class.

class CommCost {
public:
	Dim2Array transfer;   	// Array that contains the information about the number of bytes 
				// transferred between two processors
	VM *vm;  		// Pointer to the processor system

	CommCost(VM *Avm);
	CommCost();
	virtual ~CommCost();
	CommCost & operator =(const CommCost &);

	double GetCost();  
	void Update(DArray *oldDA, DArray *newDA);
 	void BoundUpdate(DArray *daPtr, vector<DimBound> & dimInfo, bool IsConer); 
	void CopyUpdate(DArray *FromArray, Block & readBlock);
};
GetCost returns the time spent in interprocessor exchanges inside the system. Algorithm is described in 3.2.
Update function that alters the transfer array according to the exchanges between the processors that occur during redistribution of the array. Algorithm implemented is described in 3.2.
BoundUpdate function that changes the transfer array according to transfers that occur during the given distributed array bound exchange. Algorithm is described in 3.3.
CopyUpdate function that changes the transfer array according to exchanges that occur during the replication of the readBlock section of the FromArray by all the processors.

 

Appendix 3. Main functions of time extrapolation

Constructor of the “Virtual machine” object

VM::VM( vector<long> ASizeArray, int AMType, double ATStart, double ATByte, double AProcPower );

ASizeArray vector, element in i-th position is the size of the given processor system in dimension i + 1 (0 £ i £ ARank – 1);
AMType type of the distributed processor system (0 – net with bus organization, 1 – transputer system);
ATStart start time of the exchange operation;
ATByte time to send one byte;
AProcPower relative processor power.
     

Constructor of the “Abstract machine representation” object

AMView::AMView( vector< long> ASizeArray );

ASizeArray vector, element in i-th position is the size of the template in dimension i+1 (0 £ i £ ARank–1).


Template mapping

void AMView::DisAM (ImLastVM *AVM_Dis, vector<long> AAxisArray,
                                        vector<long> *ADistrParamArray );

AVM_Dis pointer to the processor system on which the template is mapped;
AAxisArray vector, element in j-th position is the number of the template dimension which is used in the mapping rule for (j+1)-th processor system dimension;
ADistrParamArray ignored (only two mapping rules are provided (read Lib-DVM documentation). In the first rule the block size is calculated, not taken from AdistrParamArray).
   

Task of redistributing the template on the processor system and evaluating the time of the redistribution.

double AMView::RdisAM( vector<long> AAxisArray,
                                              vector<long> ADistrParamArray, long ANewSign );

AAxisArray vector, element in j-th position is the number of the template dimension which is used in the mapping rule for (j+1)-th processor system dimension;
ADistrParamArray ignored (only two mapping rules are provided (read Lib-DVM documentation). In the first rule the block size is calculated, not taken from AdistrParamArray);
ANewSign flag of updating contents of the redistributed arrays, active if value is 1.
     

Constructor of the “Distributed array” object

DArray::DArray( vector<long> ASizeArray, vector<long> AlowShdWidthArray,
                               vector<long> AhiShdWidthArray, long ATypeSize );

ASizeArray vector, element in i-th position contains the size of the array being created, in dimension i+1 (0 £ i £ ARank–1).
AlowShdWidthArray vector, element in i-th position contains the width of the left boundary, in dimension i+1.
AhiShdWidthArray vector, element in i-th position contains the width of the right boundary, in dimension i+1.
ATypeSize size of one array element in bytes.
     

Distributed array alignment

void DArray::AlnDA(AMView *APattern, vector<long> AAxisArray,
                                    vector<long> ACoeffArray, vector<long> AConstArray );

void DArray::AlnDA(DArray *APattern, vector<long> AAxisArray,
                                    vector<long> ACoeffArray, vector<long> AConstArray );

APattern pointer to the align pattern.
AAxisArray vector, element in j-th position contains number of the index variable (number of the dimension) of the distributed array for the linear align rule of the (j+1)-th pattern dimension.
ACoeffArray vector, element in j-th position contains the coefficient for the index variable of the distributed array in the linear align rule of the (j+1)-th pattern dimension.
AConstArray vector, element in j-th position contains constant for the linear align rule of the (j+1)-th pattern dimension.
     

Realignment of the distributed array. Evaluation of the time needed to perform this operation.

double DArray::RAlnDA( AMView *APattern, vector<long> AAxisArray,
                                            vector<long> ACoeffArray, vector<long> AConstArray,
                                            long ANewSign );

double DArray::RAlnDA( DArray *APattern, vector<long> AAxisArray,
                                            vector<long> ACoeffArray, vector<long> AConstArray,
                                            long ANewSign );

APattern pointer to the align pattern (array or template).
AAxisArray vector, element in j-th position contains number of the index variable (number of the dimension) of the distributed array for the linear align rule of the (j+1)-th pattern dimension.
ACoeffArray vector, element in j-th position contains the coefficient for the index variable of the distributed array in the linear align rule of the (j+1)-th pattern dimension.
AConstArray vector, element in j-th position contains constant for the linear align rule of the (j+1)-th pattern dimension.
ANewSign flag of updating contents of the redistributed array, active if value is 1.

The function returns time of the array realignment.


Constructor of the “Parallel loop” object.

ParLoop::ParLoop( long ARank );

ARank rank of the parallel loop.
     

Creation of the parallel loop.

void ParLoop::MapPL( AMView *APattern, vector<long> AAxisArray,
                                         vector<long> ACoeffArray, vector<long> AConstArray,
                                         vector<long> AInitIndexArray,
                                         vector<long> ALastIndexArray, vector<long> AStepArray );

void ParLoop::MapPL( DArray *APattern, vector<long> AAxisArray,
                                         vector<long> ACoeffArray, vector<long>AConstArray, vector<long>AInitIndexArray,
                                         vector<long> ALastIndexArray, vector<long>AStepArray );

APattern pointer to the parallel loop pattern.
AAxisArray vector, element in j-th position contains number of the index variable (number of the dimension) of the parallel loop for the linear align rule of the (j+1)-th pattern dimension.
ACoeffArray vector, element in j-th position contains the coefficient for the index variable of the parallel loop in the linear align rule of the (j+1)-th pattern dimension.
AConstArray vector, element in j-th position contains constant for the linear align rule of the (j+1)-th pattern dimension.
AInitIndexArray vector, element in i-th position contains start value for the index variable of the (i+1)-th parallel loop dimension.
ALastIndexArray vector, element in i-th position contains end value for the index variable of the (i+1)-th parallel loop dimension.
AStepArray vector, element in i-th position contains step value for the index variable of the (i+1)-th parallel loop dimension.
     

Parallel loop mapping.

void ParLoop::ExFirst( ParLoop *AParLoop, BoundGroup *ABoundGroup)ImLast;

AParLoop pointer to the parallel loop.
ABoundGroup pointer to the group of bounds that must be exchanged after calculating the exported elements of the local parts of the distributed arrays.
     

Set flag of changed order of loop iteration execution.

void ParLoop::ImLast( ParLoop *AParLoop, BoundGroup *ABoundGroup)ImLast;

AParLoop pointer to the parallel loop.
ABoundGroup pointer to the group of bounds that must be exchanged after calculating the exported elements of the local parts of the distributed arrays. The function sets flag of changed order of loop iteration execution.
     

Calculation of the time spent for exchanges while loading buffers by remote array elements

double ArrayCopy( DArray *AFromArray, vector<long> AFromInitIndexArray,
                                 vector<long> AFromLastIndexArray,
                                 vector<long> AFromStepArray, DArray *AToArray,
                                 vector<long> AToInitIndexArray,
                                 vector<long> AToLastIndexArray,
                                 vector<long> AToStepArray, long ACopyRegim );

AFromArray pointer to the distributed array under reading.
AFromInitIndexArray vector, element in i-th position contains start value for the index variable of the (i+1)-th dimension of the array under reading.
AFromLastIndexArray vector, element in i-th position contains end value for the index variable of the (i+1)-th dimension of the array under reading.
AFromStepArray vector, element in i-th position contains step value for the index variable of the (i+1)-th dimension of the array under reading.
AToArray header of written distributed array.
AToInitIndexArray vector, element in j-th position contains start value for the index variable of the (j+1)-th dimension of written distributed array.
AToLastIndexArray vector, element in j-th position contains end value for the index variable of the (j+1)-th dimension of written distributed array.
AToStepArray vector, element in j-th position contains step value for the index variable of the (j+1)-th dimension of written distributed array.
ACopyRegim copy mode.

The function returns required time.


Constructor of the “Edge group” object

BoundGroup::BoundGroup( );

Creation of the edge group. An empty edge group is created (does not contain any edge).


Add an array edges into the group.

void BoundGroup::AddBound( DArray *ADArray, vector<long> ALeftBSizeArray,
                                                    vector<long> ARightBSizeArray, long ACornerSign);

ADArray pointer to the distributed array.
ALeftBSizeArray vector, element in i-th position contains width of the low edge of (i+1)-th dimension of the array.
ARightBSizeArray vector, element in i-th position contains width of the high edge of (i+1)-th dimension of the array.
ACornerSign flag of including corner elements in the edge.
     

Calculation of the time spent for exchanges of distributed array edges included in the group.

double BoundGroup::StartB( );

The function returns required time.


Constructor of the “Reduction variable” object

RedVar::RedVar( long ARedElmSize, long ARedArrLength, long ALocElmSize);

AredElmSize size of one element of the reduction variable-array in bytes.
ARedArrLength number of elements in the reduction variable-array.
ALocElmSize size of one element of the array with auxiliary information in bytes.
     

Constructor of the “Reduction group” object

RedGroup::RedGroup( VM *AvmPtr );

AvmPtr pointer to the processor system.
     

Creation of the reduction group. An empty reduction group is created (does not contain any reduction variable).

Add reduction variable into the reduction group.

void RedGroup::AddRV( RedVar *ARedVar );

ARedVar pointer to the reduction variable.
     

Calculation of the time spent for exchanges during reduction operation execution.

double RedGroup::StartR( ParLoop *AParLoop );

AParLoop pointer to the parallel loop in which values of reduction variables of the given group are calculated.
     

Appendix 4. Trace fragments and parameters of Lib-DVM functions simulated by Predictor

CREATE AN ABSTRACT MACHINE REPRESENTATION

getamr_ 3.3 revision of pointers to element of abstract machine representation

AMRef getamr_ (AMViewRef *AMViewRefPtr, long IndexArray[]);

*AMViewRefPtr pointer to the abstract machine representation.
IndexArray array, i-th element contains the index value of the requested element (abstract machine) on (i+1)-th dimension.
call_getamr_	TIME=0.000000 	LINE=6 	FILE=tasks.fdv
AMViewRefPtr=4dff90; AMViewRef=9024c0;
IndexArray[0]=0;  

ret_getamr_	TIME=0.000000 	LINE=6	FILE=tasks.fdv
AMRef=903350;

MULTIPROCESSOR SYSTEMS

genblk_ Weights of multiprocessor system elements

long genblk_(PSRef *PSRefPtr, AMViewRef *AMViewRefPtr,
                      AddrType AxisWeightAddr[], long *AxisCountPtr,
                      long *DoubleSignPtr );

*PSRefPtr pointer to multiprocessor system, weights are set for elements of this system.
*AMViewRefPtr pointer to the representation of multiprocessor system, weights of coordinates will be used while mapping the multiprocessor system on the given processor system.
AxisWeightAddr[] weights of processor coordinates are defined for each dimension of processor system.
*AxisCountPtr (nonnegative number) defines the number of elements in AxisWeightAddr array.
*DoubleSignPtr non-zero flag of representation of processor weight coordinates as real positive numbers (double).
call_genblk_     TIME=0.000000 LINE=7      FILE=gausgb.fdv
PSRefPtr=4d4c48; PSRef=8417d0; AMViewRefPtr=4d4c60; AMViewRef=842860; AxisCount=1; DoubleSign=0
AxisWeightAddr[0][0] = 3

ret_genblk_      TIME=0.000000 LINE=7      FILE=gausgb.fdv

crtps_ 4.2 Create subsystem of the given multiprocessor system

PSRef crtps_ (PSRef *PSRefPtr, long InitIndexArray[], long LastIndexArray[],
                         long *StaticSignPtr);

*PSRefPtr pointer to the processor system (source), its subsystem is to be created.
InitIndexArray array, i-th element contains the start value of the source processor system on (i+1)-th dimension.
LastIndexArray array, i-th element contains the end value of the source processor system on (i+1)-th dimension.
*StaticSignPtr flag of static subsystem creation.
call_crtps_	TIME=0.000000	LINE=15	FILE=tasks.fdv
PSRefPtr=4ded68; PSRef=902450; StaticSign=0;
InitIndexArray[0]=0;  
LastIndexArray[0]=0;  

     SizeArray[0]=1;
     CoordWeight[0]= 1.00(1.00)  
ret_crtps_	TIME=0.000000 	LINE=15	FILE=tasks.fdv
PSRef=903950;

psview_ 4.3 Reconfiguration of multiprocessor system

PSRef psview_ (PSRef *PSRefPtr, long *RankPtr, long SizeArray[],
                            long *StaticSignPtr);

*PSRefPtr pointer to the source processor system to be reconfigured.
*RankPtr rank of resulting processor system.
SizeArray array, i-th element contains the rank of resulting processor system on (i+1)-th dimension.
*StaticSignPtr flag of static resulting processor system.
call_psview_	TIME=0.000000 	LINE=6	FILE=tasks.fdv
PSRefPtr=4dff84; PSRef=901330; Rank=1; StaticSign=0;
SizeArray[0]=1;  

     SizeArray[0]=1;
     CoordWeight[0]= 1.00(1.00)  
ret_psview_	TIME=0.000000 LINE=6      FILE=tasks.fdv
PSRef=902450;

MAPPING DISTRIBUTED ARRAY

getamv_ 7.8 revision of pointer to abstract machine representation, which the given distributed array is mapped on

AMViewRef getamv_ (long * ArrayHeader);

ArrayHeader header of the distributed array.
call_getamv_	TIME=0.000000	LINE=16	FILE=tasks.fdv
ArrayHeader=4dfee8; ArrayHandlePtr=903530;
ret_getamv_	TIME=0.000000	LINE=16	FILE=tasks.fdv
AMViewRef=0;

PROGRAM AS AN AGGREGATE OF SUBTASKS EXECUTED IN PARALLEL

mapam_ 10.1 Mapping an abstract machine (create subtask)

long mapam_ (AMRef *AMRefPtr, PSRef *PSRefPtr );

*AMRefPtr pointer to the abstract machine to be mapped.
*PSRefPtr pointer to processor subsystem determining processors allocated for the abstract machine (domain of the created subtask execution).
call_mapam_	TIME=0.000000 	LINE=51     	FILE=tsk_ra.cdv
AMRefPtr=4b3cc0; AMRef=823210; PSRefPtr=4b3ec4; PSRef=8231a0;
ret_mapam_	TIME=0.000000 	LINE=51     	FILE=tsk_ra.cdv

runam_ 10.2 Start of the subtask execution (activation, start)

long runam_ (AMRef *AMRefPtr);

*AMRefPtr pointer to the abstract machine of the started subtask.
call_runam_ 	 TIME=0.000000 	LINE=102   	 FILE=tsk_ra.cdv
AMRefPtr=4b3cc0; AMRef=823210;
ret_runam_	 TIME=0.000000 	LINE=102    	FILE=tsk_ra.cdv

stopam_ 10.3 End of execution of the current subtask (stop)

long stopam_ (void);

call_stopam_      TIME=0.000000 LINE=104    FILE=tsk_ra.cdv
ret_stopam_       TIME=0.000000 LINE=104    FILE=tsk_ra.cdv

REDUCTION

strtrd_ 11.5 Start of reduction group

long strtrd_ (RedGroupRef *RedGroupRefPtr);

*RedGroupRefPtr pointer to reduction group.
call_strtrd_      TIME=0.000000 LINE=129    FILE=tsk_ra.cdv
RedGroupRefPtr=6ffcdc; RedGroupRef=8291f0;
rf_MAX;    rt_DOUBLE; RVAddr = 6ffd24; RVVal = 7.000000
ret_strtrd_       TIME=0.000000 LINE=129    FILE=tsk_ra.cdv

waitrd_ 11.6 Waiting for the reduction completion

long waitrd_ (RedGroupRef *RedGroupRefPtr);

*RedGroupRefPtr pointer to the reduction group.
call_waitrd_      TIME=0.000000 LINE=129    FILE=tsk_ra.cdv
RedGroupRefPtr=6ffcdc; RedGroupRef=8291f0;
rf_MAX;    rt_DOUBLE; RVAddr = 6ffd24; RVVal = 7.000000
rf_MAX;    rt_DOUBLE; RVAddr = 6ffd24; RVVal = 7.000000

ret_waitrd_       TIME=0.000000 LINE=129    FILE=tsk_ra.cdv

DISTRIBUTED ARRAY EDGE EXCHANGE

recvsh_ 12.4 Initialization of receiving imported elements of the given edge group

long recvsh_(ShadowGroupRefPtr *ShadowGroupRefPtr);

*ShadowGroupRefPtr pointer to edge group.
call_recvsh_      TIME=0.000000 LINE=20     FILE=sor.fdv
ShadowGroupRefPtr=4cf6b8; ShadowGroupRef=8433c0;

ret_recvsh_       TIME=0.000000 LINE=20     FILE=sor.fdv

sendsh_ 12.5 Initialization of sending imported elements of the given edge group

long sendsh_(ShadowGroupRefPtr *ShadowGroupRefPtr);

*ShadowGroupRefPtr pointer to edge group.
call_sendsh_      TIME=0.000000 LINE=29     FILE=sor.fdv
ShadowGroupRefPtr=4cf6b8; ShadowGroupRef=8433c0;
 
ret_sendsh_       TIME=0.000000 LINE=29     FILE=sor.fdv

REGULAR ACCESS TO REMOTE DATA

crtrbl_ 14.1 Create buffer of distributed array remote elements

long crtrbl_(long RemArrayHeader[], long BufferHeader[], void *BasePtr,
                     long *StaticSignPtr, LoopRef *LoopRefPtr, long AxisArray[],
                     long CoeffArray[], long ConstArray[]);

RemArrayHeader header of remote distributed array.
BufferHeader header of the buffer for remote elements.
BasePtr base pointer for access to the buffer of remote elements.
*StaticSignPtr flag of static buffer creation.
*LoopRefPtr pointer to the parallel loop, where remote array elements from the buffer are required.
AxisArray array, i-th element contains dimension number of the parallel loop (k(i+1)), corresponding to (i+1)-th dimension of the remote array.
CoeffArray array, i-th element contains coefficient of the index variable of linear retrieval rule for (i+1)-th dimension of the remote array A(i+1).
ConstArray   array, i-th element contains constant of linear retrieval rule for (i+1)-th dimension of the remote array B(i+1).
call_crtrbl_	TIME=0.000000 	LINE=45	FILE=tasks.fdv
RemArrayHeader=4dfd2c; RemArrayHandlePtr=9057c0; BufferHeader=4dfd48;
BasePtr=4e1200; StaticSign=1; LoopRefPtr=4dffd0; LoopRef=906b70;
 AxisArray[0]=1;  AxisArray[1]=0;  
CoeffArray[0]=1; CoeffArray[1]=0;  
ConstArray[0]=-1; ConstArray[1]=1;  

           SizeArray[0]=8;  
      LowShdWidthArray[0]=0;  
     HiShdWidthArray[0]=0;  

     Local[0]: Lower=0 Upper=7 Size=8 Step=1
      
ret_crtrbl_	TIME=0.000000 	LINE=45	FILE=tasks.fdv
BufferHandlePtr=906e70; IsLocal=1

loadrb_ 14.2 Start of loading the buffer of distributed array remote elements

long loadrb_ (long BufferHeader[], long *RenewSignPtr);

BufferHeader header of remote element buffer.
*RenewSignPtr flag of repeated reloading of the buffer, which has already been loaded.
call_loadrb_	TIME=0.000000	LINE=45	FILE=tasks.fdv
BufferHeader=4dfd48; BufferHandlePtr=906e70; RenewSign=0;
     FromInitIndexArray[0]=0; FromInitIndexArray[1]=1;  
     FromLastIndexArray[0]=7; FromLastIndexArray[1]=1;  
          FromStepArray[0]=1;      FromStepArray[1]=1;  
       ToInitIndexArray[0]=0;  
       ToLastIndexArray[0]=7;  
            ToStepArray[0]=1;  

     ResInitIndexArray[0]=0; ResInitIndexArray[1]=1;  
     ResLastIndexArray[0]=7; ResLastIndexArray[1]=1;  
          ResStepArray[0]=1;      ResStepArray[1]=1;  

     ResInitIndexArray[0]=0;  
     ResLastIndexArray[0]=7;  
          ResStepArray[0]=1;  

ret_loadrb_	TIME=0.000000 	LINE=45	FILE=tasks.fdv

waitrb_ 14.3 Waiting for completion of loading buffer of distributed array remote elements

long waitrb_ (long BufferHeader[]);

BufferHeader header of remote element buffer.
call_waitrb_	TIME=0.000000 	LINE=45	FILE=tasks.fdv
BufferHeader=4dfd48; BufferHandlePtr=906e70;

ret_waitrb_	TIME=0.000000 	LINE=45	FILE=tasks.fdv

crtbg_ 14.6 Create group of remote element buffers

RegularAccessGroupRef crtbg_(long *StaticSignPtr, long * *DelBufSignPtr );

*StaticSignPtr flag of static buffer group creation.
*DelBufSignPtr flag of deleting all buffers from the group while the group deleting.
call_crtbg_	TIME=0.000000 	LINE=43	FILE=tasks.fdv
StaticSign=0; DelBufSign=1;
ret_crtbg_	TIME=0.000000 	LINE=43	FILE=tasks.fdv
RegularAccessGroupRef=906310;

insrb_    Insert remote element buffer in the group

long insrb_(RegularAccessGroupRef *RegularAccessGroupRefPtr, long BufferHeader[]);

*RegularAccessGroupRefPtr pointer to the buffer group.
BufferHeader header of the buffer to be inserted.
call_insrb_	TIME=0.000000 	LINE=45	FILE=tasks.fdv
RegularAccessGroupRefPtr=4e1210; RegularAccessGroupRef=906310; BufferHeader=4dfd48; BufferHeader[0]=906e70
ret_insrb_	TIME=0.000000 	LINE=45	FILE=tasks.fdv

loadbg_    Start of loading remote element buffers of the given group

long loadbg_(RegularAccessGroupRef long *RegularAccessGroupRefPtr, *RenewSignPtr);

*RegularAccessGroupRefPtr pointer to the buffer group.
*RenewSignPtr flag of repeated reloading of the buffer group, which has already been loaded.
call_loadbg_	TIME=0.000000 	LINE=43	FILE=tasks.fdv
RegularAccessGroupRefPtr=4e1210; RegularAccessGroupRef=906310; RenewSign=1
          FromInitIndexArray[0]=0; FromInitIndexArray[1]=1;  
          FromLastIndexArray[0]=7; FromLastIndexArray[1]=1;  
               FromStepArray[0]=1;      FromStepArray[1]=1;  
            ToInitIndexArray[0]=0;  
            ToLastIndexArray[0]=7;  
                 ToStepArray[0]=1;  
           
          ResInitIndexArray[0]=0; ResInitIndexArray[1]=1;  
          ResLastIndexArray[0]=7; ResLastIndexArray[1]=1;  
               ResStepArray[0]=1;      ResStepArray[1]=1;  
           
          ResInitIndexArray[0]=0;  
          ResLastIndexArray[0]=7;  
               ResStepArray[0]=1;  
           
          FromInitIndexArray[0]=0; FromInitIndexArray[1]=3;  
          FromLastIndexArray[0]=7; FromLastIndexArray[1]=3;  
               FromStepArray[0]=1;      FromStepArray[1]=1;  
            ToInitIndexArray[0]=0;  
            ToLastIndexArray[0]=7;  
                 ToStepArray[0]=1;  
           
          ResInitIndexArray[0]=0; ResInitIndexArray[1]=3;  
          ResLastIndexArray[0]=7; ResLastIndexArray[1]=3;  
               ResStepArray[0]=1;      ResStepArray[1]=1;  
           
          ResInitIndexArray[0]=0;  
          ResLastIndexArray[0]=7;  
               ResStepArray[0]=1;  
           
ret_loadbg_	TIME=0.010000 	LINE=43	FILE=tasks.fdv

waitbg_    Waiting for the loading completion of the remote element buffers of the given group

long waitbg_ (RegularAccessGroupRef *RegularAccessGroupRefPtr);

*RegularAccessGroupRefPt pointer to the buffer group.
call_waitbg_	TIME=0.000000 	LINE=45	FILE=tasks.fdv
RegularAccessGroupRefPtr=4e1210; RegularAccessGroupRef=906310;
ret_waitbg_ 	TIME=0.000000 	LINE=45	FILE=tasks.fdv


References

1. V.E.Denisov, V.N.Iliakov, N.V.Kovaleva, V.A.Krukov. Debugging DVM-program efficiency. Keldysh Institute of Applied Mathematics, Russian Academy of Science. Preprint N74, 1998.