Fortran DVM language description. Part 1 (1-4)

CDVM$ PROCESSORS R1( 2 )

Processors

REAL A(100,100)

CDVM$ DISTRIBUTE A (BLOCK, *) ONTO R1

A( 1: 50,1:100)

A(51:100,1:100)

Fortran DVM - contents

Part 1 (1-4)

Part 2 (5-6)

Part 3 (7-12)

Part 4 (Annexes)

created: april 2001

- last edited 06.10.02 -

1 Introduction

1.1 Parallel programming models

Three parallel programming models are now prevalent in large, scalable systems (see Fig. 1.1): message-passing model (MPM), shared-memory model (SMM) and data parallel model (DPM).

Message-passing model. In message passing model each process has its own local address space. Common data processing and synchronization are performed by message passing. Summarizing and standardization of different message-passing libraries resulted in MPI [1] standard development.

Shared-memory model. In shared memory model processes share common address space. Because there are no limitations to common data usage a programmer must explicitly specify common data and regulate an access to the data using synchronization tools. In high level languages logically independent threads are defined at the level of functional tasks or at level of loop iterations. Summarizing and standardization of shared memory models resulted in OpenMP [2] standard development.

Data parallel model. In data parallel model a process notion is absent and, as result, explicit message passing or explicit synchronization are absent. In the model data are distributed on nodes (processors) of computing system. Sequential program is translated by a compiler either in message passing model program or shared memory model program (Fig. 1.1). The computations are distributed according to the owner-computes rule: each processor performs only the computations of own data, that is the data, allocated on the processor.

In comparison with two previous models DPM has obvious advantages. A programmer is freed from tedious efforts of distributing the global array onto local arrays, explicit management of sending and receiving message, or explicit synchronization of parallel processes. But application area of the model is the object of research. The research results show that a performance of many algorithms of scientific computing in DPM model is comparable with the performance of the realization in MPM and SMM models.

HPF1 [3] development was the first attempt of DPM standardization. MPM and SMM model standardization is done on the base of large experience of implementations and practical applications summarizing. HPF1 standard was developed on the base of theoretical researches and 2-3 experimental implementations. Moreover, the standard was based on automatic parallelization of calculation and automatic synchronization of shared data access. First HPF1 implementations showed inefficiency of the standard for modern methods of calculations (in particular, for irregular calculations). In the next version of the standard HPF2 [2] a step was done to "manual" control of parallel execution performance. In particular, the distribution of computations and common reduction variable specification were defined.

Fig. 1.1. Three models of parallel programming

1.2 DVM-approach to parallel program development

DVM-system provides unified toolkit to develop parallel programs of scientific-technical calculations in C and Fortran 77.

DVM parallel model. DVM parallel model is based on data parallel model. The DVM name reflects two names of the model - Distributed Virtual Memory and Distributed Virtual Machine. These two names show that DVM model is adopted both for shared memory systems and for distributed memory systems. DVM high level model allows not only to decrease cost of parallel program development but provides unified formalized base for supporting Run-Time System, debugging, performance analyzing and prediction.

Languages and compilers. As distinct from HPF the goal of full automation of computation parallelization and the common data access synchronization is not assumed in DVM system. Using high level specifications, a programmer has full control of parallel program performance. On the other hand, in the process of Fortran DVM design and development the compatibility with the subsets of HPF1 and HPF2 standards was kept.

Unified parallel model is built in C and Fortran 77 language on the base of the constructions, that are "transparent" for standard compilers, that allows to have single version of the program for sequential and parallel execution. C-DVM and Fortran DVM compilers translates DVM-program in C or Fortran 77 program correspondingly, including parallel execution Run-Time Support system calls. So only requirement to a parallel system is availability of C and Fortran 77 compilers.

Execution and debugging technique. Unified parallel model allows to have unified Run-Time Support system for both languages and, as result, unified system of debugging, performance analyzing and prediction. There are following modes of DVM-program execution and debugging:

Sequential execution and debugging using standard C and Fortran 77 compiler tools.
Pseudo-parallel execution on work station (WINDOWS and UNIX environment).
Parallel execution on parallel computer.

Following debugging modes are provided in pseudo-parallel and parallel execution modes:

automatic verification of parallel directive correctness;
tracing and comparing the results of parallel and sequential execution;
accumulation and visualization of data trace;
performance information accumulation and parallel execution performance prediction.

2 Language overview

2.1 Programming model and model of parallelism

Fortran DVM language is the extension of Fortran 77 language [2]. The extension is implemented via special comments, named directives. FDVM directives may be conditionally divided on three subsets:

Data distribution (sections 2, 3, 4, 8, 9)
Computation distribution (sections 5, 7)
Remote data specification (section 6)

FDVM parallel model is based on specific form of data parallelism called SPMD (Single Program, Multiple Data). In this model the same program is executed by all the processors concerned, but each processor performs its own subset of statements in accordance with the data distribution.

First, in FDVM model a user defines multidimensional arrangement of virtual processors, which sections data and computations will be mapped on. The section can be varied from the whole processor arrangement up to a single processor.

Then the arrays to be distributed over processors (distributed data) are determined. These arrays are specified by data mapping directives (section 4). The other variables (distributed by defaults) are mapped by one copy per each processor (replicated data). A value of replicated variable must be the same one on all the processors concerned. Single exception is the variables in parallel constructions (see sections 5.1.3, 5.1.4 and 7.5).

FDVM model defines two parallelism levels:

data parallelism on processor arrangement section;
task parallelism: independent computations on sections of processor arrangement

Data parallelism is implemented by distribution of tightly nested loops over the processors (section 5). Each iteration of such loop is executed on one processor entirely. The statements located outside parallel loop are executed according own computation rules (section 5.2).

Task parallelism is implemented by distribution of data and independent computations over sections of processor arrangement (section 7).

When calculating the value of own variable, the processor may need in values of as own as other (remote) variables. All remote variables must be specified in remote data access directives (section 6).

2.2 Syntax of FDVM directives

The syntax of FDVM directives is described using a Backus-Naur form and the following notations:

is	is by definition
or	an alternative construct
[ ]	encloses optional construct
[ ]…	encloses an optionally repeated construct which may occur zero or more times
x-list	x [ , x ]…

Syntax of the directive.

directive-line	is CDVM$ dvm-directive
	or DVM$ dvm-directive*

dvm-directive	is specification-directive
	or executable-directive

specification-directive	is processors-directive
	or align-directive
	or distribute-directive
	or template-directive
	or pointer-directive
	or shadow-directive
	or dynamic-directive
	or inherit-directive
	or remote-group-directive
	or reduction-group-directive
	or task-directive
	or heap-directive
	or asyncid-directive

executable-directive	is realign-directive
	or redistribute-directive
	or parallel-directive
	or remote-access-directive
	or shadow-group-directive
	or shadow-start-directive
	or shadow-wait-directive
	or reduction-start-directive
	or reduction-wait-directive
	or new-value-directive
	or prefetch-directive
	or reset-directive
	or parallel-task-loop-directive
	or map-directive
	or task-region-directive
	or end-task-region-directive
	or on-directive
	or end-on-directive

	or f90-directive
	or asynchronous-directive
	or end-asynchronous-directive
	or asyncwait-directive

Constraints:

A directive-line follows the rules of fixed form comment lines.
A specification-directive may appear only where a specification statement may appear.
An executable-directive may appear only where executable statement may appear.
Any expression, included in specification directive, must be the specification expression (see Annex 1, s.2.4).

No statements may be interspersed within a continued directive. A directive line must not appear within a continued statement. An example of a directive continuation follows. Note that column 6 must be blank, except when signifying continuation.

CDVM$ ALIGN SPACE1(I,J,K)
CDVM$* WITH SPACE(J,K,I)

3 Virtual processor arrangements. PROCESSORS directive

The PROCESSORS directive declares one or more rectangular virtual processor arrangements.

Syntax.

processors-directive	is PROCESSORS processors-decl-list

processors-decl	is processors-name ( explicit-shape-spec-list )
explicit-shape-spec	is [ lower-bound : ] upper-bound

lower-bound	is int-expr

upper-bound	is int-expr

The intrinsic function NUMBER_OF_PROCESSORS( ) can be used to determine the number of real processors, provided to a program.

It is possible to use several virtual processor arrangements of different shape if the number of processors in every arrangement is equal to the value of function NUMBER_OF_PROCESSORS( ). If two virtual processor arrangements have the same shape, then corresponding elements of the arrangements are referred to the same virtual processor.

Example 3.1. Declaration of virtual processor arrangements.

CDVM$	PROCESSORS  P( N )
CDVM$	PROCESSORS  Q( NUMBER_OF_PROCESSORS( ) ), 
CDVM$*	            R(2, NUMBER_OF_PROCESSORS( )/2)

The value N has to be equal to the value of the function NUMBER_OF_PROCESSORS ( ).

The processor arrangements are local objects of the procedure. Data arrays with COMMON and SAVE attributes can be mapped on the local processor arrangements, if whenever the procedure is called, the local processor arrangement has the same shape.

4 Data mapping

FDVM supports distribution by blocks (equal and non-equal), inherited distribution, dynamic array distribution and distribution via alignment.

4.1 DISTRIBUTE and REDISTRIBUTE directives

Syntax.

distribute-directive	is dist-action distributee dist-directive-stuff
	or dist-action [ dist-directive-stuff ] :: distributee-list

dist-action	is DISTRIBUTE
	or REDISTRIBUTE

dist-directive-stuff	is dist-format-list [ dist-onto-clause ]

distributee	is array-name

dist-format	is BLOCK
	or GEN_BLOCK ( block-size-array )
	or WGT_BLOCK ( block-weight-array , nblock )
	or *

dist-onto-clause	is ONTO dist-target

dist-target	is processors-name [( processors-section-subscript-list )]

processors-section-subscript	is [ subscript ] : [ subscript ]

subscript	is int-expr
nblock	is int-expr

block-size-array	is array-name
block-weight-array	is array-name

Constraints:

A length of dist-format-list must be equal to the rank of each distributee to which it applies. That is, distribution format must be specified for every array dimension.
The number of distributed dimensions of the array (format is not specified as *) has to be equal to the number of dimensions of dist-target.
The array mentioned as a block-array-name in GEN_BLOCK specification must be one-dimensional integer array, with size equal to the size of corresponding dimension of processor arrangement, and a sum of its element values is equal to the size of distributed dimension.
The array mentioned as a block-weight-array in WGT_BLOCK specification must be one-dimensional array of type DOUBLE PRECISION.
Either GEN_BLOCK format or WGT_BLOCK format may appear in dist-format-list but not both of them.
REDISTRIBUTE directive can be applied to arrays with DYNAMIC attribute only.
dist-directive-stuff can be omitted in DISTRIBUTE directive only. In that case distributed array can be used after REDISTRIBUTE directive execution only.

The ONTO clause specifies the virtual processor arrangement or its section. If ONTO clause is omitted, than array distribution is performed by base virtual processor arrangement, that is a parameter of program startup. When REDISTRIBUTE directive without ONTO clause is executed in ON-block, the array is distributed on the section of processor arrangement of this ON-block (see section 7).

Several arrays (A1, A2,…) can be distributed at the same mode by the single directive of the form:

CDVM$	DISTRIBUTE dist-directive-stuff :: A1, A2, …

In that case the arrays must have the same rank, but can have different sizes of dimensions.

Let us consider distribution formats for one dimension of the array (one-dimensional array A(N)) and for one dimension of the processor arrangement (one-dimensional array R(P)). Multi-dimensional distributions are considered in section 4.1.5.

4.1.1 BLOCK format

A block of [(N-1)/P] +1 elements are allocated on each processor. It is possible for some ratio between N and P that several last processors do not contain any the array elements.

Example 4.1. Distribution by BLOCK format.

		A	B	C
	R(1)	1	1	1
		2	2	2
CDVM$ PROCESSORS R( 4 )		3	3	3
			4

REAL A (12), B(13), C(11)	R(2)	4	5	4
		5	6	5
		6	7	6
CDVM$ DISTRIBUTE A (BLOCK) ONTO R			8

	R(3)	7	9	7
CDVM$ DISTRIBUTE (BLOCK) ONTO R :: B		8	10	8
		9	11	9
			12
CDVM$ DISTRIBUTE C (BLOCK)
	R(4)	10	13	10
		11		11
		12

4.1.2 GEN_BLOCK format

Distribution by blocks of different sizes allows affecting on processor loading balance for algorithms performing different volume of computations for different parts of arrays.

Let NB( 1:P ) be an integer array. The following directive

CDVM$	DISTRIBUTE  A( GEN_BLOCK(NB))  ONTO  R

splits array A on P blocks. The block i of size NB( i ) is mapped on processor R( i ).

Here

Example 4.2. Distribution by blocks of different size.

			A
		R(1)	1
CDVM$ PROCESSORS R( 4 )			2

		R(2)	3
	INTEGER BS(4)		5
			6
	REAL A(12)
		R(3)	7
			8
CDVM$ DISTRIBUTE A ( GEN_BLOCK( BS ) ) ONTO R			9
			10

	DATA BS / 2, 4, 4, 2 /	R(4)	11
			12

4.1.3 WGT_BLOCK format

The WGT_BLOCK format specifies distribution by blocks according to their relative "weights".

Let WGT_BLOCK(WB, NBL) format is specified.

WB(i) defines weight of i -th block for 1£ i £ NBL. The blocks are distributed on P processors with balancing of sums of block weights on every processor. The condition

P £ NBL

must be satisfied.

The processor weight is defined as a sum of weights of all the blocks distributed on it . The array dimension is distributed proportionally to processor weights.

BLOCK format is special space of WGT_BLOCK(WB,P) format, where WB(i) = 1 for 1£ i £ P and NBL = P.

GEN_BLOCK format is special case of WGT_BLOCK format with some precision.

The example 4.2 can be rewritten using WGT_BLOCK format in the following way.

Example 4.3. Distribution by blocks according to weights.

CDVM$ PROCESSORS R( 4 )
      DOUBLE PRECISION WB(12)
      REAL A(12)
CDVM$ DISTRIBUTE A ( WGT_BLOCK( WB, 12 ) ) ONTO R
      DATA WB / 2., 2., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2. /

In the example 4.3 P = 4 and distribution is identical to the example 4.2.

As distinct from distribution by non-equal blocks, distribution by WGT_BLOCK format can be performed for any number of processors from range 1 £ P £ NBL. For given example the size of processor array R can be varied from 1 to 12.

4.1.4 Format of *

Format of “*” means, that a dimension is localized on each processor (non-distributed or local dimension).

4.1.5 Multidimensional distributions

For multidimensional distributions mapping format is specified for each dimension. The following correspondence is established between dimensions of the array to be distributed and the processor arrangement.

Let the processor arrangement has n dimensions. Number the dimensions not formatted as * from left to right d₁, ..., d_k. Then dimension d_i will be mapped on i-th dimension of processor arrangement. The condition k£ n must be satisfied.

Example 4.4. One-dimensional distribution.

Example 4.5. Two-dimensional distribution.

4.2 Distribution of dynamic arrays

4.2.1 Dynamic arrays in Fortran 77 program

There are no features to deal with dynamic arrays in Fortran 77. So a user is forced to simulate dynamic memory by so called working arrays. The dynamic memory is declared as a one-dimensional array of large size. The dynamic arrays of different shape are defined as continuous segments in the working array.

Example 4.6. Working array usage.

	REAL  W(100000)
	READ (6 , *)   N,  M
C	the arrays of size N*N and M*M are required in the program
	CALL SUB1(W(1), N, W(1+N*N), M)
	END

	SUBROUTINE SUB1(A, N, B, M)
	DIMENSION  A(N,N),  B(M,M)
	.   .   .
	END

An analysis of available programs showed that there is no certain discipline when dealing with simulated dynamic arrays. In particular, array allocation in memory is not fixed explicitly. An access to dynamic array is implemented by reference to the working array. Therefore the compiler can't to determine the array shape.

4.2.2 Dynamic arrays in FDVM model. POINTER directive

Suggested model is a subset of dynamic array model in Fortran 90. It allows to execute this model without modifications in three program environments:

Sequential program in Fortran 77,
Parallel program in FDVM,
Parallel program in HPF.

For dynamic arrays, distributed by default, FDVM allows to use any methods of dynamic memory simulation. For the dynamic arrays, distributed by DISTRIBUTE and ALIGN directives, FDVM defines the following discipline of allocation and usage of dynamic arrays.

* All explicitly distributed arrays are allocated in dynamic memory pool with the name HEAP

REAL HEAP (MAXM)

where MAXM is a number of dynamic memory words.

* Data type and the rank of the dynamic array are fixed by the following FDVM directive

pointer-directive	is type , POINTER ( dimension-list ) :: pointer-name-list

dimension	is :

pointer-name	is scalar-int-variable-name
	or int-array-name

The specification type defines data type of the dynamic array. The following constraints exist for the variables specified in POINTER directive.

Only value of function ALLOCATE or a value of other variable with POINTER attribute can be assigned to the variable with POINTER attribute.
A number of dimensions in left and right sides of the statement assigning a value to the variable with POINTER attribute must be the same.

* The sizes of each dimension and dynamic array allocation in HEAP are fixed by the following statement

pointer = ALLOCATE ( sdim ,... )

where

pointer	- is a pointer to integer variable (scalar or array element) with POINTER attribute
sdim	- is a integer one-dimensional array of the size ndim. ndim is a rank of multidimensional array, allocated in dynamic memory HEAP. The sdim( i ) value defines the size of i-th dimension. The size of allocated segment is equal to sdim( 1 ) ´ sdim( 2 ) ´ ....´ sdim( ndim ).

Integer function ALLOCATE returns a number of initial element of allocated segment of dynamic memory HEAP. The ALLOCATE function is programmed by a user, therefore it can have additional arguments besides obligatory sdim one.

* Only the following kind of a dynamic array reference is allowed in a procedure, where dynamic array allocation is performed

HEAP(pointer)

Moreover, this reference can be used only as the actual argument of a function or subroutine call.

Let several pools with identifiers ID₁ , …, ID_n. are used in a program. There is no necessity to rewrite the program with single pool HEAP of dynamic memory. It is enough to write following specification

CDVM$ HEAP ID₁ , …, ID_n

But every pool ID_i must satisfy requirements above for HEAP pool. The pool can keep only the arrays, distributed by DISTRIBUTE and ALIGN directives.

4.2.3 DISTRIBUTE and REDISTRIBUTE directives for dynamic arrays

Distribution of dynamic arrays is performed by DISTRIBUTE and REDISTRIBUTE directives. The syntax of directives is extended in the following way:

distributee	is . . .
	or pointer-name

If a scalar or the array with POINTER attribute are specified as distributee, then distribution is postponed up to execution of ALLOCATE function, assigning a value to the pointer. Instead of ALLOCATE function, the array creation and distribution is done according to DISTRIBUTE directive formats.

The REDISTRIBUTE directive for dynamic array can be performed only after execution of the ALLOCATE function, assigning a value to correspondent POINTER variable.

If dynamic array pointer is an element of a pointer array, the dynamic array can be distributed by REDISTRIBUTE directive only. As only a reference to pointer name is allowed in REDISTRIBUTE directive, the element of the pointer array should be assigned previously to scalar variable-pointer. The array with PT(I) pointer can be distributed by the following statements sequence:

	P1 = PT( I )
CDVM$	REDISTRIBUTE P1( BLOCK, BLOCK )

It is necessary to transform the program from example 4.5 in the following FDVM program.

Example 4.7. Mapping of FDVM dynamic arrays.

	REAL  HEAP(100000)
	INTEGER  ALLOCATE
CDVM$	REAL, POINTER (:,:) :: PA, PB
	INTEGER  PA,  PB
C	descriptors of dynamic arrays
	INTEGER  DESCA(2),  DESCB(2)
CDVM$	DISTRIBUTE (BLOCK,BLOCK) :: PA, PB
C	arrays of size N*N and M*M are required in the program
	READ (6 , *)  N,  M
C	shape of the first array
	DESCA(1) = N
	DESCA(2) = N
C	allocation and distribution of the first array
	PA = ALLOCATE(DESCA,1)
C	shape of the second array
	DESCB(1) = M
	DESCB(2) = M
C	allocation and distribution of the second array
	PB = ALLOCATE(DESCB, N*N+1)
	CALL SUB1(HEAP(PA), N, HEAP(PB), M)
	END

	SUBROUTINE SUB1(A, N, B, M)
	DIMENSION  A(N,N),  B(M,M)
CDVM$	DISTRIBUTE (BLOCK,BLOCK) ::  A, B
	.   .   .
	END

	FUNCTION  ALLOCATE(DESC, P)
	INTEGER  DESC(2), P
	ALLOCATE = P
	END

Other examples of dynamic array distribution see in section 7.7.

4.3 Distributing by aligning

Aligning array A with distributed array B brings in accordance to each element of array A an element or a section of array B. When array B is distributed array A will be distributed simultaneously. If element of B is mapped on the processor, the element of A, corresponding to element B via alignment, will be also mapped on the same processor.

Method of mapping via alignment performs the following two functions.

The same distribution of the arrays of the same shape on one processor arrangement does not always guarantee allocation of corresponding elements on the same processor. It forces to specify remote access (see section 6) where it is possible not exist. Only alignment of corresponding elements of the arrays guarantees their allocation on the same processor.
Several arrays can be aligned with the same array. Redistribution of one array by REDISTRIBUTE directive will cause corresponding redistribution of the array group.

4.3.1 ALIGN and REALIGN directives

The following directives describe array aligning:

align-directive	is align-action alignee align-directive-stuff
	or align-action [ align-directive-stuff ] :: alignee-list
align-action	is ALIGN
	or REALIGN

align-directive-stuff	is ( align-source-list ) align-with-clause

alignee	is array-name

align-source	is *
	or align-dummy

align-dummy	is scalar-int-variable

align-with-clause	is WITH align-spec

align-spec	is align-target ( align-subscript-list )

align-target	is array-name
	or template-name

align-subscript	is int-expr
	or align-dummy-use
	or *

align-dummy-use	is [ primary-expr * ] align-dummy [ add-op primary-expr ]

primary-expr	is int-constant
	or int-variable
	or ( int-expr )

add-op	is +
	or -

Constraints:

A length of align-source-list must be equal to the rank of aligned array.
A length of align-subscript-list must be equal to the rank of align-target.
Each align-dummy may appear at most once in an align-subscript-list.
An array-name mentioned as an alignee may not appear as a distributee.
REALIGN directive may be applied only to the arrays specified in DYNAMIC directives.
align-directive-stuff can be omitted only in ALIGN directive. In such a case the distributed array can be used only after execution of REALIGN directive.

Let the alignment of two arrays is specified by the directive

CDVM$	ALIGN  A(d₁,…,d_n)  WITH  B(ard₁,…,ard_m)

where d_iis specification of i-th dimension of aligned array A,
ard_j is specification of j-th dimension of base array B.

If d_i is specified by integer variable I, then there must be at most one dimension of array B, specified by linear function ard_j = a*I + b.

Let i-th dimension of array A has bounds LA_i : HA_i , and j-th dimension of array B, specified by linear function a*I + b , has the bounds LB_j : HB_j. Since the parameter I is defined on the value set LA_i : HA_i, then the following conditions must be satisfied:

a*LA_i + b ³ LB_j, a* HA_i + b £ HB_j

If d_iis * , the i-th dimension of array A will be local on each processor independently from array B distribution (it is analogue of local dimension in DISTRIBUTE directive).

If ard_iis * , then the array A will be replicated along j-th dimension of the array B.

If ard_i is int-expr, then array A is aligned with the section of the array B.

Example 4.8. Aligning arrays

	REAL A(10),B(10,10),C(22,22),D(20),E(20),F(10),G(20),H(10,10)
CDVM$	DISTRIBUTE  B (BLOCK,BLOCK)
CDVM$	DISTRIBUTE  D (BLOCK)
C	aligning with the array section
C	(the vector is aligned with the first line of A)
CDVM$	ALIGN  A( I )  WITH  B(1,I) 
C	replication of the vector aligning it with each line 
CDVM$	ALIGN  F( I )  WITH  B(*,I) 
C	the matrix is collapsed; 
C	each matrix column corresponds to the vector element
CDVM$	ALIGN  C(*,I)  WITH  D( I ) 
C	alignment of vector  with vector using stretching
CDVM$	ALIGN  E( I )  WITH  D(2*I) 
C	alignment vector with vector using reverse ordering
CDVM$	ALIGN  G( I )  WITH  D(-I+21) 
C	alignment matrix with matrix using rotation and stretching 
CDVM$	ALIGN  H(I,J)  WITH  C(2*J,2*I)

Several arrays (A1, A2,…) can be aligned with one B array in the same manner by the directive of the form:

CDVM$	ALIGN  (d₁,…,d_n)  WITH  B(ard₁,…,ard_m) :: A1, A2, …

At that A1, A2… arrays must have the same rank (n), but can have different sizes of dimensions.

Let the sequence of alignments A f1 B f2 C, be specified; f2 is alignment of the array B with the array C, and f1 is alignment of the array A with the array B. The arrays A and B are considered as aligned with array C by definition. The array B is aligned by function f2 directly and array A is aligned by composite function f1(f2) indirectly. Therefore applying REALIGN directive to the array B doesn't cause redistribution of array A.

Generally a set of ALIGN specifications is a set of trees. At that every tree root must be distributed by DISTRIBUTE or REDISTRIBUTE directives. When REDISTRIBUTE directive is executed, the whole alignment tree is redistributed.

4.3.2 TEMPLATE directive

If values of linear function a*I + b are beyond base array dimension, it is necessary to define a dummy array - referred to as an alignment template – using the following directive.

template-directive	is TEMPLATE template-decl-list

template-decl	is template-name [ ( explicit-shape-spec-list ) ]

Then it is necessary to align both arrays with the template. The template is distributed by DISTRIBUTE and REDISTRIBUTE directives. The template elements are mapped among processors without real memory allocation. They specify a processor on which elements of aligned arrays must be mapped.

Consider the following example.

Example 4.9. Aligning with template.

	REAL  A(100), B(100), C(100)
CDVM$	TEMPLATE  TABC(102)
CDVM$	ALIGN  B( I )  WITH  TABC( I ) 
CDVM$	ALIGN  A( I )  WITH  TABC( I+1 ) 
CDVM$	ALIGN  C( I )  WITH  TABC( I+2 ) 
CDVM$	DISTRIBUTE  TABC ( BLOCK )
	.   .   .
	DO 10  I = 2, 98
	   A(I) = C(I-1) + B(I+1)
10	CONTINUE

To avoid exchange between processors, it is necessary to allocate the elements A(I), C(I-1) and B(I+1) on the same processor. It is impossible to align arrays C and B with array A, because alignment functions I-1 and I+1 cause bounds violation of array A. Therefore the template TABS is declared. The elements of arrays A, B and C, which must be allocated on the same processor, are aligned with the same element of the template.

4.3.3 Aligning dynamic arrays

To specify dynamic arrays alignment the syntax of ALIGN and REALIGN directives is extended in the following way.

alignee	is . . .
	or pointer-name

align-target	is . . .
	or pointer-name

If a variable with POINTER attribute is specified as aligned array (alignee) in ALIGN directive, then directive executing is postponed up to execution of ALLOCATE function, defining the variable value. The REALIGN directive may be executed only after execution of the ALLOCATE function.

Example 4.10. Aligning dynamic arrays.

	REAL  HEAP(100000)
CDVM$	REAL, POINTER (:,:) :: PX, PY
	INTEGER  PX, PY, DESC(2)
CDVM$	ALIGN  PY(I,J )  WITH  PX(I,J) 
CDVM$	DISTRIBUTE  PX (BLOCK,BLOCK)
		.   .   .
	PX = ALLOCATE(DESC, ...)
	PY = ALLOCATE(DESC, ...)
		.   .   .
CDVM$	REDISTRIBUTE  PX (BLOCK,*)

Let sequence of alignments by ALIGN directives is specified

P₁ f₁ P₂ f₂ . . . f_N-1 P_N

where:
f_i is aligning function,
P_i is a pointer to dynamic array.

Then the order of dynamic array allocation (ALLOCATE function execution) must be reverse, i.e.:

P_N = ALLOCATE(...)
. . .
P₂ = ALLOCATE(...)
P₁ = ALLOCATE(...)

If dynamic array pointer is an element of a pointer array, the dynamic array can be aligned by REALIGN directive only. As only a reference to pointer name is allowed in REALIGN directive, the element of pointer array should be assigned previously to scalar variable-pointer. The array with pointer PT(I) can be aligned with the array with pointer PT(J) by the following statements sequence:

	P1 = PT(I)
	P2 = PT(J)
CDVM$	REALIGN  P1(I,J)  WITH  P2(I+1,J)

4.4 DYNAMIC and NEW_VALUE directives

The arrays, redistributed by REDISTRIBUTE and REALIGN directives, should be specified in DYNAMIC directive.

dynamic-directive	is DYNAMIC alignee-or-distributee-list

alignee-or-distributee	is alignee
	or distributee

If after REDISTRIBUTE and REALIGN directive execution new values will be assigned to the arrays, additional (optimizing) directive NEW_VALUE must precede these directives.

new-value-directive

is NEW_VALUE

The directive cancels reassigning the redistributed array values.

If the array is specified in DYNAMIC directive and there is no DISTRIBUTE or ALIGN specification for it, its distribution is postponed up to the first REDISTRIBUTE or REALIGN statement. It is neccessary in two cases.

distribution (alignment) of dynamic array with pointer, being a pointer array element;
array distribution on processor arrangement section, which parameters are defined during computation.

4.5 Default distribution

If the data are not specified in DISTRIBUTE or ALIGN directive , they are distributed on each processor (full replication). The same distribution can be defined by DISTRIBUTE directive with format of * for each dimension. But in that case the access to the data will be less effective.

Fortran DVM - contents

Part 1 (1-4)

Part 2 (5-6)

Part 3 (7-12)

Part 4 (Annexes)