One Approach for Parallel Algorithms Representation

This paper presents one approach for parallel algorithms representation. The proposed model is practice oriented and its name is AMPA (Agenda Model for Parallel Algorithms) due to basic blocks organization like a schedule. The model uses classical Master/Slave paradigm. One parallel merge sorting algorithm based on quick sort is presented with the discussed AMPA model and also three known representation approaches (description with natural language, pseudo code and PRAM). A survey of professional opinion about AMPA and other approaches is conducted. The results show that most of the interviewed people choose AMPA as the best way to understand the algorithm.


I. INTRODUCTION
URING THE LAST YEARS the parallel programming becomes one of the most popular techniques in application development.Development of processors architectures (SoC and Multi-core architectures) leads to significant advancement in software technologies.The possibilities lot of us to have multi processors on a small chip leads to the development of parallel applications which could effectively use these hardware resources.The scientific evolution also needs of computational resources and effective parallel programs.The complexity of software also increases and this is the reason that new usage models for program design are wanted.Some new parallel programming models for specific multi-thread architectures were designed to last year's [1,2,3].They are useful for designing parallel algorithms for specific architectures like NVidia GPU.
The main idea behind this research is to be proposed a practice oriented high-level model for parallel algorithms representation.The proposed model uses well known Master/Slave paradigm.
The AMPA defines two types of processes-Master and Slave.Master is always one but Slaves are many.The model consists of six graphical elements: 1) Process block (Master or Slave).If the algorithm contains only Master process this is not a parallel algorithm; 2) Operation blockthis block contains some operations: calculations or data exchanging; 3) Vertical arrow -this is a line which presents execution's flow in one process; 4) Horizontal arrowthis is a line which presents communications among processes; 5) Execution type blockthis is block which groups other blocks to point sequential or parallel execution part; 6) Parallel steps blockthis block groups other blocks whose parallel execution has to be repeated and it shows how many times the execution will be repeated.
The blocks of Master and Slave processes are situated in parallel lines.If two blocks of Master and Slave processes are at the same level, this means that these operations could be executed simultaneously.I.e. the position of every block shows when the block could be executed.Figure 1 shows an example of the parallel algorithm presented with AMPA.
Execution type blocks and Parallel steps block are drawn with dashed line.The AMPA model could be applied for multi-thread application.In this case: the Master process is "Process" but "Slave" processes are implemented as threads; horizontal arrows will be replaced with "read/write global data" (i.e.threads will work with data of its own process).

One Approach for Parallel Algorithms Representation
A. Bosakova-Ardenska D Fig. 1.Sample algorithm presented with AMPA III.APPLICATION OF AMPA Parallel merge sort uses a "divide and conquers" approach and data distribution maps into a binary tree [6].Data are divided into sub-lists and the process continues while lists reach size one.The proposed model is used for the representation of one modification of parallel merge sort algorithm.This modification uses quicksort algorithm [17] to sort sub-lists.The number of sub-lists is equal to the number of parallel processes (processors).The sub-lists are the same size.After their sorting with quicksort sub-lists are merged.The next figures (fig.2, fig.3, fig. 4 and fig.5) present this algorithm using respectively a description of the natural language, pseudo code, PRAM model [18] and proposed AMPA model.

Process initial data
Send\Receive data

Process initial data
Send\Receive data

Work with data
Receive processed data data initial data processed data The numbers that need to be sorted are distributed equally to the parallel processes (processors).Each process sorts its part of the numbers using the quicksort algorithm.Finally, the sorted parts are merged.
for i=1 to M- The variables and operations which are used in fig. 3 and fig. 4 are: Mnumber of parallel processes (processors); Nsize of the array for sorting (count of all numbers); arrarray for sorting; myarrlocal array for sub-list; global read()operation for global memory reading; global write()operation for global memory writing; => and <= -operations for data reading/writing.The main assumption for pseudo code description is that unsorted array belongs to process (processor) P0.The main assumption for PRAM description is that unsorted array is allocated into global memory.
The number of merge operations is equal to log 2 P, where P is a number of parallel processes, i.e. the number of sub-lists.This means that after first merge operation the number of processors which execute merge operation will decrease twice.For example: P = 8, number of parallel merge operations = 3 1 parallel merge operation: 4 processes will receive sorted sub-lists of other 4 processes and will execute merge operation; 2 parallel merge operation: 2 processes will receive sorted sub-lists of other 2 processes and will execute merge operation; 3 parallel merge operation: 1 process will receive sorted sub-lists of other process and will execute merge operation.After this step, a final sorted list will be reached.

IV. RESULTS
Discussed parallel sorting algorithm and its four representations are used for the short survey of opinion among: -students which study course Supercomputers, part of Computer Systems and Technologies speciality at University of Food Technologies, Plovdiv (Bulgaria); Centre for Supercomputing Applications) in assistance with participants of training school "Practical Programming Models and Skills on INTEL Xeon Phi for Scientific Research Engineers".This course was organized by NCSA (National Science and Technology Facilities Council (STFC) and Bayncore (U.K.).
More than fourteen people were included in the survey.The questions listed in current survey are: 1) Which of the four representations of the parallel algorithm helps you best to understand its idea?(a) Description with natural language (b) pseudo code (c) PRAM (d) AMPA 2) Which of the models for presentation of the parallel algorithm would you use if you need to implement it?Why?
The figures six and seven present results of the survey.Some of the answers to question "Why?" of question 2 (Which of the models for presentation of the parallel algorithm would you use if you need to implement it?Why?) are presented in table 1.

pseudo code
This representation is most understandable for me.This representation is "universal" code and could be used as a basic for a parallel program.
This representation is shortest and clearly described.

PRAM
The source code in this representation could be used for the skeleton of a program.
This representation is short.

AMPA
This representation is the best for idea understanding.The detailed description of the parallel algorithm is suitable for its precise implementation.
This model gives a good visual idea and thus it will decrease the count of the logical errors in implementation.

V. CONCLUSIONS AND FUTURE WORK
A novel approach for parallel algorithms representation with graphical elements is presented in this paper.One parallel merge sort algorithm is described using natural language, pseudo code, PRAM and AMPA.These four presentations were evaluated by students and participants of professional course for parallel programming.The results show that: -Preferred model is AMPA because it gives is good visual idea about algorithm (47% of interviewed people choose AMPA as the best way to understand the algorithm); -When the algorithm has to be implemented the AMPA and pseudo code models are most preferred (44%-AMPA and 26%-pseudo code).
In the future, the research will continue with developing a software tool for AMPA modelling.This tool will facilitate the use of the model.

Fig. 6
Fig.6 Results for question 1 of conducted survey