S. V. Volobuev, I. V. Zotov
The work addresses the barrier synchronization, one of the most common communication procedures that arise in the operation of various multiprocessor computer systems. A scalable hardware-based mechanism is proposed to provide cost-efficient synchronization implementation making it possible for arbitrary sets of parallel branches residing in the system to communicate to each other at a higher speed with no respect to the branch-to-unit allocation. Multidimensional mesh-connected systems are considered in the present paper.
The developed mechanism is based upon the distributed coordinating environment consisting of a set of one-bit-wide concurrent hardware slices. Each slice is capable of transferring barrier completion tags for a set of consequent barriers. The set of slices is divided into several groups that are enabled one after another thus making it possible to decrease the number of connections between cells of the coordinating environment owing to the allocation of the same connections to different groups. Resulted little synchronization speed decrease is in part compensated for pipelining the synchronization stages for neighbor groups.