Parallel matrix transpose algorithms on distributed memory concurrent computers

Jaeyoung Choi; J.J. Dongarra; D.W. Walker

doi:10.1109/SPLC.1993.365559

Proceedings of Scalable Parallel Libraries Conference

Parallel matrix transpose algorithms on distributed memory concurrent computers

Year: 1993, Pages: 245,246,247,248,249,250,251,252

DOI Bookmark: 10.1109/SPLC.1993.365559

Authors

Jaeyoung Choi, Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA
J.J. Dongarra, Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA
D.W. Walker, Math. Sci. Sect., Oak Ridge Nat. Lab., TN, USA

Abstract

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.<>

Like what you’re reading?

Already a member?

Get this article FREE with a new membership!

SIMD algorithms for matrix multiplication on the hypercube
Parallel Processing Symposium, International
Quadtree algorithms for template matching on mesh connected computer
1993 Computer Architectures for Machine Perception
Matrix-matrix multiplications and fault tolerance on hypercube multiprocessors
Proceedings of International Conference on Application Specific Array Processors (ASAP '93)
Asynchronous transpose-matrix architectures
Proceedings International Conference on Computer Design VLSI in Computers and Processors
ScaLAPACK++: an object oriented linear algebra library for scalable systems
Proceedings of Scalable Parallel Libraries Conference
Matrix transpose on meshes with wormhole and XY routing
Parallel and Distributed Processing, IEEE Symposium on
On computing the determinant and Smith form of an integer matrix
Proceedings 41st Annual Symposium on Foundations of Computer Science
The general matrix multiply-add operation on 2D torus
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium
Communication structures for asynchronous algorithms on distributed MIMD computers
1993 Euromicro Workshop on Parallel and Distributed Processing
Schur complement factorizations and parallel O(log N) algorithms for computation of operational space mass matrix and its inverse
Proceedings of the 1994 IEEE International Conference on Robotics and Automation

Parallel matrix transpose algorithms on distributed memory concurrent computers

Authors

Abstract

Related Articles