Abstract
Summary form only given. It has long been thought that research into collective communication algorithms on distributed-memory parallel computers has been exhausted. This project demonstrates that the implementations available as part of widely-used libraries are suboptimal. We demonstrate this through the implementation of the "reduce-scatter" collective communication and comparison with the MPICH implementation of MPI. Performance on a large cluster is reported.