Abstract
We present a new method and architecture to merge efficiently IEEE 754-2008 decimal rounding with significand BCD addition and subtraction. This is a key component to improve several decimal floating-point operations such as addition, multiplication and fused multiply-add. The decimal rounding unit is based on a direct implementation of the IEEE 7542008 rounding modes. We show that the resultant implementations for IEEE 754-2008 Decimal64 (16 precision digits) and Decimal128 (34 precision digits) formats reduce significantly the area and latency required for significand BCD addition/subtraction and decimal rounding in previous high-performance decimal floating-point adders.