Xiyang Sun, Yue Zhao and Sheng Zhang, Tsinghua University, China
Convolutional neural networks have been continuously updated in the last decade, requiring more diverse floating-point formats for the supported domain specific architectures. We have presented VARFMA, a tunable-precision fused multiply-add architecture based on the Least Bit Reuse structure. VARFMA optimizes the core operation of convolutional neural networks and supports a range of precision that includes the common floating-point formats used widely in enterprises and research communities today. Compared to the latest standard baseline fused multiply-add unit, VARFMA is generally more energy-efficient in supporting multiple formats, achieving up to 28.93% improvement for LeNet with only an 8.04% increase in area. Our design meets the needs of the IoT for high energy efficiency, acceptable area, and data privacy protection for distributed networks.
Fused Multiply-add, Tunable-precision, Distributed Network, Energy Efficiency, IoT