INTRODUCTION

Neural networks have become more and more important not only in the software algorithms but also critical in the hardware implementations. For software algorithms, CNN have numerous advantages such as weight reduction and high accuracy. In the use of CNN, image processing and classification have turn out to be the working well due to the low error rate. In order to implement CNNs with lower cost of energy and faster computation, the emerging hardware, RRAM, has been chosen to run several neural network tasks. Previous works show that the RRAM crossbar arrays can perform inner product required for CNNs, and that RRAM crossbar-based computing system (RCS) can boost both performance and power efficiency [2]. Similarly, other works also choose RRAM to demonstrate great results on neural network acceleration, with significant performance improvement and energy saving [1].

We propose binary weights in our project, which is actually three levels of weight, but we still use “binary” for convenience, with weight compression and binary training in the software algorithms. We propose three states for weight parameters, “-1”, “0” and “1”, respectively. In addition, five possible levels of input and activation are demonstrated, “0, 0.25, 0.5, 0.75 and 1”, respectively. Inner product is included in the convolution layer and fully connect layer, and we mean pooling when implementing the pooling layer. Our software team have finished binarized 2-layer Multi-Layer Perceptron (MLP) or we can call it 2-layer neural network. Fig. 1 shows the structure of the binarized 2-layer MLP, which include weight compression technic that can map floating-point weight values to three levels of discrete weights, but the accuracy of the original neural network on MNIST data set is still kept. We use the same weight compression technic in our binarized 2-layer neural network to design the Binary CNN. The Binary CNN still has high accuracy in MNIST data set. Fig. 2 shows the test results of 2-layer neural network and CNNs, for both floating and binary precisions.

For hardware implementation, we focus on the mean pooling function, binary weight storage and inner product in the convolutional part. Fig. 3 shows the hardware structure that can implement the inner product in convolution layer and mean pooling layer of the Binary CNN algorithm. We use RRAM array and sense amplifier to implement the function of the first and second convolution layer. In order to deal with the weights “-1, 0, 1” which has been trained by software, we store the three values in two bits and represent them by binary behaviors. Fig. 4 illustrates how we can implement inner product of three-level weight by two RRAM arrays (three-level weight data stored as two bits data) where “1” match to “10”, “-1” match to “01” and “0” match to “00”, respectively. Then basis on the concept of computing in memory, we realize the dot product function which dominated the operations in convolutional layer. In addition, the sense amplifier circuit can also map to the activation function in the convolution layer. In addition, our hardware team also design the circuit called Mean pooling Circuit (MPC) to deal with the software idea. Fig. 5 shows the MPC, which is able to output five different levels basic on the different inputs, can completely realize the mean pooling layer function in the software algorithm. By the way, MPC work as a digital to analog converter and convert the value, which stored for feature maps to analog states. Why we need to do this is that the concept of computing in memory is by the analog way and can reduce the memory access times and speed up the whole CNN algorithm.

In conclusion, we have successfully implemented binary MPL with three states of weight, (“1”, 0 and “-1”) for both 1-layer and 2-layer. We have extended the binary 2-layer MPL to Binary CNN by adding convolution layers and pooling layers. The results show that our binarization on neural networks (i.e. MLPs and CNNs), more precisely compressing the floating weight values to binary weight values of 3 state, have been working well in keeping the accuracy of original analog neural networks. Furthermore, we use RRAM arrays and some hardware circuit to implement inner product in convolution layer and mean pooling layer of our Binary CNN. Our simulation results show that our Binary CNN has a great potential of hardware implementation with RRAM arrays.

Reference

[1] P. Chi et al., “Prime: A novel processing-in-memory architecture for neural network computation in RRAM-based main memory,” in ISCA, vol. 43, 2016.

[2] B. Li et al., “Merging the interface: Power, area and accuracy cooptimization for rram crossbar-based mixed-signal computing system,” in DAC, 2015, p. 13.


Fig. 1


Fig. 2


Fig. 3


Fig. 4


Fig. 5

心得感想

楊恩雨:

Hardware and software integration for neural network algorithms is a very hard project. Thanks to all my teammates, senior lab member, our advisor, we could somehow come up with several good ideas and algorithms for out project. In our team, I was responsible for the software design. Since neural network algorithms designed for hardware, like RRAM, is still a new domain of study, there are not so many resources on the internet nor from papers. I have tried very hard to design the algorithm, Binary CNN, which have the potential of hardware implementation. Because I needed to comprehend most details in CNN algorithms to modify them, I not only learn how to use CNN, but also understand the mechanisms in it. Lastly, I especially thank our advisor, Meng-Fan Chang, for training us reading papers and listening to our presentations in meetings every week in the first semester. I learned a lot during these meetings, and I believe the skills I learned from these meetings will be very useful for me even after I graduate.

胡佳萱:

To begin with, I really appreciate that I have the opportunity to participate in Prof. Meng-Fan (Marvin) Chang’s (張孟凡教授) Memory Design Lab and to cooperate with all the amazing lab members. Within about 14 month learning, I started to know more about how to do a good research. From the very beginning, we do several paper studies and researches in order to gain both fundamental and innovative knowledge about the Neural Network and memory. This is also the first time for me to understand how a formal conference or journal paper is like. Then, we started to do some simulations according to the paper or the open sources provided online. By mimicking and practicing, I got to know more about the structures and parameters of the software algorithms of the Neural Network. Finally, we teamed up with the hardware team and were looking forward to implementing our Neural Network software algorithms on the memory hardware. Though we face challenges sometime, all of our team members made every efforts to fulfill our goal. I really learned a lot not only from our professor and seniors in our lab, but also from my best team members.

Especially thanks to the seniors in Arizona State University, they discussed software algorithms and provided codes for us to do the simulations. I would also like to thank all of you, Prof. Meng-Fan (Marvin) Chang (張孟凡教授), Wei-Hao Chen (陳韋豪學長), En-Yu Yang (楊恩雨), Yen-Cheng Chiu (邱硯晟) and Chun-Ying Li (李俊穎). Without your helps, I would not be able to undergo this wonderful journey about Neural Network and memory!

李俊穎:

I really appreciated to participate in professor Marvin Chang’s Lab to understand and design the memory. I was assigned ReRAM Neuromorphic Neural Network as my research topic. At the beginning, I totally knew nothing about this new type memory and “CNN” algorithm. Fortunately, prof. Marvin Chang and all the members in his lab were enthusiastic and willing to guide me. In this one-year project, I started to know how to do a good research and understood how to effectively interpret formal conference or journal paper. I not only enhanced the knowledge in the field of memory but also tried to design new circuit to handle the requirements of software team. This was not an easy task, but we still finished the whole task. I wanted to thank for all of my team members, En-Yu Yang, Yen-Cheng Chiu, Jia-Xuan Hu, also Prof. Marvin Chang, upperclassmen Wei-Hao Chen. Without your help and efforts, I might couldn’t finish my research this year. It was very glad to work with you. Thank you so much!

邱硯晟:

I really appreciated to complete this one-year project. At the beginning, I knew nothing about what “CNN”, “Computing in memory” and “RRAM” were. Thanks to prof. Meng-Fan (Marvin) Chang (張孟凡教授) and all members in the Memory Design Lab. With their help , I gradually understood the concept of computing in memory and how it interacted with RRAM. In this year, by reading lots of papers and asking the questions to professor , I not only accumulated my knowledge in the memory field but also tried to design our own circuit to deal with the requirement provided by the software team. I truly wanted to thank to my hardware team partner, who is also my best friend, 李俊穎. Without his help and efforts, we might not have completed our job before the deadline. This cooperation was very important for me to realize how the software team did when designing their algorithm and mechanism, and what the problems are when we worked together. Thanks software team members 胡佳萱and 楊恩雨, both of you were super amazing! Hope our topic can enter the final competition and gain some price.