Research & Development

Deep Learning Model Compression

Background

In order to reduce network traffic due to the explosive increase in data expected with the arrival of the IoT era, there is a need for an AI edge technology that performs high-speed data processing at the terminal end (edge) before the data flows into the network. Although deep learning is attracting attention as a highly accurate AI technology, expensive hardware is needed to handle the large memory and computational cost requirements. Therefore, OKI is focusing on the R&D of a “model compression technology” that can suppress performance degradation even if the configuration (model) of the neural network used for deep learning is reduced. The technology will minimize memory and computational cost requirements, thus reduce hardware requirements and make the deployment of deep learning technology easier.

Features

A method called channel pruning, which removes redundant neurons from a neural network, is one example of OKI’s deep learning model compression technology. In order to perform pruning, criteria are needed to assess the redundancy of each neuron that make up the model. Although various neuron evaluation criteria such as the sum of the absolute weight parameters values and reconstruction errors are being proposed one after another at leading machine learning conferences, OKI is developing its own criteria based on its latest research. For example, OKI has developed a method to identify and reduce low importance neurons by constructing an attention network between each layer to estimate and learn the importance of neurons. The method succeeded in reducing the number of parameters and floating-point operations by 90.8% and 79.4%, respectively while suppressing the accuracy degradation of a deep learning model called ResNet-50 (*1) to about 1% (CIFAR-10 (*2): recognition benchmark result for an image data set). With such technology, memory use and computational cost can be reduced while suppressing accuracy degradation and makes implementation of high-precision and high-speed AI on the edge side possible.

  • *1 Abbreviation for Residual Network, a neural network model devised by Kaiming He in 2015 that extracts advanced and complex features through layering.
  • *2 An image data set commonly used for benchmarking object recognition.

Deep Learning Model Compression

  • Use attention network to calculate the importance of neurons that make up a layer
  • Reduce low important neurons to greatly reduce the amount of computation while maintaining accuracy

Deep Learning Model Compression Image

Special Contents

      Contact

      Contact