Topic Title: Model compression and encryption deploy
Technical Area:Heterogeneous Model Compression、Data Security and Application Security
- In the era of machine intelligence, model services will be everywhere: cloud devices, mobile terminals, IOT and other devices
- Lower storage and computation cost: require for smaller model size
- Model and Data security: model encryption, application encryption
Model deploy at Alibaba
- Heterogeneous model training platform: PAI, Spark, TensorFlow, Caffe...etc.
- The model compression is at the exploratory stage: the Low-bits quantization neural network is mainly used in training process . It lacks general model compression technology in matrix / tensor decomposition, sparse network or other non neural networks
- Model encryption deploy: including model encryption and application encryption. Model encryption is currently blank; application encryption has C, C++, and Java engineering encryption, but the technology is conventional. There is lacks of universal model encryption and deployment technology.
1. Model compression
There should be some universal model compression technology, and the compress rate should be over 70% reduction in model file size. Forecast accuracy influence after compression is no more than 1% After compression, the performance is increased by more than 50%.
2. Model and application encryption deploy:
There is some universal model encryption deployment technology. The performance loss of encryption should be less than 30% than that of not encrypted.
Related Research Topics
In recent years, with the rapid development of artificial intelligence, the machine learning model service is not only deployed on the server side, but also on more intellectualized mobile phones and IOT devices. And the deep neural network (CNN, DNN, RNN, LSTM, etc.) are also remarkable in the field of artificial intelligence, such as computer vision, speech recognition, autopilot, time series analysis and so on.
The size of the model is getting larger and larger, which brings great challenges to the storage of embedded end models， and the time of model inference becomes longer and latency becomes larger and larger.
In order to solve these problems and enable the model to be easily deployed to mobile terminals, IOT devices, we need to compress and accelerate the depth models.
Common model compression techniques research
1. Compression techniques during the training process: Existing model compression techniques mainly include compression techniques such as neuron pruning, synapse pruning, quantization, network structure transformation, adaptive Huffman and so on.
2. General post-training compression technology: Tensorflow TFLite of Google is TensorFlow 's lightweight solution for mobile and embedded devices. TensorFlow Lite uses many techniques to achieve low latency such as optimizing. But there are some limitations: only can deal with tensorflow model files, other types of model files can not be processed. And the performance remains to be improved.
Therefor, other general model compression techniques is necessary.
Encrypted Model Deployment Research
It includs encryption of model data files and application encryption deployment. It is a hot direction of research in the field of information and data security.
1. Model encryption:
The data of model are extracted, where symmetric and asymmetric mixed encryption is carried out. The deep learning neural network model is partitioned into two parts, and the key part is put on the server side to inference, and so on.
2. Application protection:
The application is converted into a binary file or an assembly file, in addition of other Reverse engineering protection: anti-disassembly analysis, file integrity check, compression, flower instruction, code license and so on.
We can try to research these technology combinations, or explore better model and application encryption deployment technology.