Theme Title: Data privacy protection based on processing data directly on mobile devices


Technical Area: Device AI, Federated learning



Recently, with the enactment of privacy laws and the outbreak of user data leakage, people are becoming more and more sensitive to upload personal privacy information.


Also, governments and companies have been paying more and more emphasis on user data. The European Union's general data protection ordinance (GDPR), the most widely covered global data privacy protection law so far, will take effect on May 25, 2018. The explosion of Facebook user information leaks leads to a huge trust crisis for the company.


There are many user scenarios related to privacy data in Taobao. For example, lots of tracking data covering user use behaviors are uploaded to servers for personalized recommendation after authorized by users.


In order to avoid uploading user privacy data to servers, we need to process data directly on mobile devices. Some progresses have been made. For example, we can use deep learning algorithm to preprocess user privacy data and extract non-privacy features. After that, we only upload those features to servers. Google has proposed an idea that we can do Federated learning and process user sensitive data.



Process user sensitive data directly on mobile devices and eliminate the potential leakage of user data.


Make use of Federated learning to reduce the cost of computation and improve the effect of personalized recommendation.


Related Research Topics

Standard machine learning approaches require centralizing the training data on one machine or in a datacenter. Data are collected from mobile devices and sent to clouds to get processed. However, it makes little use of the capability of mobile devices.


Google proposed a new term of machine learning last year, which is called Federated Learning. It enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.


The way it works can be described as follows: your device downloads the current model, improves it by learning from data on your phone, and then summarizes the changes as a small focused update. Only this update to the model is sent to the cloud, using encrypted communication, where it is immediately averaged with other user updates to improve the shared model. All the training data remains on your device, and no individual updates are stored in the cloud.


Federated Learning allows for smarter models, lower latency, and less power consumption, all while ensuring privacy. And this approach has another immediate benefit: in addition to providing an update to the shared model, the improved model on your phone can also be used immediately, powering experiences personalized by the way you use your phone.