Topic Title: Multi-agent Reinforcement Learning for Online Advertising System


Technical Area: Multi-agent System, Game Theory



In big data era, by analyzing vast amounts of data, modern online advertising system is able to recognize interests of target population in real time and automatically deliver the most matched ads. A suitable mechanism design would be able to simultaneously optimize all stakeholders including advertisers, users, and platform’s objectives and improve online marketing efficiency in large scale.


In Taobao advertising platform, different advertisers could pursue different key performance indicators such as clicks, conversions, page views and return of investment (ROI) by optimizing their marketing strategies such as bidding strategies, campaign management, etc. Users, advertisers (more precisely, the merchants) and the platform form a tripartite economic game. The platform is responsible for optimizing and balancing users’ experience, advertisers’ interests and its own revenue to achieve a win-win-win situation and build a sustainable e-commerce marketing system. In practice, however, the optimal or equilibrium strategies for advertisers are largely unknown, depending on various factors, including the availability of market information, resource constraints, performance objectives, irrationality of opponent advertisers, etc. As such, how to strategically optimize marketing strategies becomes a central question in Taobao advertising platform.


The research on optimal marketing strategies so far has been focused largely on statistical solutions, making the strong assumption that the market data is stationary (i.e. their probability distribution does not change over time in response to the current opponents’ behaviors). However, advertisers not only interact with the platform, but also, most critically, interact with each other. The changes in the strategy of one opponent would affect the strategies of other opponents, and vice versa. In addition, existing computational strategy methods are mainly concerned with micro-level optimization of one party (a specific advertiser or merchant) benefit. But given the competition environment, optimizing one party’s benefit may ignore and hurt other parties’ benefits. From the ad system’s viewpoint, the micro-level optimization may not fully utilize the dynamics of the ad ecosystem in order to achieve better social optimality. Therefore, advertisers marketing strategy optimization can be modeled in a multi-agent fashion.


Multi-agent learning arises in a variety of domains where intelligent agents interact not only with the environment but also with each other. It has an increasing number of applications such as autonomous robots control, distributed sensors optimization, and real-time bidding in competitive e-commerce and financial markets. Multi-agent research focuses on problems as competition, collaboration, and communications among intelligent agents. This interdisciplinary field involves research as reinforcement learning, deep learning and game theory.


Recent few years have witnessed significant progress in deep learning with its end-to-end learning paradigm, which has demonstrated superior performance on various predictive tasks. Deep reinforcement learning further provides methodologies to make decisions for optimizing long-term goals such as Atari games, Alpha GO, etc. In multi-agent framework, besides agent modeling, learning algorithm motivated by mechanism design is as well critical such as GSP (Generalized Second-price), VCG (Vickrey–Clarke–Groves), etc.


As the combination of multi-agent reinforcement learning and advertising game enlightens promising academic research and practical applications, we invite researchers who are experts in this field to research on optimizing all agents' policies in the e-commerce environment with god-like perspectives to maximize the platform's efficiency and create commercial values.



This project would research the marketing procedure in Taobao’s advertising system to model tripartite game among users, advertisers and the platform itself. We shall further explore advertisers’ collaboration and competition to optimize their marketing strategies. The algorithms aim at meeting the demands of users’ experience, advertisers’ marketing objectives and the platform traffic indices as overall optimization results.


Considering characteristics of Taobao’s ecosystem, this project would innovate a game-theoretical approach and utilize multi-agent deep reinforcement learning methods such as reward shaping, learning by demonstration, to model multiple agents with incentives and economic constraints in the system where collaboration to accomplish a large task and competition for individual incentives co-exist.


The research would cooperate with Alibaba to develop a more effective and sustainable advertising platform which consists of heterogeneous agents and supports real-time processing of large-scale data.


Related Research Topics

Related research may arise in several aspects, following lists several examples: