Developing a Trading Platform that Processes 325,000 Transactions per Second
    2018-03-27    Yu Zhenxin(虞振昕)

How Alibaba’s technology supports faster rollout while improving system integrity

Following the trend set by Alibaba in previous years, the 2017 Double 11 Global Shopping Festival shattered records once again. One of the new milestones was the peak number of transactions – 325,000 per second at one point.

Though this figure is a commercial achievement for Alibaba, it also represents an operational and technical challenge. Reaching transaction volumes of this magnitude means meeting the needs of dozens of business units within the Alibaba group while maintaining the stability and integrity of over 7,000 system applications. The biggest challenge brought by this kind of microservice distributed architecture is ascertainment of business requirements and impact, and quick full-scale implementation and launch. This requires an analysis of the requirements, their technical solutions, coding and testing, and the launch itself. This is also a very complicated cross-team process.

The difficulty is mainly reflected in:

  • The lack of a full-scale management mechanism, and low collaboration efficiency;

 

  • High barriers to platform entry and the inability of new businesses to try new modes without risking failure;

 

  • Poor separation between businesses and the platform, which is unable to support the development of the businesses;

 

  • Lack of reusable business assets.

 

Pain point 1: Lack of a full-scale management mechanism, and low collaboration efficiency

The lack of a full-scale perspective on the tracking and management of business requirements is observed as follows:

  • The description of requirements is often provided in one simple sentence. A detailed description is generally explained in the form of requirement description documents, emails, and organizational requirements clarification sessions.

 

  • The delivery of requirements is inefficient and requires repeated communication. After these requirements are clarified, they lack effective delivery vehicles in the later design process, and are passed down to developers with several inaccuracies, leading to repeated communication, clarification, and rework.

 

  • Platform capabilities are often unclear, and the evaluation of technical solutions takes a long time. When technicians evaluate the changes made to the platform due to the implementation of requirements, platform capabilities get obscured. Meanwhile, the business and platform codes get mixed together, causing difficulties in evaluating technical evaluation issues such as reuse of platform capabilities, and the number and nature of businesses or systems affected by the changes. This makes it necessary to analyze and evaluate these issues by going through the code repeatedly before business visualization.

 

  • Similar requirements are constructed repeatedly. Personnel changes or turnover make it hard to track or follow changes in requirements. Whenever similar requirements are encountered, the analysis, design, and coding need to be re-performed again.

 

Pain point 2: High barriers to platform entry, and the inability of new businesses to try new modes without risking failure

New businesses are bound by the laws of growth. In the early stages of business model verification, requirements are relatively simple, and trial and error through repeated releases is a viable strategy. However, the older trading platform had certain restrictions and constraints, and could not meet the new businesses’ fast access requirements like simplified ordering processes and independent launch.

 

Pain point 3: Poor separation between businesses and the platform, which is unable to support the development of the businesses

The older extension mechanism for business logic was the Java SPI (Service Provider Interface) approach, with more than 500 SPIs used for business extension on the trading platform. However, the specific implementation clauses of these SPIs were not organized and isolated according to business dimensions, which resulted in the businesses’ custom logic being scattered across the platform code. Whenever new requirements emerged, business parties could only modify the code in the platform's code library to customize the business logic. The lack of reasonable layering and domain abstraction led to the coupling of businesses and the platform, where business parties could not design and develop their own businesses with ease and efficiency.

 

Pain point 4: Lack of reusable business assets

The domestic Taobao and Tmall sites have a variety of business support tools such as pre-sales, shopping vouchers, and red envelopes. Faced with delivery for international markets, we wondered if these could be reused as they are, or if they required adaptation. This raised the issue of business asset reusability.

In response to the above pain points and problems, the Alibaba tech team designed a new trading system based on a Trade Modularization Framework, TMF 2.0. With a brand new architecture and enhanced planning and monitoring functionalities, TMF 2.0 is an altogether more powerful and more robust trading platform, supporting more rapid new business launches while ensuring system stability and integrity.

 .     .     .

Conceptualizing the TMF 2.0 Platform

A holistic assessment of the technical impact of new business operations must consider full-chain stability, link monitoring, potential interdependencies or conflicts, code corruption, quality control, and failure analysis.

To support more accurate assessments and better ongoing monitoring, TMF 2.0 was designed with six key functionalities in mind:

 

  1. Full-chain business visualization

Business analysts and developers can discuss requirements and analyze impact based on the visualized business diagram. The business rules you see are what are run on the system.

Since there are over 7000+ applications in Alibaba Group. A full-scale business visualization is necessary for efficient analysis.

  1. Demand structuring

After the business requirements have been analyzed, they’re further broken down based on capabilities under the relevant business architecture specification. Alibaba’s specifications are used not only for demand, but also for business processes and interfaces. This standardization reduces communication and time costs.

  1. Business configuration

Once the business definition has been visualized, business rules can be configured easily. After change requests have been approved, they can be applied rapidly.

 

  1. Business test integration

Because Alibaba’s E-commerce business is characterized by long links involving multiple products from multiple BUs, any changes in business requirements call for upstream and downstream testing through regression verification. If done manually, processes like regression verification and test data preparation result in heavy time costs. Instead, TMF2.0 provides an auto-regression testing function based on business data visualization to ensure sufficient test coverage for normal and abnormal scenarios.

 

  1. Business monitoring

In routine business maintenance, we need to constantly monitor the business dashboard. We not only need the overall indicator data, but the specific business’s indicator data.

 

  1. Business-oriented troubleshooting

Achieving a snapshot of the issue to quickly restore the business track and locate the problem when a business fails.

The framework of the platform was designed with these functionalities in mind.

.      .      .

Building the TMF 2.0Framework

The architecture of the platform was designed around three principles to ensure the autonomy and integrity at each level and of each logical unit:

  • Plug-in architecture for platform segregation and business customization

 

  • Unified business identity program

 

  • Segregation of management domain and operation domain

Plug-in Architecture for Platform Segregation and Business Customization

A plug-in style architecture was adopted to separate businesses from the platform. The platform provides a mechanism to enable the registration of business party’s plug-in packages during runtime. Business code is allowed only in a plug-in package, which is kept strictly separate from the platform code. The business packages’ code configuration library is also separate from the platform’s code library, and is provided to a container for loading through a two-party package.

There are three layers to this architecture – business specification, solution realization, and business customization. These are shown and described in more detail below.

 

 

Business Specification

This is the bottom layer, which deals with trade specifications. These include Alibaba’s trade business entity models, business domain definition, and bootstrap specification for different environments. Based on these specifications, we can easily build market-specific solutions, for example, a Chinese solution for Tmall or an AE solution for AliExpress.

This theoretical model allows performance and reuse of definition and specification tasks like business process definitions, business extension interface definitions, and business entity model specifications.

Solution Realization

The middle layer is solution realization, which is sub-divided into basic realization and market-specific solutions. Since Alibaba is an international business, it must construct different market-specific solutions with their own business rules and logic. These varied solutions choose from a combination of different basic solution realizations, each with their own processes and rules, many of which overlap. Here, relevant aspects of existing solutions can simply be reused rather than recreated, and more attention can be focused on market-specific requirements.

Business Customization

The top layer is business customization, developed keeping in mind the many subdivisions of customized rules with their own business logic even within even single markets. At this layer, the Alibaba tech team assemble customized business packages according to the underlying needs to realize distinct business logic and rules.

Though this architecture is complex, it allows for clearly demarcated responsibilities between different layers while consciously isolating the entire code. During new business deployments, the team first focuses on reuse of underlying business solutions before formulating solutions for different markets, and finally differentiates different parts of the business by solution categories.

.      .      . 

Unified Business Identity Program through the Full-chain

Separating businesses and the platform is not enough – sub-businesses must also be separated through a unified business identity (similar to an ID number) that remains unique throughout the entire transaction link. This is much more effective than simply using filtering, an approach used in traditional Service Provider Interface (SPI) architectures which does not distinguish between business identities at all.

The business identity needs to be abstracted through three dimensions – people, goods, and fields. Fields include market type, verticals, and channel source. Business processes and rules can be related once a unique business identity is generated.

The Alibaba team adopted a UIL-based business identification program to create these unified business identities. The overall design is based on standard abstraction models, with customized syntax and unified management models. It effectively identifies 99% of products through four dimensions – sample model, buyer model, seller model, and category model.

Business configuration and deployment can be managed uniformly according to these dimensions once a business identity has been assigned. For this, core elements such as configuration isolation, hot deployment, configuration rollback, and configuration determinism must be implemented.

 .      .      .

Segregation of management domain and operation domain

Once the business identity has been determined, it is necessary to define the business itself. This involves separating the business domain, where business logic is defined, and the operations domain, where it is executed.

This is necessary because business logic cannot rely on dynamic runtime calculations. Instead, it must be defined and visualized during a static period. During this static period, decisions can also be made to resolve any rule conflicts that appear in business definitions. During runtime, the business rules and conflict decision strategies defined in the static period are then strictly followed in the operations domain.

The following figure shows the architecture used for separating the business and operations domains.

 

The business domain defines the business life cycle, business identity, and business objects, which include business processes and business management. Once these operations are completed, configuration files are delivered to operations domain platforms, which automatically resolve them into commands for execution.

How the business domain defines business rules is a complex process. The three core elements in this process are the business identity, business superposition, and conflict decisions. Business superposition refers to identification of business rule conflicts in two dimensions, namely the horizontal and vertical dimensions as shown in the following figure.

 

The horizontal dimension is also known as the product dimension. Horizontal considerations include products being used by multiple vertical businesses (or vertical businesses using multiple products) and ascertaining whether a product is valid based on the given business session. For example, the validity of an e-voucher depends on whether the user has put it into use.

The vertical dimension is also known as the industry. Often a specific business object (such as a commodity) can help determine which industry it belongs to in a static period. Business rules of one industry are not automatically imposed on other industries. For example, the payment time-out period for different industries can be set to one day, but if Tmall Car changes its time-out period, this change would not impact other industries.

Determining the complexity of a business based on the quantity of rules involved is framed as a simple calculation:

Total business rules for one business session = one vertical business rule set + n horizontal business rule sets

Therefore, defining and managing businesses requires specific operations to determine cross-sections of vertical and horizontal businesses. This helps find the best solutions to conflicts that arise after these cross-sections have been determined.

.      .      .

The TMF2.0 Key Concept Models

The TMF 2.0 framework’s extensive functions are leveraged through a 2-pipeline usage model for business configuration and operations. This model is shown in the figure below.

 

During the business configuration pipeline, the team considers the domains covered by the relevant business, the functions and products available under the domains, and business points which can be expanded. This requires the support of the domain functionality model. The model allows targeted settings by revealing structured data on the capabilities and extension points within each domain in the platform. Template reveal is carried out through the key view template (shown in the lower part of the figure).

Once business configuration is complete, the configuration data is saved and delivered to the business operation pipeline.

The usage model outlined above greatly streamlines the planning and launching of new businesses.

Business definition is visual, manageable, and configurable across the entire trading platform. Visualization is provided for system capabilities, business processes and rules, and product superposition. Configuring business rules is simple and reliable, following the “What You See Is What You Get” (WYSIWYG) principle. All systems based on TMF 2.0 standards can immediately obtain business configurability without any need for additional development.

Additionally, a comprehensive configuration version management system is provided for quick implementation or rollback of business configurations. Multi-tenancy management enables complete isolation between different business systems through tenants, who are allowed their own data space and configuration push policies.

 .      .      .

Business dimension-oriented operation and maintenance protection

After businesses are separated from the platform and given a unified business identity, we can ensure reliability from the following dimensional perspectives:

1) Fault monitoring by business dimensions

In the absence of a unified business identity, transaction failure monitoring tends to be rough since only trading volume trends can be observed. The transaction volume, especially for new and small ones, is usually very small, making it difficult to monitor and analyze discrepancies in a timely manner. Due to this, faults are often detected when they’re flagged in customer complaints.

The TMF2.0 business system makes cross-dimensional grouping and differential monitoring easy since it utilizes a unified business identity.

 

2) Cluster deployment by business dimensions

The unified business identity also allows dimension-based clustered deployment. It can not only achieve physical isolation between key businesses and other businesses, but can also further satisfy the requirements of new small businesses for rapid iterative releases.

 

3) Stability guarantee by business dimensions

The unified business identity facilitates dimension-based implementation of differentiated promotion protections with varied QoS (Quality of Service) strategies. For example, targeted traffic restriction, dimension-based establishment and monitoring of performance baselines, and the construction of a query link monitoring view according to global or business dimensions.

 .      .      .

Results

The TMF 2.0 trading platform has transformed new business rollout at Alibaba in three areas. Firstly, it has drastically reduced business-need assessment time, reducing the average time to just 12 days. To take the 4S automotive business as an example, the process took a minimum of one month under the previous system, while under the new system it took just seven days. For the Wudaokou business, the process was reduced from two months to twelve working days, and for ele.me the time was cut from two weeks to two days.

Secondly, it has successfully decoupled the platform and the businesses operating on it. This benefits businesses because their customizations are stored only in the business package while the platform remains intact, making releases more flexible. In numerous cases, release for a single business entity has not required any other business entities to regress.

Finally, it has enabled the establishment of business asset libraries. As of now, over 50 business asset libraries have been accumulated, allowing greater efficiency in copying, adjusting, and launching new businesses.

(Original article by Yu Zhenxin虞振昕)
 
.      .      .
Alibaba Tech
First hand, detailed, and in-depth information about Alibaba’s latest technology → Search “Alibaba Tech” on Facebook