Topic Title: Semantic and Instance Segmentation in the Wild


Technical Area: Computer Vision, Machine Learning



Nowadays, accuracy of object detection has reached a higher level than ever before. The capability of instance localization inspires a great number of new applications. However, many challenges are still in the way. Two of them are how to label each pixel and how to segment each detected object from the background, called semantic and instance segmentation. Restricted to the limited annotations, typical semantic and instance segmentation systems can only handle a narrow slice of the vast visual world. A principle reason for this limitation is that state-of-the art segmentation algorithms require strong supervision and such supervision may be limited and expensive to collect for new categories.


Since using semantic and instance segmentation, summarizing the content of an image or video is applicable, it has been extensively studied in recent years. Research in this area mainly forces on partially supervised segmentation tasks.



Making robust image or video semantic segmentation and instance segmentation in natural scenes can enable a lot of interesting applications. The obvious one is compositing the matte image and associated image onto a new background, which will give us better options to present our commodities. Our targets of the research can be summarized as following:

(1) Given an arbitrary image or video (it may include human beings or not), the method should be able to make robust semantic and instance segmentation for each object;


(2) The method should be automatic with no user input;


(3) The method is able to run on servers in real-time (less than 50ms) when processing high resolution images;


(4) Real-time processing capability on mobile platforms is preferred.


Related Research Topics

Although researchers have been making some progress for image segmentation algorithms since last decade, there are still some major limitations for the current approaches. We expect more innovative technology breakthroughs to achieve our target. The research includes, but not limited to, the following topics:

(1) Image segmentation in complicated background: Existing object segmentation algorithms rely largely on color as the distinguishing feature, which make image segmentation very difficult in situations where the foreground and background color distributions overlap. However, this is actually a quite common case for natural images. Existing segmentation approaches do not generalize well to these typical everyday scenes.


(2) Handling different kinds of foreground objects: As Taobao has a huge number of various commodities, it’s a great challenge to make robust image segmentation for every single type of them. Some of the commodities may be very small, such as ring and others may have complicated shapes, such as fishing net. The capability of providing robust image segmentation for arbitrary objects in the wild remains doubtful.


(3) Establishing new datasets for image segmentation: Generating ground truth for image segmentation is very difficult and time-consuming. Most of the existing segmentation datasets, such as “PASCAL VOC” and “MSCOCO”, contain only a limited number of categories and the quality of those segmented masks are not perfect. And if we train the cutting-edge deep learning networks on small datasets, at some point they may overfit to them and no longer generalize to real scenes. Therefore, new image segmentation datasets with both quantities and qualities are strongly needed.


(4) Real-time image segmentation on mobile devices: Most of the existing image segmentation applications need smooth interactions with the users, which means we need a real-time processing speed. Traditional image segmentation methods may require seconds to get one image done. Even if deep learning methods can be very fast at testing stage (time-consuming at training stage), real-time speed (30~40 ms/image) is still far to reach. Besides the speed, there is also a trend to migrate segmentation algorithms to mobile devices because of the huge number of customers there. Above all, real-time mobile image segmentation applications will be the future.