RAIL student won the IROS Lifelong Robotic Vision Challenge Finalist Award
ZHOU Ligang, Ph.D. student and TAN Xinyang, undergraduate (Advisors: Prof. XU Yangsheng and Prof. LAM Tin Lun), defeated many international teams in the Lifelong Robotic Vision Challenge competition of international top robot conference IROS, and won the finalist award. Other teams are from MIT, Imperial College, Tsinghua University, Hong Kong University of science and technology, Peking University and other famous universities.
IEEE International Conference on intelligent robots and systems (IROS) is one of the two top international conferences in the field of intelligent robots and automation. Each IROS conference and its exhibition have been held with great success, which plays an important role in promoting the technical development of related fields. Iros2019 is the 32nd session of IROS, which is jointly sponsored by IEEE, IEEE robotics and automation society, IEEE Institute of industrial electronics, Japan Robotics Society, Institute of instrumentation and control engineers and new technology foundation. At that time, about 4000 leading figures, top research team representatives and business people from all over the world in robotics, automation systems, artificial intelligence, and other fields will gather in Macao to participate in this grand gathering, jointly explore the cutting-edge technology in the field of intelligent robots and systems, share and discuss the latest progress in related fields. The conference includes keynote speeches, technical reports, seminars, competitions, forums, and exhibitions. The lifetime machine vision data set global challenge is part of the IROS 2019 competition.
Top international events in the hottest areas of robotics and artificial intelligence
Focus on the cutting-edge field of machine vision, aiming to give AI lifelong learning ability through competition exploration.
Humans: continuously learning knowledge and skills from environment and experience
Robots: need lifelong learning ability to adapt to changing environment and tasks Computer vision: one-time learning from pre-built data sets
Computer vision: one-time learning from pre-built data sets
In recent years, the new development of large data sets, such as Imagenet and coco, has greatly improved the computer vision technology based on deep learning. At present, the application of computer vision based on a large number of data sets for object detection, segmentation and recognition has also made outstanding contributions in smart home, security, industrial detection and other fields. However, robot vision brings new challenges to the development and landing of vision algorithm. The computer vision algorithm implicitly assumes that the data are independent and equally distributed, such as fixed categories and single simple tasks. Obviously, the semantic concept of real environment will change dynamically with time. In the practical application scenario, the robot needs to run in a variable environment for a long time, which requires the robot to have the ability of lifelong learning to adapt to the changes of the environment.
Continuous learning to add new knowledge
Previous work often based on a large number of data sets to get the pre-training model, and then fine-tuning or retraining according to the specific application data set. However, the final model often forgets the learned patterns (such as the recognition of previous tasks). This phenomenon is called "catastrophic forgetting" in deep learning. For the robot to face the dynamic scene, it is necessary to retrain the deployed model, which requires that the model has real memory ability, can effectively overcome the defect of catastrophic forgetting, and do not forget the old knowledge when learning new knowledge.
Data set introduction and comparison of existing data sets
Openloris-object data set aims to accelerate the research and application of lifelong/continuous/incremental learning and is currently committed to improving the ability of common objects in the family scene of continuous learning. The data is obtained in the office and home environment, and the video of the target object is actively recorded by the robot in different lighting, occlusion, camera object distance/angle, clutter level, and different scene information.
- Lighting: in practical applications, lighting can vary greatly over time, such as day and night differences. Our dataset is mainly collected from normal sunlight, including weak light and strong light, each accounting for 10% of the objects in each scene. As the light weakens, the task becomes more challenging.
- Occlusion: occlusion occurs when a part of an object is hidden by one or more objects, or when only a part of the object is displayed in the field of view. Occlusion makes classification tasks more challenging because it may hide the unique features of objects.
- Camera object angle/distance: the angle of the camera will affect the attributes detected from the object, and the distance will affect the size of the object.
- Clutter level: refers to the presence of other objects near the object under consideration. Multiple objects at the same time may interfere with the classification task.
- Scene information: environmental information is another factor in the learning process, for example, in the kitchen scene, it can improve the recognition ability of objects such as knives and cookers. Most previous studies have ignored the importance of scene information for context recognition.
Picture1: Zhou Ligang introduces his competition method
Zhou Ligang: I think lifelong learning is the key problem to be solved in the field of robotics. Our existing robots have been able to use the algorithm based on deep learning to perform the task of target recognition. For example, the robot that we do can grasp automatically in the warehouse, by combining the target recognition technology based on deep learning and the planning and sensing technology of the manipulator, can grasp without human. In the process of target recognition, a large number of data sets need to be collected for training, and the training process is relatively long, which will consume a lot of computing resources. If we need to identify new objects, we need to collect new datasets, merge them with the old datasets, and train them again. This method will waste a lot of computing resources. Lifelong learning hopes that when the robot only learns new data sets, it will retain the recognition ability of the old tasks. After literature research and algorithm analysis, I went to Professor Xu and Professor Lin to discuss the importance of this problem. They provided good suggestions and encouraged me to study this problem. Therefore, I signed up with Tan Xinyang, an undergraduate of the laboratory, to participate in the competition. We analyzed the current cutting-edge methods and selected two methods for implementation. One is based on the single model method, elastic weights consolidation (EWC). The Fisher information matrix is used to measure the importance of each parameter in the neural network for the previous tasks, and reduce the changes of the parameters with high importance, so as to realize the memory retention of the neural network. However, we find that this method is not suitable for this data set and it is not easy to implement good sampling. Therefore, we focus on the implementation of dynamic/scalable neural networks. One of the typical methods is learning without forcing (LWF), which divides the parameters of neural network into three parts: shared parameters, parameters for old tasks, and parameters for new tasks. By expanding the neural network and distilling the knowledge learned from the old tasks, this method can retain more memory of the old tasks. During the meeting, we found that the teams with good results used the improved method based on LWF, which verified our judgment.
Picture 2: certificate issued by IROS
Picture 3: group photo of winners