Personal tools

Hamidreza Kasaei

From IEETA

Jump to: navigation, search


Subject Interactive Open-Ended Learning for 3D Object Recognition
Advisor Luís Seabra Lopes, Ana Maria Tomé
Group Intelligent Robotics and Systems
Status PhD student
Starts 2012/10/01
Ends 2018/03/31
Country Iran
Projects
Past projects
Events
Proposals
Personal page

S. Hamidreza Kasaei joined the IEETA and IRIS Labs as a Ph.D. student in October 2012, to work on FP7 Project named RACE: Robustness by Autonomous Competence Enhancement under the supervision of Luís Seabra Lopes and Ana Maria Tomé . During his Ph.D., Hamidreza is working on conceptualizing 3D objects, such that robots’ performance on object recognition improves with accumulated experiences and conceptualizations. This project is based on 3D environment perception and open-ended learning, as well as human-robot interaction for labeling objects and providing feedback. Before his Ph.D., Hamidreza completed his Master Degree in Computer Engineering field of Artificial Intelligence at the University of Isfahan. His thesis entitled “Face Recognition Using Single Normal Reference Image and Feature Statistics" was written under the supervision of Amirhassan Monadjemi. Besides, he worked on Middle size soccer robot and Humanoid robot and obtained different ranks in RoboCup competitions. The current version of his CV is available here, and his full list of publications and corresponding BibTeX files can be found on his Google scholar account.


Latest News


  • December 2017: Bin-Picking Synthetic Dataset is now available here! This dataset contains RGB and depth images captured from multiple views in five different physically feasible bin picking scenarios.
  • November 2017: Our paper Perceiving, Learning, and Recognizing 3D Objects: An Approach to Cognitive Service Robots got accepted at AAAI2018.
  • June 2017: A free and open source implementation of the GOOD descriptor is now available on my github account.
  • May 2017: Restaurant Object Dataset v.1.0 (RGB-D) is now available here! It contains 306 views of one instance of each category (Bottle, Bowl, Flask, Fork, Knife, Mug, Plate, Spoon, Teapot, and Vase), and 31 views of Unknown objects views (e.g. views that belong to the furniture).
  • April 2017: New journal paper accepted at Neurocomputing journal: Towards Lifelong Assistive Robotics: A Tight Coupling between Object Perception and Manipulation.
  • Jan 2017: I will be a research intern at ICVL Lab, Imperial Colledge London, UK.

Research

My research interests focus on the intersection of robotics, machine learning and machine vision. I am interested in developing algorithms for an adaptive perception system based on interactive environment exploration and open-ended learning, which enables robots to learn from past experiences and interact with human users. This is necessary for assistive robots, not only to perform manipulation tasks in a reasonable amount of time and in an appropriate manner, but also to robustly adapt to new environments by handling new objects. I investigate active perception, where robots use their mobility and manipulation capabilities not only to gain the most useful perceptual information to model the world, also to predict the next best view for improving object detection and manipulation performances. I have evaluated my work on different robotic platforms including PR2, robotic arms, and humanoid robots. My research is summarized by the following projects:


‡ Perceiving, Learning, and Recognizing 3D Objects: An Approach to Cognitive Service Robots

There is growing need for robots that can interact with people in everyday situations. For service robots, it is not reasonable to assume that one can pre-program all object categories. Instead, apart from learning from a batch of labelled training data, robots should continuously update and learn new object categories while working in the environment. This paper proposes a cognitive architecture designed to create a concurrent 3D object category learning and recognition in an interactive and open-ended manner. In particular, this cognitive architecture provides automatic perception capabilities that will allow robots to detect objects in highly crowded scenes and learn new object categories from the set of accumulated experiences in an incremental and open-ended way. Moreover, it supports constructing the full model of an unknown object in an on-line manner and predicting next best view for improving object detection and manipulation performance. We provide extensive experimental results demonstrating system performance in terms of recognition, scalability, next-best-view prediction and real-world robotic applications.

• Publications:
¤ AAAI2018 (to appear) ¤-----¤ ¤ ICCV2017-WS (Multi-view 6D Object Pose Estimation and Camera Motion Planning)
• Videos:
¤ Demo1: online object model construction ¤-----¤ Demo2: object recognition and manipulation ¤-----¤ Demo3: object view recognition

‡ Hierarchical Object Representation for OpenEnded Object Category Learning and Recognition (Local Open-Ended LDA)

Service robots are expected to be more autonomous and work effectively in human-centric environments. This implies that robots should have special capabilities, such as learning from past experiences and real-time object category learning and recognition. This paper proposes an open-ended 3D object recognition system which concurrently learns both the object categories and the statistical features for encoding objects. In particular, we propose an extension of Latent Dirichlet Allocation to learn structural semantic features (i.e. topics), from low-level feature co-occurrences, for each category independently. Moreover, topics in each category are discovered in an unsupervised fashion and are updated incrementally using new object views. In this way, the advantage of both the local hand-crafted and the structural semantic features have been considered in an efficient way. A set of extensive experiments has been performed to assess the performance of the proposed Local Open-Ended LDA in terms of descriptiveness, scalability, and computational time. Experimental results show that the overall classification performance obtained with Local Open-Ended LDA is clearly better than the best performances obtained with the state-of-the-art approaches. Moreover, the best scalability, in terms of number of learned categories, was obtained with the proposed Local Open-Ended LDA approach, closely followed by a Bag-of-Words (BoW) approach. Concerning computational time, the best result was obtained with BoW, immediately followed by the Local Open-Ended LDA approach.

• Publications:
¤ NIPS2016
• Videos:
¤ Demo1: open-ended object recognition ¤-----¤ Demo2: Washington scene dataset

‡ GOOD: A Global Orthographic Object Descriptor for 3D Object Recognition and Manipulation

The Global Orthographic Object Descriptor (GOOD) has been designed to be robust, descriptive and efficient to compute and use. GOOD descriptor has two outstanding characteristics: (1) Providing a good trade-off among: descriptiveness, robustness, computation time, memory usage; (2) Allowing concurrent object recognition and pose estimation for manipulation. The performance of the proposed object descriptor is compared with the main state-of-the-art descriptors. Experimental results show that the overall classification performance obtained with GOOD is comparable to the best performances obtained with the state-of-the-art descriptors. Concerning memory and computation time, GOOD clearly outperforms the other descriptors. Therefore, GOOD is especially suited for real-time applications. The current implementation of GOOD descriptor supports several functionalities for 3D Object Recognition and Object Manipulation.

• Publications:
¤ Pattern Recognition Letters(2016) ¤-----¤ IROS2016
• Source Codes:
¤ GitHub ¤-----¤ Part of PCL1.9
• Videos:
¤ Demo1 ¤-----¤ Demo2

‡ Towards Lifelong Assistive Robotics: A Tight Coupling between Object Perception and Manipulation

In this work, we propose a cognitive architecture designed to create a tight coupling between perception and manipulation for assistive robots. This is necessary for assistive robots, not only to perform manipulation tasks in a reasonable amount of time and in an appropriate manner, but also to robustly adapt to new environments by handling new objects. In particular, this cognitive architecture provides perception capabilities that will allow robots to, incrementally learn object categories from the set of accumulated experiences and reason about how to perform complex tasks. To achieve these goals, it is critical to detect, track and recognize objects in the environment as well as to conceptualize experiences and learn novel object categories in an open-ended manner, based on human-robot interaction. Interaction capabilities were developed to enable human users to teach new object categories and instruct the robot to perform complex tasks. A Bayesian learning approach with a Bag-of-Words object representation are used to acquire and refine object category models. Perceptual memory is used to store object experiences, feature dictionary and object category models. Working memory is employed to support communication purposes between the different modules of the architecture. A reactive planning approach is used to carry out complex tasks. To examine the performance of the proposed architecture, a quantitative evaluation and a qualitative analysis are carried out. Experimental results show that the proposed system is able to interact with human users, learn new object categories over time, as well as perform complex tasks.

• Publications:
¤ Neurocomputing (to appear) ¤-----¤ RoboCup2016 ¤-----¤ IROS2015 (online dictionary learning)
• Videos:
¤ Demo1: clear table ¤-----¤ Demo2: serve a meal ¤-----¤ Demo3: office exploration

‡ Interactive Open-Ended Learning for 3D Object Recognition: An Approach and Experiments

Nowadays robots already utilize 3D computer vision algorithms broadly to perform complex tasks such as object detection and manipulation. However, most 3D vision algorithms are not efficient for using in real-time and open-ended robotic domains. Many problems have already been understood and solved successfully; however, many issues remain open. Open-ended object recognition is one of these issues waiting for many improvements. This work presents an efficient approach capable of learning and recognizing object categories in an interactive and open-ended manner. In this work, open-ended implies that the set of object categories to be learned is not known in advance. The training instances are extracted from on-line experiences of a robot, and thus become gradually available over time, rather than at the beginning of the learning process. This paper focuses on two state-of-the-art questions: (1) How to automatically detect, conceptualize and recognize objects in 3D scenes in an open-ended manner? (2) How to acquire and use high-level knowledge obtained from the interaction with human users, namely when they provide category labels, in order to improve the system performance?

• Publications:
¤ Journal of Intelligent & Robotic Systems(2014) ¤-----¤ ¤ Journal of Robotics and Autonomous Systems(2016) ¤-----¤ IROS2014
• Videos:
¤ Demo1: open-ended object recognition ¤-----¤ Demo2: RACE

‡ Learning to Grasp Familiar Objects using Object View Recognition and Template Matching

Robots are still not able to grasp all unforeseen objects. Finding a proper grasp configuration, i.e. the position and orientation of the arm relative to the object, is still challenging. One approach for grasping unforeseen objects is to recognize an appropriate grasp configuration from previous grasp demonstrations. The underlying assumption in this approach is that new objects that are similar to known ones (i.e. they are familiar) can be grasped in a similar way. However finding a grasp representation and a grasp similarity metric is still the main challenge in developing an approach for grasping familiar objects. In this work, interactive object view learning and recognition capabilities are integrated in the process of learning and recognizing grasps. The object view recognition module uses an interactive incremental learning approach to recognize object view labels. The grasp pose learning approach uses local and global visual features of a demonstrated grasp to learn a grasp template associated with the recognized object view. A grasp distance measure based on Mahalanobis distance is used in a grasp template matching approach to recognize an appropriate grasp pose. The experimental results demonstrate the high reliability of the developed template matching approach in recognizing the grasp poses. Experimental results also show how the robot can incrementally improve its performance in grasping familiar objects.

• Publications:
¤ IROS2016
• Videos:
¤ Demo1: learn to grasp familiar object ¤-----¤ Demo2: grasp different objects


Publications

Articles in international journals listed in the ISI

Other articles in journals

Chapters in books

Articles in conference proceedings