- Unsupervised RGB-to-Thermal Domain Adaptation via Multi-Domain Attention NetworkLu Gan, Connor Lee, and Soon-Jo Chung2023 IEEE International Conference on Robotics and Automation (ICRA) 2023
This work presents a new method for unsupervised thermal image classification and semantic segmentation by transferring knowledge from the RGB domain using a multi-domain attention network. Our method does not require any thermal annotations or co-registered RGB-thermal pairs, enabling robots to perform visual tasks at night and in adverse weather conditions without incurring additional costs of data labeling and registration. Current unsupervised domain adaptation methods look to align global images or features across domains. However, when the domain shift is significantly larger for cross-modal data, not all features can be transferred. We solve this problem by using a shared backbone network that promotes generalization, and domain-specific attention that reduces negative transfer by attending to domain-invariant and easily-transferable features. Our approach outperforms the state-of-the-art RGB-to-thermal adaptation method in classification benchmarks, and is successfully applied to thermal river scene segmentation using only synthetic RGB images. Our code is made publicly available at https://github.com/ganlumomo/thermal-uda-attention.
- Online Self-Supervised Thermal Water Segmentation for Aerial Vehicles (under review)Connor Lee, Jonathan Gustafsson Frennert, Lu Gan, and 2 more authors2023 IEEE International Conference on Robotics and Automation (ICRA) 2023
- Multitask Learning for Scalable and Dense Multilayer Bayesian Map InferenceLu Gan, Youngji Kim, Jessy W. Grizzle, and 4 more authorsIEEE Transactions on Robotics 2022
In this article, we present a novel and flexible multitask multilayer Bayesian mapping framework with readily extendable attribute layers. The proposed framework goes beyond modern metric-semantic maps to provide even richer environmental information for robots in a single mapping formalism while exploiting intralayer and interlayer correlations. It removes the need for a robot to access and process information from many separate maps when performing a complex task, advancing the way robots interact with their environments. To this end, we design a multitask deep neural network with attention mechanisms as our front-end to provide heterogeneous observations for multiple map layers simultaneously. Our back-end runs a scalable closed-form Bayesian inference with only logarithmic time complexity. We apply the framework to build a dense robotic map, including metric-semantic occupancy and traversability layers. Traversability ground truth labels are automatically generated from exteroceptive sensory data in a self-supervised manner. We present extensive experimental results on publicly available datasets and data collected by a three-dimensional bipedal robot platform and show reliable mapping performance in different environments. Finally, we also discuss how the current framework can be extended to incorporate more information, such as friction, signal strength, temperature, and physical quantity concentration using Gaussian map layers. The software for reproducing the presented results or running on customized data is made publicly available.
- Energy-Based Legged Robots Terrain Traversability Modeling via Deep Inverse Reinforcement LearningIEEE Robotics and Automation Letters Oct 2022
This work reports ondeveloping a deep inverse reinforcement learning method for legged robots terrain traversability modeling that incorporates both exteroceptive and proprioceptive sensory data. Existing works use robot-agnostic exteroceptive environmental features or handcrafted kinematic features; instead, we propose to also learn robot-specific inertial features from proprioceptive sensory data for reward approximation in a single deep neural network. Incorporating the inertial features can improve the model fidelity and provide a reward that depends on the robot’s state during deployment. We train the reward network using the Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) algorithm and propose simultaneously minimizing a trajectory ranking loss to deal with the suboptimality of legged robot demonstrations. The demonstrated trajectories are ranked by locomotion energy consumption, in order to learn an energy-aware reward function and a more energy-efficient policy than demonstration. We evaluate our method using a dataset collected by an MIT Mini-Cheetah robot and a Mini-Cheetah simulator. The code is publicly available.1
- Dynamic Semantic Occupancy Mapping Using 3D Scene Flow and Closed-Form Bayesian InferenceAishwarya Unnikrishnan, Joey Wilson, Lu Gan, and 4 more authorsIEEE Access 2022
This paper reports on a dynamic semantic mapping framework that incorporates 3D scene flow measurements into a closed-form Bayesian inference model. Existence of dynamic objects in the environment can cause artifacts and traces in current mapping algorithms, leading to an inconsistent map posterior. We leverage state-of-the-art semantic segmentation and 3D flow estimation using deep learning to provide measurements for map inference. We develop a Bayesian model that propagates the scene with flow and infers a 3D continuous (i.e., can be queried at arbitrary resolution) semantic occupancy map outperforming its static counterpart. Extensive experiments using publicly available data sets show that the proposed framework improves over its predecessors and input measurements from deep neural networks consistently.
- Bayesian Spatial Kernel Smoothing for Scalable Dense Semantic MappingLu Gan, Ray Zhang, Jessy W. Grizzle, and 2 more authorsIEEE Robotics and Automation Letters Apr 2020
This article develops a Bayesian continuous 3D semantic occupancy map from noisy point clouds by generalizing the Bayesian kernel inference model for building occupancy maps, a binary problem, to semantic maps, a multi-class problem. The proposed method provides a unified probabilistic model for both occupancy and semantic probabilities and nicely reverts to the original occupancy mapping framework when only one occupied class exists in obtained measurements. The Bayesian spatial kernel inference relaxes the independent grid assumption and brings smoothness and continuity to the map inference, enabling to exploit local correlations present in the environment and increasing the performance. The accompanying software uses multi-threading and vectorization, and runs at about 2 Hz on a laptop CPU. Evaluations using multiple sequences of stereo camera and LiDAR datasets show that the proposed method consistently outperforms current baselines. We also present a qualitative evaluation using data collected with a bipedal robot platform on the University of Michigan - North Campus.
- Boosting Shape Registration Algorithms via Reproducing Kernel Hilbert Space RegularizersSteven A. Parkison, Maani Ghaffari, Lu Gan, and 3 more authorsIEEE Robotics and Automation Letters Oct 2019
The essence of most shape registration algorithms is to find correspondences between two point clouds and then to solve for a rigid body transformation that aligns the geometry. The main drawback is that the point clouds are obtained by placing the sensor at different views; consequently, the two matched points most likely do not correspond to the same physical point in the real environment. In other words, the point cloud is a discrete representation of the shape geometry. Alternatively, a point cloud measurement can be seen as samples from geometry, and a function can be learned for a continuous representation using regression techniques such as kernel methods. To boost registration algorithms, this work develops a novel class of regularizers modeled in the Reproducing Kernel Hilbert Space (RKHS) that ensures correspondences are also consistent in an abstract vector space of functions such as intensity surface. Furthermore, the proposed RKHS regularizer is agnostic to the choice of the registration cost function which is desirable. The evaluations on experimental data confirm the effectiveness of the proposed regularizer using RGB-D and LIDAR sensors.
- Legged Robot State-Estimation Through Combined Forward Kinematic and Preintegrated Contact FactorsRoss Hartley, Josh Mangelson, Lu Gan, and 4 more authors2018 IEEE International Conference on Robotics and Automation (ICRA) May 2018
State-of-the-art robotic perception systems have achieved sufficiently good performance using Inertial Measurement Units (IMUs), cameras, and nonlinear optimization techniques, that they are now being deployed as technologies. However, many of these methods rely significantly on vision and often fail when visual tracking is lost due to lighting or scarcity of features. This paper presents a state-estimation technique for legged robots that takes into account the robot’s kinematic model as well as its contact with the environment. We introduce forward kinematic factors and preintegrated contact factors into a factor graph framework that can be incrementally solved in real-time. The forward kinematic factor relates the robot’s base pose to a contact frame through noisy encoder measurements. The preintegrated contact factor provides odometry measurements of this contact frame while accounting for possible foot slippage. Together, the two developed factors constrain the graph optimization problem allowing the robot’s trajectory to be estimated. The paper evaluates the method using simulated and real sensory IMU and kinematic data from experiments with a Cassie-series robot designed by Agility Robotics. These preliminary experiments show that using the proposed method in addition to IMU decreases drift and improves localization accuracy, suggesting that its use can enable successful recovery from a loss of visual tracking.
- Hybrid Contact Preintegration for Visual-Inertial-Contact State Estimation Using Factor GraphsRoss Hartley, Maani Ghaffari Jadidi, Lu Gan, and 3 more authors2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Oct 2018
The factor graph framework is a convenient modeling technique for robotic state estimation where states are represented as nodes, and measurements are modeled as factors. When designing a sensor fusion framework for legged robots, one often has access to visual, inertial, joint encoder, and contact sensors. While visual-inertial odometry has been studied extensively in this framework, the addition of a preintegrated contact factor for legged robots has been only recently proposed. This allowed for integration of encoder and contact measurements into existing factor graphs, however, new nodes had to be added to the graph every time contact was made or broken. In this work, to cope with the problem of switching contact frames, we propose a hybrid contact preintegration theory that allows contact information to be integrated through an arbitrary number of contact switches. The proposed hybrid modeling approach reduces the number of required variables in the nonlinear optimization problem by only requiring new states to be added alongside camera or selected keyframes. This method is evaluated using real experimental data collected from a Cassie-series robot where the trajectory of the robot produced by a motion capture system is used as a proxy for ground truth. The evaluation shows that inclusion of the proposed preintegrated hybrid contact factor alongside visual-inertial navigation systems improves estimation accuracy as well as robustness to vision failure, while its generalization makes it more accessible for legged platforms.
- Semantic Iterative Closest Point through Expectation-Maximization.Steven A Parkison, Lu Gan, Maani Ghaffari Jadidi, and 1 more author29th British Machine Vision Conference (BMVC) Oct 2018
In this paper, we develop a novel point cloud registration algorithm that directly incorporates pixelated semantic measurements into the estimation of the relative transformation between two point clouds. The algorithm uses an Iterative Closest Point (ICP)-like scheme and performs joint semantic and geometric inference using the ExpectationMaximization technique in which semantic labels and point associations between two point clouds are treated as latent random variables. The minimization of the expected cost on the three-dimensional special Euclidean group, i.e., SE(3), yields the rigid body transformation between two point clouds. The evaluation on publicly available RGBD benchmarks shows that, in comparison with both the standard Generalized ICP (GICP) available in the Point Cloud Library and GICP on SE(3), the registration error is reduced.