**Robot Learning Workshop - Overview**

Iacocca Hall, Lehigh University. (Held October 14-15, 2019)

This was a 2 day NSF funded workshop consisiting of a series of presentations on emerging directions within intersection of robotics, deep and reinforcement learning, control systems, and operational research. The primary objective of this event was to facilitate interactions between researchers from different disciplines interested in designing and implementing the envisioned autonomous robots. The broader impact of this workshop aimed to inspire the research community on new interdisciplinary directions in robotics, controls, and machine learning. We believe that presenting challenging and important problems, in a coherent fashion, to these communities will open up tremendous intellectual opportunities for research and attract young researchers and students to this timely and important research field.

Organizing Committee: Nader Motee, Lehigh University; Hector Munoz-Avila, Lehigh University; Katya Scheinberg, Cornell University; Jeff Trinkle, Lehigh University.

**Workshop Schedule (pdf)**

**Workshop Presentations, Bios, Abstracts and Videos**

*I-DISC wishes to thank all the speakers that kindly gave us permission to record and share their presentations with you:*

**Adaptive Learning for Multi-Agent Navigation**

Presentation by **Maria Gini, University of Minnesota****Abstract**: In crowded multi-agent navigation, the motion of the agents is constrained by the motion of nearby agents. This makes planning paths difficult and leads to inefficient global motion. We formulate the problem as an action-selection problem, and propose an approach that enables agents to compute in real-time efficient and collision-free motions. We demonstrate experimentally how the approach works in a variety of scenarios in simulation and with a few real robots.

**Learning Geometry-Aware Representations: 3D Object and Human Pose Inference**

Presentation **Kostas Daniilidis, University of Pennsylvania****Abstract**: Traditional convolutional networks exhibit unprecedented robustness to intraclass nuisances when trained on big data. However, such data have to be augmented to cover geometric Transformations. Several approaches have shown recently that data augmentation can be avoided if networks are structured such that feature representations are transformed the same way as the input, a desirable property called equivariance. We show in this talk that global equivariance can be achieved for the case of 2D scaling, rotation, and translation as well as 3D rotations. We show state of the art results using an order of magnitude lower capacity than competing approaches. Moreover, we show how such geometric embeddings can recover the 3D pose of objects without key points or using groundtruth pose on regression. We finish by showing how graph convolutions enable the recovery of human pose and shape without any 2D annotation.

**Autonomous Systems in the Intersection of Controls, Learning Theory and Formal Methods**

Presentation by **Ufuk Topcu, The University of Texas at Austin****Abstract**: Autonomous systems are emerging as a driving technology for countlessly many applications. Numerous disciplines tackle the challenges toward making these systems agile, adaptable, reliable, user friendly and economical. On the other hand, the existing disciplinary boundaries delay and possibly even obstruct progress. I argue that the non-conventional problems that arise in the design and verification of autonomous systems require hybrid solutions at the intersection of learning, formal methods and controls. I will present examples of such hybrid solutions in several problems in autonomy at varying levels of detail.

**Learning Dynamical Systems with Side Information**

Presentation by **Amir Ali Ahmadi, Princeton University****Abstract**: In several safety-critical applications, one has to learn the behavior of an unknown dynamical system from noisy observations of a very limited number of trajectories. For example, to autonomously land an airplane that has just gone through engine failure, limited time is available to learn the modified dynamics of the plane before appropriate control action can be taken. Similarly, when a new infectious disease breaks out, few observations are initially available to understand the dynamics of contagion. In situations of this type where data is limited, it is essential to exploit “side information -e.g. physical laws or contextual knowledge--to assist the task of learning. We present a mathematical formalism of the problem of learning a dynamical system with side information, where side information can mean a concrete collection of local or global properties of the dynamical system. We show that sum of squares optimization is particularly suited for learning a dynamical system that best agrees with the observations and respects the side information. Based on joint work with Bachir El Khadir (Princeton).

**Deep Learning for Semantic Visual Navigation**

Presentation by **Alexander Toshev, Google AI****Abstract**: One of the fundamental problems for autonomous intelligent agents is the ability to move in visually and spatially complex environments for the purpose of finding objects, places, etc. This problem, commonly referred to as Visual Semantic Navigation, has been heavily studied in various settings. However, in its generality, unexplored and dynamic environments, complex semantics, continuous adaptation to the environment, it still presents many challenges. Deep Learning, by enabling models to learn complex concepts from experience, has huge potential in solving these challenges. In this talk, we present a framework towards a learning-based solution for Visual Semantic Navigation. We focus on two recent results. First, we talk about visual representations suitable for learning navigation algorithms. These representations result in systems for object-driven navigation that generalize to unexplored environments and utilize large synthetic data. Second, we present an approach towards continuous exploration of a novel environment using a model with general external memory.

**Kinodynamic Motion Planning with Q-Learning: An Online, Model-Free, and Safe Navigation Framework **

Presentation by **Kyriakos Vamvoudakis, Georgia Institute of Technology****Abstract**: This talk will present an online kinodynamic motion planning algorithmic framework using asymptotically optimal rapidly-exploring random tree (RRT*) and continuous-time Q-learning, which we term as RRT-Q*. I will formulate a model-free Q-based advantage function and I will utilize integral reinforcement learning to develop tuning laws for the online approximation of the optimal cost and the optimal policy of continuous-time linear systems. A terminal state evaluation procedure is introduced to facilitate the online implementation. I will then propose a static obstacle augmentation and a local replanning framework, which are based on topological connectedness, to locally recompute the robot's path and ensure collision-free navigation. I will finally show simulations and a qualitative comparison to evaluate the efficacy of the proposed methodology.

**Leveraging Deep Learning Models to Create a Natural Interface for Quadcopter Photography**

Presentation by **Gita Sukthankar**, **University of Central Florida****Abstract**: A quadcopter can capture photos from vantage points unattainable for a human photographer, but teleoperating it to a good viewpoint is a non-trivial task. Since humans are good at composing photos, the aim of our research is to leverage deep learning to create a customizable flight controller that can capture photos under the guidance of a human photographer. Our system, the Selfie Drone Stick, allows the user to assign a vantage point to the quadcopter based on the phone’s sensors. The user takes a selfie with the phone once, and the quadcopter autonomously flies to the target viewpoint. The proliferation of open source deep learning models provided us with a large variety of options for the computer vision and flight control systems. This article describes three key innovations required to deploy the models on a real robot: 1) a new architecture for rapid object detection, DUNet; 2) an abstract state representation for transferring learning from simulation to the hardware platform; 3) reward shaping and staging paradigms for training a deep reinforcement learning controller. Without these improvements, we were unable to learn a flight controller that adequately supported the intuitive user interface.

**From Optimization Algorithms to Dynamical Systems and Back**

Presentation by **Rene Vidal**, **Johns Hopkins University****Abstract**: Recently, there has been an increasing interest in using tools from dynamical systems to analyze the behavior of simple optimization algorithms such as gradient descent and accelerated variants. This talk will present differential equations that model the continuous limit of the sequence of iterates generated by the alternating direction method of multipliers, as well as an accelerated variant. We employ the direct method of Lyapunov to analyze the stability of critical points of the dynamical systems and to obtain associated convergence rates.

**Robust Guarantees for Perception-Based Control**

Presentation by **Nikolai Matni**, **University of Pennsylvania****Abstract**: Motivated by vision-based control of autonomous vehicles, we consider the problem of controlling a known linear dynamical system for which partial state information, such as vehicle position, can only be extracted from high-dimensional data, such as an image. Our approach is to learn a perception map from high-dimensional data to partial-state observation, and its corresponding error profile, and then design a robust controller. We show that under suitable smoothness assumptions on the perception map and generative model relating state to high-dimensional data, an affine error model is sufficiently rich to capture all possible error profiles, and can further be learned via a robust regression problem. We then show how to integrate the learned perception map and error model into a novel robust control synthesis procedure, and prove that the resulting perception and control loop has favorable generalization properties. Finally, we illustrate the usefulness of our approach on a synthetic example and on the self-driving car simulation platform CARLA.

**Perceptual Robot Learning**

Presentation by **David Held, Carnegie Mellon University****Abstract**: Robots today are typically confined to operate in relatively simple, controlled environments. One reason for these limitations is that current methods for robotic perception and control tend to break down when faced with occlusions, viewpoint changes, poor lighting, unmodeled dynamics, and other challenging but common situations that occur when robots are placed in the real world. I argue that, in order to handle these variations, robots need to learn to understand how the world changes over time: how the environment can change as a result of the robot’s own actions or from the actions of other agents in the environment. I will show how we can apply this idea of understanding changes to a number of robotics problems, such as object segmentation, tracking, and velocity estimation for autonomous driving as well as perception and control for various object manipulation tasks, including transparent, reflective, and deformable objects. By learning how the environment can change over time, we can enable robots to operate in the complex, cluttered environments of our daily lives.

**Show and Tell: Robots Learning Actions from Vision and Language**

Presentation by **Yiannis Aloimonos**, **University of Maryland****Abstract**: Context-free grammars have been in fashion in linguistics because they provide a simple and precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks. Also, the basic recursive structure of natural languages, the way in which clauses nest inside other clauses, and the way in which lists of adjectives and adverbs are followed by nouns and verbs, is described exactly. Similarly, for manipulation actions, every complex activity is built from smaller blocks involving hands and their movements, as well as objects, tools and the monitoring of their state. Thus, interpreting a “seen” action is like understanding language, and executing an action from knowledge in memory is like producing language. Several experiments will be shown interpreting human actions in the arts and crafts or assembly domain, through a parsing of the visual input, on the basis of the manipulation grammar. This parsing, in order to be realized, requires a network of visual processes that attend to objects and tools, segment them and recognize them, track the moving objects and hands, and monitor the state of objects to calculate goal completion. These processes will also be explained and we will conclude with demonstrations of robots learning how to perform tasks by watching videos of relevant human activities.

**Topics in Graph Deep Learning**

Presentation by **Radu Balan**, **University of Maryland****Abstract**: In this talk we discuss two problems seemingly unrelated: representations of permutation invariant data sets and quadratic assignment optimization problems. Two matrices of same size are called permutation equivalent if they are equal to one another up to a row permutation. The first problem asks for an Euclidian embedding of the quotient space induced by the row permutation equivalence relation. As we shall see, the problem admits several equivalent formulations. We shall discuss representations inspired by results from commutative algebra theory, measure theory, and reproducing kernel Hilbert space theory. This problem has direct application to graph classification problems where the underlying network has a natural equivariance property. The quadratic assignment problem is a NP hard optimization problem. We shall analyze an approach using graph convolution networks (GCN). We prove that a specially designed GCN produces the optimal solution for a broad class of assignment problems.

**Distributed Image Classification using Deep Reinforcement Learning**

Presentation by **Martin Takac**, **Lehigh University****Abstract**: We propose a planning and perception mechanism for robots (agents), that can only observe the underlying environment partially, in order to solve an image classification problem. We study two different settings: a) using a single agent which is choosing a goal location where we plans to get; b) and multiple agent scenarios where agents learn how to communicate to achieve the classification goal. Our proposed methodology is tested on the MNIST dataset of handwritten digits, which provides us with a level of explainability while interpreting the agent's understanding of the world and actions. Coauthors: Hossein K. Mousavi, Guangyi Liu, Weihang Yuan, Mohammad Reza Nazari, Héctor Muñoz-Avila, Nader Motee Papers: https://arxiv.org/abs/1909.09705 https://arxiv.org/abs/1905.04835 Bio: Prof Takác received his B.S. (2008) and M.S. (2010) degrees in Mathematics from Comenius University, Slovakia, and Ph.D. (2014) degree in Mathematics from The University of Edinburgh, United Kingdom. He received several awards during this period, including the Best Ph.D. Dissertation Award by the OR Society (2014), Leslie Fox Prize (2nd Prize; 2013) by the Institute for Mathematics and its Applications, and INFORMS Computing Society Best Student Paper Award (runner up; 2012). Since 2014, he is a Tenure Track Assistant Professor in the Department of Industrial and Systems Engineering at Lehigh University, USA. His current research interests include the design, analysis and application of algorithms for machine learning, optimization, highperformance computing, operations research and energy systems.

**The Many Faces of Learning**

Presentation by **Don Perlis, University of Maryland****Abstract**: Machine learning (ML) is only one of various forms of learning. I will describe several of these, and how they may fit together in a "complete robotic system”.

**Learning Stabilizable Nonlinear Dynamics with Contraction-Based Regularization**

Presentation by **Sumeet Singh, Stanford University**¹**Abstract**: When it works, model-based Reinforcement Learning (RL) typically offers major improvements in sample efficiency in comparison to model-free techniques such as policy gradients that do not explicitly estimate the underlying dynamical system. Yet, all too often, when standard supervised learning is applied to model complex dynamics, the resulting controllers do not perform at par with model-free RL methods in the limit of increasing sample size, due to compounding errors across long time horizons. In this talk, I will present novel algorithmic tools leveraging Lyapunovbased analysis and semi-infinite convex programming to derive a control-theoretic regularizer for dynamics fitting, rooted in the notion of trajectory stabilizability. I will demonstrate how to embed these control-theoretic conditions as constraints within a semi-supervised algorithm for learning dynamical systems from user demonstrations. The constraints act as a form of context-driven hypothesis pruning to yield learned models that jointly balance regression performance and stabilizability, ultimately resulting in generated trajectories for the robot that are conditioned for feedback control. Experimental results on a quadrotor testbed will illustrate the efficacy of the proposed algorithms and clear connections between theory and hardware.

^{1}*Now at Google Brain Robotics*