Research

Rational Inverse Reasoning

Ben Zandonati, Tomás Lozano-Pérez, Leslie Pack Kaelbling

In Submission! 2025

TL;DR: This paper introduces a framework for few-shot learning from demonstration by inferring the embodied reasoning process behind observed actions. On challenging reasoning tasks, our method achieves strong generalization to new tasks with only 1-3 demonstrations, moving closer towards human performance.

arXiv
Hierarchical Vision-Language Planning for Multi-Step Humanoid Manipulation

André Schakkal, Ben Zandonati, Zhutian Yang, Navid Azizan

Robotics Science and Systems (RSS) Workshop on Robot Planning in the Era of Foundation Models 2025

TL;DR: This work introduces a hierarchical and modular vision-language architecture for multi-step humanoid manipulation tasks. Specifically, we leverage vision-language models (VLMs) for high-level planning and action-chunking transformer (ACT) and whole-body RL for mid and low-level control.

arXiv Project
Investigating Vision Foundational Models for Tactile Representation Learning

Ben Zandonati, Ruohan Wang, Ruihan Gao, Yan Wu

2023

TL;DR: This paper explores the use of vision foundational models and pre-trained representations to enhance tactile representation learning and multi-modal continual learning.

arXiv
Towards Optimal Compression: Joint Pruning and Quantization

Ben Zandonati, Glenn Bucagu, Adrian Alan Pol, Maurizio Pierini, Olya Sirkin, Tal Kopetz

2023

TL;DR: This work presents a method for simultaneously pruning and quantizing neural networks by approximately minimizing the distance in parameter space.

arXiv
FIT: A Metric for Model Sensitivity

Ben Zandonati, Adrian Alan Pol, Maurizio Pierini, Olya Sirkin, Tal Kopetz

International Conference on Learning Representations (ICLR) 2022

TL;DR: This paper introduces a Fisher Information Metric approximation method for model sensitivity to low-bit quantization.

arXiv

Research

Rational Inverse Reasoning

Hierarchical Vision-Language Planning for Multi-Step Humanoid Manipulation

Investigating Vision Foundational Models for Tactile Representation Learning

Towards Optimal Compression: Joint Pruning and Quantization

FIT: A Metric for Model Sensitivity