Samir Yitzhak Gadre

I am a 4rd year Ph.D. student at Columbia University studying large-scale dataset construction and model training. I am privileged to be advised by Shuran Song and fortunate to work closely with Ludwig Schmidt.

My work is supported by a Columbia Presidential Fellowship and a NSF Graduate Research Fellowship.

In addition to research, I enjoy running, climbing mountains, and singing with my pop/rock choir: Here to Sing.

Email  /  Google Scholar  /  Twitter  /  CV

profile photo
News
Talks
DataComp: In search of the next generation of multimodal datasets [slides] at NYU.
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation [slides] at the CVPR 2023 Workshop on 3D Scene Understanding for Vision, Graphics, and Robotics.
No Training? Towards Adapting Zero-Shot Models to Robotics Tasks [slides] at the CVPR 2022 Tutorial on Vision-Based Robot Learning.
Publications and Pre-Prints
(* indicates equal contribution)
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Anas Awadalla*, Irena Gao*, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Yitzhak Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt
arXiv, 2023
arXiv | blog | code | demo

An open-source implementation of Flamingo models and training.

Improving multimodal datasets with image captioning
Thao Nguyen, Samir Yitzhak Gadre, Gabriel Ilharco, Sewoong Oh, Ludwig Schmidt
NeurIPS, 2023
arXiv | More coming soon!

Improving image-text datasets for downstream classification and retrieval using image captioning models (e.g., BLIP2).

Objaverse-XL: A Universe of 10M+ 3D Objects
Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt*, Ali Farhadi*
NeurIPS, 2023
arXiv | website | code

A dataset of over 10 million 3D objects.

DataComp: In search of the next generation of multimodal datasets
Samir Yitzhak Gadre*, Gabriel Ilharco*, Alex Fang*, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
NeurIPS, 2023 (oral)
arXiv | website | code

A benchmark where model training is fixed and participants iterate on data curation strategies. We release a dataset of 12.8B image-text pairs, the largest public dataset of its kind to date. On a 1.4B subset, DataComp-1B, we outperform OpenAI CLIP models trained with the same compute budget.

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text
Wanrong Zhu*, Jack Hessel*, Anas Awadalla, Samir Yitzhak Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin Choi
NeurIPS, 2023
arXiv | code

A billion-scale dataset of interleaved images and text.

CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation
Samir Yitzhak Gadre, Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt, Shuran Song
CVPR, 2023
arXiv | website | code

We study how to turn existing zero-shot vision-and-language models (e.g., CLIP) into zero-shot object navigators.

Patching open-vocabulary models by interpolating weights
Gabriel Ilharco*, Mitchell Wortsman*, Samir Yitzhak Gadre*, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt
NeurIPS, 2022
arXiv | website (with demo!) | code

We introduce PAINT to improve performance on tasks where pre-trained open-vocabulary models struggle, while maintaining performance on tasks they are already performant on.

Structure From Action: Learning Interactions for Articulated Object 3D Structure Discovery
Neil Nie, Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song
IROS, 2023
arXiv | More coming soon!

We learn how to interact with 3D articulated objects to reconstruct their parts and discover joint constraints.

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon*, Simon Kornblith*, Ludwig Schmidt*
ICML, 2022
arXiv | code

We average the weights of many models (ingredients) fine-tuned with different hyperparameters. The resulting soup outperforms the individual ingredients.

Continuous Scene Representations for Embodied AI
Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song, Roozbeh Mottaghi
CVPR, 2022
arXiv | website | code

We employ a contrastive loss to embed relationships between objects as features. We show how our representation can be used downstream for visual room rearrangement without any additional training.

Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery
Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song
ICCV, 2021
arXiv | website (with demo!)

We learn multi-step interaction with articulated objects to (1) discover the number of parts and (2) find part segmentation masks, all without semantic labels.

End-User Robot Programming Using Mixed Reality
Samir Yitzhak Gadre, Eric Rosen, Gary Chien, Elizabeth Phillips, Stefanie Tellex, George Konidaris
ICRA, 2019
pdf

Mixed reality robot programming interface for pick-and-place tasks.

Teaching Robots Using Mixed Reality
Samir Yitzhak Gadre
Brown University Undergraduate Honors Thesis, 2018
pdf

Mixed reality learning from demonstration system for pick-and-place tasks.

Service
ECCV: 2020; CVPR: 2023; ICLR: 2023; ICML: 2022, 2023; IROS: 2022; ICRA: 2023; NeurIPS: 2023
Pre-submission Application Review (PAR); co-organizer, fall 2021.

WiSC; mentor, fall 2020, spring 2021.

PAR; application reader, fall 2020.

COMS 4733: Computational Aspects of Robotics; Graduate Teaching Assistant, fall 2020.

COMS 6998: Topics in Robot Learning; Graduate Teaching Assistant, spring 2021.
CS16: Algorithms and Data Structures; Teaching Assistant, spring 2018.

CS15: Object Oriented Programming; Teaching Assistant, fall 2016.
More!
Releasing my NSF GRFP application, inspired by others, whose materials were super helpful to me [personal, research]











Template modified from the Jon Barron original.