Mathis Petrovich

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="Content-type" content="text/html; charset=utf-8"/>  <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no" />  <script src="https://kit.fontawesome.com/bacac70704.js" crossorigin="anonymous"></script> <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css"> <link rel="shortcut icon" href="resrc/icons/mathis_thumb.png"> <meta name="description" content="Mathis Petrovich"> <meta name="author" content="Mathis Petrovich"> <title>Mathis Petrovich</title> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.1/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-iYQeCzEYFbKjA/T2uDLTpkwGzCiq6soy8tYaI1GyVh/UjpbCx/TYkiZhlZB6+fzT" crossorigin="anonymous"> <link href="css/style.css" rel="stylesheet"> <link href="css/media.css" rel="stylesheet">  <script async src="https://www.googletagmanager.com/gtag/js?id=G-VL12GQY0EE"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-VL12GQY0EE'); </script> </head> <body> <div class="container"> <div class="row"> <div class="col-md-8 col-sm-8 col-xs-12"> <br> <h1>Mathis Petrovich</h1> <h3custom> Applied Research Scientist at <a href="https://www.nvidia.com/" target="_blank">NVIDIA</a> </h3custom> <br> <p style="font-size: 120%;text-align: justify;"> </p> <div id="contact_nvidia"> <p style="font-size: 120%;text-align: justify;"> <b> <a href="https://www.nvidia.com/" target="_blank">NVIDIA</a> <a href="https://research.nvidia.com/labs/toronto-ai/" target="_blank">Toronto AI Lab</a> </b><br> NVIDIA Switzerland AG <br> Europaallee 39, 8004 Zürich <br> Switzerland <br> ✉ mpetrovich@nvidia.com </p> </div> <div id="contact_personal"> <p style="font-size: 120%;text-align: justify;"> <b> Personal info </b> <br> ✉ mathis.petrovich@gmail.com </p> </div> </div> <div class="col-md-4 col-sm-4 col-xs-12 text-center"> <br> <br> <br> <img src="resrc/mathis.jpg" title="" style="width: 100%;max-width:250px; border-radius: 20px;"> <br> <h5 style="padding-top: 5px;"> <a href="mailto:mathis.petrovich@enpc.fr" title="e-Mail" target="_blank"><i class="fa fa-envelope-square fa-2x"></i></a> <a href="https://github.com/Mathux" title="GitHub" target="_blank"><i class="fa fa-github-square fa-2x"></i></a> <a href="https://www.linkedin.com/in/mathis-petrovich/" title="LinkedIn" target="_blank"><i class="fa fa-linkedin-square fa-2x"></i></a> <a href="https://scholar.google.com/citations?user=WUKF2ZwAAAAJ&hl=en" title="Google Scholar" target="_blank"> <i class="ai ai-google-scholar-square ai-2x"></i> </a> <a href="https://arxiv.org/search/cs?query=Petrovich%2C+M&searchtype=author&abstracts=show&order=-announced_date_first&size=50" target="_blank"> <i class="ai ai-arxiv-square ai-2x"></i> </a> <a href="cv_mathis_petrovich.pdf" title="CV" target="_blank"><i class="ai ai-cv-square ai-2x"></i></a> </h5> </div> </div> <div class="row"> <div class="col-md-12"> <h3>Introduction</h3> <p style="font-size: 120%;text-align: justify;"> I am an Applied Research Scientist at NVIDIA in the <a href="https://research.nvidia.com/labs/toronto-ai/" target="_blank">Toronto AI Lab</a>, and work on text-to-3D human motion synthesis. I received my <a href="https://ellis.eu/" target="_blank">ELLIS</a> PhD from the <a href="https://www.ecoledesponts.fr/en" target="_blank">École des Ponts ParisTech</a> (ENPC), where I worked in the <a href="https://imagine.enpc.fr" target="_blank">IMAGINE</a> computer vision team. I also worked in close collaboration with the <a href="https://ps.is.mpg.de/" target="_blank">Perceiving Systems</a> Department of <a href="https://is.mpg.de/" target="_blank">Max Planck Institute for Intelligent Systems</a> (MPI-IS). My co-advisors were <a href="https://imagine.enpc.fr/~varolg/" target="_blank">G&uumll Varol</a> (ENPC) and <a href="https://ps.is.mpg.de/~black" target="_blank">Michael J. Black</a> (MPI) and my PhD topic was to generate realistic and diverse human body motion in a controllable way (given labels or text instructions), and to create text-motion joint latent spaces. Throughout my PhD, I interned at <a href="https://www.nvidia.com/" target="_blank">NVIDIA</a>. Before my PhD, I studied at the <a href="https://ens-paris-saclay.fr/" target="_blank">École normale supérieure Paris-Saclay</a>. </p> </div> </div> <div class="row"> <div class="col-md-12"> <h3>News</h3> <ul class="mb-0" style="list-style-type:none;padding-left:0;font-size: 120%;"> <li><span class="badge bg-success" style="width: 85px">07/2024</span> I am joining <b><a href="https://www.nvidia.com/" target="_blank">NVIDIA</a></b> as an Applied Research Scientist! </li> <li><span class="badge bg-success" style="width: 85px">04/2024</span> I have <b><a href="https://www.youtube.com/watch?v=yi5-d-wEtyI&t=1s">defended<a/></b> my <b>PhD thesis</b>! </li> <li><span class="badge bg-success" style="width: 85px">04/2024</span> <b><a href="stmc/index.html">STMC</a></b> have been accepted to <b>CVPRW 2024</b>! </li> <li><span class="badge bg-secondary" style="width: 85px">07/2023</span> <b><a href="https://sinc.is.tue.mpg.de">SINC</a></b> and <b><a href="tmr/index.html">TMR</a></b> have been accepted to <b>ICCV 2023</b>! </li> <li><span class="badge bg-secondary" style="width: 85px">06/2023</span> I am joining <b><a href="https://www.nvidia.com/" target="_blank">NVIDIA</a></b> for a research internship! </li> <li><span class="badge bg-secondary" style="width: 85px">05/2023</span> I am glad to be named one of the <b><a href="https://cvpr2023.thecvf.com/Conferences/2023/OutstandingReviewers">CVPR 2023</a></b> Outstanding Reviewers! </li> <li><span class="badge bg-primary" style="width: 85px">04/2022</span> <b><a href="https://sinc.is.tue.mpg.de">SINC</a></b> and <b><a href="tmr/index.html">TMR</a></b> preprints are available online.</li> <li><span class="badge bg-primary" style="width: 85px">08/2022</span> <b><a href="https://teach.is.tue.mpg.de">TEACH</a></b> paper has been accepted to <b>3DV 2022</b>! </li> <li><span class="badge bg-primary" style="width: 85px">07/2022</span> I attended the <a href="https://cmp.felk.cvut.cz/summerschool2022/" target="_blank"> Vision and Sports Summer School</a>! </li> <li><span class="badge bg-primary" style="width: 85px">07/2022</span> <b><a href="temos/index.html">TEMOS</a></b> paper has been accepted as an <b>Oral</b> to <b>ECCV 2022</b>! </li> <li><span class="badge bg-primary" style="width: 85px">06/2022</span> <b>FROT</b> paper has been accepted to <b>ECML 2022</b>! </li> <li><span class="badge bg-info" style="width: 85px">10/2021</span> I have been nominated to become an <b>ELLIS PhD student</b>! </li> <li><span class="badge bg-info" style="width: 85px">07/2021</span> I attended the <a href="https://project.inria.fr/paiss/" target="_blank"> PAISS Summer School</a>! </li> <li><span class="badge bg-info" style="width: 85px">07/2021</span> <b><a href="actor/index.html">ACTOR</a></b> paper has been accepted to <b>ICCV 2021</b>! </li> </ul>  <button type="button" class="btn btn-default btn-sm" data-bs-toggle="collapse" data-bs-target="#old_news">Show more</button> <div id="old_news" class="collapse"> <ul class="mb-0" style="list-style-type:none;padding-left:0;font-size: 120%;"> <li><span class="badge bg-success" style="width: 85px">10/2020</span> Starting of my <b>PhD</b>! </li> <li><span class="badge bg-secondary" style="width: 85px">10/2019</span> I am joining the High-Dimensional Statistical Modeling Team at <b>Riken AIP</b>! </li> <li><span class="badge bg-secondary" style="width: 85px">08/2019</span> I obtained the <a href="http://math.ens-paris-saclay.fr/version-francaise/formations/master-mva/" target="_blank">MVA</a> Master's degree at <a href="https://ens-paris-saclay.fr/" target="_blank">École normale supérieure Paris-Saclay</a>! </li> <li><span class="badge bg-primary" style="width: 85px">09/2016</span> I am accepted to the <a href="https://ens-paris-saclay.fr/" target="_blank">École normale supérieure Paris-Saclay</a> as a <em>Normalien</em>! </li> </ul> </div> <br> </div> </div> <div class="row"> <div class="col-md-12"> <h3>Publications</h3>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <a href="https://imagine.enpc.fr/~leore.bensabath/TMR++"> <img class="img-thumbnail mb-3" src="resrc/tmrpp.png" alt="TMR++: A Cross-Dataset Study for Text-based 3D Human Motion Retrieval"> </a> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>TMR++: A Cross-Dataset Study for Text-based 3D Human Motion Retrieval</strong><br> Léore Bensabath, <u>Mathis Petrovich</u>, G&uumll Varol<br> CVPRW 2024 <br> <a target="_blank" href="https://arxiv.org/abs/2405.16909"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <a target="_blank" href="https://imagine.enpc.fr/~leore.bensabath/TMR++"> <button type="button" class="btn btn-primary btn-sm"> Webpage </button></a> <a target="_blank" href="https://github.com/leorebensabath/TMRPlusPlus"> <button type="button" class="btn btn-primary btn-sm"> Code </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex22">BibTex</button> <div id="bibtex22" class="collapse"> <pre><tt>@inproceedings{lbensabath2024, title = {{TMR++}: A Cross-Dataset Study for Text-based 3D Human Motion Retrieval}, author = {Bensabath, Léore and Petrovich, Mathis and Varol, G{\"u}l}, booktitle = {CVPR Workshop on Human Motion Generation}, year = {2024} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract22">Abstract</button> <div id="abstract22" class="collapse"> <p class="bg-light"> We provide results of our study on text-based 3D human motion retrieval and particularly focus on cross-dataset generalization. Due to practical reasons such as dataset-specific human body representations, existing works typically benchmark by training and testing on partitions from the same dataset. Here, we employ a unified SMPL body format for all datasets, which allows us to perform training on one dataset, testing on the other, as well as training on a combination of datasets. Our results suggest that there exist dataset biases in standard text-motion benchmarks such as HumanML3D, KIT Motion-Language, and BABEL. We show that text augmentations help close the domain gap to some extent, but the gap remains. We further provide the first zero-shot action recognition results on BABEL, without using categorical action labels during training, opening up a new avenue for future research. </p> </div> <div style="height:30px;"></div> </div> </div>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <a href="stmc/index.html"> <img class="img-thumbnail mb-3" src="stmc/images/stmc.png" alt="STMC: Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation"> </a> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>STMC: Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation</strong><br> <u>Mathis Petrovich</u>, Or Litany, Umar Iqbal, Michael J. Black, G&uumll Varol, Xue Bin Peng, Davis Rempe<br> CVPRW 2024 <br> <a target="_blank" href="https://arxiv.org/abs/2401.08559"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <a target="_blank" href="stmc/index.html"> <button type="button" class="btn btn-primary btn-sm"> Webpage </button></a> <a target="_blank" href="https://github.com/nv-tlabs/stmc"> <button type="button" class="btn btn-primary btn-sm"> Code </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex21">BibTex</button> <div id="bibtex21" class="collapse"> <pre><tt>@inproceedings{petrovich24stmc, title = {Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation}, author = {Petrovich, Mathis and Litany, Or and Iqbal, Umar and Black, Michael J. and Varol, G{\"u}l and Peng, Xue Bin and Rempe, Davis}, booktitle = {CVPR Workshop on Human Motion Generation}, year = {2024} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract21">Abstract</button> <div id="abstract21" class="collapse"> <p class="bg-light"> Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To address this, we introduce the new problem of timeline control for text-driven motion synthesis, which provides an intuitive, yet fine-grained, input interface for users. Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap. This enables specifying the exact timings of each action and composing multiple actions in sequence or at overlapping intervals. To generate composite animations from a multi-track timeline, we propose a new test-time denoising method. This method can be integrated with any pre-trained motion diffusion model to synthesize realistic motions that accurately reflect the timeline. At every step of denoising, our method processes each timeline interval (text prompt) individually, subsequently aggregating the predictions with consideration for the specific body parts engaged in each action. Experimental comparisons and ablations validate that our method produces realistic motions that respect the semantics and timing of given text prompts. </p> </div> <div style="height:30px;"></div> </div> </div>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <a href="tmr/index.html"> <img class="img-thumbnail mb-3" src="tmr/images/tmr.png" alt="TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis"> </a> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis</strong><br> <u>Mathis Petrovich</u>, Michael J. Black and Gül Varol<br> ICCV 2023  <br> <a target="_blank" href="https://arxiv.org/abs/2305.00976"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <a target="_blank" href="tmr/index.html"> <button type="button" class="btn btn-primary btn-sm"> Webpage </button></a> <a target="_blank" href="https://huggingface.co/spaces/Mathux/TMR"> <button type="button" class="btn btn-primary btn-sm"> Demo </button></a> <a target="_blank" href="https://github.com/Mathux/TMR"> <button type="button" class="btn btn-primary btn-sm"> Code </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex20">BibTex</button> <div id="bibtex20" class="collapse"> <pre><tt>@inproceedings{petrovich23tmr, title = {{TMR}: Text-to-Motion Retrieval Using Contrastive {3D} Human Motion Synthesis}, author = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l}, booktitle = {International Conference on Computer Vision ({ICCV})}, year = {2023} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract20">Abstract</button> <div id="abstract20" class="collapse"> <p class="bg-light"> In this paper, we present TMR, a simple yet effective approach for text to 3D human motion retrieval. While previous work has only treated retrieval as a proxy evaluation metric, we tackle it as a standalone task. Our method extends the state-of-the-art text-to-motion synthesis model TEMOS, and incorporates a contrastive loss to better structure the cross-modal latent space. We show that maintaining the motion generation loss, along with the contrastive training, is crucial to obtain good performance. We introduce a benchmark for evaluation and provide an in-depth analysis by reporting results on several protocols. Our extensive experiments on the KIT-ML and HumanML3D datasets show that TMR outperforms the prior work by a significant margin, for example reducing the median rank from 54 to 19. Finally, we showcase the potential of our approach on moment retrieval. Our code and models are publicly available. </p> </div> <div style="height:30px;"></div> </div> </div>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <a href="https://sinc.is.tue.mpg.de"> <img class="img-thumbnail mb-3" src="resrc/sinc.png" alt="SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation"> </a> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation</strong><br> Nikos Athanasiou*, <u>Mathis Petrovich</u>*, Michael J. Black, G&uumll Varol <br> ICCV 2023 <br>  <a target="_blank" href="https://arxiv.org/abs/2304.10417"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <a target="_blank" href="https://sinc.is.tue.mpg.de"> <button type="button" class="btn btn-primary btn-sm"> Webpage </button></a> <a target="_blank" href="https://github.com/athn-nik/sinc"> <button type="button" class="btn btn-primary btn-sm"> Code </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex19">BibTex</button> <div id="bibtex19" class="collapse"> <pre><tt>@inproceedings{SINC:ICCV:2023, title = {{SINC}: Spatial Composition of {3D} Human Motions for Simultaneous Action Generation}, author = {Athanasiou, Nikos and Petrovich, Mathis and Black, Michael J. and Varol, G\"{u}l }, booktitle = {International Conference on Computer Vision ({ICCV})}, year = {2023} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract19">Abstract</button> <div id="abstract19" class="collapse"> <p class="bg-light"> Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions, for example 'waving hand' while 'walking' at the same time. We refer to generating such simultaneous movements as performing 'spatial compositions'. In contrast to temporal compositions that seek to transition from one action to another, spatial compositing requires understanding which body parts are involved in which action, to be able to move them simultaneously. Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as "what are the body parts involved in the action <action name>?", while also providing the parts list and few-shot examples. Given this action-part mapping, we combine body parts from two motions together and establish the first automated method to spatially compose two actions. However, training data with compositional actions is always limited by the combinatorics. Hence, we further create synthetic data with this approach, and use it to train a new state-of-the-art text-to-motion generation model, called SINC ("SImultaneous actioN Compositions for 3D human motions"). In our experiments, we find training on additional synthetic GPT-guided compositional motions improves text-to-motion generation. </p> </div> <div style="height:30px;"></div> </div> </div>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <a href="https://teach.is.tue.mpg.de"> <img class="img-thumbnail mb-3" src="resrc/teach.png" alt="TEACH: Temporal Action Composition for 3D Human"> </a> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>TEACH: Temporal Action Composition for 3D Human</strong><br> Nikos Athanasiou, <u>Mathis Petrovich</u>, Michael J. Black, G&uumll Varol <br> 3DV 2022 <br>  <a target="_blank" href="https://arxiv.org/abs/2209.04066"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <a target="_blank" href="https://teach.is.tue.mpg.de"> <button type="button" class="btn btn-primary btn-sm"> Webpage </button></a> <a target="_blank" href="https://github.com/athn-nik/teach"> <button type="button" class="btn btn-primary btn-sm"> Code </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex18">BibTex</button> <div id="bibtex18" class="collapse"> <pre><tt>@inproceedings{TEACH:3DV:2022, title = {{TEACH}: {T}emporal {A}ction {C}ompositions for {3D} {H}umans}, author = {Athanasiou, Nikos and Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l }, booktitle = {{International Conference on 3D Vision (3DV)}}, year = {2022} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract18">Abstract</button> <div id="abstract18" class="collapse"> <p class="bg-light"> Given a series of natural language descriptions, our task is to generate 3D human motions that correspond semantically to the text, and follow the temporal order of the instructions. In particular, our goal is to enable the synthesis of a series of actions, which we refer to as temporal action composition. The current state of the art in text-conditioned motion synthesis only takes a single action or a single sentence as input. This is partially due to lack of suitable training data containing action sequences, but also due to the computational complexity of their non-autoregressive model formulation, which does not scale well to long sequences. In this work, we address both issues. First, we exploit the recent BABEL motion-text collection, which has a wide range of labeled actions, many of which occur in a sequence with transitions between them. Next, we design a Transformer-based approach that operates non-autoregressively within an action, but autoregressively within the sequence of actions. This hierarchical formulation proves effective in our experiments when compared with multiple baselines. Our approach, called TEACH for "TEmporal Action Compositions for Human motions", produces realistic human motions for a wide variety of actions and temporal compositions from language descriptions. To encourage work on this new task, we make our code available for research purposes at our website. </p> </div> <div style="height:30px;"></div> </div> </div>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <a href="temos/index.html"> <img class="img-thumbnail mb-3" src="temos/images/small_white.png" alt="TEMOS: Generating diverse human motions from textual descriptions"> </a> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>TEMOS: Generating diverse human motions from textual descriptions</strong><br> <u>Mathis Petrovich</u>, Michael J. Black and Gül Varol<br> ECCV 2022 (Oral) <br>  <a target="_blank" href="https://arxiv.org/abs/2204.14109"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <a target="_blank" href="temos/index.html"> <button type="button" class="btn btn-primary btn-sm"> Webpage </button></a> <a target="_blank" href="https://github.com/Mathux/TEMOS"> <button type="button" class="btn btn-primary btn-sm"> Code </button></a> <a target="_blank" href="temos/poster.pdf"> <button type="button" class="btn btn-primary btn-sm"> Poster </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex17">BibTex</button> <div id="bibtex17" class="collapse"> <pre><tt>@inproceedings{petrovich22temos, title = {{TEMOS}: Generating diverse human motions from textual descriptions}, author = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l}, booktitle = {European Conference on Computer Vision ({ECCV})}, year = {2022} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract17">Abstract</button> <div id="abstract17" class="collapse"> <p class="bg-light"> We address the problem of generating diverse 3D human motions from textual descriptions. This challenging task requires joint modeling of both modalities: understanding and extracting useful human-centric information from the text, and then generating plausible and realistic sequences of human poses. In contrast to most previous work which focuses on generating a single, deterministic, motion from a textual description, we design a variational approach that can produce multiple diverse human motions. We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data, in combination with a text encoder that produces distribution parameters compatible with the VAE latent space. We show the TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions. We evaluate our approach on the KIT Motion-Language benchmark and, despite being relatively straightforward, demonstrate significant improvements over the state of the art. Code and models are available on our webpage. </p> </div> <div style="height:30px;"></div> </div> </div>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <a href="actor/index.html"> <img class="img-thumbnail mb-3" src="resrc/actor.png" alt="ACTOR: Action-Conditioned 3D Human Motion Synthesis with Transformer VAE"> </a> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>ACTOR: Action-Conditioned 3D Human Motion Synthesis with Transformer VAE</strong><br> <u>Mathis Petrovich</u>, Michael J. Black and Gül Varol<br> ICCV 2021 <br>  <a target="_blank" href="https://arxiv.org/abs/2104.05670"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <a target="_blank" href="actor/index.html"> <button type="button" class="btn btn-primary btn-sm"> Webpage </button></a> <a target="_blank" href="https://github.com/Mathux/ACTOR"> <button type="button" class="btn btn-primary btn-sm"> Code </button></a> <a target="_blank" href="actor/poster.pdf"> <button type="button" class="btn btn-primary btn-sm"> Poster </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex13"> BibTex </button> <div id="bibtex13" class="collapse"> <pre><tt>@inproceedings{petrovich21actor, title = {Action-Conditioned 3{D} Human Motion Synthesis with Transformer {VAE}}, author = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l}, booktitle = {International Conference on Computer Vision ({ICCV})}, year = {2021} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract13"> Abstract </button> <div id="abstract13" class="collapse"> <p class="bg-light"> We tackle the problem of action-conditioned generation of realistic and diverse human motion sequences. In contrast to methods that complete, or extend, motion sequences, this task does not require an initial pose or sequence. Here we learn an action-aware latent representation for human motions by training a generative variational autoencoder (VAE). By sampling from this latent space and querying a certain duration through a series of positional encodings, we synthesize variable-length motion sequences conditioned on a categorical action. Specifically, we design a Transformer-based architecture, ACTOR, for encoding and decoding a sequence of parametric SMPL human body models estimated from action recognition datasets. We evaluate our approach on the NTU RGB+D, HumanAct12 and UESTC datasets and show improvements over the state of the art. Furthermore, we present two use cases: improving action recognition through adding our synthesized data to training, and motion denoising. Code and models are available on our project page. </p> </div> <div style="height:30px;"></div> </div> </div>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <img class="img-thumbnail mb-3" src="resrc/frot.png" alt="Feature Robust Optimal Transport for High-dimensional Data"> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>FROT: Feature Robust Optimal Transport for High-dimensional Data</strong><br> <u>Mathis Petrovich</u>*, Chao Liang*, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, <br> Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada<br> ECML 2022 <br>  <a target="_blank" href="https://arxiv.org/abs/2005.12123"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex12">BibTex</button> <div id="bibtex12" class="collapse"> <pre><tt>@inproceedings{petrovich2022FROT, title = {Feature Robust Optimal Transport for High-dimensional Data}, author = {Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov and Makoto Yamada}, booktitle = {{European Conference on Machine Learning (ECML)}}, year = {2022} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract12">Abstract</button> <div id="abstract12" class="collapse"> <p class="bg-light"> Optimal transport is a machine learning problem with applications including distribution comparison, feature selection, and generative adversarial networks. In this paper, we propose feature-robust optimal transport (FROT) for high-dimensional data, which solves high-dimensional OT problems using feature selection to avoid the curse of dimensionality. Specifically, we find a transport plan with discriminative features. To this end, we formulate the FROT problem as a min--max optimization problem. We then propose a convex formulation of the FROT problem and solve it using a Frank--Wolfe-based optimization algorithm, whereby the subproblem can be efficiently solved using the Sinkhorn algorithm. Since FROT finds the transport plan from selected features, it is robust to noise features. To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence. By conducting synthetic and benchmark experiments, we demonstrate that the proposed method can find a strong correspondence by determining important layers. We show that the FROT algorithm achieves state-of-the-art performance in real-world semantic correspondence datasets. </p> </div> <div style="height:30px;"></div> </div> </div>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <img class="img-thumbnail mb-3" src="resrc/fsnet.png" alt="FsNet: Feature Selection Network on High-dimensional Biological Data"> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>FsNet: Feature Selection Network on High-dimensional Biological Data</strong><br> Dinesh Singh, Héctor Climente-González, <u>Mathis Petrovich</u>, Eiryo Kawakami, Makoto Yamada<br> IJCNN 2023 <br>  <a target="_blank" href="https://arxiv.org/abs/2001.08322"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex11">BibTex</button> <div id="bibtex11" class="collapse"> <pre><tt>@inproceedings{dinesh2020fsnet, title = {{FsNet}: Feature Selection Network on High-dimensional Biological Data} author = {Dinesh Singh, Héctor Climente-González, Mathis Petrovich, Eiryo Kawakami and Makoto Yamada}, booktitle = {{International Joint Conference on Neural Networks (IJCNN)}}, year = {2023} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract11">Abstract</button> <div id="abstract11" class="collapse"> <p class="bg-light"> Biological data including gene expression data are generally high-dimensional and require efficient, generalizable, and scalable machine-learning methods to discover their complex nonlinear patterns. The recent advances in machine learning can be attributed to deep neural networks (DNNs), which excel in various tasks in terms of computer vision and natural language processing. However, standard DNNs are not appropriate for high-dimensional datasets generated in biology because they have many parameters, which in turn require many samples. In this paper, we propose a DNN-based, nonlinear feature selection method, called the feature selection network (FsNet), for high-dimensional and small number of sample data. Specifically, FsNet comprises a selection layer that selects features and a reconstruction layer that stabilizes the training. Because a large number of parameters in the selection and reconstruction layers can easily result in overfitting under a limited number of samples, we use two tiny networks to predict the large, virtual weight matrices of the selection and reconstruction layers. Experimental results on several real-world, high-dimensional biological datasets demonstrate the efficacy of the proposed method. </p> </div> <div style="height:30px;"></div> </div> </div>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <img class="img-thumbnail mb-3" src="resrc/fall.png" alt="Fast local linear regression with anchor regularization"> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>FALL: Fast local linear regression with anchor regularization</strong><br> <u>Mathis Petrovich</u>, Makoto Yamada <br> arXiv 2020 <br> <a target="_blank" href="https://arxiv.org/abs/2003.05747"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex10">BibTex</button> <div id="bibtex10" class="collapse"> <pre><tt>@inproceedings{petrovich2020fall, title = {Fast local linear regression with anchor regularization}, author = {Mathis Petrovich and Makoto Yamada}, booktitle = {arXiv preprint}, year = {2020} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract10">Abstract</button> <div id="abstract10" class="collapse"> <p class="bg-light"> Regression is an important task in machine learning and data mining. It has several applications in various domains, including finance, biomedical, and computer vision. Recently, network Lasso, which estimates local models by making clusters using the network information, was proposed and its superior performance was demonstrated. In this study, we propose a simple yet effective local model training algorithm called the fast anchor regularized local linear method (FALL). More specifically, we train a local model for each sample by regularizing it with precomputed anchor models. The key advantage of the proposed algorithm is that we can obtain a closed-form solution with only matrix multiplication; additionally, the proposed algorithm is easily interpretable, fast to compute and parallelizable. Through experiments on synthetic and real-world datasets, we demonstrate that FALL compares favorably in terms of accuracy with the state-of-the-art network Lasso algorithm with significantly smaller training time (two orders of magnitude). </p> </div> <div style="height:30px;"></div> </div> </div>  <div class="row"> <div class="col-xs-10 col-sm-4 col-md-4"> <img class="img-thumbnail mb-3" src="resrc/semanticTMO.png" alt="Tone Mapping Operators: Progressing Towards Semantic-Awareness"> </div> <div class="col-xs-12 col-sm-8 col-md-8" style="font-size: 120%;"> <strong>Tone Mapping Operators: Progressing Towards Semantic-Awareness</strong><br> Abhishek Goswami, <u>Mathis Petrovich</u>, Wolf Hauser, Frederic Dufaux<br> ICMEW 2020 <br>  <a target="_blank" href="https://hal.inria.fr/hal-02543939"> <button type="button" class="btn btn-primary btn-sm"> Paper </button></a> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#bibtex09">BibTex</button> <div id="bibtex09" class="collapse"> <pre><tt>@inproceedings{abhishek2020tonemapping, title = {Tone Mapping Operators: Progressing Towards Semantic-Awareness}, author = {Abhishek Goswami, Mathis Petrovich, Wolf Hauser and Frederic Dufaux}, booktitle = {{International Conference on Multimedia & Expo Workshops (ICMEW 2020)}}, year = {2020} }</tt></pre> </div> <button type="button" class="btn btn-primary btn-sm" data-bs-toggle="collapse" data-bs-target="#abstract09">Abstract</button> <div id="abstract09" class="collapse"> <p class="bg-light"> A Tone Mapping Operator (TMO) aims at reproducing the visual perception of a scene with a high dynamic range (HDR) on low dynamic range (LDR) media. TMOs have primarily aimed to preserve global perception by employing a model of human visual system (HVS), analysing perceptual attributes of each pixel and adjusting exposure at the pixel level. Preserving semantic perception, also an essential step for HDR rendering, has never been in explicit focus. We argue that explicitly introducing semantic information to create a 'content and semantic'-aware TMO has the potential to further improve existing approaches. In this paper, we therefore propose a new local tone mapping approach by introducing semantic information using off-the-shelf semantic segmenta-tion tools into a novel tone mapping pipeline. More specifically , we adjust pixel values to a semantic specific target to reproduce the real-world semantic perception. </p> </div> <div style="height:30px;"></div> </div> </div> </div> <div class="row"> <div class="col-md-12"> <h3> Teaching </h3> <ul style="padding-left:0; list-style-type:none;font-size: 120%;"> <li style="margin:1em 0"><b>Object recognition and computer vision (RecVis MVA)</b> (2021 - 2024) <ul style="list-style-type:circle;"> <li> Supervision of master students in their project and grading </li> </ul> </li> <li style="margin:1em 0"><b>Supervision of students of the ENPC engineering school for a research project</b> (2023) </li> <li style="margin:1em 0"><b>C++ teaching at ENPC (French)</b> (2020 - 2021) <ul style="list-style-type:circle;"> <li> <a href="https://imagine.enpc.fr/~monasse/Info/">Course website</a> </li> <li> <a href="https://mathux.github.io/cours-cpp/"> Slides </a></li> </ul> </li> </ul> </div> </div> <div class="row"> <div class="col-md-12"> <h3>Miscellaneous</h3> <ul style="padding-left:0; list-style-type:none;font-size: 120%;"> <li style="margin:1em 0"><b>Reviewer</b> <ul style="list-style-type:circle;"> <li> <a href="https://eg2024.cyens.org.cy/">Eurographics</a> 2024 </li> <li> <a href="https://3dvconf.github.io/2024/">International Conference on 3D Vision (3DV)</a> 2024 </li> <li> <a href="https://iccv2023.thecvf.com/">International Conference on Computer Vision (ICCV)</a> 2023 </li> <li> <a href="https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=34/">Transactions on Pattern Analysis and Machine Intelligence (TPAMI)</a> 2023 </li> <li> <a href="https://www.springer.com/journal/11263/">International Journal of Computer Vision (IJCV)</a> 2023 </li> <li> <a href="https://s2023.siggraph.org">SIGGRAPH</a> 2023 </li> <li> <a href="https://cvpr2023.thecvf.com">Computer Vision and Pattern Recognition (CVPR)</a> 2023 (Outstanding Reviewer), 2024 </li> <li> <a href="https://www.springer.com/journal/11263/">International Journal of Computer Vision (IJCV)</a> 2022 </li> <li> <a href="https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=34/">Transactions on Pattern Analysis and Machine Intelligence (TPAMI)</a> 2022 </li> <li> <a href="https://eccv2022.ecva.net/">European Conference on Computer Vision (ECCV)</a> 2022 </li> <li> <a href="https://www.journals.elsevier.com/computers-and-graphics">Computers & Graphics</a> 2021 </li> </ul> </li> <li style="margin:1em 0"><b>Rubik's cube</b> <ul style="list-style-type:circle;"> <li> One of the first 100 solvers of <a href="http://roice3.org/magictile/mathologer/">the Klein Bottle Rubik's Cube analogue</a> </li> <li> I am/was sub-15 (faster than 15 seconds on average) on the 3x3x3 </li>  </ul> </li> <li style="margin:1em 0"><b>Digital magic</b> <ul style="list-style-type:circle;"> <li> One of the co-founders (with <a href="https://mlaurent.ovh"> Mickael Laurent</a>) of <a href="https://mamimagics.com"> MamiMagics</a>, a digital magic brand that releases digital magic apps with Computer Vision! </li> <li> <a href="https://mamimagics.com/cube.html"> MamiCube</a> a set of tools to help magicians predict the future with a Rubik's cube! </li> <li> <a href="https://mamimagics.com/print.html"> MamiPrint</a> the automatic printing of magic! </li> </ul> </li> </ul> </div> </div> </div>  <div class="container"> <footer> <p align="right"><small>Copyright © Mathis Petrovich  /  Last update May 2023 <br> Inspired from the personal page of <a href="https://vobecant.github.io/">Antonín Vobecký</a> and <a href="https://imagine.enpc.fr/~varolg/"> Gül Varol </a> </small></p> </footer> <div style="height:10px;"></div> </div>  <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js"></script>  <script src="js/scripts.js"></script> </body> </html>

CINXE.COM

Mathis Petrovich