ChangeIt3D

<!DOCTYPE html> <html> <head lang="en"> <title>ChangeIt3D</title> <meta charset="UTF-8"> <meta http-equiv="x-ua-compatible" content="ie=edge"> <meta name="description" content="ChangeIt3D, ShapeTalk, language-assisted-deformations, ShapeNet, PartNet, ModelNet, neural-listener"> <meta name="viewport" content="width=device-width, initial-scale=1">  <meta property="og:title" content="ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations"/> <meta property="og:url" content="https://https://changeit3d.github.io"/> <meta property="og:description" content="ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations"/> <meta property="og:site_name" content="ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations"/> <meta property="og:image" content="https://changeit3d.github.io/img/shapetalk_favicon.png"/>  <meta name="twitter:card" content="https://changeit3d.github.io/img/shapetalk_favicon.png"/> <meta name="twitter:title" content="ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations"/> <meta name="twitter:image" content="https://changeit3d.github.io/img/shapetalk_favicon.png"> <meta name="twitter:url" content="https://https://changeit3d.github.io"/> <meta name="twitter:description" content="ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations"/> <link rel="icon" type="image/png" href="img/changeit3d_favicon.png"> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css"> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css"> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.css"> <link rel="stylesheet" href="css/app.css"> <link rel="stylesheet" href="css/bootstrap.min.css"> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/1.5.3/clipboard.min.js"></script> <script src="js/app.js"></script> </head> <body> <div class="container" id="main"> <div class="row"> <h2 class="col-md-12 text-center" style="padding-bottom:20px"> <span style="font-size:35pt"><b><i>ShapeTalk:</i><br>A Language Dataset <br>and Framework for 3D Shape Edits and Deformations</b></span> </h2> </div> <div class="row" id="authors"> <div class="col-md-12 text-center"> <ul class="list-inline" style="font-size:17pt;"> <li> <a href="https://optas.github.io/"> Panos Achlioptas </a> <sup>1,3</sup> </li> <li> <a href="https://ianhuang0630.github.io"> Ian Huang </a> <sup>3</sup> </li> <li> <a href="https://mhsung.github.io/"> Minhyuk Sung </a> <sup>2</sup> </li> <br> <li> <a href="http://www.stulyakov.com/"> Sergey Tulyakov </a> <sup>1</sup> </li> <li> <a href="https://geometry.stanford.edu/member/guibas"> Leonidas Guibas </a> <sup>3</sup> </li> </ul> <div style="font-size: 14pt;"> Snap Inc.<sup>1</sup>     KAIST<sup>2</sup>     Stanford University<sup>3</sup> </div> </div> </div> <div class="row" style="padding-top:45px"> <div class="col-md-8 col-md-offset-2 text-center"> <ul class="nav nav-pills nav-justified"> <li> <a href="https://openaccess.thecvf.com/content/CVPR2023/papers/Achlioptas_ShapeTalk_A_Language_Dataset_and_Framework_for_3D_Shape_Edits_CVPR_2023_paper.pdf" class="imageLink"> <img src="img/arxiv_paper.png" width="70" height="80"> <h4><strong>[Paper]</strong></h4> </a> </li> <li> <a href="#dataset"> <img src="img/shapetalk_favicon.png" width="140" height="80"> <h4><strong>[Dataset] (arrived!) </strong></h4> </a> </li> <li> <a href="https://github.com/optas/changeit3d"> <img src="img/github.png"height="80"> <h4><strong>[Code]<br>(arrived!)</strong></h4> </a> </li> <li> <a href="https://changeit3d.github.io/materials/changeIt3D_supplemental_material.pdf" class="imageLink"> <img src="img/supplemental_image.png" height="80"> <h4><strong>[Supplemental Material]</strong></h4> </a> </li> </ul> </div> </div> <div class="row" style="padding-bottom:30px"> <div class="col-md-12 col-md-offset-2"> <h3> <b>News</b> </h3> <ul> <li>[August 22, 2023] The <a href="https://github.com/optas/changeit3d">codebase</a> and the pretrained models are publicly released.</li> <li>[March 5, 2023] A version of this work was accepted in <a href="https://openaccess.thecvf.com/content/CVPR2023/papers/Achlioptas_ShapeTalk_A_Language_Dataset_and_Framework_for_3D_Shape_Edits_CVPR_2023_paper.pdf">CVPR-2023</a>.</li> </ul> </div> </div> <div class="row" style="padding-bottom:30px"> <div class="col-md-8 col-md-offset-2"> <h3> <b>Abstract</b> </h3> <p class="text-justify"> Editing 3D geometry is a challenging task requiring specialized skills. In this work, we aim to facilitate the task of editing the geometry of 3D models through the use of natural language. For example, we may want to modify a 3D chair model to “make its legs thinner” or to “open a hole in its back”. To tackle this problem in a manner that promotes <i>open-ended</i> language use and enables <i>fine-grained</i> shape edits, we introduce the most extensive existing corpus of natural language utterances describing shape differences: <b>ShapeTalk</b>. ShapeTalk contains over half a million discriminative utterances produced by con- trasting the shapes of common 3D objects for a variety of object classes and degrees of similarity. We also introduce a generic framework, <b>ChangeIt3D</b>, which builds on ShapeTalk and can use an <i>arbitrary</i> 3D generative model of shapes to produce edits that align the output better with the edit or deformation description. Finally, we introduce <b>metrics</b> for the <i>quantitative evaluation</i> of language-assisted shape editing methods that reflect key desiderata within this editing setup. We note, that our modules are trained and deployed directly in a latent space of 3D shapes, bypassing the ambiguities of <span>“</span>lifting<span>”</span> 2D to 3D when using extant foundation models and thus opening a new avenue for 3D object-centric manipulation through language. </p> </div> </div> <div class="row" id="dataset" style="padding-bottom:30px"> <div class="col-md-8 col-md-offset-2" > <h3 style="margin-bottom:30px"> <b>The ShapeTalk Dataset</b> </h3>  <figure style="padding-bottom:80px"> <img src="img/teaser_v3.jpeg" style="padding-bottom:15px" class="img-responsive"> <figcaption style="background-color:aliceblue;padding:10px"> <b>ShapeTalk covers <u>30</u> common object classes, with <u>536K</u> contrastive utterances</b>. Samples of those utterances are shown above. In each sub-box <i>shape differences</i> between a target and a distractor object of the same class are enumerated by an annotator (by decreasing order of importance in the annotator's judgement). Interestingly, <i>both continuous and discrete</i> geometric features that shapes share <b>across</b> categories emerge in the language of ShapeTalk; e.g., humans describe the <font color="green">“thinness”</font> of a chair leg or of a vase lip (top row), or the presence of an (<font color="red">“arm”</font>) that a lamp or a clock might have (bottom row). </figcaption> </figure>  <figure style="padding-bottom:80px;"> <img src="img/shapetalk_analysis.png" style="padding-bottom:15px;" class="img-responsive"> <figcaption style="background-color:aliceblue;padding:10px"> <b>Key characteristics of ShapeTalk.</b> ShapeTalk's corpus explains the shapes for a large variety of common 3D objects in a rich and (by construction) discriminative manner. <i>Shape parts</i>, <i>geometric attributes</i> and <i>dimensional specifications</i>, are some of the main typical properties that annotators include in their references. See prototypical words for these properties (right, top). Interestingly, when the compared objects have on average a higher degree of shape similarity ("all hard class"), part-based and local reference is more frequent compared to when contrasting less similar ("all easy class") shapes. </figcaption> </figure> <h4> <b>Browse</b> </h4> <p class="text-justify">You can browse the ShapeTalk annotations <a href="http://5.78.48.181:8502/">here</a>.</p> <a href="http://5.78.48.181:8502/"> <img src="img/shapetalk_streamlit_teaser_old.png" class="img-responsive"> </a> <br> <h4> <b>License & Download</b> </h4> <ul> <li>The ShapeTalk dataset is released under the <a href="materials/shapetalk_terms_of_use.txt">ShapeTalk Terms of Use</a>.</li> <li>To download the ShapeTalk dataset please first fill out this <a href="https://forms.gle/gqLEuBcgFGcQhikk9">form</a>, accepting the <a href="materials/shapetalk_terms_of_use.txt">Terms of Use</a>. </li> </ul> </div> <div class="row" id="changeIt3DNet" style="padding-bottom:30px"> <div class="col-md-8 col-md-offset-2" > <h3 style="margin-bottom:30px"> <b>ChangeIt3D Architecture</b> </h3> <figure style="padding-bottom:80px"> <img src="img/changeIt3DNet.png" style="padding-bottom:15px" class="img-responsive"> <figcaption style="background-color:aliceblue;padding:10px"> <b>Overview of ChangeIt3DNet, our modular framework for ChangeIt3D task.</b> In Stage 1, we pretrain a shape autoencoder for shapes (using traditional reconstruction losses), freeze the encoder and use the encoded latents of the target and distractor to pretrain a neural listener (using classification losses). In Stage 2, we use the pretrained autoencoder and neural listener to train a shape editor module to edit shapes within the encoded latent space in a way that is both consistent with the language instruction and also minimal. All modules with locks indicate frozen weights. </figcaption> </figure> </div> </div> <div class="row" id="qualitative_results" style="padding-bottom:30px"> <div class="col-md-8 col-md-offset-2" > <h3 style="margin-bottom:30px"> <b>Qualitative Results</b> </h3> <figure style="padding-bottom:80px"> <img src="img/qualitative_teaser.jpeg" style="padding-bottom:15px" class="img-responsive"> <figcaption style="background-color:aliceblue;padding:10px"> <b>Qualitative edits produced by ChangeIt3DNet.</b> The results are derived by using an ImNet-based AE operating with implicit shape field. The achieved edits are oftentimes local, e.g., thinner legs, fine-grained, as in slatted back, or entail high-level and complex shape understanding, e.g. it appears more sturdy. Remarkably, these edits are derived by ChangeIt3DNet which does not utilize any form an explicit geometric local prior of shapes (part-like, or otherwise); but instead learns solely from the implicit bias of training with referential language.  </figcaption> </figure>  </div> </div> <div class="row" id="citation" style="padding-bottom:30px"> <div class="col-md-8 col-md-offset-2"> <h3> <b>Citations</b> </h3> <p class="text-justify">If you find our work useful in your research, please consider citing:</p> <pre class="w1-panel w3-leftbar w3-light-grey"> @inproceedings{achlioptas2023shapetalk, title={{ShapeTalk}: A Language Dataset and Framework for 3D Shape Edits and Deformations}, author={Achlioptas, Panos and Huang, Ian and Sung, Minhyuk and Tulyakov, Sergey and Guibas, Leonidas}, booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2023}}</pre> <p class="text-justify">Also, if you you use the ShapeTalk dataset please also consider citing our previous paper/data: <a href="https://ai.stanford.edu/~optas/shapeglot/"> ShapeGlot</a>, which was critical in building and analyzing ShapeTalk:</p> <pre class="w1-panel w3-leftbar w3-light-grey"> @inproceedings{achlioptas2019shapeglot, title={{ShapeGlot}: Learning Language for Shape Differentiation}, author={Achlioptas, Panos and Fan, Judy and Hawkins, Robert and Goodman, Noah and Guibas, Leonidas}, booktitle = {International Conference on Computer Vision (ICCV)}, year={2019}}</pre> </div> </div> <div class="row" style="padding-bottom:30px"> <div class="col-md-8 col-md-offset-2"> <h3> <b>Acknowledgements</b> </h3> <p class="text-justify"> This work is funded by a Vannevar Bush Faculty Fellowship, an ARL grant W911NF-21-2-0104, and a gift from Snap corporation. Panos Achlioptas wish to thank for their advices and help the following researchers: Iro Armeni (data collection), Nikos Gkanatsios (neural-listening), Ahmed Abdelreheem (rendering), Yan Zheng and Ruojin Cai (SGF deployment), Antonia Saravanou and Mingyi Lu (relevant discussions) and Menglei Chai (CLIP-NeRF). Last but not least, the authors want to emphasize their gratitude to all the hard-working Amazon Mechanical Turkers without whom this work would be impossible. </p> </div> </div> <div class="row" style="padding-top:40px"> <div class="col-md-6 col-md-offset-3"> <script type="text/javascript" id="clustrmaps" src="//cdn.clustrmaps.com/map_v2.js?cl=080808&w=a&t=n&d=kRCrtAEoOj6DtfE2RFVCwC0a9JZObL4pGdsfeJQdGHw&co=ffffff&cmo=3acc3a&cmn=ff5353&ct=808080"></script> </div> </div> </div> </body>  <style> .zoom { padding: 50px; /*background-color: green;*/ transition: transform .2s; /* Animation */ /*width: 200px;*/ /*height: 200px;*/ margin: 0 auto; } .zoom:hover { transform: scale(1.75); /* (150% zoom - Note: if the zoom is too large, it will go outside of the viewport) */ } </style> </html>

CINXE.COM

ChangeIt3D