CINXE.COM
David Picard
<!DOCTYPE HTML> <html lang="en"> <head> <title>David Picard</title> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> <meta name="author" content="David Picard" /> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="stylesheet" type="text/css" href="/style.css" /> <link rel="canonical" href="https://davidpicard.github.io/"> <link href="https://fonts.googleapis.com/css?family=Lato:400,700,400italic,700italic" rel="stylesheet" type="text/css"> </head> <body> <table style="width:100%;max-width:960px;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"> <tr style="padding:0px"> <td style="padding:0px"> <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"> <tr style="padding:0px"> <td style="padding:2.5%;width:60%;vertical-align:middle"> <h1> David Picard </h1> <p style="text-align:center"> <a href="https://imagine-lab.enpc.fr/">IMAGINE</a>/<a href="">A3SI</a>/<a href="http://ligm.u-pem.fr/">LIGM</a>, <a href="http://www.enpc.fr/en">脡cole des Ponts ParisTech</a><br /> <a href="https://www.ins2i.cnrs.fr/en">CNRS</a>, <a href="https://www.univ-gustave-eiffel.fr/">Univ Gustave Eiffel</a><br /> 6-8, Av Blaise Pascal - Cit茅 Descartes <br /> Champs-sur-Marne <br /> 77455 Marne-la-Vall茅e cedex 2 </p> <p style="text-align:center"> <a target="_blank" href="https://mailhide.io/e/rKZBG"> Email</a> / <a href="https://github.com/davidpicard">GitHub</a> / <a href="https://scholar.google.com/citations?user=YtF6k9AAAAAJ">Google Scholar</a> / <a href="https://davidpicard.github.io/pdf/cv.pdf">Resume</a> / <a href="https://davidpicard.github.io/teaching">Teaching</a> </p> <p style="text-align:center"> <a href="https://imagine-lab.enpc.fr"><img style="width:25%;max-width:25%" src="https://imagine-lab.enpc.fr/wp-content/uploads/logo-ligm-150x150.png" /></a> <a href="http://www.enpc.fr"><img style="width:25%;max-width:25%" src="/images/ecole_ponts_RVB72_petit.jpg" /></a> </p> </td> <td style="padding:2.5%;width:40%;max-width:40%"> <img style="width:100%;max-width:100%" alt="profile photo" src="images/moi.jpg"> </td> </tr> </table> <table style="width:100%; border 0px; border-spacing:0px; border-collapse:separate;margin-right:auto;margin-left:auto;"> <tr style="padding:0px"> <td style="padding:2.5%;width=50%;max-width:50%;vertical-align:top"> <h2> Research </h2> <p> My research interests include machine learning for computer vision. I am interested in all sorts of things both on the more theoretical side (with prior works on Kernel methods and now deep learning) and applications (with prior works on human analysis and visual search). I am into image generative models (any architecture/loss) at the moment, but more for the unsupervised learning tools aspect. </p> </td> <td style="padding:2.5%;width:50%;max-width:50%;vertical-align:top"> <p style="text-align:center;margin:0px;margin-top:-10px"> <a class="twitter-timeline" data-width="400" data-height="300" data-chrome="nofooter noborders noscrollbar" data-tweet-limit="1" href="https://twitter.com/david_picard?ref_src=twsrc%5Etfw">Tweets by david_picard</a> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> </p> </td> </tr> </table> <h2> PhD Students </h2> <br/> <h3> Current </h3> <ul> <li>Simon Lepage, 2022-2025, Flexible representation learning, with J茅r茅mie Mary at Criteo</li> <li>Nicolas Dufour, 2021-2025, Image and video editing using generative models, with Vicky Kalogeiton at 脡cole Polytechnique</li> <li>Natacha Luka, 2019-2024, Cross-modal Representation Learning, with Romain Negrel at ESIEE</li> </ul> <h3> Alumni </h3> <ul> <li><a href="">Yue Zhu</a>, 2024, Interactive 3D estimation of human posture in the working environment using deep neural networks</li> <li><a href="https://scholar.google.com/citations?hl=fr&user=oKWj7yQAAAAJ">Gr茅goire Petit</a>, 2020-2023, Examplar Free Class Incremental Learning, with Adrian Popescu and Bertrand Delezoide at CEA</li> <li><a href="https://thibautissenhuth.github.io/">Thibaut Issenhuth</a>, 2023, Interactive Generative Models, with J茅r茅mie Mary at Criteo</li> <li><a href="https://scholar.google.fr/citations?hl=en&user=n_C2h-QAAAAJ">Victor Besnier</a>, 2022, Safety in Deep Learning based Computer Vision, with Alexandre Briot and Andrei Bursuc at Valeo</li> <li><a href="https://perso-etis.ensea.fr/paumard/">Marie-Morgane Paumard</a>, 2020, Solving jigsaw puzzles with Deep Learning (with H. Tabia)</li> <li><a href="https://scholar.google.fr/citations?user=jsMHC9YAAAAJ&hl=fr">Pierre Jacob</a>, 2020, High-order statistics for representation learning (with A. Histace and E. Klein)</li> <li><a href="https://scholar.google.fr/citations?user=VZ1Q5v4AAAAJ&hl=fr">Diogo Luvizon</a>, 2019, 2D/3D Pose Estimation and Action Recognition (with H. Tabia), now at Samsung</li> <li><a href="https://scholar.google.fr/citations?user=U0y5G7gAAAAJ&hl=fr">J茅r么me Fellus</a>, 2017, Machine learning using asynchronous gossip exchange (with P.H. Gosselin), now postdoc at Irisa</li> <li><a href="https://perso.esiee.fr/~negrelr/index.php?page=home">Romain Negrel</a>, 2014, Representation learning for image retrieval (with P.H. Gosselin), now associate prof at Esiee</li> </ul> <h2> Recent publications </h2> <p> Full list: <a href="https://scholar.google.fr/citations?user=YtF6k9AAAAAJ&hl=en">scholar</a> / <a href="https://dblp.uni-trier.de/pers/hd/p/Picard:David.html">dblp</a> / <a href="https://cv.archives-ouvertes.fr/david-picard">hal</a> </p> <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/cad.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Don鈥檛 drop your samples! Coherence-aware training benefits Conditional diffusion</h3> <br> Nicolas Dufour, Victor Besnier, Vicky Kalogeiton, David Picard <br> <em>CVPR</em>, 2024 <br> <a href="https://arxiv.org/abs/2405.20324">arxiv</a> / <a href="https://github.com/nicolas-dufour/CAD">code</a> / <p></p> <p>We propose the Coherence-Aware Diffusion (CAD), a novel method that integrates coherence in conditional information into diffusion models, allowing them to learn from noisy annotations without discarding data. We assume that each data point has an associated coherence score that reflects the quality of the conditional information. We then condition the diffusion model on both the conditional information and the coherence score. In this way, the model learns to ignore or discount the conditioning when the coherence is low.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/efcil.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>An Analysis of Initial Training Strategies for Exemplar-Free Class-Incremental Learning</h3> <br> Gr茅goire Petit, Micha毛l Soumm, Eva Feillet, Adrian Popescu, Bertrand Delezoide, David Picard, C茅line Hudelot <br> <em>WACV</em>, 2024 <br> <a href="https://arxiv.org/abs/2308.11677">arxiv</a> / <p></p> <p>Class-Incremental Learning (CIL) aims to build classification models from data streams. At each step of the CIL process, new classes must be integrated into the model. Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored, the case on which we focus here. To date, most approaches are based exclusively on the target dataset of the CIL process. However, the use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum. The initial model of the CIL process may only use the first batch of the target dataset, or also use pre-trained weights obtained on an auxiliary dataset. The choice between these two initial learning strategies can significantly influence the performance of the incremental learning model, but has not yet been studied in depth. Performance is also influenced by the choice of the CIL algorithm, the neural architecture, the nature of the target task, the distribution of classes in the stream and the number of examples available for learning. We conduct a comprehensive experimental study to assess the roles of these factors.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/h3wb.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>H3WB: Human3.6M 3D WholeBody Dataset and Benchmark</h3> <br> Yue Zhu, Nermin Samet, David Picard <br> <em>ICCV</em>, 2023 <br> <a href="https://arxiv.org/abs/2211.15692">arxiv</a> / <a href="https://github.com/wholebody3d/wholebody3d">code</a> / <p></p> <p>3D human whole-body pose estimation aims to localize precise 3D keypoints on the entire human body, including the face, hands, body, and feet. We introduce Human3.6M 3D WholeBody (H3WB) which provides whole-body annotations for the Human3.6M dataset using the COCO Wholebody layout. H3WB is a large scale dataset with 133 whole-body keypoint annotations on 100K images, made possible by our new multi-view pipeline. Along with H3WB, we propose 3 tasks: i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, iii) 3D whole-body pose estimation from a single RGB image. We also report several baselines from popular methods for these tasks.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/simplicial.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Unveiling the Latent Space Geometry of Push-Forward Generative Models</h3> <br> Thibaut Issenhuth, Ugo Tanielian, Jeremie Mary, David Picard <br> <em>ICML</em>, 2023 <br> <a href="https://arxiv.org/abs/2207.10541">arxiv</a> / <p></p> <p>Many deep generative models are defined as a push-forward of a Gaussian measure by a continuous generator, such as Generative Adversarial Networks (GANs) or Variational Auto-Encoders (VAEs). This work explores the latent space of such deep generative models. A key issue with these models is their tendency to output samples outside of the support of the target distribution when learning disconnected distributions. We investigate the relationship between the performance of these models and the geometry of their latent space. Building on recent developments in geometric measure theory, we prove a sufficient condition for optimality in the case where the dimension of the latent space is larger than the number of modes. Through experiments on GANs, we demonstrate the validity of our theoretical results and gain new insights into the latent space geometry of these models. Additionally, we propose a truncation method that enforces a simplicial cluster structure in the latent space and improves the performance of GANs.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/sppnet.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>SSP-Net: Scalable sequential pyramid networks for real-Time 3D human pose regression</h3> <br> Diogo C Luvizon, Hedi Tabia, David Picard <br> <em>Pattern Recognition</em>, 2023 <br> <a href="https://arxiv.org/abs/2009.01998">arxiv</a> / <p></p> <p>In this paper we propose a highly scalable convolutional neural network, end-to-end trainable, for real-time 3D human pose regression from still RGB images. We call this approach the Scalable Sequential Pyramid Networks (SSP-Net) as it is trained with refined supervision at multiple scales in a sequential manner. Our network requires a single training procedure and is capable of producing its best predictions at 120 frames per second (FPS), or acceptable predictions at more than 200 FPS when cut at test time. We show that the proposed regression approach is invariant to the size of feature maps, allowing our method to perform multi-resolution intermediate supervisions and reaching results comparable to the state-of-the-art with very low resolution feature maps.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/edibert.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>EdiBERT, a generative model for image editing</h3> <br> Thibaut Issenhuth, Ugo Tanielian, J茅r茅mie Mary, David Picard <br> <em>TMLR</em>, 2023 <br> <a href="https://arxiv.org/abs/2111.15264">arxiv</a> / <a href="https://github.com/EdiBERT4ImageManipulation/EdiBERT">code</a> / <p></p> <p>In this paper, we aim at making a step towards a unified approach for image editing. To do so, we propose EdiBERT, a bi-directional transformer trained in the discrete latent space built by a vector-quantized auto-encoder. We argue that such a bidirectional model is suited for image manipulation since any patch can be re-sampled conditionally to the whole image. Using this unique and straightforward training objective,</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/decanus.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Decanus to Legatus: Synthetic training for 2D-3D human pose lifting</h3> <br> Yue Zhu, David Picard <br> <em>ACCV 2022</em>, 2022 <br> <a href="https://arxiv.org/abs/2210.02231">arxiv</a> / <a href="https://github.com/Zhuyue0324/Decanus-to-Legatus">code</a> / <p></p> <p>We propose an algorithm to generate infinite 3D synthetic human poses (Legatus) from a 3D pose distribution based on 10 initial handcrafted 3D poses (Decanus) during the training of a 2D to 3D human pose lifter neural network. Our results show that we can achieve 3D pose estimation performance comparable to methods using real data from specialized datasets but in a zero-shot setup, showing the generalization potential of our framework.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/scam.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>SCAM! Transferring Humans Between Images with Semantic Cross Attention Modulation</h3> <br> Nicolas Dufour, David Picard, Vicky Kalogeiton <br> <em>ECCV 2022</em>, 2022 <br> <a href="https://arxiv.org/abs/2210.04883">arxiv</a> / <a href="https://github.com/nicolas-dufour/SCAM">code</a> / <p></p> <p>We introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/3dconsensus.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Consensus-based optimization for 3D human pose estimation in camera coordinates </h3> <br> Diogo C Luvizon, David Picard, Hedi Tabia <br> <em>International Journal of Computer Vision</em>, 2022 <br> <a href="https://arxiv.org/abs/1911.09245">arxiv</a> / <a href="https://github.com/dluvizon/3d-pose-consensus">code</a> / <a href="https://doi.org/10.1007/s11263-021-01570-9">doi</a> / <p></p> <p>3D human pose estimation is frequently seen as the task of estimating 3D poses relative to the root body joint. Alternatively, we propose a 3D human pose estimation method in camera coordinates, which allows effective combination of 2D annotated data and 3D poses and a straightforward multi-view generalization. To that end, we cast the problem as a view frustum space pose estimation, where absolute depth prediction and joint relative depth estimations are disentangled. Final 3D predictions are obtained in camera coordinates by the inverse camera projection. Based on this, we also present a consensus-based optimization algorithm for multi-view predictions from uncalibrated images, which requires a single monocular training procedure.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/latentrs.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Latent reweighting, an almost free improvement for GANs</h3> <br> Thibaut Issenhuth, Ugo Tanielian, David Picard, J茅r茅mie Mary <br> <em>IEEE/CVF Winter Conference on Applications of Computer Vision</em>, 2022 <br> <a href="https://arxiv.org/abs/2110.09803">arxiv</a> / <p></p> <p>Standard formulations of GANs, where a continuous function deforms a connected latent space, have been shown to be misspecified when fitting different classes of images. In particular, the generator will necessarily sample some low-quality images in between the classes. Rather than modifying the architecture, a line of works aims at improving the sampling quality from pre-trained generators at the expense of increased computational cost. Building on this, we introduce an additional network to predict latent importance weights and two associated sampling methods to avoid the poorest samples.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/obsnet.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Triggering Failures: Out-of-Distribution Detection by Learning From Local Adversarial Attacks in Semantic Segmentation </h3> <br> Victor Besnier, Andrei Bursuc, David Picard, Alexandre Briot <br> <em>International Conference on Computer Vision</em>, 2021 <br> <a href="http://arxiv.org/abs/2108.01634">arxiv</a> / <a href="https://github.com/valeoai/obsnet">code</a> / <p></p> <p>In this paper, we tackle the detection of out-of-distribution (OOD) objects in semantic segmentation. By analyzing the literature, we found that current methods are either accurate or fast but not both which limits their usability in real world applications. To get the best of both aspects, we propose to mitigate the common shortcomings by following four design principles: decoupling the OOD detection from the segmentation task, observing the entire segmentation network instead of just its output, generating training data for the OOD detector by leveraging blind spots in the segmentation network and focusing the generated data on localized regions in the image to simulate OOD objects.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/icip21.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Learning Uncertainty for Safety-Oriented Semantic Segmentation in Autonomous Driving</h3> <br> Victor Besnier, David Picard, Alexandre Briot <br> <em>International Conference on Image Processing</em>, 2021 <br> <a href="https://arxiv.org/abs/2105.13688">arxiv</a> / <p></p> <p>In this paper, we show how uncertainty estimation can be leveraged to enable safety critical image segmentation in autonomous driving, by triggering a fallback behavior if a target accuracy cannot be guaranteed. We introduce a new uncertainty measure based on disagreeing predictions as measured by a dissimilarity function. We propose to estimate this dissimilarity by training a deep neural architecture in parallel to the task-specific network. It allows this observer to be dedicated to the uncertainty estimation, and let the task-specific network make predictions. We propose to use self-supervision to train the observer, which implies that our method does not require additional training data.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/prl_jacob.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>DIABLO: Dictionary-based attention block for deep metric learning</h3> <br> Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein <br> <em>Pattern Recognition Letters</em>, 2020 <br> <a href="http://arxiv.org/abs/2004.14644">arxiv</a> / <a href="https://doi.org/10.1016/j.patrec.2020.03.020">doi</a> / <p></p> <p>In this paper, we propose DIABLO, a dictionary-based attention method for image embedding. DIABLO produces richer representations by aggregating only visually-related features together while being easier to train than other attention-based methods in deep metric learning. This is experimentally confirmed on four deep metric learning datasets (Cub-200-2011, Cars-196, Stanford Online Products, and In-Shop Clothes Retrieval) for which DIABLO shows state-of-the-art performances.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/tpami_luvizon.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition</h3> <br> Diogo Luvizon, David Picard, Hedi Tabia <br> <em>IEEE Transactions on Pattern Analysis and Machine Intelligence</em>, 2020 <br> <a href="https://arxiv.org/abs/1912.08077">arxiv</a> / <a href="https://doi.org/10.1109/TPAMI.2020.2976014">doi</a> / <p></p> <p>In this work, we propose a multi-task framework for jointly estimating 2D or 3D human poses from monocular color images and classifying human actions from video sequences. We show that a single architecture can be used to solve both problems in an efficient way and still achieves state-of-the-art or comparable results at each task while running with a throughput of more than 100 frames per second. The proposed method benefits from high parameters sharing between the two tasks by unifying still images and video clips processing in a single pipeline, allowing the model to be trained with data from different categories simultaneously and in a seamlessly way.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/deepzzle.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Deepzzle: Solving Visual Jigsaw Puzzles with Deep Learning and Shortest Path Optimization</h3> <br> Marie-Morgane Paumard, David Picard, Hedi Tabia <br> <em>IEEE Transactions on Image Processing</em>, 2020 <br> <a href="https://doi.org/10.1109/TIP.2019.2963378">doi</a> / <p></p> <p>We tackle the image reassembly problem with wide space between the fragments, in such a way that the patterns and colors continuity is mostly unusable. The spacing emulates the erosion of which the archaeological fragments suffer. We use a two-step method to obtain the reassemblies: 1) a neural network predicts the positions of the fragments despite the gaps between them; 2) a graph that leads to the best reassemblies is made from these predictions.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/cag.png" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Human pose regression by combining indirect part detection and contextual information</h3> <br> Diogo Luvizon, David Picard, Hedi Tabia <br> <em>Computers and Graphics</em>, 2019 <br> <a href="https://arxiv.org/abs/1710.02322">arxiv</a> / <a href="https://github.com/dluvizon/pose-regression">code</a> / <a href="https://doi.org/10.1016/j.cag.2019.09.002">doi</a> / <p></p> <p>In this paper, we tackle the problem of human pose estimation from still images, which is a very active topic, specially due to its several applications, from image annotation to human-machine interface. We use the soft-argmax function to convert feature maps directly to body joint coordinates, resulting in a fully differentiable framework. Our method is able to learn heat maps representations indirectly, without additional steps of artificial ground truth generation.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/horde.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings</h3> <br> Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein <br> <em>International Conference on Computer Vision</em>, 2019 <br> <a href="https://arxiv.org/abs/1908.02735">arxiv</a> / <a href="https://github.com/pierre-jacob/ICCV2019-Horde">code</a> / <p></p> <p>In this paper, we tackle this scattering problem with a distribution-aware regularization named HORDE. This regularizer enforces visually-close images to have deep features with the same distribution which are well localized in the feature space. We provide a theoretical analysis supporting this regularization effect. We also show the effectiveness of our approach by obtaining state-of-the-art results on 4 well-known datasets (Cub-200-2011, Cars-196, Stanford Online Products and Inshop Clothes Retrieval).</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/jcf.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Efficient Codebook and Factorization for Second Order Representation Learning</h3> <br> Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein <br> <em>International Conference on Image Processing</em>, 2019 <br> <a href="https://arxiv.org/abs/1906.01972">arxiv</a> / <p></p> <p>To build richer representations, high order statistics have been exploited and have shown excellent performances, but they produce higher dimensional features. While this drawback has been partially addressed with factorization schemes, the original compactness of first order models has never been retrieved, or at the cost of a strong performance decrease. Our method, by jointly integrating codebook strategy to factorization scheme, is able to produce compact representations while keeping the second order performances with few additional parameters.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/gosgd.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Distributed optimization for deep learning with gossip exchange</h3> <br> Michael Blot, David Picard, Nicolas Thome, Matthieu Cord <br> <em>Neurocomputing</em>, 2019 <br> <a href="https://arxiv.org/abs/1804.01852">arxiv</a> / <a href="https://doi.org/10.1016/j.neucom.2018.11.002">doi</a> / <p></p> <p>We address the issue of speeding up the training of convolutional neural networks by studying a distributed method adapted to stochastic gradient descent. Our parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way of sharing information between different threads based on gossip algorithms that show good consensus convergence properties. Our method called GoSGD has the advantage to be fully asynchronous and decentralized.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/ista.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Leveraging Implicit Spatial Information in Global Features for Image Retrieval</h3> <br> Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein <br> <em>International Conference on Image Processing</em>, 2018 <br> <a href="https://arxiv.org/abs/1806.08991">arxiv</a> / <a href="https://doi.org/10.1109/ICIP.2018.8451817">doi</a> / <p></p> <p>Most image retrieval methods use global features that aggregate local distinctive patterns into a single representation. However, the aggregation process destroys the relative spatial information by considering orderless sets of local descriptors. We propose to integrate relative spatial information into the aggregation process by taking into account co-occurrences of local patterns in a tensor framework.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/icip_puz.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Jigsaw Puzzle Solving Using Local Feature Co-Occurrences in Deep Neural Networks</h3> <br> Marie-Morgane Paumard, David Picard, Hedi Tabia <br> <em>International Conference on Image Processing</em>, 2018 <br> <a href="https://arxiv.org/abs/1807.03155">arxiv</a> / <a href="https://doi.org/10.1109/ICIP.2018.8451094">doi</a> / <p></p> <p>Archaeologists are in dire need of automated object reconstruction methods. Fragments reassembly is close to puzzle problems, which may be solved by computer vision algorithms. As they are often beaten on most image related tasks by deep learning algorithms, we study a classification method that can solve jigsaw puzzles. In this paper, we focus on classifying the relative position: given a couple of fragments, we compute their local relation (e.g. on top). We propose several enhancements over the state of the art in this domain, which is outperformed by our method by 25%.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/eccv_puz.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Image Reassembly Combining Deep Learning and Shortest Path Problem</h3> <br> Marie-Morgane PaumardDavid Picard, Hedi Tabia <br> <em>European Conference on Computer Vision</em>, 2018 <br> <a href="https://arxiv.org/abs/1809.00898">arxiv</a> / <a href="https://doi.org/10.1007/978-3-030-01231-1_10">doi</a> / <p></p> <p>This paper addresses the problem of reassembling images from disjointed fragments. More specifically, given an unordered set of fragments, we aim at reassembling one or several possibly incomplete images. The main contributions of this work are: (1) several deep neural architectures to predict the relative position of image fragments that outperform the previous state of the art; (2) casting the reassembly problem into the shortest path in a graph problem for which we provide several construction algorithms depending on available information; (3) a new dataset of images taken from the Metropolitan Museum of Art (MET) dedicated to image reassembly for which we provide a clear setup and a strong baseline.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/sigir18.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings</h3> <br> Micael Carvalho, R茅mi Cad猫ne, David Picard, Laure Soulier, Nicolas Thome, Matthieu Cord <br> <em>ACM SIGIR Conference on Research and Development in Information Retrieval</em>, 2018 <br> <a href="https://arxiv.org/abs/1804.11146">arxiv</a> / <a href="https://doi.org/10.1145/3209978.3210036">doi</a> / <p></p> <p>Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them. In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space. We describe an effective learning scheme, capable of tackling large-scale problems, and validate it on the Recipe1M dataset containing nearly 1 million picture-recipe pairs. We show the effectiveness of our approach regarding previous state-of-the-art models and present qualitative results over computational cooking use cases.</p> </td> </tr> <tr> <td style="padding:2.5%;width:35%;vertical-align:middle;min-width:120px"> <img src="/images/cvpr18.jpg" alt="project image" style="width:auto; height:auto; max-width:100%;" /> </td> <td style="padding:2.5%;width:65%;vertical-align:middle"> <h3>2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning</h3> <br> Diogo Luvizon, David Picard, Hedi Tabia <br> <em>IEEE/CVF Conference on Computer Vision and Pattern Recognition</em>, 2018 <br> <a href="https://arxiv.org/abs/1802.09232">arxiv</a> / <a href="https://github.com/dluvizon/deephar">code</a> / <a href="https://doi.org/10.1109/CVPR.2018.00539">doi</a> / <p></p> <p>Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature. In this work, we propose a multitask framework for jointly 2D and 3D pose estimation from still images and human action recognition from video sequences. We show that a single architecture can be used to solve the two problems in an efficient way and still achieves state-of-the-art results. Additionally, we demonstrate that optimization from end-to-end leads to significantly higher accuracy than separated learning. The proposed architecture can be trained with data from different categories simultaneously in a seamlessly way.</p> </td> </tr> </table> <br> <br> <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"> <tr> <td style="padding:0px"> <br> <p style="text-align:center;font-size:small;"> Design and source code from <a style="font-size:small;" href="https://jonbarron.info">Jon Barron's website</a> </p> </td> </tr> </table> </td> </tr> </table> </body> </html>