CINXE.COM
Jingliang Hu
<!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><style> html, body, button { font-family: '__Lato_814572', '__Lato_Fallback_814572', Helvetica Neue, Helvetica, Arial, sans-serif !important; } </style><link rel="canonical" href="https://www.catalyzex.com/author/Jingliang%20Hu"/><style> html, body, button { font-family: '__Lato_814572', '__Lato_Fallback_814572', Helvetica Neue, Helvetica, Arial, sans-serif !important; } </style><meta name="viewport" content="width=device-width, initial-scale=1.0, shrink-to-fit=no"/><title>Jingliang Hu</title><meta name="description" content="View Jingliang Hu's papers and open-source code. See more researchers and engineers like Jingliang Hu."/><meta property="og:title" content="Jingliang Hu"/><meta property="og:description" content="View Jingliang Hu's papers and open-source code. See more researchers and engineers like Jingliang Hu."/><meta name="twitter:title" content="Jingliang Hu"/><meta name="twitter:description" content="View Jingliang Hu's papers and open-source code. See more researchers and engineers like Jingliang Hu."/><meta name="og:image" content="https://ui-avatars.com/api/?name=Jingliang Hu&background=BCBCBC&color=3E3E3F&size=256&bold=true&font-size=0.5&format=png"/><script type="application/ld+json">{"@context":"https://schema.org","@graph":[{"@type":"ProfilePage","mainEntity":{"@type":"Person","name":"Jingliang Hu","description":null},"hasPart":[{"@type":"ScholarlyArticle","url":"https://www.catalyzex.com/paper/causalclipseg-unlocking-clip-s-potential-in","name":"CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention","datePublished":"2025-03-20","author":[{"@type":"Person","name":"Yaxiong Chen","url":"https://www.catalyzex.com/author/Yaxiong Chen"},{"@type":"Person","name":"Minghong Wei","url":"https://www.catalyzex.com/author/Minghong Wei"},{"@type":"Person","name":"Zixuan Zheng","url":"https://www.catalyzex.com/author/Zixuan Zheng"},{"@type":"Person","name":"Jingliang Hu","url":"https://www.catalyzex.com/author/Jingliang Hu"},{"@type":"Person","name":"Yilei Shi","url":"https://www.catalyzex.com/author/Yilei Shi"},{"@type":"Person","name":"Shengwu Xiong","url":"https://www.catalyzex.com/author/Shengwu Xiong"},{"@type":"Person","name":"Xiao Xiang Zhu","url":"https://www.catalyzex.com/author/Xiao Xiang Zhu"},{"@type":"Person","name":"Lichao Mou","url":"https://www.catalyzex.com/author/Lichao Mou"}]},{"@type":"ScholarlyArticle","url":"https://www.catalyzex.com/paper/unicrossadapter-multimodal-adaptation-of-clip","name":"UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation","datePublished":"2025-03-20","author":[{"@type":"Person","name":"Yaxiong Chen","url":"https://www.catalyzex.com/author/Yaxiong Chen"},{"@type":"Person","name":"Chuang Du","url":"https://www.catalyzex.com/author/Chuang Du"},{"@type":"Person","name":"Chunlei Li","url":"https://www.catalyzex.com/author/Chunlei Li"},{"@type":"Person","name":"Jingliang Hu","url":"https://www.catalyzex.com/author/Jingliang Hu"},{"@type":"Person","name":"Yilei Shi","url":"https://www.catalyzex.com/author/Yilei Shi"},{"@type":"Person","name":"Shengwu Xiong","url":"https://www.catalyzex.com/author/Shengwu Xiong"},{"@type":"Person","name":"Xiao Xiang Zhu","url":"https://www.catalyzex.com/author/Xiao Xiang Zhu"},{"@type":"Person","name":"Lichao Mou","url":"https://www.catalyzex.com/author/Lichao Mou"}]},{"@type":"ScholarlyArticle","url":"https://www.catalyzex.com/paper/one-shot-medical-video-object-segmentation","name":"One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks","datePublished":"2025-03-19","author":[{"@type":"Person","name":"Yaxiong Chen","url":"https://www.catalyzex.com/author/Yaxiong Chen"},{"@type":"Person","name":"Junjian Hu","url":"https://www.catalyzex.com/author/Junjian Hu"},{"@type":"Person","name":"Chunlei Li","url":"https://www.catalyzex.com/author/Chunlei Li"},{"@type":"Person","name":"Zixuan Zheng","url":"https://www.catalyzex.com/author/Zixuan Zheng"},{"@type":"Person","name":"Jingliang Hu","url":"https://www.catalyzex.com/author/Jingliang Hu"},{"@type":"Person","name":"Yilei Shi","url":"https://www.catalyzex.com/author/Yilei Shi"},{"@type":"Person","name":"Shengwu Xiong","url":"https://www.catalyzex.com/author/Shengwu Xiong"},{"@type":"Person","name":"Xiao Xiang Zhu","url":"https://www.catalyzex.com/author/Xiao Xiang Zhu"},{"@type":"Person","name":"Lichao Mou","url":"https://www.catalyzex.com/author/Lichao Mou"}]},{"@type":"ScholarlyArticle","url":"https://www.catalyzex.com/paper/ultrasound-image-to-video-synthesis-via","name":"Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models","datePublished":"2025-03-19","author":[{"@type":"Person","name":"Tingxiu Chen","url":"https://www.catalyzex.com/author/Tingxiu Chen"},{"@type":"Person","name":"Yilei Shi","url":"https://www.catalyzex.com/author/Yilei Shi"},{"@type":"Person","name":"Zixuan Zheng","url":"https://www.catalyzex.com/author/Zixuan Zheng"},{"@type":"Person","name":"Bingcong Yan","url":"https://www.catalyzex.com/author/Bingcong Yan"},{"@type":"Person","name":"Jingliang Hu","url":"https://www.catalyzex.com/author/Jingliang Hu"},{"@type":"Person","name":"Xiao Xiang Zhu","url":"https://www.catalyzex.com/author/Xiao Xiang Zhu"},{"@type":"Person","name":"Lichao Mou","url":"https://www.catalyzex.com/author/Lichao Mou"}]},{"@type":"ScholarlyArticle","url":"https://www.catalyzex.com/paper/reducing-annotation-burden-exploiting-image","name":"Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning","datePublished":"2025-03-19","author":[{"@type":"Person","name":"Zixuan Zheng","url":"https://www.catalyzex.com/author/Zixuan Zheng"},{"@type":"Person","name":"Yilei Shi","url":"https://www.catalyzex.com/author/Yilei Shi"},{"@type":"Person","name":"Chunlei Li","url":"https://www.catalyzex.com/author/Chunlei Li"},{"@type":"Person","name":"Jingliang Hu","url":"https://www.catalyzex.com/author/Jingliang Hu"},{"@type":"Person","name":"Xiao Xiang Zhu","url":"https://www.catalyzex.com/author/Xiao Xiang Zhu"},{"@type":"Person","name":"Lichao Mou","url":"https://www.catalyzex.com/author/Lichao Mou"}]},{"@type":"ScholarlyArticle","url":"https://www.catalyzex.com/paper/scale-aware-contrastive-reverse-distillation","name":"Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection","datePublished":"2025-03-18","author":[{"@type":"Person","name":"Chunlei Li","url":"https://www.catalyzex.com/author/Chunlei Li"},{"@type":"Person","name":"Yilei Shi","url":"https://www.catalyzex.com/author/Yilei Shi"},{"@type":"Person","name":"Jingliang Hu","url":"https://www.catalyzex.com/author/Jingliang Hu"},{"@type":"Person","name":"Xiao Xiang Zhu","url":"https://www.catalyzex.com/author/Xiao Xiang Zhu"},{"@type":"Person","name":"Lichao Mou","url":"https://www.catalyzex.com/author/Lichao Mou"}]},{"@type":"ScholarlyArticle","url":"https://www.catalyzex.com/paper/rethinking-cell-counting-methods-decoupling","name":"Rethinking Cell Counting Methods: Decoupling Counting and Localization","datePublished":"2025-03-18","author":[{"@type":"Person","name":"Zixuan Zheng","url":"https://www.catalyzex.com/author/Zixuan Zheng"},{"@type":"Person","name":"Yilei Shi","url":"https://www.catalyzex.com/author/Yilei Shi"},{"@type":"Person","name":"Chunlei Li","url":"https://www.catalyzex.com/author/Chunlei Li"},{"@type":"Person","name":"Jingliang Hu","url":"https://www.catalyzex.com/author/Jingliang Hu"},{"@type":"Person","name":"Xiao Xiang Zhu","url":"https://www.catalyzex.com/author/Xiao Xiang Zhu"},{"@type":"Person","name":"Lichao Mou","url":"https://www.catalyzex.com/author/Lichao Mou"}]},{"@type":"ScholarlyArticle","url":"https://www.catalyzex.com/paper/striving-for-simplicity-simple-yet-effective","name":"Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation","datePublished":"2025-03-18","author":[{"@type":"Person","name":"Yaxiong Chen","url":"https://www.catalyzex.com/author/Yaxiong Chen"},{"@type":"Person","name":"Yujie Wang","url":"https://www.catalyzex.com/author/Yujie Wang"},{"@type":"Person","name":"Zixuan Zheng","url":"https://www.catalyzex.com/author/Zixuan Zheng"},{"@type":"Person","name":"Jingliang Hu","url":"https://www.catalyzex.com/author/Jingliang Hu"},{"@type":"Person","name":"Yilei Shi","url":"https://www.catalyzex.com/author/Yilei Shi"},{"@type":"Person","name":"Shengwu Xiong","url":"https://www.catalyzex.com/author/Shengwu Xiong"},{"@type":"Person","name":"Xiao Xiang Zhu","url":"https://www.catalyzex.com/author/Xiao Xiang Zhu"},{"@type":"Person","name":"Lichao Mou","url":"https://www.catalyzex.com/author/Lichao Mou"}]},{"@type":"ScholarlyArticle","url":"https://www.catalyzex.com/paper/dynamicearthnet-daily-multi-spectral","name":"DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation","datePublished":"2022-03-23","author":[{"@type":"Person","name":"Aysim Toker","url":"https://www.catalyzex.com/author/Aysim Toker"},{"@type":"Person","name":"Lukas Kondmann","url":"https://www.catalyzex.com/author/Lukas Kondmann"},{"@type":"Person","name":"Mark Weber","url":"https://www.catalyzex.com/author/Mark Weber"},{"@type":"Person","name":"Marvin Eisenberger","url":"https://www.catalyzex.com/author/Marvin Eisenberger"},{"@type":"Person","name":"Andr茅s Camero","url":"https://www.catalyzex.com/author/Andr茅s Camero"},{"@type":"Person","name":"Jingliang Hu","url":"https://www.catalyzex.com/author/Jingliang Hu"},{"@type":"Person","name":"Ariadna Pregel Hoderlein","url":"https://www.catalyzex.com/author/Ariadna Pregel Hoderlein"},{"@type":"Person","name":"脟a臒lar 艦enaras","url":"https://www.catalyzex.com/author/脟a臒lar 艦enaras"},{"@type":"Person","name":"Timothy Davis","url":"https://www.catalyzex.com/author/Timothy Davis"},{"@type":"Person","name":"Daniel Cremers","url":"https://www.catalyzex.com/author/Daniel Cremers"},{"@type":"Person","name":"Giovanni Marchisio","url":"https://www.catalyzex.com/author/Giovanni Marchisio"},{"@type":"Person","name":"Xiao Xiang Zhu","url":"https://www.catalyzex.com/author/Xiao Xiang Zhu"},{"@type":"Person","name":"Laura Leal-Taix茅","url":"https://www.catalyzex.com/author/Laura Leal-Taix茅"}],"image":"https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/7e2f67581458d2c17c5806df724a7706ed2c95e9/3-Table1-1.png"},{"@type":"ScholarlyArticle","url":"https://www.catalyzex.com/paper/multimodal-remote-sensing-benchmark-datasets","name":"Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model","datePublished":"2021-05-21","author":[{"@type":"Person","name":"Danfeng Hong","url":"https://www.catalyzex.com/author/Danfeng Hong"},{"@type":"Person","name":"Jingliang Hu","url":"https://www.catalyzex.com/author/Jingliang Hu"},{"@type":"Person","name":"Jing Yao","url":"https://www.catalyzex.com/author/Jing Yao"},{"@type":"Person","name":"Jocelyn Chanussot","url":"https://www.catalyzex.com/author/Jocelyn Chanussot"},{"@type":"Person","name":"Xiao Xiang Zhu","url":"https://www.catalyzex.com/author/Xiao Xiang Zhu"}],"image":"https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/07fa32eb364f966fb4b44dcf9435adf165295c96/7-Figure1-1.png"}]}]}</script><link rel="preload" as="image" imageSrcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ffilter.cf288982.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ffilter.cf288982.png&w=1080&q=75 2x" fetchpriority="high"/><link rel="preload" as="image" imageSrcSet="/_next/image?url=https%3A%2F%2Fui-avatars.com%2Fapi%2F%3Fname%3DJingliang%20Hu%26background%3DBCBCBC%26color%3D3E3E3F%26size%3D256%26bold%3Dtrue%26font-size%3D0.5%26format%3Dpng&w=128&q=75 1x, /_next/image?url=https%3A%2F%2Fui-avatars.com%2Fapi%2F%3Fname%3DJingliang%20Hu%26background%3DBCBCBC%26color%3D3E3E3F%26size%3D256%26bold%3Dtrue%26font-size%3D0.5%26format%3Dpng&w=256&q=75 2x" fetchpriority="high"/><meta name="next-head-count" content="15"/><link rel="preconnect" href="https://fonts.googleapis.com"/><link rel="preconnect" href="https://fonts.gstatic.com" crossorigin=""/><meta charSet="utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge"/><meta name="p:domain_verify" content="7a8c54ff8920a71e909037ac35612f4e"/><meta name="author" content="CatalyzeX"/><meta property="og:type" content="website"/><meta property="og:site_name" content="CatalyzeX"/><meta property="og:url" content="https://www.catalyzex.com/"/><meta property="fb:app_id" content="658945670928778"/><meta property="fb:admins" content="515006233"/><meta name="twitter:card" content="summary_large_image"/><meta name="twitter:domain" content="www.catalyzex.com"/><meta name="twitter:site" content="@catalyzex"/><meta name="twitter:creator" content="@catalyzex"/><script data-partytown-config="true"> partytown = { lib: "/_next/static/~partytown/", forward: [ "gtag", "mixpanel.track", "mixpanel.track_pageview", "mixpanel.identify", "mixpanel.people.set", "mixpanel.reset", "mixpanel.get_distinct_id", "mixpanel.set_config", "manuallySyncMixpanelId" ] }; </script><link rel="preconnect" href="https://fonts.gstatic.com" crossorigin /><link rel="preload" href="/_next/static/media/155cae559bbd1a77-s.p.woff2" as="font" type="font/woff2" crossorigin="anonymous" data-next-font="size-adjust"/><link rel="preload" href="/_next/static/media/4de1fea1a954a5b6-s.p.woff2" as="font" type="font/woff2" crossorigin="anonymous" data-next-font="size-adjust"/><link rel="preload" href="/_next/static/media/6d664cce900333ee-s.p.woff2" as="font" type="font/woff2" crossorigin="anonymous" data-next-font="size-adjust"/><link rel="preload" href="/_next/static/css/21db9cac47ff2c1f.css" as="style"/><link rel="stylesheet" href="/_next/static/css/21db9cac47ff2c1f.css" data-n-g=""/><link rel="preload" href="/_next/static/css/5e125fcc6dc1ef5f.css" as="style"/><link rel="stylesheet" href="/_next/static/css/5e125fcc6dc1ef5f.css" data-n-p=""/><noscript data-n-css=""></noscript><script defer="" nomodule="" src="/_next/static/chunks/polyfills-78c92fac7aa8fdd8.js"></script><script data-partytown="">!(function(w,p,f,c){if(!window.crossOriginIsolated && !navigator.serviceWorker) return;c=w[p]=w[p]||{};c[f]=(c[f]||[])})(window,'partytown','forward');/* Partytown 0.10.2 - MIT builder.io */ const t={preserveBehavior:!1},e=e=>{if("string"==typeof e)return[e,t];const[n,r=t]=e;return[n,{...t,...r}]},n=Object.freeze((t=>{const e=new Set;let n=[];do{Object.getOwnPropertyNames(n).forEach((t=>{"function"==typeof n[t]&&e.add(t)}))}while((n=Object.getPrototypeOf(n))!==Object.prototype);return Array.from(e)})());!function(t,r,o,i,a,s,c,d,l,p,u=t,f){function h(){f||(f=1,"/"==(c=(s.lib||"/~partytown/")+(s.debug?"debug/":""))[0]&&(l=r.querySelectorAll('script[type="text/partytown"]'),i!=t?i.dispatchEvent(new CustomEvent("pt1",{detail:t})):(d=setTimeout(v,1e4),r.addEventListener("pt0",w),a?y(1):o.serviceWorker?o.serviceWorker.register(c+(s.swPath||"partytown-sw.js"),{scope:c}).then((function(t){t.active?y():t.installing&&t.installing.addEventListener("statechange",(function(t){"activated"==t.target.state&&y()}))}),console.error):v())))}function y(e){p=r.createElement(e?"script":"iframe"),t._pttab=Date.now(),e||(p.style.display="block",p.style.width="0",p.style.height="0",p.style.border="0",p.style.visibility="hidden",p.setAttribute("aria-hidden",!0)),p.src=c+"partytown-"+(e?"atomics.js?v=0.10.2":"sandbox-sw.html?"+t._pttab),r.querySelector(s.sandboxParent||"body").appendChild(p)}function v(n,o){for(w(),i==t&&(s.forward||[]).map((function(n){const[r]=e(n);delete t[r.split(".")[0]]})),n=0;n<l.length;n++)(o=r.createElement("script")).innerHTML=l[n].innerHTML,o.nonce=s.nonce,r.head.appendChild(o);p&&p.parentNode.removeChild(p)}function w(){clearTimeout(d)}s=t.partytown||{},i==t&&(s.forward||[]).map((function(r){const[o,{preserveBehavior:i}]=e(r);u=t,o.split(".").map((function(e,r,o){var a;u=u[o[r]]=r+1<o.length?u[o[r]]||(a=o[r+1],n.includes(a)?[]:{}):(()=>{let e=null;if(i){const{methodOrProperty:n,thisObject:r}=((t,e)=>{let n=t;for(let t=0;t<e.length-1;t+=1)n=n[e[t]];return{thisObject:n,methodOrProperty:e.length>0?n[e[e.length-1]]:void 0}})(t,o);"function"==typeof n&&(e=(...t)=>n.apply(r,...t))}return function(){let n;return e&&(n=e(arguments)),(t._ptf=t._ptf||[]).push(o,arguments),n}})()}))})),"complete"==r.readyState?h():(t.addEventListener("DOMContentLoaded",h),t.addEventListener("load",h))}(window,document,navigator,top,window.crossOriginIsolated);</script><script src="https://www.googletagmanager.com/gtag/js?id=G-BD14FTHPNC" type="text/partytown" data-nscript="worker"></script><script defer="" src="/_next/static/chunks/336.311897441b58c7f9.js"></script><script defer="" src="/_next/static/chunks/501.e875b519017f5a6c.js"></script><script src="/_next/static/chunks/webpack-74a7c512fa42fc69.js" defer=""></script><script src="/_next/static/chunks/main-819661c54c38eafc.js" defer=""></script><script src="/_next/static/chunks/pages/_app-7f9dc6693ce04520.js" defer=""></script><script src="/_next/static/chunks/117-cbf0dd2a93fca997.js" defer=""></script><script src="/_next/static/chunks/602-80e933e094e77991.js" defer=""></script><script src="/_next/static/chunks/947-ca6cb45655821eab.js" defer=""></script><script src="/_next/static/chunks/403-8b84e5049c16d49f.js" defer=""></script><script src="/_next/static/chunks/460-cfc8c96502458833.js" defer=""></script><script src="/_next/static/chunks/68-8acf76971c46bf47.js" defer=""></script><script src="/_next/static/chunks/pages/author/%5Bname%5D-42289ca93c12b4ed.js" defer=""></script><script src="/_next/static/rcP1HS6ompi8ywYpLW-WW/_buildManifest.js" defer=""></script><script src="/_next/static/rcP1HS6ompi8ywYpLW-WW/_ssgManifest.js" defer=""></script><style data-href="https://fonts.googleapis.com/css2?family=Lato:wght@300;400;700&display=swap">@font-face{font-family:'Lato';font-style:normal;font-weight:300;font-display:swap;src:url(https://fonts.gstatic.com/s/lato/v24/S6u9w4BMUTPHh7USeww.woff) format('woff')}@font-face{font-family:'Lato';font-style:normal;font-weight:400;font-display:swap;src:url(https://fonts.gstatic.com/s/lato/v24/S6uyw4BMUTPHvxo.woff) format('woff')}@font-face{font-family:'Lato';font-style:normal;font-weight:700;font-display:swap;src:url(https://fonts.gstatic.com/s/lato/v24/S6u9w4BMUTPHh6UVeww.woff) format('woff')}@font-face{font-family:'Lato';font-style:normal;font-weight:300;font-display:swap;src:url(https://fonts.gstatic.com/s/lato/v24/S6u9w4BMUTPHh7USSwaPGQ3q5d0N7w.woff2) format('woff2');unicode-range:U+0100-02AF,U+0304,U+0308,U+0329,U+1E00-1E9F,U+1EF2-1EFF,U+2020,U+20A0-20AB,U+20AD-20C0,U+2113,U+2C60-2C7F,U+A720-A7FF}@font-face{font-family:'Lato';font-style:normal;font-weight:300;font-display:swap;src:url(https://fonts.gstatic.com/s/lato/v24/S6u9w4BMUTPHh7USSwiPGQ3q5d0.woff2) format('woff2');unicode-range:U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+02C6,U+02DA,U+02DC,U+0304,U+0308,U+0329,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD}@font-face{font-family:'Lato';font-style:normal;font-weight:400;font-display:swap;src:url(https://fonts.gstatic.com/s/lato/v24/S6uyw4BMUTPHjxAwXiWtFCfQ7A.woff2) format('woff2');unicode-range:U+0100-02AF,U+0304,U+0308,U+0329,U+1E00-1E9F,U+1EF2-1EFF,U+2020,U+20A0-20AB,U+20AD-20C0,U+2113,U+2C60-2C7F,U+A720-A7FF}@font-face{font-family:'Lato';font-style:normal;font-weight:400;font-display:swap;src:url(https://fonts.gstatic.com/s/lato/v24/S6uyw4BMUTPHjx4wXiWtFCc.woff2) format('woff2');unicode-range:U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+02C6,U+02DA,U+02DC,U+0304,U+0308,U+0329,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD}@font-face{font-family:'Lato';font-style:normal;font-weight:700;font-display:swap;src:url(https://fonts.gstatic.com/s/lato/v24/S6u9w4BMUTPHh6UVSwaPGQ3q5d0N7w.woff2) format('woff2');unicode-range:U+0100-02AF,U+0304,U+0308,U+0329,U+1E00-1E9F,U+1EF2-1EFF,U+2020,U+20A0-20AB,U+20AD-20C0,U+2113,U+2C60-2C7F,U+A720-A7FF}@font-face{font-family:'Lato';font-style:normal;font-weight:700;font-display:swap;src:url(https://fonts.gstatic.com/s/lato/v24/S6u9w4BMUTPHh6UVSwiPGQ3q5d0.woff2) format('woff2');unicode-range:U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+02C6,U+02DA,U+02DC,U+0304,U+0308,U+0329,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD}</style></head><body><div id="__next"><script id="google-analytics" type="text/partytown"> window.dataLayer = window.dataLayer || []; window.gtag = function gtag(){window.dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-BD14FTHPNC', { page_path: window.location.pathname, }); </script><script type="text/partytown"> const MIXPANEL_CUSTOM_LIB_URL = 'https://www.catalyzex.com/mp-cdn/libs/mixpanel-2-latest.min.js'; (function(f,b){if(!b.__SV){var e,g,i,h;window.mixpanel=b;b._i=[];b.init=function(e,f,c){function g(a,d){var b=d.split(".");2==b.length&&(a=a[b[0]],d=b[1]);a[d]=function(){a.push([d].concat(Array.prototype.slice.call(arguments,0)))}}var a=b;"undefined"!==typeof c?a=b[c]=[]:c="mixpanel";a.people=a.people||[];a.toString=function(a){var d="mixpanel";"mixpanel"!==c&&(d+="."+c);a||(d+=" (stub)");return d};a.people.toString=function(){return a.toString(1)+".people (stub)"};i="disable time_event track track_pageview track_links track_forms track_with_groups add_group set_group remove_group register register_once alias unregister identify name_tag set_config reset opt_in_tracking opt_out_tracking has_opted_in_tracking has_opted_out_tracking clear_opt_in_out_tracking start_batch_senders people.set people.set_once people.unset people.increment people.append people.union people.track_charge people.clear_charges people.delete_user people.remove".split(" "); for(h=0;h<i.length;h++)g(a,i[h]);var j="set set_once union unset remove delete".split(" ");a.get_group=function(){function b(c){d[c]=function(){call2_args=arguments;call2=[c].concat(Array.prototype.slice.call(call2_args,0));a.push([e,call2])}}for(var d={},e=["get_group"].concat(Array.prototype.slice.call(arguments,0)),c=0;c<j.length;c++)b(j[c]);return d};b._i.push([e,f,c])};b.__SV=1.2;e=f.createElement("script");e.type="text/javascript";e.async=!0;e.src="undefined"!==typeof MIXPANEL_CUSTOM_LIB_URL? MIXPANEL_CUSTOM_LIB_URL:"file:"===f.location.protocol&&"//catalyzex.com/mp-cdn/libs/mixpanel-2-latest.min.js".match(/^\/\//)?"https://www.catalyzex.com/mp-cdn/libs/mixpanel-2-latest.min.js":"//catalyzex.com/mp-cdn/libs/mixpanel-2-latest.min.js";g=f.getElementsByTagName("script")[0];g.parentNode.insertBefore(e,g)}})(document,window.mixpanel||[]); mixpanel.init("851392464b60e8cc1948a193642f793b", { api_host: "https://www.catalyzex.com/mp", }) manuallySyncMixpanelId = function(currentMixpanelId) { const inMemoryProps = mixpanel?.persistence?.props if (inMemoryProps) { inMemoryProps['distinct_id'] = currentMixpanelId inMemoryProps['$device_id'] = currentMixpanelId delete inMemoryProps['$user_id'] } } </script><div class="Layout_layout-container__GqQwY"><div><div data-testid="banner-main-container" id="Banner_banner-main-container__DgEOW" class="cx-banner"><span class="Banner_content__a4ws8 Banner_default-content___HRmT">Get our free extension to see links to code for papers anywhere online!</span><span class="Banner_content__a4ws8 Banner_small-content__iQlll">Free add-on: code for papers everywhere!</span><span class="Banner_content__a4ws8 Banner_extra-small-content__qkq9E">Free add-on: See code for papers anywhere!</span><div class="Banner_banner-button-section__kX1fj"><a class="Banner_banner-social-button__b3sZ7 Banner_browser-button__6CbLf" href="https://chrome.google.com/webstore/detail/%F0%9F%92%BB-catalyzex-link-all-aim/aikkeehnlfpamidigaffhfmgbkdeheil" rel="noreferrer" target="_blank"><p><img src="/static/images/google-chrome.svg" alt="Chrome logo"/>Add to <!-- -->Chrome</p></a><a class="Banner_firefox-button__nwnR6 Banner_banner-social-button__b3sZ7 Banner_browser-button__6CbLf" href="https://addons.mozilla.org/en-US/firefox/addon/code-finder-catalyzex" rel="noreferrer" target="_blank"><p><img src="/static/images/firefox.svg" alt="Firefox logo"/>Add to <!-- -->Firefox</p></a><a class="Banner_banner-social-button__b3sZ7 Banner_browser-button__6CbLf" href="https://microsoftedge.microsoft.com/addons/detail/get-papers-with-code-ever/mflbgfojghoglejmalekheopgadjmlkm" rel="noreferrer" target="_blank"><p><img src="/static/images/microsoft-edge.svg" alt="Edge logo"/>Add to <!-- -->Edge</p></a></div><div id="Banner_banner-close-button__68_52" class="banner-close-button" data-testid="banner-close-icon" role="button" tabindex="0" aria-label="Home"><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" height="22" width="22" color="black"><path stroke-linecap="round" stroke-linejoin="round" d="m9.75 9.75 4.5 4.5m0-4.5-4.5 4.5M21 12a9 9 0 1 1-18 0 9 9 0 0 1 18 0Z"></path></svg></div></div></div><section data-hydration-on-demand="true"></section><div data-testid="header-main-container" class="Header_navbar__bVRQt"><nav><div><a class="Header_navbar-brand__9oFe_" href="/"><svg version="1.0" xmlns="http://www.w3.org/2000/svg" width="466.000000pt" height="466.000000pt" viewBox="0 0 466.000000 466.000000" preserveAspectRatio="xMidYMid meet" data-testid="catalyzex-header-icon"><title>CatalyzeX Icon</title><g transform="translate(0.000000,466.000000) scale(0.100000,-0.100000)" fill="#000000" stroke="none"><path d="M405 3686 c-42 -18 -83 -69 -92 -114 -4 -20 -8 -482 -8 -1027 l0 -990 25 -44 c16 -28 39 -52 65 -65 38 -20 57 -21 433 -24 444 -3 487 1 538 52 18 18 37 50 43 71 7 25 11 154 11 343 l0 302 -165 0 -165 0 0 -240 0 -240 -225 0 -225 0 0 855 0 855 225 0 225 0 0 -225 0 -225 166 0 165 0 -3 308 c-3 289 -4 309 -24 342 -11 19 -38 45 -60 57 -39 23 -42 23 -469 22 -335 0 -437 -3 -460 -13z"></path><path d="M1795 3686 c-16 -7 -38 -23 -48 -34 -47 -52 -46 -27 -47 -1262 0 -808 3 -1177 11 -1205 14 -50 63 -102 109 -115 19 -5 142 -10 273 -10 l238 0 -3 148 -3 147 -125 2 c-69 0 -135 1 -147 2 l-23 1 0 1025 0 1025 150 0 150 0 0 145 0 145 -252 0 c-188 -1 -261 -4 -283 -14z"></path><path d="M3690 3555 l0 -145 155 0 155 0 0 -1025 0 -1025 -27 0 c-16 -1 -84 -2 -153 -3 l-125 -2 -3 -148 -3 -148 258 3 c296 3 309 7 351 88 l22 45 -2 1202 c-3 1196 -3 1202 -24 1229 -11 15 -33 37 -48 48 -26 20 -43 21 -292 24 l-264 3 0 -146z"></path><path d="M2520 2883 c0 -5 70 -164 156 -356 l157 -347 -177 -374 c-97 -205 -176 -376 -176 -380 0 -3 77 -5 171 -4 l172 3 90 228 c49 125 93 227 97 227 4 0 47 -103 95 -230 l87 -230 174 0 c96 0 174 2 174 3 0 2 -79 172 -175 377 -96 206 -175 378 -175 382 0 8 303 678 317 701 2 4 -70 7 -161 7 l-164 0 -83 -210 c-45 -115 -85 -210 -89 -210 -4 0 -43 95 -86 210 l-79 210 -162 0 c-90 0 -163 -3 -163 -7z"></path></g></svg></a></div></nav></div><div class="Author_author-container__7mxgD"><div class="Searchbar_search-bar-container__xIN4L rounded-border Author_searchbar-component__RYtrU" id="searchbar-component"><form class="Searchbar_search-bar-container__xIN4L" data-testid="search-bar-form"><div><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" height="22"><title>Search Icon</title><path stroke-linecap="round" stroke-linejoin="round" d="m21 21-5.197-5.197m0 0A7.5 7.5 0 1 0 5.196 5.196a7.5 7.5 0 0 0 10.607 10.607Z"></path></svg><input class="form-control Searchbar_search-field__L9Oaa" type="text" id="search-field" name="search" required="" autoComplete="off" placeholder="Type here to search" value=""/><button class="Searchbar_filter-icon-container__qAKJN" type="button" title="search by advanced filters like language/framework, computational requirement, dataset, use case, hardware, etc."><div class="Searchbar_pulse1__6sv_E"></div><img alt="Alert button" fetchpriority="high" width="512" height="512" decoding="async" data-nimg="1" class="Searchbar_filter-icon__0rBbt" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ffilter.cf288982.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ffilter.cf288982.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ffilter.cf288982.png&w=1080&q=75"/></button></div></form></div><section data-hydration-on-demand="true"></section><div class="Author_author-info-container__s16bv"><img alt="Picture for Jingliang Hu" fetchpriority="high" width="125" height="125" decoding="async" data-nimg="1" class="Author_author-avatar___oMPi" style="color:transparent" srcSet="/_next/image?url=https%3A%2F%2Fui-avatars.com%2Fapi%2F%3Fname%3DJingliang%20Hu%26background%3DBCBCBC%26color%3D3E3E3F%26size%3D256%26bold%3Dtrue%26font-size%3D0.5%26format%3Dpng&w=128&q=75 1x, /_next/image?url=https%3A%2F%2Fui-avatars.com%2Fapi%2F%3Fname%3DJingliang%20Hu%26background%3DBCBCBC%26color%3D3E3E3F%26size%3D256%26bold%3Dtrue%26font-size%3D0.5%26format%3Dpng&w=256&q=75 2x" src="/_next/image?url=https%3A%2F%2Fui-avatars.com%2Fapi%2F%3Fname%3DJingliang%20Hu%26background%3DBCBCBC%26color%3D3E3E3F%26size%3D256%26bold%3Dtrue%26font-size%3D0.5%26format%3Dpng&w=256&q=75"/><div class="Author_author-text-info__5qK6g"><div class="Author_author-name__2fRuf"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" aria-hidden="true" data-slot="icon" height="20"><path fill-rule="evenodd" d="M7.5 6a4.5 4.5 0 1 1 9 0 4.5 4.5 0 0 1-9 0ZM3.751 20.105a8.25 8.25 0 0 1 16.498 0 .75.75 0 0 1-.437.695A18.683 18.683 0 0 1 12 22.5c-2.786 0-5.433-.608-7.812-1.7a.75.75 0 0 1-.437-.695Z" clip-rule="evenodd"></path></svg><h1>Jingliang Hu</h1><div class="wrapper Author_author-alert-button__u4klv"><button class="AlertButton_alert-btn__pC8cK" title="Get notified when a new paper is added by the author"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:1px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 39.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M55.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div></div></div><div><section data-testid="paper-details-container" class="Search_paper-details-container__Dou2Q"><h2 class="Search_paper-heading__bq58c"><a data-testid="paper-result-title" href="/paper/causalclipseg-unlocking-clip-s-potential-in"><strong>CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention</strong></a></h2><div class="Search_buttons-container__WWw_l"><a href="#" target="_blank" id="request-code-2503.15949" data-testid="view-code-button" class="Search_view-code-link__xOgGF"><button type="button" class="btn Search_view-button__D5D2K Search_buttons-spacing__iB2NS Search_black-button__O7oac Search_view-code-button__8Dk6Z"><svg role="img" height="14" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#fff"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg>View Code</button></a><button type="button" class="btn Search_view-button__D5D2K Search_black-button__O7oac Search_buttons-spacing__iB2NS"><svg fill="#fff" height="20" viewBox="0 0 48 48" width="20" xmlns="http://www.w3.org/2000/svg"><title>Play Icon</title><path d="M0 0h48v48H0z" fill="none"></path><path d="M24 4C12.95 4 4 12.95 4 24s8.95 20 20 20 20-8.95 20-20S35.05 4 24 4zm-4 29V15l12 9-12 9z"></path></svg>Notebook</button><button type="button" class="Search_buttons-spacing__iB2NS Search_related-code-btn__F5B3X" data-testid="related-code-button"><span class="descriptor" style="display:none">Code for Similar Papers:</span><img alt="Code for Similar Papers" title="View code for similar papers" loading="lazy" width="37" height="35" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75"/></button><a class="Search_buttons-spacing__iB2NS Search_add-code-button__GKwQr" target="_blank" href="/add_code?title=CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention&paper_url=http://arxiv.org/abs/2503.15949" rel="nofollow"><img alt="Add code" title="Contribute your code for this paper to the community" loading="lazy" width="36" height="36" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75"/></a><div class="wrapper Search_buttons-spacing__iB2NS BookmarkButton_bookmark-wrapper__xJaOg"><button title="Bookmark this paper"><img alt="Bookmark button" id="bookmark-btn" loading="lazy" width="388" height="512" decoding="async" data-nimg="1" class="BookmarkButton_bookmark-btn-image__gkInJ" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75"/></button></div><div class="wrapper Search_buttons-spacing__iB2NS"><button class="AlertButton_alert-btn__pC8cK" title="Get alerts when new code is available for this paper"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:9px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 75.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M69.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div><span class="Search_publication-date__mLvO2">Mar 20, 2025<br/></span><div class="AuthorLinks_authors-container__fAwXT"><span class="descriptor" style="display:none">Authors:</span><span><a data-testid="paper-result-author" href="/author/Yaxiong%20Chen">Yaxiong Chen</a>, </span><span><a data-testid="paper-result-author" href="/author/Minghong%20Wei">Minghong Wei</a>, </span><span><a data-testid="paper-result-author" href="/author/Zixuan%20Zheng">Zixuan Zheng</a>, </span><span><a data-testid="paper-result-author" href="/author/Jingliang%20Hu">Jingliang Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Yilei%20Shi">Yilei Shi</a>, </span><span><a data-testid="paper-result-author" href="/author/Shengwu%20Xiong">Shengwu Xiong</a>, </span><span><a data-testid="paper-result-author" href="/author/Xiao%20Xiang%20Zhu">Xiao Xiang Zhu</a>, </span><span><a data-testid="paper-result-author" href="/author/Lichao%20Mou">Lichao Mou</a></span></div><div class="Search_paper-detail-page-images-container__FPeuN"></div><p class="Search_paper-content__1CSu5 text-with-links"><span class="descriptor" style="display:none">Abstract:</span>Referring medical image segmentation targets delineating lesions indicated by textual descriptions. Aligning visual and textual cues is challenging due to their distinct data properties. Inspired by large-scale pre-trained vision-language models, we propose CausalCLIPSeg, an end-to-end framework for referring medical image segmentation that leverages CLIP. Despite not being trained on medical data, we enforce CLIP's rich semantic space onto the medical domain by a tailored cross-modal decoding method to achieve text-to-pixel alignment. Furthermore, to mitigate confounding bias that may cause the model to learn spurious correlations instead of meaningful causal relationships, CausalCLIPSeg introduces a causal intervention module which self-annotates confounders and excavates causal features from inputs for segmentation judgments. We also devise an adversarial min-max game to optimize causal features while penalizing confounding ones. Extensive experiments demonstrate the state-of-the-art performance of our proposed method. Code is available at <a href="https://github.com/WUTCM-Lab/CausalCLIPSeg">https://github.com/WUTCM-Lab/CausalCLIPSeg</a>.<br/></p><div class="text-with-links"><span></span><span><em>* <!-- -->MICCAI 2024<!-- -->聽</em><br/></span></div><div class="Search_search-result-provider__uWcak">Via<img alt="arxiv icon" loading="lazy" width="56" height="25" decoding="async" data-nimg="1" class="Search_arxiv-icon__SXHe4" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=64&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75"/></div><div class="Search_paper-link__nVhf_"><svg role="img" height="20" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="margin-right:5px"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" width="22" style="margin-right:10px;margin-top:2px"><path stroke-linecap="round" stroke-linejoin="round" d="M12 6.042A8.967 8.967 0 0 0 6 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 0 1 6 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 0 1 6-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0 0 18 18a8.967 8.967 0 0 0-6 2.292m0-14.25v14.25"></path></svg><a data-testid="paper-result-access-link" href="/paper/causalclipseg-unlocking-clip-s-potential-in">Access Paper or Ask Questions</a></div><div data-testid="social-icons-tray" class="SocialIconBar_social-icons-tray__hq8N8"><a href="https://twitter.com/intent/tweet?text=Currently%20reading%20%22CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention%22%20catalyzex.com/paper/causalclipseg-unlocking-clip-s-potential-in%20via%20@CatalyzeX%0A%0AMore%20at:&url=https://www.catalyzex.com&related=CatalyzeX" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Twitter Icon</title><path d="M23.953 4.57a10 10 0 01-2.825.775 4.958 4.958 0 002.163-2.723c-.951.555-2.005.959-3.127 1.184a4.92 4.92 0 00-8.384 4.482C7.69 8.095 4.067 6.13 1.64 3.162a4.822 4.822 0 00-.666 2.475c0 1.71.87 3.213 2.188 4.096a4.904 4.904 0 01-2.228-.616v.06a4.923 4.923 0 003.946 4.827 4.996 4.996 0 01-2.212.085 4.936 4.936 0 004.604 3.417 9.867 9.867 0 01-6.102 2.105c-.39 0-.779-.023-1.17-.067a13.995 13.995 0 007.557 2.209c9.053 0 13.998-7.496 13.998-13.985 0-.21 0-.42-.015-.63A9.935 9.935 0 0024 4.59z"></path></svg></a><a href="https://www.facebook.com/dialog/share?app_id=704241106642044&display=popup&href=catalyzex.com/paper/causalclipseg-unlocking-clip-s-potential-in&redirect_uri=https%3A%2F%2Fcatalyzex.com&quote=Currently%20reading%20%22CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention%22%20via%20CatalyzeX.com" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Facebook Icon</title><path d="M24 12.073c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.99 4.388 10.954 10.125 11.854v-8.385H7.078v-3.47h3.047V9.43c0-3.007 1.792-4.669 4.533-4.669 1.312 0 2.686.235 2.686.235v2.953H15.83c-1.491 0-1.956.925-1.956 1.874v2.25h3.328l-.532 3.47h-2.796v8.385C19.612 23.027 24 18.062 24 12.073z"></path></svg></a><a href="https://www.linkedin.com/sharing/share-offsite/?url=catalyzex.com/paper/causalclipseg-unlocking-clip-s-potential-in&title=CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" aria-labelledby="Linkedin Icon" xmlns="http://www.w3.org/2000/svg" fill="#0e76a8"><title>Linkedin Icon</title><path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z"></path></svg></a><a href="https://api.whatsapp.com/send?text=See this paper I'm reading: CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention - catalyzex.com/paper/causalclipseg-unlocking-clip-s-potential-in %0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning" target="_blank" rel="noreferrer"><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" x="0px" y="0px" viewBox="0 0 512 512" height="28" width="28"><title>Whatsapp Icon</title><path fill="#EDEDED" d="M0,512l35.31-128C12.359,344.276,0,300.138,0,254.234C0,114.759,114.759,0,255.117,0 S512,114.759,512,254.234S395.476,512,255.117,512c-44.138,0-86.51-14.124-124.469-35.31L0,512z"></path><path fill="#55CD6C" d="M137.71,430.786l7.945,4.414c32.662,20.303,70.621,32.662,110.345,32.662 c115.641,0,211.862-96.221,211.862-213.628S371.641,44.138,255.117,44.138S44.138,137.71,44.138,254.234 c0,40.607,11.476,80.331,32.662,113.876l5.297,7.945l-20.303,74.152L137.71,430.786z"></path><path fill="#FEFEFE" d="M187.145,135.945l-16.772-0.883c-5.297,0-10.593,1.766-14.124,5.297 c-7.945,7.062-21.186,20.303-24.717,37.959c-6.179,26.483,3.531,58.262,26.483,90.041s67.09,82.979,144.772,105.048 c24.717,7.062,44.138,2.648,60.028-7.062c12.359-7.945,20.303-20.303,22.952-33.545l2.648-12.359 c0.883-3.531-0.883-7.945-4.414-9.71l-55.614-25.6c-3.531-1.766-7.945-0.883-10.593,2.648l-22.069,28.248 c-1.766,1.766-4.414,2.648-7.062,1.766c-15.007-5.297-65.324-26.483-92.69-79.448c-0.883-2.648-0.883-5.297,0.883-7.062 l21.186-23.834c1.766-2.648,2.648-6.179,1.766-8.828l-25.6-57.379C193.324,138.593,190.676,135.945,187.145,135.945"></path></svg></a><a title="Send via Messenger" href="https://www.facebook.com/dialog/send?app_id=704241106642044&link=catalyzex.com/paper/causalclipseg-unlocking-clip-s-potential-in&redirect_uri=https%3A%2F%2Fcatalyzex.com" target="_blank" rel="noreferrer"><svg role="img" height="24" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#0695FF"><title>Messenger Icon</title><path d="M.001 11.639C.001 4.949 5.241 0 12.001 0S24 4.95 24 11.639c0 6.689-5.24 11.638-12 11.638-1.21 0-2.38-.16-3.47-.46a.96.96 0 00-.64.05l-2.39 1.05a.96.96 0 01-1.35-.85l-.07-2.14a.97.97 0 00-.32-.68A11.39 11.389 0 01.002 11.64zm8.32-2.19l-3.52 5.6c-.35.53.32 1.139.82.75l3.79-2.87c.26-.2.6-.2.87 0l2.8 2.1c.84.63 2.04.4 2.6-.48l3.52-5.6c.35-.53-.32-1.13-.82-.75l-3.79 2.87c-.25.2-.6.2-.86 0l-2.8-2.1a1.8 1.8 0 00-2.61.48z"></path></svg></a><a title="Share via Email" href="mailto:?subject=See this paper I'm reading: CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention&body=%22CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention%22 - catalyzex.com/paper/causalclipseg-unlocking-clip-s-potential-in%0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning%0D%0A%0D%0A" target="_blank" rel="noreferrer"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="#ff8c00" aria-hidden="true" data-slot="icon" height="30" width="30"><title>Email Icon</title><path d="M1.5 8.67v8.58a3 3 0 0 0 3 3h15a3 3 0 0 0 3-3V8.67l-8.928 5.493a3 3 0 0 1-3.144 0L1.5 8.67Z"></path><path d="M22.5 6.908V6.75a3 3 0 0 0-3-3h-15a3 3 0 0 0-3 3v.158l9.714 5.978a1.5 1.5 0 0 0 1.572 0L22.5 6.908Z"></path></svg></a></div></section><div class="Search_seperator-line__4FidS"></div></div><div><section data-testid="paper-details-container" class="Search_paper-details-container__Dou2Q"><h2 class="Search_paper-heading__bq58c"><a data-testid="paper-result-title" href="/paper/unicrossadapter-multimodal-adaptation-of-clip"><strong>UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation</strong></a></h2><div class="Search_buttons-container__WWw_l"><a href="#" target="_blank" id="request-code-2503.15940" data-testid="view-code-button" class="Search_view-code-link__xOgGF"><button type="button" class="btn Search_view-button__D5D2K Search_buttons-spacing__iB2NS Search_black-button__O7oac Search_view-code-button__8Dk6Z"><svg role="img" height="14" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#fff"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg>View Code</button></a><button type="button" class="btn Search_view-button__D5D2K Search_black-button__O7oac Search_buttons-spacing__iB2NS"><svg fill="#fff" height="20" viewBox="0 0 48 48" width="20" xmlns="http://www.w3.org/2000/svg"><title>Play Icon</title><path d="M0 0h48v48H0z" fill="none"></path><path d="M24 4C12.95 4 4 12.95 4 24s8.95 20 20 20 20-8.95 20-20S35.05 4 24 4zm-4 29V15l12 9-12 9z"></path></svg>Notebook</button><button type="button" class="Search_buttons-spacing__iB2NS Search_related-code-btn__F5B3X" data-testid="related-code-button"><span class="descriptor" style="display:none">Code for Similar Papers:</span><img alt="Code for Similar Papers" title="View code for similar papers" loading="lazy" width="37" height="35" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75"/></button><a class="Search_buttons-spacing__iB2NS Search_add-code-button__GKwQr" target="_blank" href="/add_code?title=UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation&paper_url=http://arxiv.org/abs/2503.15940" rel="nofollow"><img alt="Add code" title="Contribute your code for this paper to the community" loading="lazy" width="36" height="36" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75"/></a><div class="wrapper Search_buttons-spacing__iB2NS BookmarkButton_bookmark-wrapper__xJaOg"><button title="Bookmark this paper"><img alt="Bookmark button" id="bookmark-btn" loading="lazy" width="388" height="512" decoding="async" data-nimg="1" class="BookmarkButton_bookmark-btn-image__gkInJ" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75"/></button></div><div class="wrapper Search_buttons-spacing__iB2NS"><button class="AlertButton_alert-btn__pC8cK" title="Get alerts when new code is available for this paper"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:9px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 75.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M69.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div><span class="Search_publication-date__mLvO2">Mar 20, 2025<br/></span><div class="AuthorLinks_authors-container__fAwXT"><span class="descriptor" style="display:none">Authors:</span><span><a data-testid="paper-result-author" href="/author/Yaxiong%20Chen">Yaxiong Chen</a>, </span><span><a data-testid="paper-result-author" href="/author/Chuang%20Du">Chuang Du</a>, </span><span><a data-testid="paper-result-author" href="/author/Chunlei%20Li">Chunlei Li</a>, </span><span><a data-testid="paper-result-author" href="/author/Jingliang%20Hu">Jingliang Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Yilei%20Shi">Yilei Shi</a>, </span><span><a data-testid="paper-result-author" href="/author/Shengwu%20Xiong">Shengwu Xiong</a>, </span><span><a data-testid="paper-result-author" href="/author/Xiao%20Xiang%20Zhu">Xiao Xiang Zhu</a>, </span><span><a data-testid="paper-result-author" href="/author/Lichao%20Mou">Lichao Mou</a></span></div><div class="Search_paper-detail-page-images-container__FPeuN"></div><p class="Search_paper-content__1CSu5 text-with-links"><span class="descriptor" style="display:none">Abstract:</span>Automated radiology report generation aims to expedite the tedious and error-prone reporting process for radiologists. While recent works have made progress, learning to align medical images and textual findings remains challenging due to the relative scarcity of labeled medical data. For example, datasets for this task are much smaller than those used for image captioning in computer vision. In this work, we propose to transfer representations from CLIP, a large-scale pre-trained vision-language model, to better capture cross-modal semantics between images and texts. However, directly applying CLIP is suboptimal due to the domain gap between natural images and radiology. To enable efficient adaptation, we introduce UniCrossAdapter, lightweight adapter modules that are incorporated into CLIP and fine-tuned on the target task while keeping base parameters fixed. The adapters are distributed across modalities and their interaction to enhance vision-language alignment. Experiments on two public datasets demonstrate the effectiveness of our approach, advancing state-of-the-art in radiology report generation. The proposed transfer learning framework provides a means of harnessing semantic knowledge from large-scale pre-trained models to tackle data-scarce medical vision-language tasks. Code is available at <a href="https://github.com/chauncey-tow/MRG-CLIP">https://github.com/chauncey-tow/MRG-CLIP</a>.<br/></p><div class="text-with-links"><span></span><span><em>* <!-- -->MICCAI 2024 Workshop<!-- -->聽</em><br/></span></div><div class="Search_search-result-provider__uWcak">Via<img alt="arxiv icon" loading="lazy" width="56" height="25" decoding="async" data-nimg="1" class="Search_arxiv-icon__SXHe4" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=64&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75"/></div><div class="Search_paper-link__nVhf_"><svg role="img" height="20" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="margin-right:5px"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" width="22" style="margin-right:10px;margin-top:2px"><path stroke-linecap="round" stroke-linejoin="round" d="M12 6.042A8.967 8.967 0 0 0 6 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 0 1 6 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 0 1 6-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0 0 18 18a8.967 8.967 0 0 0-6 2.292m0-14.25v14.25"></path></svg><a data-testid="paper-result-access-link" href="/paper/unicrossadapter-multimodal-adaptation-of-clip">Access Paper or Ask Questions</a></div><div data-testid="social-icons-tray" class="SocialIconBar_social-icons-tray__hq8N8"><a href="https://twitter.com/intent/tweet?text=Currently%20reading%20%22UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation%22%20catalyzex.com/paper/unicrossadapter-multimodal-adaptation-of-clip%20via%20@CatalyzeX%0A%0AMore%20at:&url=https://www.catalyzex.com&related=CatalyzeX" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Twitter Icon</title><path d="M23.953 4.57a10 10 0 01-2.825.775 4.958 4.958 0 002.163-2.723c-.951.555-2.005.959-3.127 1.184a4.92 4.92 0 00-8.384 4.482C7.69 8.095 4.067 6.13 1.64 3.162a4.822 4.822 0 00-.666 2.475c0 1.71.87 3.213 2.188 4.096a4.904 4.904 0 01-2.228-.616v.06a4.923 4.923 0 003.946 4.827 4.996 4.996 0 01-2.212.085 4.936 4.936 0 004.604 3.417 9.867 9.867 0 01-6.102 2.105c-.39 0-.779-.023-1.17-.067a13.995 13.995 0 007.557 2.209c9.053 0 13.998-7.496 13.998-13.985 0-.21 0-.42-.015-.63A9.935 9.935 0 0024 4.59z"></path></svg></a><a href="https://www.facebook.com/dialog/share?app_id=704241106642044&display=popup&href=catalyzex.com/paper/unicrossadapter-multimodal-adaptation-of-clip&redirect_uri=https%3A%2F%2Fcatalyzex.com&quote=Currently%20reading%20%22UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation%22%20via%20CatalyzeX.com" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Facebook Icon</title><path d="M24 12.073c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.99 4.388 10.954 10.125 11.854v-8.385H7.078v-3.47h3.047V9.43c0-3.007 1.792-4.669 4.533-4.669 1.312 0 2.686.235 2.686.235v2.953H15.83c-1.491 0-1.956.925-1.956 1.874v2.25h3.328l-.532 3.47h-2.796v8.385C19.612 23.027 24 18.062 24 12.073z"></path></svg></a><a href="https://www.linkedin.com/sharing/share-offsite/?url=catalyzex.com/paper/unicrossadapter-multimodal-adaptation-of-clip&title=UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" aria-labelledby="Linkedin Icon" xmlns="http://www.w3.org/2000/svg" fill="#0e76a8"><title>Linkedin Icon</title><path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z"></path></svg></a><a href="https://api.whatsapp.com/send?text=See this paper I'm reading: UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation - catalyzex.com/paper/unicrossadapter-multimodal-adaptation-of-clip %0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning" target="_blank" rel="noreferrer"><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" x="0px" y="0px" viewBox="0 0 512 512" height="28" width="28"><title>Whatsapp Icon</title><path fill="#EDEDED" d="M0,512l35.31-128C12.359,344.276,0,300.138,0,254.234C0,114.759,114.759,0,255.117,0 S512,114.759,512,254.234S395.476,512,255.117,512c-44.138,0-86.51-14.124-124.469-35.31L0,512z"></path><path fill="#55CD6C" d="M137.71,430.786l7.945,4.414c32.662,20.303,70.621,32.662,110.345,32.662 c115.641,0,211.862-96.221,211.862-213.628S371.641,44.138,255.117,44.138S44.138,137.71,44.138,254.234 c0,40.607,11.476,80.331,32.662,113.876l5.297,7.945l-20.303,74.152L137.71,430.786z"></path><path fill="#FEFEFE" d="M187.145,135.945l-16.772-0.883c-5.297,0-10.593,1.766-14.124,5.297 c-7.945,7.062-21.186,20.303-24.717,37.959c-6.179,26.483,3.531,58.262,26.483,90.041s67.09,82.979,144.772,105.048 c24.717,7.062,44.138,2.648,60.028-7.062c12.359-7.945,20.303-20.303,22.952-33.545l2.648-12.359 c0.883-3.531-0.883-7.945-4.414-9.71l-55.614-25.6c-3.531-1.766-7.945-0.883-10.593,2.648l-22.069,28.248 c-1.766,1.766-4.414,2.648-7.062,1.766c-15.007-5.297-65.324-26.483-92.69-79.448c-0.883-2.648-0.883-5.297,0.883-7.062 l21.186-23.834c1.766-2.648,2.648-6.179,1.766-8.828l-25.6-57.379C193.324,138.593,190.676,135.945,187.145,135.945"></path></svg></a><a title="Send via Messenger" href="https://www.facebook.com/dialog/send?app_id=704241106642044&link=catalyzex.com/paper/unicrossadapter-multimodal-adaptation-of-clip&redirect_uri=https%3A%2F%2Fcatalyzex.com" target="_blank" rel="noreferrer"><svg role="img" height="24" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#0695FF"><title>Messenger Icon</title><path d="M.001 11.639C.001 4.949 5.241 0 12.001 0S24 4.95 24 11.639c0 6.689-5.24 11.638-12 11.638-1.21 0-2.38-.16-3.47-.46a.96.96 0 00-.64.05l-2.39 1.05a.96.96 0 01-1.35-.85l-.07-2.14a.97.97 0 00-.32-.68A11.39 11.389 0 01.002 11.64zm8.32-2.19l-3.52 5.6c-.35.53.32 1.139.82.75l3.79-2.87c.26-.2.6-.2.87 0l2.8 2.1c.84.63 2.04.4 2.6-.48l3.52-5.6c.35-.53-.32-1.13-.82-.75l-3.79 2.87c-.25.2-.6.2-.86 0l-2.8-2.1a1.8 1.8 0 00-2.61.48z"></path></svg></a><a title="Share via Email" href="mailto:?subject=See this paper I'm reading: UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation&body=%22UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation%22 - catalyzex.com/paper/unicrossadapter-multimodal-adaptation-of-clip%0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning%0D%0A%0D%0A" target="_blank" rel="noreferrer"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="#ff8c00" aria-hidden="true" data-slot="icon" height="30" width="30"><title>Email Icon</title><path d="M1.5 8.67v8.58a3 3 0 0 0 3 3h15a3 3 0 0 0 3-3V8.67l-8.928 5.493a3 3 0 0 1-3.144 0L1.5 8.67Z"></path><path d="M22.5 6.908V6.75a3 3 0 0 0-3-3h-15a3 3 0 0 0-3 3v.158l9.714 5.978a1.5 1.5 0 0 0 1.572 0L22.5 6.908Z"></path></svg></a></div></section><div class="Search_seperator-line__4FidS"></div></div><section data-hydration-on-demand="true"><div><section data-testid="paper-details-container" class="Search_paper-details-container__Dou2Q"><h2 class="Search_paper-heading__bq58c"><a data-testid="paper-result-title" href="/paper/one-shot-medical-video-object-segmentation"><strong>One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks</strong></a></h2><div class="Search_buttons-container__WWw_l"><a href="#" target="_blank" id="request-code-2503.14979" data-testid="view-code-button" class="Search_view-code-link__xOgGF"><button type="button" class="btn Search_view-button__D5D2K Search_buttons-spacing__iB2NS Search_black-button__O7oac Search_view-code-button__8Dk6Z"><svg role="img" height="14" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#fff"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg>View Code</button></a><button type="button" class="btn Search_view-button__D5D2K Search_black-button__O7oac Search_buttons-spacing__iB2NS"><svg fill="#fff" height="20" viewBox="0 0 48 48" width="20" xmlns="http://www.w3.org/2000/svg"><title>Play Icon</title><path d="M0 0h48v48H0z" fill="none"></path><path d="M24 4C12.95 4 4 12.95 4 24s8.95 20 20 20 20-8.95 20-20S35.05 4 24 4zm-4 29V15l12 9-12 9z"></path></svg>Notebook</button><button type="button" class="Search_buttons-spacing__iB2NS Search_related-code-btn__F5B3X" data-testid="related-code-button"><span class="descriptor" style="display:none">Code for Similar Papers:</span><img alt="Code for Similar Papers" title="View code for similar papers" loading="lazy" width="37" height="35" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75"/></button><a class="Search_buttons-spacing__iB2NS Search_add-code-button__GKwQr" target="_blank" href="/add_code?title=One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks&paper_url=http://arxiv.org/abs/2503.14979" rel="nofollow"><img alt="Add code" title="Contribute your code for this paper to the community" loading="lazy" width="36" height="36" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75"/></a><div class="wrapper Search_buttons-spacing__iB2NS BookmarkButton_bookmark-wrapper__xJaOg"><button title="Bookmark this paper"><img alt="Bookmark button" id="bookmark-btn" loading="lazy" width="388" height="512" decoding="async" data-nimg="1" class="BookmarkButton_bookmark-btn-image__gkInJ" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75"/></button></div><div class="wrapper Search_buttons-spacing__iB2NS"><button class="AlertButton_alert-btn__pC8cK" title="Get alerts when new code is available for this paper"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:9px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 75.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M69.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div><span class="Search_publication-date__mLvO2">Mar 19, 2025<br/></span><div class="AuthorLinks_authors-container__fAwXT"><span class="descriptor" style="display:none">Authors:</span><span><a data-testid="paper-result-author" href="/author/Yaxiong%20Chen">Yaxiong Chen</a>, </span><span><a data-testid="paper-result-author" href="/author/Junjian%20Hu">Junjian Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Chunlei%20Li">Chunlei Li</a>, </span><span><a data-testid="paper-result-author" href="/author/Zixuan%20Zheng">Zixuan Zheng</a>, </span><span><a data-testid="paper-result-author" href="/author/Jingliang%20Hu">Jingliang Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Yilei%20Shi">Yilei Shi</a>, </span><span><a data-testid="paper-result-author" href="/author/Shengwu%20Xiong">Shengwu Xiong</a>, </span><span><a data-testid="paper-result-author" href="/author/Xiao%20Xiang%20Zhu">Xiao Xiang Zhu</a>, </span><span><a data-testid="paper-result-author" href="/author/Lichao%20Mou">Lichao Mou</a></span></div><div class="Search_paper-detail-page-images-container__FPeuN"></div><p class="Search_paper-content__1CSu5 text-with-links"><span class="descriptor" style="display:none">Abstract:</span>Video object segmentation is crucial for the efficient analysis of complex medical video data, yet it faces significant challenges in data availability and annotation. We introduce the task of one-shot medical video object segmentation, which requires separating foreground and background pixels throughout a video given only the mask annotation of the first frame. To address this problem, we propose a temporal contrastive memory network comprising image and mask encoders to learn feature representations, a temporal contrastive memory bank that aligns embeddings from adjacent frames while pushing apart distant ones to explicitly model inter-frame relationships and stores these features, and a decoder that fuses encoded image features and memory readouts for segmentation. We also collect a diverse, multi-source medical video dataset spanning various modalities and anatomies to benchmark this task. Extensive experiments demonstrate state-of-the-art performance in segmenting both seen and unseen structures from a single exemplar, showing ability to generalize from scarce labels. This highlights the potential to alleviate annotation burdens for medical video analysis. Code is available at <a href="https://github.com/MedAITech/TCMN">https://github.com/MedAITech/TCMN</a>.<br/></p><div class="text-with-links"><span></span><span><em>* <!-- -->MICCAI 2024 Workshop<!-- -->聽</em><br/></span></div><div class="Search_search-result-provider__uWcak">Via<img alt="arxiv icon" loading="lazy" width="56" height="25" decoding="async" data-nimg="1" class="Search_arxiv-icon__SXHe4" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=64&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75"/></div><div class="Search_paper-link__nVhf_"><svg role="img" height="20" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="margin-right:5px"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" width="22" style="margin-right:10px;margin-top:2px"><path stroke-linecap="round" stroke-linejoin="round" d="M12 6.042A8.967 8.967 0 0 0 6 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 0 1 6 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 0 1 6-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0 0 18 18a8.967 8.967 0 0 0-6 2.292m0-14.25v14.25"></path></svg><a data-testid="paper-result-access-link" href="/paper/one-shot-medical-video-object-segmentation">Access Paper or Ask Questions</a></div><div data-testid="social-icons-tray" class="SocialIconBar_social-icons-tray__hq8N8"><a href="https://twitter.com/intent/tweet?text=Currently%20reading%20%22One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks%22%20catalyzex.com/paper/one-shot-medical-video-object-segmentation%20via%20@CatalyzeX%0A%0AMore%20at:&url=https://www.catalyzex.com&related=CatalyzeX" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Twitter Icon</title><path d="M23.953 4.57a10 10 0 01-2.825.775 4.958 4.958 0 002.163-2.723c-.951.555-2.005.959-3.127 1.184a4.92 4.92 0 00-8.384 4.482C7.69 8.095 4.067 6.13 1.64 3.162a4.822 4.822 0 00-.666 2.475c0 1.71.87 3.213 2.188 4.096a4.904 4.904 0 01-2.228-.616v.06a4.923 4.923 0 003.946 4.827 4.996 4.996 0 01-2.212.085 4.936 4.936 0 004.604 3.417 9.867 9.867 0 01-6.102 2.105c-.39 0-.779-.023-1.17-.067a13.995 13.995 0 007.557 2.209c9.053 0 13.998-7.496 13.998-13.985 0-.21 0-.42-.015-.63A9.935 9.935 0 0024 4.59z"></path></svg></a><a href="https://www.facebook.com/dialog/share?app_id=704241106642044&display=popup&href=catalyzex.com/paper/one-shot-medical-video-object-segmentation&redirect_uri=https%3A%2F%2Fcatalyzex.com&quote=Currently%20reading%20%22One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks%22%20via%20CatalyzeX.com" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Facebook Icon</title><path d="M24 12.073c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.99 4.388 10.954 10.125 11.854v-8.385H7.078v-3.47h3.047V9.43c0-3.007 1.792-4.669 4.533-4.669 1.312 0 2.686.235 2.686.235v2.953H15.83c-1.491 0-1.956.925-1.956 1.874v2.25h3.328l-.532 3.47h-2.796v8.385C19.612 23.027 24 18.062 24 12.073z"></path></svg></a><a href="https://www.linkedin.com/sharing/share-offsite/?url=catalyzex.com/paper/one-shot-medical-video-object-segmentation&title=One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" aria-labelledby="Linkedin Icon" xmlns="http://www.w3.org/2000/svg" fill="#0e76a8"><title>Linkedin Icon</title><path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z"></path></svg></a><a href="https://api.whatsapp.com/send?text=See this paper I'm reading: One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks - catalyzex.com/paper/one-shot-medical-video-object-segmentation %0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning" target="_blank" rel="noreferrer"><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" x="0px" y="0px" viewBox="0 0 512 512" height="28" width="28"><title>Whatsapp Icon</title><path fill="#EDEDED" d="M0,512l35.31-128C12.359,344.276,0,300.138,0,254.234C0,114.759,114.759,0,255.117,0 S512,114.759,512,254.234S395.476,512,255.117,512c-44.138,0-86.51-14.124-124.469-35.31L0,512z"></path><path fill="#55CD6C" d="M137.71,430.786l7.945,4.414c32.662,20.303,70.621,32.662,110.345,32.662 c115.641,0,211.862-96.221,211.862-213.628S371.641,44.138,255.117,44.138S44.138,137.71,44.138,254.234 c0,40.607,11.476,80.331,32.662,113.876l5.297,7.945l-20.303,74.152L137.71,430.786z"></path><path fill="#FEFEFE" d="M187.145,135.945l-16.772-0.883c-5.297,0-10.593,1.766-14.124,5.297 c-7.945,7.062-21.186,20.303-24.717,37.959c-6.179,26.483,3.531,58.262,26.483,90.041s67.09,82.979,144.772,105.048 c24.717,7.062,44.138,2.648,60.028-7.062c12.359-7.945,20.303-20.303,22.952-33.545l2.648-12.359 c0.883-3.531-0.883-7.945-4.414-9.71l-55.614-25.6c-3.531-1.766-7.945-0.883-10.593,2.648l-22.069,28.248 c-1.766,1.766-4.414,2.648-7.062,1.766c-15.007-5.297-65.324-26.483-92.69-79.448c-0.883-2.648-0.883-5.297,0.883-7.062 l21.186-23.834c1.766-2.648,2.648-6.179,1.766-8.828l-25.6-57.379C193.324,138.593,190.676,135.945,187.145,135.945"></path></svg></a><a title="Send via Messenger" href="https://www.facebook.com/dialog/send?app_id=704241106642044&link=catalyzex.com/paper/one-shot-medical-video-object-segmentation&redirect_uri=https%3A%2F%2Fcatalyzex.com" target="_blank" rel="noreferrer"><svg role="img" height="24" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#0695FF"><title>Messenger Icon</title><path d="M.001 11.639C.001 4.949 5.241 0 12.001 0S24 4.95 24 11.639c0 6.689-5.24 11.638-12 11.638-1.21 0-2.38-.16-3.47-.46a.96.96 0 00-.64.05l-2.39 1.05a.96.96 0 01-1.35-.85l-.07-2.14a.97.97 0 00-.32-.68A11.39 11.389 0 01.002 11.64zm8.32-2.19l-3.52 5.6c-.35.53.32 1.139.82.75l3.79-2.87c.26-.2.6-.2.87 0l2.8 2.1c.84.63 2.04.4 2.6-.48l3.52-5.6c.35-.53-.32-1.13-.82-.75l-3.79 2.87c-.25.2-.6.2-.86 0l-2.8-2.1a1.8 1.8 0 00-2.61.48z"></path></svg></a><a title="Share via Email" href="mailto:?subject=See this paper I'm reading: One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks&body=%22One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks%22 - catalyzex.com/paper/one-shot-medical-video-object-segmentation%0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning%0D%0A%0D%0A" target="_blank" rel="noreferrer"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="#ff8c00" aria-hidden="true" data-slot="icon" height="30" width="30"><title>Email Icon</title><path d="M1.5 8.67v8.58a3 3 0 0 0 3 3h15a3 3 0 0 0 3-3V8.67l-8.928 5.493a3 3 0 0 1-3.144 0L1.5 8.67Z"></path><path d="M22.5 6.908V6.75a3 3 0 0 0-3-3h-15a3 3 0 0 0-3 3v.158l9.714 5.978a1.5 1.5 0 0 0 1.572 0L22.5 6.908Z"></path></svg></a></div></section><div class="Search_seperator-line__4FidS"></div></div><div><section data-testid="paper-details-container" class="Search_paper-details-container__Dou2Q"><h2 class="Search_paper-heading__bq58c"><a data-testid="paper-result-title" href="/paper/ultrasound-image-to-video-synthesis-via"><strong>Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models</strong></a></h2><div class="Search_buttons-container__WWw_l"><a href="#" target="_blank" id="request-code-2503.14966" data-testid="view-code-button" class="Search_view-code-link__xOgGF"><button type="button" class="btn Search_view-button__D5D2K Search_buttons-spacing__iB2NS Search_black-button__O7oac Search_view-code-button__8Dk6Z"><svg role="img" height="14" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#fff"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg>View Code</button></a><button type="button" class="btn Search_view-button__D5D2K Search_black-button__O7oac Search_buttons-spacing__iB2NS"><svg fill="#fff" height="20" viewBox="0 0 48 48" width="20" xmlns="http://www.w3.org/2000/svg"><title>Play Icon</title><path d="M0 0h48v48H0z" fill="none"></path><path d="M24 4C12.95 4 4 12.95 4 24s8.95 20 20 20 20-8.95 20-20S35.05 4 24 4zm-4 29V15l12 9-12 9z"></path></svg>Notebook</button><button type="button" class="Search_buttons-spacing__iB2NS Search_related-code-btn__F5B3X" data-testid="related-code-button"><span class="descriptor" style="display:none">Code for Similar Papers:</span><img alt="Code for Similar Papers" title="View code for similar papers" loading="lazy" width="37" height="35" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75"/></button><a class="Search_buttons-spacing__iB2NS Search_add-code-button__GKwQr" target="_blank" href="/add_code?title=Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models&paper_url=http://arxiv.org/abs/2503.14966" rel="nofollow"><img alt="Add code" title="Contribute your code for this paper to the community" loading="lazy" width="36" height="36" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75"/></a><div class="wrapper Search_buttons-spacing__iB2NS BookmarkButton_bookmark-wrapper__xJaOg"><button title="Bookmark this paper"><img alt="Bookmark button" id="bookmark-btn" loading="lazy" width="388" height="512" decoding="async" data-nimg="1" class="BookmarkButton_bookmark-btn-image__gkInJ" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75"/></button></div><div class="wrapper Search_buttons-spacing__iB2NS"><button class="AlertButton_alert-btn__pC8cK" title="Get alerts when new code is available for this paper"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:9px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 75.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M69.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div><span class="Search_publication-date__mLvO2">Mar 19, 2025<br/></span><div class="AuthorLinks_authors-container__fAwXT"><span class="descriptor" style="display:none">Authors:</span><span><a data-testid="paper-result-author" href="/author/Tingxiu%20Chen">Tingxiu Chen</a>, </span><span><a data-testid="paper-result-author" href="/author/Yilei%20Shi">Yilei Shi</a>, </span><span><a data-testid="paper-result-author" href="/author/Zixuan%20Zheng">Zixuan Zheng</a>, </span><span><a data-testid="paper-result-author" href="/author/Bingcong%20Yan">Bingcong Yan</a>, </span><span><a data-testid="paper-result-author" href="/author/Jingliang%20Hu">Jingliang Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Xiao%20Xiang%20Zhu">Xiao Xiang Zhu</a>, </span><span><a data-testid="paper-result-author" href="/author/Lichao%20Mou">Lichao Mou</a></span></div><div class="Search_paper-detail-page-images-container__FPeuN"></div><p class="Search_paper-content__1CSu5 text-with-links"><span class="descriptor" style="display:none">Abstract:</span>Ultrasound video classification enables automated diagnosis and has emerged as an important research area. However, publicly available ultrasound video datasets remain scarce, hindering progress in developing effective video classification models. We propose addressing this shortage by synthesizing plausible ultrasound videos from readily available, abundant ultrasound images. To this end, we introduce a latent dynamic diffusion model (LDDM) to efficiently translate static images to dynamic sequences with realistic video characteristics. We demonstrate strong quantitative results and visually appealing synthesized videos on the BUSV benchmark. Notably, training video classification models on combinations of real and LDDM-synthesized videos substantially improves performance over using real data alone, indicating our method successfully emulates dynamics critical for discrimination. Our image-to-video approach provides an effective data augmentation solution to advance ultrasound video analysis. Code is available at <a href="https://github.com/MedAITech/U_I2V">https://github.com/MedAITech/U_I2V</a>.<br/></p><div class="text-with-links"><span></span><span><em>* <!-- -->MICCAI 2024<!-- -->聽</em><br/></span></div><div class="Search_search-result-provider__uWcak">Via<img alt="arxiv icon" loading="lazy" width="56" height="25" decoding="async" data-nimg="1" class="Search_arxiv-icon__SXHe4" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=64&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75"/></div><div class="Search_paper-link__nVhf_"><svg role="img" height="20" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="margin-right:5px"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" width="22" style="margin-right:10px;margin-top:2px"><path stroke-linecap="round" stroke-linejoin="round" d="M12 6.042A8.967 8.967 0 0 0 6 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 0 1 6 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 0 1 6-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0 0 18 18a8.967 8.967 0 0 0-6 2.292m0-14.25v14.25"></path></svg><a data-testid="paper-result-access-link" href="/paper/ultrasound-image-to-video-synthesis-via">Access Paper or Ask Questions</a></div><div data-testid="social-icons-tray" class="SocialIconBar_social-icons-tray__hq8N8"><a href="https://twitter.com/intent/tweet?text=Currently%20reading%20%22Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models%22%20catalyzex.com/paper/ultrasound-image-to-video-synthesis-via%20via%20@CatalyzeX%0A%0AMore%20at:&url=https://www.catalyzex.com&related=CatalyzeX" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Twitter Icon</title><path d="M23.953 4.57a10 10 0 01-2.825.775 4.958 4.958 0 002.163-2.723c-.951.555-2.005.959-3.127 1.184a4.92 4.92 0 00-8.384 4.482C7.69 8.095 4.067 6.13 1.64 3.162a4.822 4.822 0 00-.666 2.475c0 1.71.87 3.213 2.188 4.096a4.904 4.904 0 01-2.228-.616v.06a4.923 4.923 0 003.946 4.827 4.996 4.996 0 01-2.212.085 4.936 4.936 0 004.604 3.417 9.867 9.867 0 01-6.102 2.105c-.39 0-.779-.023-1.17-.067a13.995 13.995 0 007.557 2.209c9.053 0 13.998-7.496 13.998-13.985 0-.21 0-.42-.015-.63A9.935 9.935 0 0024 4.59z"></path></svg></a><a href="https://www.facebook.com/dialog/share?app_id=704241106642044&display=popup&href=catalyzex.com/paper/ultrasound-image-to-video-synthesis-via&redirect_uri=https%3A%2F%2Fcatalyzex.com&quote=Currently%20reading%20%22Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models%22%20via%20CatalyzeX.com" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Facebook Icon</title><path d="M24 12.073c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.99 4.388 10.954 10.125 11.854v-8.385H7.078v-3.47h3.047V9.43c0-3.007 1.792-4.669 4.533-4.669 1.312 0 2.686.235 2.686.235v2.953H15.83c-1.491 0-1.956.925-1.956 1.874v2.25h3.328l-.532 3.47h-2.796v8.385C19.612 23.027 24 18.062 24 12.073z"></path></svg></a><a href="https://www.linkedin.com/sharing/share-offsite/?url=catalyzex.com/paper/ultrasound-image-to-video-synthesis-via&title=Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" aria-labelledby="Linkedin Icon" xmlns="http://www.w3.org/2000/svg" fill="#0e76a8"><title>Linkedin Icon</title><path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z"></path></svg></a><a href="https://api.whatsapp.com/send?text=See this paper I'm reading: Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models - catalyzex.com/paper/ultrasound-image-to-video-synthesis-via %0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning" target="_blank" rel="noreferrer"><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" x="0px" y="0px" viewBox="0 0 512 512" height="28" width="28"><title>Whatsapp Icon</title><path fill="#EDEDED" d="M0,512l35.31-128C12.359,344.276,0,300.138,0,254.234C0,114.759,114.759,0,255.117,0 S512,114.759,512,254.234S395.476,512,255.117,512c-44.138,0-86.51-14.124-124.469-35.31L0,512z"></path><path fill="#55CD6C" d="M137.71,430.786l7.945,4.414c32.662,20.303,70.621,32.662,110.345,32.662 c115.641,0,211.862-96.221,211.862-213.628S371.641,44.138,255.117,44.138S44.138,137.71,44.138,254.234 c0,40.607,11.476,80.331,32.662,113.876l5.297,7.945l-20.303,74.152L137.71,430.786z"></path><path fill="#FEFEFE" d="M187.145,135.945l-16.772-0.883c-5.297,0-10.593,1.766-14.124,5.297 c-7.945,7.062-21.186,20.303-24.717,37.959c-6.179,26.483,3.531,58.262,26.483,90.041s67.09,82.979,144.772,105.048 c24.717,7.062,44.138,2.648,60.028-7.062c12.359-7.945,20.303-20.303,22.952-33.545l2.648-12.359 c0.883-3.531-0.883-7.945-4.414-9.71l-55.614-25.6c-3.531-1.766-7.945-0.883-10.593,2.648l-22.069,28.248 c-1.766,1.766-4.414,2.648-7.062,1.766c-15.007-5.297-65.324-26.483-92.69-79.448c-0.883-2.648-0.883-5.297,0.883-7.062 l21.186-23.834c1.766-2.648,2.648-6.179,1.766-8.828l-25.6-57.379C193.324,138.593,190.676,135.945,187.145,135.945"></path></svg></a><a title="Send via Messenger" href="https://www.facebook.com/dialog/send?app_id=704241106642044&link=catalyzex.com/paper/ultrasound-image-to-video-synthesis-via&redirect_uri=https%3A%2F%2Fcatalyzex.com" target="_blank" rel="noreferrer"><svg role="img" height="24" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#0695FF"><title>Messenger Icon</title><path d="M.001 11.639C.001 4.949 5.241 0 12.001 0S24 4.95 24 11.639c0 6.689-5.24 11.638-12 11.638-1.21 0-2.38-.16-3.47-.46a.96.96 0 00-.64.05l-2.39 1.05a.96.96 0 01-1.35-.85l-.07-2.14a.97.97 0 00-.32-.68A11.39 11.389 0 01.002 11.64zm8.32-2.19l-3.52 5.6c-.35.53.32 1.139.82.75l3.79-2.87c.26-.2.6-.2.87 0l2.8 2.1c.84.63 2.04.4 2.6-.48l3.52-5.6c.35-.53-.32-1.13-.82-.75l-3.79 2.87c-.25.2-.6.2-.86 0l-2.8-2.1a1.8 1.8 0 00-2.61.48z"></path></svg></a><a title="Share via Email" href="mailto:?subject=See this paper I'm reading: Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models&body=%22Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models%22 - catalyzex.com/paper/ultrasound-image-to-video-synthesis-via%0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning%0D%0A%0D%0A" target="_blank" rel="noreferrer"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="#ff8c00" aria-hidden="true" data-slot="icon" height="30" width="30"><title>Email Icon</title><path d="M1.5 8.67v8.58a3 3 0 0 0 3 3h15a3 3 0 0 0 3-3V8.67l-8.928 5.493a3 3 0 0 1-3.144 0L1.5 8.67Z"></path><path d="M22.5 6.908V6.75a3 3 0 0 0-3-3h-15a3 3 0 0 0-3 3v.158l9.714 5.978a1.5 1.5 0 0 0 1.572 0L22.5 6.908Z"></path></svg></a></div></section><div class="Search_seperator-line__4FidS"></div></div><div><section data-testid="paper-details-container" class="Search_paper-details-container__Dou2Q"><h2 class="Search_paper-heading__bq58c"><a data-testid="paper-result-title" href="/paper/reducing-annotation-burden-exploiting-image"><strong>Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning</strong></a></h2><div class="Search_buttons-container__WWw_l"><a href="#" target="_blank" id="request-code-2503.14958" data-testid="view-code-button" class="Search_view-code-link__xOgGF"><button type="button" class="btn Search_view-button__D5D2K Search_buttons-spacing__iB2NS Search_black-button__O7oac Search_view-code-button__8Dk6Z"><svg role="img" height="14" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#fff"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg>View Code</button></a><button type="button" class="btn Search_view-button__D5D2K Search_black-button__O7oac Search_buttons-spacing__iB2NS"><svg fill="#fff" height="20" viewBox="0 0 48 48" width="20" xmlns="http://www.w3.org/2000/svg"><title>Play Icon</title><path d="M0 0h48v48H0z" fill="none"></path><path d="M24 4C12.95 4 4 12.95 4 24s8.95 20 20 20 20-8.95 20-20S35.05 4 24 4zm-4 29V15l12 9-12 9z"></path></svg>Notebook</button><button type="button" class="Search_buttons-spacing__iB2NS Search_related-code-btn__F5B3X" data-testid="related-code-button"><span class="descriptor" style="display:none">Code for Similar Papers:</span><img alt="Code for Similar Papers" title="View code for similar papers" loading="lazy" width="37" height="35" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75"/></button><a class="Search_buttons-spacing__iB2NS Search_add-code-button__GKwQr" target="_blank" href="/add_code?title=Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning&paper_url=http://arxiv.org/abs/2503.14958" rel="nofollow"><img alt="Add code" title="Contribute your code for this paper to the community" loading="lazy" width="36" height="36" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75"/></a><div class="wrapper Search_buttons-spacing__iB2NS BookmarkButton_bookmark-wrapper__xJaOg"><button title="Bookmark this paper"><img alt="Bookmark button" id="bookmark-btn" loading="lazy" width="388" height="512" decoding="async" data-nimg="1" class="BookmarkButton_bookmark-btn-image__gkInJ" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75"/></button></div><div class="wrapper Search_buttons-spacing__iB2NS"><button class="AlertButton_alert-btn__pC8cK" title="Get alerts when new code is available for this paper"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:9px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 75.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M69.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div><span class="Search_publication-date__mLvO2">Mar 19, 2025<br/></span><div class="AuthorLinks_authors-container__fAwXT"><span class="descriptor" style="display:none">Authors:</span><span><a data-testid="paper-result-author" href="/author/Zixuan%20Zheng">Zixuan Zheng</a>, </span><span><a data-testid="paper-result-author" href="/author/Yilei%20Shi">Yilei Shi</a>, </span><span><a data-testid="paper-result-author" href="/author/Chunlei%20Li">Chunlei Li</a>, </span><span><a data-testid="paper-result-author" href="/author/Jingliang%20Hu">Jingliang Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Xiao%20Xiang%20Zhu">Xiao Xiang Zhu</a>, </span><span><a data-testid="paper-result-author" href="/author/Lichao%20Mou">Lichao Mou</a></span></div><div class="Search_paper-detail-page-images-container__FPeuN"></div><p class="Search_paper-content__1CSu5 text-with-links"><span class="descriptor" style="display:none">Abstract:</span>Few-shot video object segmentation aims to reduce annotation costs; however, existing methods still require abundant dense frame annotations for training, which are scarce in the medical domain. We investigate an extremely low-data regime that utilizes annotations from only a few video frames and leverages existing labeled images to minimize costly video annotations. Specifically, we propose a two-phase framework. First, we learn a few-shot segmentation model using labeled images. Subsequently, to improve performance without full supervision, we introduce a spatiotemporal consistency relearning approach on medical videos that enforces consistency between consecutive frames. Constraints are also enforced between the image model and relearning model at both feature and prediction levels. Experiments demonstrate the superiority of our approach over state-of-the-art few-shot segmentation methods. Our model bridges the gap between abundant annotated medical images and scarce, sparsely labeled medical videos to achieve strong video segmentation performance in this low data regime. Code is available at <a href="https://github.com/MedAITech/RAB">https://github.com/MedAITech/RAB</a>.<br/></p><div class="text-with-links"><span></span><span><em>* <!-- -->MICCAI 2024<!-- -->聽</em><br/></span></div><div class="Search_search-result-provider__uWcak">Via<img alt="arxiv icon" loading="lazy" width="56" height="25" decoding="async" data-nimg="1" class="Search_arxiv-icon__SXHe4" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=64&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75"/></div><div class="Search_paper-link__nVhf_"><svg role="img" height="20" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="margin-right:5px"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" width="22" style="margin-right:10px;margin-top:2px"><path stroke-linecap="round" stroke-linejoin="round" d="M12 6.042A8.967 8.967 0 0 0 6 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 0 1 6 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 0 1 6-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0 0 18 18a8.967 8.967 0 0 0-6 2.292m0-14.25v14.25"></path></svg><a data-testid="paper-result-access-link" href="/paper/reducing-annotation-burden-exploiting-image">Access Paper or Ask Questions</a></div><div data-testid="social-icons-tray" class="SocialIconBar_social-icons-tray__hq8N8"><a href="https://twitter.com/intent/tweet?text=Currently%20reading%20%22Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning%22%20catalyzex.com/paper/reducing-annotation-burden-exploiting-image%20via%20@CatalyzeX%0A%0AMore%20at:&url=https://www.catalyzex.com&related=CatalyzeX" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Twitter Icon</title><path d="M23.953 4.57a10 10 0 01-2.825.775 4.958 4.958 0 002.163-2.723c-.951.555-2.005.959-3.127 1.184a4.92 4.92 0 00-8.384 4.482C7.69 8.095 4.067 6.13 1.64 3.162a4.822 4.822 0 00-.666 2.475c0 1.71.87 3.213 2.188 4.096a4.904 4.904 0 01-2.228-.616v.06a4.923 4.923 0 003.946 4.827 4.996 4.996 0 01-2.212.085 4.936 4.936 0 004.604 3.417 9.867 9.867 0 01-6.102 2.105c-.39 0-.779-.023-1.17-.067a13.995 13.995 0 007.557 2.209c9.053 0 13.998-7.496 13.998-13.985 0-.21 0-.42-.015-.63A9.935 9.935 0 0024 4.59z"></path></svg></a><a href="https://www.facebook.com/dialog/share?app_id=704241106642044&display=popup&href=catalyzex.com/paper/reducing-annotation-burden-exploiting-image&redirect_uri=https%3A%2F%2Fcatalyzex.com&quote=Currently%20reading%20%22Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning%22%20via%20CatalyzeX.com" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Facebook Icon</title><path d="M24 12.073c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.99 4.388 10.954 10.125 11.854v-8.385H7.078v-3.47h3.047V9.43c0-3.007 1.792-4.669 4.533-4.669 1.312 0 2.686.235 2.686.235v2.953H15.83c-1.491 0-1.956.925-1.956 1.874v2.25h3.328l-.532 3.47h-2.796v8.385C19.612 23.027 24 18.062 24 12.073z"></path></svg></a><a href="https://www.linkedin.com/sharing/share-offsite/?url=catalyzex.com/paper/reducing-annotation-burden-exploiting-image&title=Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" aria-labelledby="Linkedin Icon" xmlns="http://www.w3.org/2000/svg" fill="#0e76a8"><title>Linkedin Icon</title><path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z"></path></svg></a><a href="https://api.whatsapp.com/send?text=See this paper I'm reading: Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning - catalyzex.com/paper/reducing-annotation-burden-exploiting-image %0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning" target="_blank" rel="noreferrer"><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" x="0px" y="0px" viewBox="0 0 512 512" height="28" width="28"><title>Whatsapp Icon</title><path fill="#EDEDED" d="M0,512l35.31-128C12.359,344.276,0,300.138,0,254.234C0,114.759,114.759,0,255.117,0 S512,114.759,512,254.234S395.476,512,255.117,512c-44.138,0-86.51-14.124-124.469-35.31L0,512z"></path><path fill="#55CD6C" d="M137.71,430.786l7.945,4.414c32.662,20.303,70.621,32.662,110.345,32.662 c115.641,0,211.862-96.221,211.862-213.628S371.641,44.138,255.117,44.138S44.138,137.71,44.138,254.234 c0,40.607,11.476,80.331,32.662,113.876l5.297,7.945l-20.303,74.152L137.71,430.786z"></path><path fill="#FEFEFE" d="M187.145,135.945l-16.772-0.883c-5.297,0-10.593,1.766-14.124,5.297 c-7.945,7.062-21.186,20.303-24.717,37.959c-6.179,26.483,3.531,58.262,26.483,90.041s67.09,82.979,144.772,105.048 c24.717,7.062,44.138,2.648,60.028-7.062c12.359-7.945,20.303-20.303,22.952-33.545l2.648-12.359 c0.883-3.531-0.883-7.945-4.414-9.71l-55.614-25.6c-3.531-1.766-7.945-0.883-10.593,2.648l-22.069,28.248 c-1.766,1.766-4.414,2.648-7.062,1.766c-15.007-5.297-65.324-26.483-92.69-79.448c-0.883-2.648-0.883-5.297,0.883-7.062 l21.186-23.834c1.766-2.648,2.648-6.179,1.766-8.828l-25.6-57.379C193.324,138.593,190.676,135.945,187.145,135.945"></path></svg></a><a title="Send via Messenger" href="https://www.facebook.com/dialog/send?app_id=704241106642044&link=catalyzex.com/paper/reducing-annotation-burden-exploiting-image&redirect_uri=https%3A%2F%2Fcatalyzex.com" target="_blank" rel="noreferrer"><svg role="img" height="24" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#0695FF"><title>Messenger Icon</title><path d="M.001 11.639C.001 4.949 5.241 0 12.001 0S24 4.95 24 11.639c0 6.689-5.24 11.638-12 11.638-1.21 0-2.38-.16-3.47-.46a.96.96 0 00-.64.05l-2.39 1.05a.96.96 0 01-1.35-.85l-.07-2.14a.97.97 0 00-.32-.68A11.39 11.389 0 01.002 11.64zm8.32-2.19l-3.52 5.6c-.35.53.32 1.139.82.75l3.79-2.87c.26-.2.6-.2.87 0l2.8 2.1c.84.63 2.04.4 2.6-.48l3.52-5.6c.35-.53-.32-1.13-.82-.75l-3.79 2.87c-.25.2-.6.2-.86 0l-2.8-2.1a1.8 1.8 0 00-2.61.48z"></path></svg></a><a title="Share via Email" href="mailto:?subject=See this paper I'm reading: Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning&body=%22Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning%22 - catalyzex.com/paper/reducing-annotation-burden-exploiting-image%0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning%0D%0A%0D%0A" target="_blank" rel="noreferrer"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="#ff8c00" aria-hidden="true" data-slot="icon" height="30" width="30"><title>Email Icon</title><path d="M1.5 8.67v8.58a3 3 0 0 0 3 3h15a3 3 0 0 0 3-3V8.67l-8.928 5.493a3 3 0 0 1-3.144 0L1.5 8.67Z"></path><path d="M22.5 6.908V6.75a3 3 0 0 0-3-3h-15a3 3 0 0 0-3 3v.158l9.714 5.978a1.5 1.5 0 0 0 1.572 0L22.5 6.908Z"></path></svg></a></div></section><div class="Search_seperator-line__4FidS"></div></div><div><section data-testid="paper-details-container" class="Search_paper-details-container__Dou2Q"><h2 class="Search_paper-heading__bq58c"><a data-testid="paper-result-title" href="/paper/scale-aware-contrastive-reverse-distillation"><strong>Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection</strong></a></h2><div class="Search_buttons-container__WWw_l"><a href="#" target="_blank" id="request-code-2503.13828" data-testid="view-code-button" class="Search_view-code-link__xOgGF"><button type="button" class="btn Search_view-button__D5D2K Search_buttons-spacing__iB2NS Search_black-button__O7oac Search_view-code-button__8Dk6Z"><svg role="img" height="14" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#fff"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg>View Code</button></a><button type="button" class="btn Search_view-button__D5D2K Search_black-button__O7oac Search_buttons-spacing__iB2NS"><svg fill="#fff" height="20" viewBox="0 0 48 48" width="20" xmlns="http://www.w3.org/2000/svg"><title>Play Icon</title><path d="M0 0h48v48H0z" fill="none"></path><path d="M24 4C12.95 4 4 12.95 4 24s8.95 20 20 20 20-8.95 20-20S35.05 4 24 4zm-4 29V15l12 9-12 9z"></path></svg>Notebook</button><button type="button" class="Search_buttons-spacing__iB2NS Search_related-code-btn__F5B3X" data-testid="related-code-button"><span class="descriptor" style="display:none">Code for Similar Papers:</span><img alt="Code for Similar Papers" title="View code for similar papers" loading="lazy" width="37" height="35" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75"/></button><a class="Search_buttons-spacing__iB2NS Search_add-code-button__GKwQr" target="_blank" href="/add_code?title=Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection&paper_url=http://arxiv.org/abs/2503.13828" rel="nofollow"><img alt="Add code" title="Contribute your code for this paper to the community" loading="lazy" width="36" height="36" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75"/></a><div class="wrapper Search_buttons-spacing__iB2NS BookmarkButton_bookmark-wrapper__xJaOg"><button title="Bookmark this paper"><img alt="Bookmark button" id="bookmark-btn" loading="lazy" width="388" height="512" decoding="async" data-nimg="1" class="BookmarkButton_bookmark-btn-image__gkInJ" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75"/></button></div><div class="wrapper Search_buttons-spacing__iB2NS"><button class="AlertButton_alert-btn__pC8cK" title="Get alerts when new code is available for this paper"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:9px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 75.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M69.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div><span class="Search_publication-date__mLvO2">Mar 18, 2025<br/></span><div class="AuthorLinks_authors-container__fAwXT"><span class="descriptor" style="display:none">Authors:</span><span><a data-testid="paper-result-author" href="/author/Chunlei%20Li">Chunlei Li</a>, </span><span><a data-testid="paper-result-author" href="/author/Yilei%20Shi">Yilei Shi</a>, </span><span><a data-testid="paper-result-author" href="/author/Jingliang%20Hu">Jingliang Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Xiao%20Xiang%20Zhu">Xiao Xiang Zhu</a>, </span><span><a data-testid="paper-result-author" href="/author/Lichao%20Mou">Lichao Mou</a></span></div><div class="Search_paper-detail-page-images-container__FPeuN"></div><p class="Search_paper-content__1CSu5 text-with-links"><span class="descriptor" style="display:none">Abstract:</span>Unsupervised anomaly detection using deep learning has garnered significant research attention due to its broad applicability, particularly in medical imaging where labeled anomalous data are scarce. While earlier approaches leverage generative models like autoencoders and generative adversarial networks (GANs), they often fall short due to overgeneralization. Recent methods explore various strategies, including memory banks, normalizing flows, self-supervised learning, and knowledge distillation, to enhance discrimination. Among these, knowledge distillation, particularly reverse distillation, has shown promise. Following this paradigm, we propose a novel scale-aware contrastive reverse distillation model that addresses two key limitations of existing reverse distillation methods: insufficient feature discriminability and inability to handle anomaly scale variations. Specifically, we introduce a contrastive student-teacher learning approach to derive more discriminative representations by generating and exploring out-of-normal distributions. Further, we design a scale adaptation mechanism to softly weight contrastive distillation losses at different scales to account for the scale variation issue. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, validating the efficacy of the proposed method. Code is available at <a href="https://github.com/MedAITech/SCRD4AD">https://github.com/MedAITech/SCRD4AD</a>.<br/></p><div class="text-with-links"><span></span><span><em>* <!-- -->ICLR 2025<!-- -->聽</em><br/></span></div><div class="Search_search-result-provider__uWcak">Via<img alt="arxiv icon" loading="lazy" width="56" height="25" decoding="async" data-nimg="1" class="Search_arxiv-icon__SXHe4" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=64&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75"/></div><div class="Search_paper-link__nVhf_"><svg role="img" height="20" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="margin-right:5px"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" width="22" style="margin-right:10px;margin-top:2px"><path stroke-linecap="round" stroke-linejoin="round" d="M12 6.042A8.967 8.967 0 0 0 6 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 0 1 6 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 0 1 6-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0 0 18 18a8.967 8.967 0 0 0-6 2.292m0-14.25v14.25"></path></svg><a data-testid="paper-result-access-link" href="/paper/scale-aware-contrastive-reverse-distillation">Access Paper or Ask Questions</a></div><div data-testid="social-icons-tray" class="SocialIconBar_social-icons-tray__hq8N8"><a href="https://twitter.com/intent/tweet?text=Currently%20reading%20%22Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection%22%20catalyzex.com/paper/scale-aware-contrastive-reverse-distillation%20via%20@CatalyzeX%0A%0AMore%20at:&url=https://www.catalyzex.com&related=CatalyzeX" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Twitter Icon</title><path d="M23.953 4.57a10 10 0 01-2.825.775 4.958 4.958 0 002.163-2.723c-.951.555-2.005.959-3.127 1.184a4.92 4.92 0 00-8.384 4.482C7.69 8.095 4.067 6.13 1.64 3.162a4.822 4.822 0 00-.666 2.475c0 1.71.87 3.213 2.188 4.096a4.904 4.904 0 01-2.228-.616v.06a4.923 4.923 0 003.946 4.827 4.996 4.996 0 01-2.212.085 4.936 4.936 0 004.604 3.417 9.867 9.867 0 01-6.102 2.105c-.39 0-.779-.023-1.17-.067a13.995 13.995 0 007.557 2.209c9.053 0 13.998-7.496 13.998-13.985 0-.21 0-.42-.015-.63A9.935 9.935 0 0024 4.59z"></path></svg></a><a href="https://www.facebook.com/dialog/share?app_id=704241106642044&display=popup&href=catalyzex.com/paper/scale-aware-contrastive-reverse-distillation&redirect_uri=https%3A%2F%2Fcatalyzex.com&quote=Currently%20reading%20%22Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection%22%20via%20CatalyzeX.com" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Facebook Icon</title><path d="M24 12.073c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.99 4.388 10.954 10.125 11.854v-8.385H7.078v-3.47h3.047V9.43c0-3.007 1.792-4.669 4.533-4.669 1.312 0 2.686.235 2.686.235v2.953H15.83c-1.491 0-1.956.925-1.956 1.874v2.25h3.328l-.532 3.47h-2.796v8.385C19.612 23.027 24 18.062 24 12.073z"></path></svg></a><a href="https://www.linkedin.com/sharing/share-offsite/?url=catalyzex.com/paper/scale-aware-contrastive-reverse-distillation&title=Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" aria-labelledby="Linkedin Icon" xmlns="http://www.w3.org/2000/svg" fill="#0e76a8"><title>Linkedin Icon</title><path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z"></path></svg></a><a href="https://api.whatsapp.com/send?text=See this paper I'm reading: Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection - catalyzex.com/paper/scale-aware-contrastive-reverse-distillation %0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning" target="_blank" rel="noreferrer"><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" x="0px" y="0px" viewBox="0 0 512 512" height="28" width="28"><title>Whatsapp Icon</title><path fill="#EDEDED" d="M0,512l35.31-128C12.359,344.276,0,300.138,0,254.234C0,114.759,114.759,0,255.117,0 S512,114.759,512,254.234S395.476,512,255.117,512c-44.138,0-86.51-14.124-124.469-35.31L0,512z"></path><path fill="#55CD6C" d="M137.71,430.786l7.945,4.414c32.662,20.303,70.621,32.662,110.345,32.662 c115.641,0,211.862-96.221,211.862-213.628S371.641,44.138,255.117,44.138S44.138,137.71,44.138,254.234 c0,40.607,11.476,80.331,32.662,113.876l5.297,7.945l-20.303,74.152L137.71,430.786z"></path><path fill="#FEFEFE" d="M187.145,135.945l-16.772-0.883c-5.297,0-10.593,1.766-14.124,5.297 c-7.945,7.062-21.186,20.303-24.717,37.959c-6.179,26.483,3.531,58.262,26.483,90.041s67.09,82.979,144.772,105.048 c24.717,7.062,44.138,2.648,60.028-7.062c12.359-7.945,20.303-20.303,22.952-33.545l2.648-12.359 c0.883-3.531-0.883-7.945-4.414-9.71l-55.614-25.6c-3.531-1.766-7.945-0.883-10.593,2.648l-22.069,28.248 c-1.766,1.766-4.414,2.648-7.062,1.766c-15.007-5.297-65.324-26.483-92.69-79.448c-0.883-2.648-0.883-5.297,0.883-7.062 l21.186-23.834c1.766-2.648,2.648-6.179,1.766-8.828l-25.6-57.379C193.324,138.593,190.676,135.945,187.145,135.945"></path></svg></a><a title="Send via Messenger" href="https://www.facebook.com/dialog/send?app_id=704241106642044&link=catalyzex.com/paper/scale-aware-contrastive-reverse-distillation&redirect_uri=https%3A%2F%2Fcatalyzex.com" target="_blank" rel="noreferrer"><svg role="img" height="24" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#0695FF"><title>Messenger Icon</title><path d="M.001 11.639C.001 4.949 5.241 0 12.001 0S24 4.95 24 11.639c0 6.689-5.24 11.638-12 11.638-1.21 0-2.38-.16-3.47-.46a.96.96 0 00-.64.05l-2.39 1.05a.96.96 0 01-1.35-.85l-.07-2.14a.97.97 0 00-.32-.68A11.39 11.389 0 01.002 11.64zm8.32-2.19l-3.52 5.6c-.35.53.32 1.139.82.75l3.79-2.87c.26-.2.6-.2.87 0l2.8 2.1c.84.63 2.04.4 2.6-.48l3.52-5.6c.35-.53-.32-1.13-.82-.75l-3.79 2.87c-.25.2-.6.2-.86 0l-2.8-2.1a1.8 1.8 0 00-2.61.48z"></path></svg></a><a title="Share via Email" href="mailto:?subject=See this paper I'm reading: Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection&body=%22Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection%22 - catalyzex.com/paper/scale-aware-contrastive-reverse-distillation%0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning%0D%0A%0D%0A" target="_blank" rel="noreferrer"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="#ff8c00" aria-hidden="true" data-slot="icon" height="30" width="30"><title>Email Icon</title><path d="M1.5 8.67v8.58a3 3 0 0 0 3 3h15a3 3 0 0 0 3-3V8.67l-8.928 5.493a3 3 0 0 1-3.144 0L1.5 8.67Z"></path><path d="M22.5 6.908V6.75a3 3 0 0 0-3-3h-15a3 3 0 0 0-3 3v.158l9.714 5.978a1.5 1.5 0 0 0 1.572 0L22.5 6.908Z"></path></svg></a></div></section><div class="Search_seperator-line__4FidS"></div></div><div><section data-testid="paper-details-container" class="Search_paper-details-container__Dou2Q"><h2 class="Search_paper-heading__bq58c"><a data-testid="paper-result-title" href="/paper/rethinking-cell-counting-methods-decoupling"><strong>Rethinking Cell Counting Methods: Decoupling Counting and Localization</strong></a></h2><div class="Search_buttons-container__WWw_l"><a href="#" target="_blank" id="request-code-2503.13989" data-testid="view-code-button" class="Search_view-code-link__xOgGF"><button type="button" class="btn Search_view-button__D5D2K Search_buttons-spacing__iB2NS Search_black-button__O7oac Search_view-code-button__8Dk6Z"><svg role="img" height="14" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#fff"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg>View Code</button></a><button type="button" class="btn Search_view-button__D5D2K Search_black-button__O7oac Search_buttons-spacing__iB2NS"><svg fill="#fff" height="20" viewBox="0 0 48 48" width="20" xmlns="http://www.w3.org/2000/svg"><title>Play Icon</title><path d="M0 0h48v48H0z" fill="none"></path><path d="M24 4C12.95 4 4 12.95 4 24s8.95 20 20 20 20-8.95 20-20S35.05 4 24 4zm-4 29V15l12 9-12 9z"></path></svg>Notebook</button><button type="button" class="Search_buttons-spacing__iB2NS Search_related-code-btn__F5B3X" data-testid="related-code-button"><span class="descriptor" style="display:none">Code for Similar Papers:</span><img alt="Code for Similar Papers" title="View code for similar papers" loading="lazy" width="37" height="35" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75"/></button><a class="Search_buttons-spacing__iB2NS Search_add-code-button__GKwQr" target="_blank" href="/add_code?title=Rethinking Cell Counting Methods: Decoupling Counting and Localization&paper_url=http://arxiv.org/abs/2503.13989" rel="nofollow"><img alt="Add code" title="Contribute your code for this paper to the community" loading="lazy" width="36" height="36" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75"/></a><div class="wrapper Search_buttons-spacing__iB2NS BookmarkButton_bookmark-wrapper__xJaOg"><button title="Bookmark this paper"><img alt="Bookmark button" id="bookmark-btn" loading="lazy" width="388" height="512" decoding="async" data-nimg="1" class="BookmarkButton_bookmark-btn-image__gkInJ" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75"/></button></div><div class="wrapper Search_buttons-spacing__iB2NS"><button class="AlertButton_alert-btn__pC8cK" title="Get alerts when new code is available for this paper"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:9px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 75.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M69.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div><span class="Search_publication-date__mLvO2">Mar 18, 2025<br/></span><div class="AuthorLinks_authors-container__fAwXT"><span class="descriptor" style="display:none">Authors:</span><span><a data-testid="paper-result-author" href="/author/Zixuan%20Zheng">Zixuan Zheng</a>, </span><span><a data-testid="paper-result-author" href="/author/Yilei%20Shi">Yilei Shi</a>, </span><span><a data-testid="paper-result-author" href="/author/Chunlei%20Li">Chunlei Li</a>, </span><span><a data-testid="paper-result-author" href="/author/Jingliang%20Hu">Jingliang Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Xiao%20Xiang%20Zhu">Xiao Xiang Zhu</a>, </span><span><a data-testid="paper-result-author" href="/author/Lichao%20Mou">Lichao Mou</a></span></div><div class="Search_paper-detail-page-images-container__FPeuN"></div><p class="Search_paper-content__1CSu5 text-with-links"><span class="descriptor" style="display:none">Abstract:</span>Cell counting in microscopy images is vital in medicine and biology but extremely tedious and time-consuming to perform manually. While automated methods have advanced in recent years, state-of-the-art approaches tend to increasingly complex model designs. In this paper, we propose a conceptually simple yet effective decoupled learning scheme for automated cell counting, consisting of separate counter and localizer networks. In contrast to jointly learning counting and density map estimation, we show that decoupling these objectives surprisingly improves results. The counter operates on intermediate feature maps rather than pixel space to leverage global context and produce count estimates, while also generating coarse density maps. The localizer then reconstructs high-resolution density maps that precisely localize individual cells, conditional on the original images and coarse density maps from the counter. Besides, to boost counting accuracy, we further introduce a global message passing module to integrate cross-region patterns. Extensive experiments on four datasets demonstrate that our approach, despite its simplicity, challenges common practice and achieves state-of-the-art performance by significant margins. Our key insight is that decoupled learning alleviates the need to learn counting on high-resolution density maps directly, allowing the model to focus on global features critical for accurate estimates. Code is available at <a href="https://github.com/MedAITech/DCL">https://github.com/MedAITech/DCL</a>.<br/></p><div class="text-with-links"><span></span><span><em>* <!-- -->MICCAI 2024<!-- -->聽</em><br/></span></div><div class="Search_search-result-provider__uWcak">Via<img alt="arxiv icon" loading="lazy" width="56" height="25" decoding="async" data-nimg="1" class="Search_arxiv-icon__SXHe4" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=64&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75"/></div><div class="Search_paper-link__nVhf_"><svg role="img" height="20" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="margin-right:5px"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" width="22" style="margin-right:10px;margin-top:2px"><path stroke-linecap="round" stroke-linejoin="round" d="M12 6.042A8.967 8.967 0 0 0 6 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 0 1 6 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 0 1 6-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0 0 18 18a8.967 8.967 0 0 0-6 2.292m0-14.25v14.25"></path></svg><a data-testid="paper-result-access-link" href="/paper/rethinking-cell-counting-methods-decoupling">Access Paper or Ask Questions</a></div><div data-testid="social-icons-tray" class="SocialIconBar_social-icons-tray__hq8N8"><a href="https://twitter.com/intent/tweet?text=Currently%20reading%20%22Rethinking Cell Counting Methods: Decoupling Counting and Localization%22%20catalyzex.com/paper/rethinking-cell-counting-methods-decoupling%20via%20@CatalyzeX%0A%0AMore%20at:&url=https://www.catalyzex.com&related=CatalyzeX" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Twitter Icon</title><path d="M23.953 4.57a10 10 0 01-2.825.775 4.958 4.958 0 002.163-2.723c-.951.555-2.005.959-3.127 1.184a4.92 4.92 0 00-8.384 4.482C7.69 8.095 4.067 6.13 1.64 3.162a4.822 4.822 0 00-.666 2.475c0 1.71.87 3.213 2.188 4.096a4.904 4.904 0 01-2.228-.616v.06a4.923 4.923 0 003.946 4.827 4.996 4.996 0 01-2.212.085 4.936 4.936 0 004.604 3.417 9.867 9.867 0 01-6.102 2.105c-.39 0-.779-.023-1.17-.067a13.995 13.995 0 007.557 2.209c9.053 0 13.998-7.496 13.998-13.985 0-.21 0-.42-.015-.63A9.935 9.935 0 0024 4.59z"></path></svg></a><a href="https://www.facebook.com/dialog/share?app_id=704241106642044&display=popup&href=catalyzex.com/paper/rethinking-cell-counting-methods-decoupling&redirect_uri=https%3A%2F%2Fcatalyzex.com&quote=Currently%20reading%20%22Rethinking Cell Counting Methods: Decoupling Counting and Localization%22%20via%20CatalyzeX.com" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Facebook Icon</title><path d="M24 12.073c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.99 4.388 10.954 10.125 11.854v-8.385H7.078v-3.47h3.047V9.43c0-3.007 1.792-4.669 4.533-4.669 1.312 0 2.686.235 2.686.235v2.953H15.83c-1.491 0-1.956.925-1.956 1.874v2.25h3.328l-.532 3.47h-2.796v8.385C19.612 23.027 24 18.062 24 12.073z"></path></svg></a><a href="https://www.linkedin.com/sharing/share-offsite/?url=catalyzex.com/paper/rethinking-cell-counting-methods-decoupling&title=Rethinking Cell Counting Methods: Decoupling Counting and Localization" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" aria-labelledby="Linkedin Icon" xmlns="http://www.w3.org/2000/svg" fill="#0e76a8"><title>Linkedin Icon</title><path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z"></path></svg></a><a href="https://api.whatsapp.com/send?text=See this paper I'm reading: Rethinking Cell Counting Methods: Decoupling Counting and Localization - catalyzex.com/paper/rethinking-cell-counting-methods-decoupling %0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning" target="_blank" rel="noreferrer"><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" x="0px" y="0px" viewBox="0 0 512 512" height="28" width="28"><title>Whatsapp Icon</title><path fill="#EDEDED" d="M0,512l35.31-128C12.359,344.276,0,300.138,0,254.234C0,114.759,114.759,0,255.117,0 S512,114.759,512,254.234S395.476,512,255.117,512c-44.138,0-86.51-14.124-124.469-35.31L0,512z"></path><path fill="#55CD6C" d="M137.71,430.786l7.945,4.414c32.662,20.303,70.621,32.662,110.345,32.662 c115.641,0,211.862-96.221,211.862-213.628S371.641,44.138,255.117,44.138S44.138,137.71,44.138,254.234 c0,40.607,11.476,80.331,32.662,113.876l5.297,7.945l-20.303,74.152L137.71,430.786z"></path><path fill="#FEFEFE" d="M187.145,135.945l-16.772-0.883c-5.297,0-10.593,1.766-14.124,5.297 c-7.945,7.062-21.186,20.303-24.717,37.959c-6.179,26.483,3.531,58.262,26.483,90.041s67.09,82.979,144.772,105.048 c24.717,7.062,44.138,2.648,60.028-7.062c12.359-7.945,20.303-20.303,22.952-33.545l2.648-12.359 c0.883-3.531-0.883-7.945-4.414-9.71l-55.614-25.6c-3.531-1.766-7.945-0.883-10.593,2.648l-22.069,28.248 c-1.766,1.766-4.414,2.648-7.062,1.766c-15.007-5.297-65.324-26.483-92.69-79.448c-0.883-2.648-0.883-5.297,0.883-7.062 l21.186-23.834c1.766-2.648,2.648-6.179,1.766-8.828l-25.6-57.379C193.324,138.593,190.676,135.945,187.145,135.945"></path></svg></a><a title="Send via Messenger" href="https://www.facebook.com/dialog/send?app_id=704241106642044&link=catalyzex.com/paper/rethinking-cell-counting-methods-decoupling&redirect_uri=https%3A%2F%2Fcatalyzex.com" target="_blank" rel="noreferrer"><svg role="img" height="24" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#0695FF"><title>Messenger Icon</title><path d="M.001 11.639C.001 4.949 5.241 0 12.001 0S24 4.95 24 11.639c0 6.689-5.24 11.638-12 11.638-1.21 0-2.38-.16-3.47-.46a.96.96 0 00-.64.05l-2.39 1.05a.96.96 0 01-1.35-.85l-.07-2.14a.97.97 0 00-.32-.68A11.39 11.389 0 01.002 11.64zm8.32-2.19l-3.52 5.6c-.35.53.32 1.139.82.75l3.79-2.87c.26-.2.6-.2.87 0l2.8 2.1c.84.63 2.04.4 2.6-.48l3.52-5.6c.35-.53-.32-1.13-.82-.75l-3.79 2.87c-.25.2-.6.2-.86 0l-2.8-2.1a1.8 1.8 0 00-2.61.48z"></path></svg></a><a title="Share via Email" href="mailto:?subject=See this paper I'm reading: Rethinking Cell Counting Methods: Decoupling Counting and Localization&body=%22Rethinking Cell Counting Methods: Decoupling Counting and Localization%22 - catalyzex.com/paper/rethinking-cell-counting-methods-decoupling%0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning%0D%0A%0D%0A" target="_blank" rel="noreferrer"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="#ff8c00" aria-hidden="true" data-slot="icon" height="30" width="30"><title>Email Icon</title><path d="M1.5 8.67v8.58a3 3 0 0 0 3 3h15a3 3 0 0 0 3-3V8.67l-8.928 5.493a3 3 0 0 1-3.144 0L1.5 8.67Z"></path><path d="M22.5 6.908V6.75a3 3 0 0 0-3-3h-15a3 3 0 0 0-3 3v.158l9.714 5.978a1.5 1.5 0 0 0 1.572 0L22.5 6.908Z"></path></svg></a></div></section><div class="Search_seperator-line__4FidS"></div></div><div><section data-testid="paper-details-container" class="Search_paper-details-container__Dou2Q"><h2 class="Search_paper-heading__bq58c"><a data-testid="paper-result-title" href="/paper/striving-for-simplicity-simple-yet-effective"><strong>Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation</strong></a></h2><div class="Search_buttons-container__WWw_l"><a href="#" target="_blank" id="request-code-2503.13987" data-testid="view-code-button" class="Search_view-code-link__xOgGF"><button type="button" class="btn Search_view-button__D5D2K Search_buttons-spacing__iB2NS Search_black-button__O7oac Search_view-code-button__8Dk6Z"><svg role="img" height="14" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#fff"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg>View Code</button></a><button type="button" class="btn Search_view-button__D5D2K Search_black-button__O7oac Search_buttons-spacing__iB2NS"><svg fill="#fff" height="20" viewBox="0 0 48 48" width="20" xmlns="http://www.w3.org/2000/svg"><title>Play Icon</title><path d="M0 0h48v48H0z" fill="none"></path><path d="M24 4C12.95 4 4 12.95 4 24s8.95 20 20 20 20-8.95 20-20S35.05 4 24 4zm-4 29V15l12 9-12 9z"></path></svg>Notebook</button><button type="button" class="Search_buttons-spacing__iB2NS Search_related-code-btn__F5B3X" data-testid="related-code-button"><span class="descriptor" style="display:none">Code for Similar Papers:</span><img alt="Code for Similar Papers" title="View code for similar papers" loading="lazy" width="37" height="35" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75"/></button><a class="Search_buttons-spacing__iB2NS Search_add-code-button__GKwQr" target="_blank" href="/add_code?title=Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation&paper_url=http://arxiv.org/abs/2503.13987" rel="nofollow"><img alt="Add code" title="Contribute your code for this paper to the community" loading="lazy" width="36" height="36" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75"/></a><div class="wrapper Search_buttons-spacing__iB2NS BookmarkButton_bookmark-wrapper__xJaOg"><button title="Bookmark this paper"><img alt="Bookmark button" id="bookmark-btn" loading="lazy" width="388" height="512" decoding="async" data-nimg="1" class="BookmarkButton_bookmark-btn-image__gkInJ" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75"/></button></div><div class="wrapper Search_buttons-spacing__iB2NS"><button class="AlertButton_alert-btn__pC8cK" title="Get alerts when new code is available for this paper"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:9px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 75.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M69.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div><span class="Search_publication-date__mLvO2">Mar 18, 2025<br/></span><div class="AuthorLinks_authors-container__fAwXT"><span class="descriptor" style="display:none">Authors:</span><span><a data-testid="paper-result-author" href="/author/Yaxiong%20Chen">Yaxiong Chen</a>, </span><span><a data-testid="paper-result-author" href="/author/Yujie%20Wang">Yujie Wang</a>, </span><span><a data-testid="paper-result-author" href="/author/Zixuan%20Zheng">Zixuan Zheng</a>, </span><span><a data-testid="paper-result-author" href="/author/Jingliang%20Hu">Jingliang Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Yilei%20Shi">Yilei Shi</a>, </span><span><a data-testid="paper-result-author" href="/author/Shengwu%20Xiong">Shengwu Xiong</a>, </span><span><a data-testid="paper-result-author" href="/author/Xiao%20Xiang%20Zhu">Xiao Xiang Zhu</a>, </span><span><a data-testid="paper-result-author" href="/author/Lichao%20Mou">Lichao Mou</a></span></div><div class="Search_paper-detail-page-images-container__FPeuN"></div><p class="Search_paper-content__1CSu5 text-with-links"><span class="descriptor" style="display:none">Abstract:</span>Medical ultrasound imaging is ubiquitous, but manual analysis struggles to keep pace. Automated segmentation can help but requires large labeled datasets, which are scarce. Semi-supervised learning leveraging both unlabeled and limited labeled data is a promising approach. State-of-the-art methods use consistency regularization or pseudo-labeling but grow increasingly complex. Without sufficient labels, these models often latch onto artifacts or allow anatomically implausible segmentations. In this paper, we present a simple yet effective pseudo-labeling method with an adversarially learned shape prior to regularize segmentations. Specifically, we devise an encoder-twin-decoder network where the shape prior acts as an implicit shape model, penalizing anatomically implausible but not ground-truth-deviating predictions. Without bells and whistles, our simple approach achieves state-of-the-art performance on two benchmarks under different partition protocols. We provide a strong baseline for future semi-supervised medical image segmentation. Code is available at <a href="https://github.com/WUTCM-Lab/Shape-Prior-Semi-Seg">https://github.com/WUTCM-Lab/Shape-Prior-Semi-Seg</a>.<br/></p><div class="text-with-links"><span></span><span><em>* <!-- -->MICCAI 2024<!-- -->聽</em><br/></span></div><div class="Search_search-result-provider__uWcak">Via<img alt="arxiv icon" loading="lazy" width="56" height="25" decoding="async" data-nimg="1" class="Search_arxiv-icon__SXHe4" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=64&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75"/></div><div class="Search_paper-link__nVhf_"><svg role="img" height="20" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="margin-right:5px"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" width="22" style="margin-right:10px;margin-top:2px"><path stroke-linecap="round" stroke-linejoin="round" d="M12 6.042A8.967 8.967 0 0 0 6 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 0 1 6 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 0 1 6-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0 0 18 18a8.967 8.967 0 0 0-6 2.292m0-14.25v14.25"></path></svg><a data-testid="paper-result-access-link" href="/paper/striving-for-simplicity-simple-yet-effective">Access Paper or Ask Questions</a></div><div data-testid="social-icons-tray" class="SocialIconBar_social-icons-tray__hq8N8"><a href="https://twitter.com/intent/tweet?text=Currently%20reading%20%22Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation%22%20catalyzex.com/paper/striving-for-simplicity-simple-yet-effective%20via%20@CatalyzeX%0A%0AMore%20at:&url=https://www.catalyzex.com&related=CatalyzeX" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Twitter Icon</title><path d="M23.953 4.57a10 10 0 01-2.825.775 4.958 4.958 0 002.163-2.723c-.951.555-2.005.959-3.127 1.184a4.92 4.92 0 00-8.384 4.482C7.69 8.095 4.067 6.13 1.64 3.162a4.822 4.822 0 00-.666 2.475c0 1.71.87 3.213 2.188 4.096a4.904 4.904 0 01-2.228-.616v.06a4.923 4.923 0 003.946 4.827 4.996 4.996 0 01-2.212.085 4.936 4.936 0 004.604 3.417 9.867 9.867 0 01-6.102 2.105c-.39 0-.779-.023-1.17-.067a13.995 13.995 0 007.557 2.209c9.053 0 13.998-7.496 13.998-13.985 0-.21 0-.42-.015-.63A9.935 9.935 0 0024 4.59z"></path></svg></a><a href="https://www.facebook.com/dialog/share?app_id=704241106642044&display=popup&href=catalyzex.com/paper/striving-for-simplicity-simple-yet-effective&redirect_uri=https%3A%2F%2Fcatalyzex.com&quote=Currently%20reading%20%22Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation%22%20via%20CatalyzeX.com" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Facebook Icon</title><path d="M24 12.073c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.99 4.388 10.954 10.125 11.854v-8.385H7.078v-3.47h3.047V9.43c0-3.007 1.792-4.669 4.533-4.669 1.312 0 2.686.235 2.686.235v2.953H15.83c-1.491 0-1.956.925-1.956 1.874v2.25h3.328l-.532 3.47h-2.796v8.385C19.612 23.027 24 18.062 24 12.073z"></path></svg></a><a href="https://www.linkedin.com/sharing/share-offsite/?url=catalyzex.com/paper/striving-for-simplicity-simple-yet-effective&title=Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" aria-labelledby="Linkedin Icon" xmlns="http://www.w3.org/2000/svg" fill="#0e76a8"><title>Linkedin Icon</title><path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z"></path></svg></a><a href="https://api.whatsapp.com/send?text=See this paper I'm reading: Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation - catalyzex.com/paper/striving-for-simplicity-simple-yet-effective %0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning" target="_blank" rel="noreferrer"><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" x="0px" y="0px" viewBox="0 0 512 512" height="28" width="28"><title>Whatsapp Icon</title><path fill="#EDEDED" d="M0,512l35.31-128C12.359,344.276,0,300.138,0,254.234C0,114.759,114.759,0,255.117,0 S512,114.759,512,254.234S395.476,512,255.117,512c-44.138,0-86.51-14.124-124.469-35.31L0,512z"></path><path fill="#55CD6C" d="M137.71,430.786l7.945,4.414c32.662,20.303,70.621,32.662,110.345,32.662 c115.641,0,211.862-96.221,211.862-213.628S371.641,44.138,255.117,44.138S44.138,137.71,44.138,254.234 c0,40.607,11.476,80.331,32.662,113.876l5.297,7.945l-20.303,74.152L137.71,430.786z"></path><path fill="#FEFEFE" d="M187.145,135.945l-16.772-0.883c-5.297,0-10.593,1.766-14.124,5.297 c-7.945,7.062-21.186,20.303-24.717,37.959c-6.179,26.483,3.531,58.262,26.483,90.041s67.09,82.979,144.772,105.048 c24.717,7.062,44.138,2.648,60.028-7.062c12.359-7.945,20.303-20.303,22.952-33.545l2.648-12.359 c0.883-3.531-0.883-7.945-4.414-9.71l-55.614-25.6c-3.531-1.766-7.945-0.883-10.593,2.648l-22.069,28.248 c-1.766,1.766-4.414,2.648-7.062,1.766c-15.007-5.297-65.324-26.483-92.69-79.448c-0.883-2.648-0.883-5.297,0.883-7.062 l21.186-23.834c1.766-2.648,2.648-6.179,1.766-8.828l-25.6-57.379C193.324,138.593,190.676,135.945,187.145,135.945"></path></svg></a><a title="Send via Messenger" href="https://www.facebook.com/dialog/send?app_id=704241106642044&link=catalyzex.com/paper/striving-for-simplicity-simple-yet-effective&redirect_uri=https%3A%2F%2Fcatalyzex.com" target="_blank" rel="noreferrer"><svg role="img" height="24" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#0695FF"><title>Messenger Icon</title><path d="M.001 11.639C.001 4.949 5.241 0 12.001 0S24 4.95 24 11.639c0 6.689-5.24 11.638-12 11.638-1.21 0-2.38-.16-3.47-.46a.96.96 0 00-.64.05l-2.39 1.05a.96.96 0 01-1.35-.85l-.07-2.14a.97.97 0 00-.32-.68A11.39 11.389 0 01.002 11.64zm8.32-2.19l-3.52 5.6c-.35.53.32 1.139.82.75l3.79-2.87c.26-.2.6-.2.87 0l2.8 2.1c.84.63 2.04.4 2.6-.48l3.52-5.6c.35-.53-.32-1.13-.82-.75l-3.79 2.87c-.25.2-.6.2-.86 0l-2.8-2.1a1.8 1.8 0 00-2.61.48z"></path></svg></a><a title="Share via Email" href="mailto:?subject=See this paper I'm reading: Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation&body=%22Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation%22 - catalyzex.com/paper/striving-for-simplicity-simple-yet-effective%0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning%0D%0A%0D%0A" target="_blank" rel="noreferrer"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="#ff8c00" aria-hidden="true" data-slot="icon" height="30" width="30"><title>Email Icon</title><path d="M1.5 8.67v8.58a3 3 0 0 0 3 3h15a3 3 0 0 0 3-3V8.67l-8.928 5.493a3 3 0 0 1-3.144 0L1.5 8.67Z"></path><path d="M22.5 6.908V6.75a3 3 0 0 0-3-3h-15a3 3 0 0 0-3 3v.158l9.714 5.978a1.5 1.5 0 0 0 1.572 0L22.5 6.908Z"></path></svg></a></div></section><div class="Search_seperator-line__4FidS"></div></div><div><section data-testid="paper-details-container" class="Search_paper-details-container__Dou2Q"><h2 class="Search_paper-heading__bq58c"><a data-testid="paper-result-title" href="/paper/dynamicearthnet-daily-multi-spectral"><strong>DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation</strong></a></h2><div class="Search_buttons-container__WWw_l"><a href="#" target="_blank" id="request-code-2203.12560" data-testid="view-code-button" class="Search_view-code-link__xOgGF"><button type="button" class="btn Search_view-button__D5D2K Search_buttons-spacing__iB2NS Search_black-button__O7oac Search_view-code-button__8Dk6Z"><svg role="img" height="14" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#fff"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg>Request Code</button></a><button type="button" class="Search_dataset-button__k2oLH btn Search_view-button__D5D2K Search_view-dataset-button__I0Q7T"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" aria-hidden="true" data-slot="icon" height="24" width="24"><path d="M21 6.375c0 2.692-4.03 4.875-9 4.875S3 9.067 3 6.375 7.03 1.5 12 1.5s9 2.183 9 4.875Z"></path><path d="M12 12.75c2.685 0 5.19-.586 7.078-1.609a8.283 8.283 0 0 0 1.897-1.384c.016.121.025.244.025.368C21 12.817 16.97 15 12 15s-9-2.183-9-4.875c0-.124.009-.247.025-.368a8.285 8.285 0 0 0 1.897 1.384C6.809 12.164 9.315 12.75 12 12.75Z"></path><path d="M12 16.5c2.685 0 5.19-.586 7.078-1.609a8.282 8.282 0 0 0 1.897-1.384c.016.121.025.244.025.368 0 2.692-4.03 4.875-9 4.875s-9-2.183-9-4.875c0-.124.009-.247.025-.368a8.284 8.284 0 0 0 1.897 1.384C6.809 15.914 9.315 16.5 12 16.5Z"></path><path d="M12 20.25c2.685 0 5.19-.586 7.078-1.609a8.282 8.282 0 0 0 1.897-1.384c.016.121.025.244.025.368 0 2.692-4.03 4.875-9 4.875s-9-2.183-9-4.875c0-.124.009-.247.025-.368a8.284 8.284 0 0 0 1.897 1.384C6.809 19.664 9.315 20.25 12 20.25Z"></path></svg>Dataset</button><button type="button" class="Search_buttons-spacing__iB2NS Search_related-code-btn__F5B3X" data-testid="related-code-button"><span class="descriptor" style="display:none">Code for Similar Papers:</span><img alt="Code for Similar Papers" title="View code for similar papers" loading="lazy" width="37" height="35" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75"/></button><a class="Search_buttons-spacing__iB2NS Search_add-code-button__GKwQr" target="_blank" href="/add_code?title=DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation&paper_url=http://arxiv.org/abs/2203.12560" rel="nofollow"><img alt="Add code" title="Contribute your code for this paper to the community" loading="lazy" width="36" height="36" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75"/></a><div class="wrapper Search_buttons-spacing__iB2NS BookmarkButton_bookmark-wrapper__xJaOg"><button title="Bookmark this paper"><img alt="Bookmark button" id="bookmark-btn" loading="lazy" width="388" height="512" decoding="async" data-nimg="1" class="BookmarkButton_bookmark-btn-image__gkInJ" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75"/></button></div><div class="wrapper Search_buttons-spacing__iB2NS"><button class="AlertButton_alert-btn__pC8cK" title="Get alerts when new code is available for this paper"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:9px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 75.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M69.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div><span class="Search_publication-date__mLvO2">Mar 23, 2022<br/></span><div class="AuthorLinks_authors-container__fAwXT"><span class="descriptor" style="display:none">Authors:</span><span><a data-testid="paper-result-author" href="/author/Aysim%20Toker">Aysim Toker</a>, </span><span><a data-testid="paper-result-author" href="/author/Lukas%20Kondmann">Lukas Kondmann</a>, </span><span><a data-testid="paper-result-author" href="/author/Mark%20Weber">Mark Weber</a>, </span><span><a data-testid="paper-result-author" href="/author/Marvin%20Eisenberger">Marvin Eisenberger</a>, </span><span><a data-testid="paper-result-author" href="/author/Andr%C3%A9s%20Camero">Andr茅s Camero</a>, </span><span><a data-testid="paper-result-author" href="/author/Jingliang%20Hu">Jingliang Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Ariadna%20Pregel%20Hoderlein">Ariadna Pregel Hoderlein</a>, </span><span><a data-testid="paper-result-author" href="/author/%C3%87a%C4%9Flar%20%C5%9Eenaras">脟a臒lar 艦enaras</a>, </span><span><a data-testid="paper-result-author" href="/author/Timothy%20Davis">Timothy Davis</a>, </span><span><a data-testid="paper-result-author" href="/author/Daniel%20Cremers">Daniel Cremers</a></span><span>(<a href="/author/Jingliang%20Hu#">+<!-- -->3<!-- --> more</a>)</span></div><div class="Search_paper-detail-page-images-container__FPeuN"><div class="Search_paper-images__fnVzM"><span class="descriptor" style="display:none">Figures and Tables:</span><div class="Search_paper-image__Cd6kR" data-testid="paper-result-image"><img alt="Figure 1 for DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation" loading="lazy" width="298" height="192" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F3-Table1-1.png&w=384&q=75 1x, /_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F3-Table1-1.png&w=640&q=75 2x" src="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F3-Table1-1.png&w=640&q=75"/></div><div class="Search_paper-image__Cd6kR" data-testid="paper-result-image"><img alt="Figure 2 for DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation" loading="lazy" width="298" height="192" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F3-Table2-1.png&w=384&q=75 1x, /_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F3-Table2-1.png&w=640&q=75 2x" src="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F3-Table2-1.png&w=640&q=75"/></div><div class="Search_paper-image__Cd6kR" data-testid="paper-result-image"><img alt="Figure 3 for DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation" loading="lazy" width="298" height="192" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F4-Figure2-1.png&w=384&q=75 1x, /_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F4-Figure2-1.png&w=640&q=75 2x" src="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F4-Figure2-1.png&w=640&q=75"/></div><div class="Search_paper-image__Cd6kR" data-testid="paper-result-image"><img alt="Figure 4 for DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation" loading="lazy" width="298" height="192" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F7-Table3-1.png&w=384&q=75 1x, /_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F7-Table3-1.png&w=640&q=75 2x" src="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F7e2f67581458d2c17c5806df724a7706ed2c95e9%2F7-Table3-1.png&w=640&q=75"/></div></div></div><p class="Search_paper-content__1CSu5 text-with-links"><span class="descriptor" style="display:none">Abstract:</span>Earth observation is a fundamental tool for monitoring the evolution of land use in specific areas of interest. Observing and precisely defining change, in this context, requires both time-series data and pixel-wise segmentations. To that end, we propose the DynamicEarthNet dataset that consists of daily, multi-spectral satellite observations of 75 selected areas of interest distributed over the globe with imagery from Planet Labs. These observations are paired with pixel-wise monthly semantic segmentation labels of 7 land use and land cover (LULC) classes. DynamicEarthNet is the first dataset that provides this unique combination of daily measurements and high-quality labels. In our experiments, we compare several established baselines that either utilize the daily observations as additional training data (semi-supervised learning) or multiple observations at once (spatio-temporal learning) as a point of reference for future research. Finally, we propose a new evaluation metric SCS that addresses the specific challenges associated with time-series semantic change segmentation. The data is available at: <a href="https://mediatum.ub.tum.de/1650201">https://mediatum.ub.tum.de/1650201</a>.<br/></p><div class="text-with-links"><span></span><span><em>* <!-- -->Accepted to CVPR 2022, evaluation webpage:<!-- --> <!-- --> <a href="https://codalab.lisn.upsaclay.fr/competitions/2882">https://codalab.lisn.upsaclay.fr/competitions/2882</a>聽</em><br/></span></div><div class="Search_search-result-provider__uWcak">Via<img alt="arxiv icon" loading="lazy" width="56" height="25" decoding="async" data-nimg="1" class="Search_arxiv-icon__SXHe4" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=64&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75"/></div><div class="Search_paper-link__nVhf_"><svg role="img" height="20" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="margin-right:5px"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" width="22" style="margin-right:10px;margin-top:2px"><path stroke-linecap="round" stroke-linejoin="round" d="M12 6.042A8.967 8.967 0 0 0 6 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 0 1 6 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 0 1 6-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0 0 18 18a8.967 8.967 0 0 0-6 2.292m0-14.25v14.25"></path></svg><a data-testid="paper-result-access-link" href="/paper/dynamicearthnet-daily-multi-spectral">Access Paper or Ask Questions</a></div><div data-testid="social-icons-tray" class="SocialIconBar_social-icons-tray__hq8N8"><a href="https://twitter.com/intent/tweet?text=Currently%20reading%20%22DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation%22%20catalyzex.com/paper/dynamicearthnet-daily-multi-spectral%20via%20@CatalyzeX%0A%0AMore%20at:&url=https://www.catalyzex.com&related=CatalyzeX" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Twitter Icon</title><path d="M23.953 4.57a10 10 0 01-2.825.775 4.958 4.958 0 002.163-2.723c-.951.555-2.005.959-3.127 1.184a4.92 4.92 0 00-8.384 4.482C7.69 8.095 4.067 6.13 1.64 3.162a4.822 4.822 0 00-.666 2.475c0 1.71.87 3.213 2.188 4.096a4.904 4.904 0 01-2.228-.616v.06a4.923 4.923 0 003.946 4.827 4.996 4.996 0 01-2.212.085 4.936 4.936 0 004.604 3.417 9.867 9.867 0 01-6.102 2.105c-.39 0-.779-.023-1.17-.067a13.995 13.995 0 007.557 2.209c9.053 0 13.998-7.496 13.998-13.985 0-.21 0-.42-.015-.63A9.935 9.935 0 0024 4.59z"></path></svg></a><a href="https://www.facebook.com/dialog/share?app_id=704241106642044&display=popup&href=catalyzex.com/paper/dynamicearthnet-daily-multi-spectral&redirect_uri=https%3A%2F%2Fcatalyzex.com&quote=Currently%20reading%20%22DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation%22%20via%20CatalyzeX.com" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Facebook Icon</title><path d="M24 12.073c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.99 4.388 10.954 10.125 11.854v-8.385H7.078v-3.47h3.047V9.43c0-3.007 1.792-4.669 4.533-4.669 1.312 0 2.686.235 2.686.235v2.953H15.83c-1.491 0-1.956.925-1.956 1.874v2.25h3.328l-.532 3.47h-2.796v8.385C19.612 23.027 24 18.062 24 12.073z"></path></svg></a><a href="https://www.linkedin.com/sharing/share-offsite/?url=catalyzex.com/paper/dynamicearthnet-daily-multi-spectral&title=DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" aria-labelledby="Linkedin Icon" xmlns="http://www.w3.org/2000/svg" fill="#0e76a8"><title>Linkedin Icon</title><path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z"></path></svg></a><a href="https://api.whatsapp.com/send?text=See this paper I'm reading: DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation - catalyzex.com/paper/dynamicearthnet-daily-multi-spectral %0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning" target="_blank" rel="noreferrer"><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" x="0px" y="0px" viewBox="0 0 512 512" height="28" width="28"><title>Whatsapp Icon</title><path fill="#EDEDED" d="M0,512l35.31-128C12.359,344.276,0,300.138,0,254.234C0,114.759,114.759,0,255.117,0 S512,114.759,512,254.234S395.476,512,255.117,512c-44.138,0-86.51-14.124-124.469-35.31L0,512z"></path><path fill="#55CD6C" d="M137.71,430.786l7.945,4.414c32.662,20.303,70.621,32.662,110.345,32.662 c115.641,0,211.862-96.221,211.862-213.628S371.641,44.138,255.117,44.138S44.138,137.71,44.138,254.234 c0,40.607,11.476,80.331,32.662,113.876l5.297,7.945l-20.303,74.152L137.71,430.786z"></path><path fill="#FEFEFE" d="M187.145,135.945l-16.772-0.883c-5.297,0-10.593,1.766-14.124,5.297 c-7.945,7.062-21.186,20.303-24.717,37.959c-6.179,26.483,3.531,58.262,26.483,90.041s67.09,82.979,144.772,105.048 c24.717,7.062,44.138,2.648,60.028-7.062c12.359-7.945,20.303-20.303,22.952-33.545l2.648-12.359 c0.883-3.531-0.883-7.945-4.414-9.71l-55.614-25.6c-3.531-1.766-7.945-0.883-10.593,2.648l-22.069,28.248 c-1.766,1.766-4.414,2.648-7.062,1.766c-15.007-5.297-65.324-26.483-92.69-79.448c-0.883-2.648-0.883-5.297,0.883-7.062 l21.186-23.834c1.766-2.648,2.648-6.179,1.766-8.828l-25.6-57.379C193.324,138.593,190.676,135.945,187.145,135.945"></path></svg></a><a title="Send via Messenger" href="https://www.facebook.com/dialog/send?app_id=704241106642044&link=catalyzex.com/paper/dynamicearthnet-daily-multi-spectral&redirect_uri=https%3A%2F%2Fcatalyzex.com" target="_blank" rel="noreferrer"><svg role="img" height="24" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#0695FF"><title>Messenger Icon</title><path d="M.001 11.639C.001 4.949 5.241 0 12.001 0S24 4.95 24 11.639c0 6.689-5.24 11.638-12 11.638-1.21 0-2.38-.16-3.47-.46a.96.96 0 00-.64.05l-2.39 1.05a.96.96 0 01-1.35-.85l-.07-2.14a.97.97 0 00-.32-.68A11.39 11.389 0 01.002 11.64zm8.32-2.19l-3.52 5.6c-.35.53.32 1.139.82.75l3.79-2.87c.26-.2.6-.2.87 0l2.8 2.1c.84.63 2.04.4 2.6-.48l3.52-5.6c.35-.53-.32-1.13-.82-.75l-3.79 2.87c-.25.2-.6.2-.86 0l-2.8-2.1a1.8 1.8 0 00-2.61.48z"></path></svg></a><a title="Share via Email" href="mailto:?subject=See this paper I'm reading: DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation&body=%22DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation%22 - catalyzex.com/paper/dynamicearthnet-daily-multi-spectral%0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning%0D%0A%0D%0A" target="_blank" rel="noreferrer"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="#ff8c00" aria-hidden="true" data-slot="icon" height="30" width="30"><title>Email Icon</title><path d="M1.5 8.67v8.58a3 3 0 0 0 3 3h15a3 3 0 0 0 3-3V8.67l-8.928 5.493a3 3 0 0 1-3.144 0L1.5 8.67Z"></path><path d="M22.5 6.908V6.75a3 3 0 0 0-3-3h-15a3 3 0 0 0-3 3v.158l9.714 5.978a1.5 1.5 0 0 0 1.572 0L22.5 6.908Z"></path></svg></a></div></section><div class="Search_seperator-line__4FidS"></div></div><div><section data-testid="paper-details-container" class="Search_paper-details-container__Dou2Q"><h2 class="Search_paper-heading__bq58c"><a data-testid="paper-result-title" href="/paper/multimodal-remote-sensing-benchmark-datasets"><strong>Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model</strong></a></h2><div class="Search_buttons-container__WWw_l"><a href="#" target="_blank" id="request-code-2105.10196" data-testid="view-code-button" class="Search_view-code-link__xOgGF"><button type="button" class="btn Search_view-button__D5D2K Search_buttons-spacing__iB2NS Search_black-button__O7oac Search_view-code-button__8Dk6Z"><svg role="img" height="14" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#fff"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg>View Code</button></a><button type="button" class="btn Search_view-button__D5D2K Search_black-button__O7oac Search_buttons-spacing__iB2NS"><svg fill="#fff" height="20" viewBox="0 0 48 48" width="20" xmlns="http://www.w3.org/2000/svg"><title>Play Icon</title><path d="M0 0h48v48H0z" fill="none"></path><path d="M24 4C12.95 4 4 12.95 4 24s8.95 20 20 20 20-8.95 20-20S35.05 4 24 4zm-4 29V15l12 9-12 9z"></path></svg>Notebook</button><button type="button" class="Search_buttons-spacing__iB2NS Search_related-code-btn__F5B3X" data-testid="related-code-button"><span class="descriptor" style="display:none">Code for Similar Papers:</span><img alt="Code for Similar Papers" title="View code for similar papers" loading="lazy" width="37" height="35" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelated_icon_transparent.98f57b13.png&w=96&q=75"/></button><a class="Search_buttons-spacing__iB2NS Search_add-code-button__GKwQr" target="_blank" href="/add_code?title=Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model&paper_url=http://arxiv.org/abs/2105.10196" rel="nofollow"><img alt="Add code" title="Contribute your code for this paper to the community" loading="lazy" width="36" height="36" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=48&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Faddcode_white.6afb879f.png&w=96&q=75"/></a><div class="wrapper Search_buttons-spacing__iB2NS BookmarkButton_bookmark-wrapper__xJaOg"><button title="Bookmark this paper"><img alt="Bookmark button" id="bookmark-btn" loading="lazy" width="388" height="512" decoding="async" data-nimg="1" class="BookmarkButton_bookmark-btn-image__gkInJ" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbookmark_outline.3a3e1c2c.png&w=828&q=75"/></button></div><div class="wrapper Search_buttons-spacing__iB2NS"><button class="AlertButton_alert-btn__pC8cK" title="Get alerts when new code is available for this paper"><img alt="Alert button" id="alert_btn" loading="lazy" width="512" height="512" decoding="async" data-nimg="1" class="alert-btn-image " style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=640&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Falert_light_mode_icon.b8fca154.png&w=1080&q=75"/></button><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 106 34" style="margin-left:9px"><g class="sparkles"><path style="animation:sparkle 2s 0s infinite ease-in-out" d="M15.5740361 -10.33344622s1.1875777-6.20179466 2.24320232 0c0 0 5.9378885 1.05562462 0 2.11124925 0 0-1.05562463 6.33374774-2.24320233 0-3.5627331-.6597654-3.29882695-1.31953078 0-2.11124925z"></path><path style="animation:sparkle 1.5s 0.9s infinite ease-in-out" d="M33.5173993 75.97263826s1.03464615-5.40315215 1.95433162 0c0 0 5.17323078.91968547 0 1.83937095 0 0-.91968547 5.51811283-1.95433162 0-3.10393847-.57480342-2.8740171-1.14960684 0-1.83937095z"></path><path style="animation:sparkle 1.7s 0.4s infinite ease-in-out" d="M69.03038108 1.71240809s.73779281-3.852918 1.39360864 0c0 0 3.68896404.65581583 0 1.31163166 0 0-.65581583 3.93489497-1.39360864 0-2.21337842-.4098849-2.04942447-.81976979 0-1.31163166z"></path></g></svg></div></div><span class="Search_publication-date__mLvO2">May 21, 2021<br/></span><div class="AuthorLinks_authors-container__fAwXT"><span class="descriptor" style="display:none">Authors:</span><span><a data-testid="paper-result-author" href="/author/Danfeng%20Hong">Danfeng Hong</a>, </span><span><a data-testid="paper-result-author" href="/author/Jingliang%20Hu">Jingliang Hu</a>, </span><span><a data-testid="paper-result-author" href="/author/Jing%20Yao">Jing Yao</a>, </span><span><a data-testid="paper-result-author" href="/author/Jocelyn%20Chanussot">Jocelyn Chanussot</a>, </span><span><a data-testid="paper-result-author" href="/author/Xiao%20Xiang%20Zhu">Xiao Xiang Zhu</a></span></div><div class="Search_paper-detail-page-images-container__FPeuN"><div class="Search_paper-images__fnVzM"><span class="descriptor" style="display:none">Figures and Tables:</span><div class="Search_paper-image__Cd6kR" data-testid="paper-result-image"><img alt="Figure 1 for Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model" loading="lazy" width="298" height="192" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F7-Figure1-1.png&w=384&q=75 1x, /_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F7-Figure1-1.png&w=640&q=75 2x" src="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F7-Figure1-1.png&w=640&q=75"/></div><div class="Search_paper-image__Cd6kR" data-testid="paper-result-image"><img alt="Figure 2 for Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model" loading="lazy" width="298" height="192" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F8-Table1-1.png&w=384&q=75 1x, /_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F8-Table1-1.png&w=640&q=75 2x" src="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F8-Table1-1.png&w=640&q=75"/></div><div class="Search_paper-image__Cd6kR" data-testid="paper-result-image"><img alt="Figure 3 for Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model" loading="lazy" width="298" height="192" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F9-Figure2-1.png&w=384&q=75 1x, /_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F9-Figure2-1.png&w=640&q=75 2x" src="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F9-Figure2-1.png&w=640&q=75"/></div><div class="Search_paper-image__Cd6kR" data-testid="paper-result-image"><img alt="Figure 4 for Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model" loading="lazy" width="298" height="192" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F17-Table2-1.png&w=384&q=75 1x, /_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F17-Table2-1.png&w=640&q=75 2x" src="/_next/image?url=https%3A%2F%2Fai2-s2-public.s3.amazonaws.com%2Ffigures%2F2017-08-08%2F07fa32eb364f966fb4b44dcf9435adf165295c96%2F17-Table2-1.png&w=640&q=75"/></div></div></div><p class="Search_paper-content__1CSu5 text-with-links"><span class="descriptor" style="display:none">Abstract:</span>As remote sensing (RS) data obtained from different sensors become available largely and openly, multimodal data processing and analysis techniques have been garnering increasing interest in the RS and geoscience community. However, due to the gap between different modalities in terms of imaging sensors, resolutions, and contents, embedding their complementary information into a consistent, compact, accurate, and discriminative representation, to a great extent, remains challenging. To this end, we propose a shared and specific feature learning (S2FL) model. S2FL is capable of decomposing multimodal RS data into modality-shared and modality-specific components, enabling the information blending of multi-modalities more effectively, particularly for heterogeneous data sources. Moreover, to better assess multimodal baselines and the newly-proposed S2FL model, three multimodal RS benchmark datasets, i.e., Houston2013 -- hyperspectral and multispectral data, Berlin -- hyperspectral and synthetic aperture radar (SAR) data, Augsburg -- hyperspectral, SAR, and digital surface model (DSM) data, are released and used for land cover classification. Extensive experiments conducted on the three datasets demonstrate the superiority and advancement of our S2FL model in the task of land cover classification in comparison with previously-proposed state-of-the-art baselines. Furthermore, the baseline codes and datasets used in this paper will be made available freely at <a href="https://github.com/danfenghong/ISPRS_S2FL">https://github.com/danfenghong/ISPRS_S2FL</a>.<br/></p><div class="text-with-links"><span><em>* <!-- -->ISPRS Journal of Photogrammetry and Remote Sensing, 2021<!-- --> 聽</em><br/></span><span></span></div><div class="Search_search-result-provider__uWcak">Via<img alt="arxiv icon" loading="lazy" width="56" height="25" decoding="async" data-nimg="1" class="Search_arxiv-icon__SXHe4" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=64&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Farxiv.41e50dc5.png&w=128&q=75"/></div><div class="Search_paper-link__nVhf_"><svg role="img" height="20" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="margin-right:5px"><title>Github Icon</title><path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path></svg><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" aria-hidden="true" data-slot="icon" width="22" style="margin-right:10px;margin-top:2px"><path stroke-linecap="round" stroke-linejoin="round" d="M12 6.042A8.967 8.967 0 0 0 6 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 0 1 6 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 0 1 6-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0 0 18 18a8.967 8.967 0 0 0-6 2.292m0-14.25v14.25"></path></svg><a data-testid="paper-result-access-link" href="/paper/multimodal-remote-sensing-benchmark-datasets">Access Paper or Ask Questions</a></div><div data-testid="social-icons-tray" class="SocialIconBar_social-icons-tray__hq8N8"><a href="https://twitter.com/intent/tweet?text=Currently%20reading%20%22Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model%22%20catalyzex.com/paper/multimodal-remote-sensing-benchmark-datasets%20via%20@CatalyzeX%0A%0AMore%20at:&url=https://www.catalyzex.com&related=CatalyzeX" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Twitter Icon</title><path d="M23.953 4.57a10 10 0 01-2.825.775 4.958 4.958 0 002.163-2.723c-.951.555-2.005.959-3.127 1.184a4.92 4.92 0 00-8.384 4.482C7.69 8.095 4.067 6.13 1.64 3.162a4.822 4.822 0 00-.666 2.475c0 1.71.87 3.213 2.188 4.096a4.904 4.904 0 01-2.228-.616v.06a4.923 4.923 0 003.946 4.827 4.996 4.996 0 01-2.212.085 4.936 4.936 0 004.604 3.417 9.867 9.867 0 01-6.102 2.105c-.39 0-.779-.023-1.17-.067a13.995 13.995 0 007.557 2.209c9.053 0 13.998-7.496 13.998-13.985 0-.21 0-.42-.015-.63A9.935 9.935 0 0024 4.59z"></path></svg></a><a href="https://www.facebook.com/dialog/share?app_id=704241106642044&display=popup&href=catalyzex.com/paper/multimodal-remote-sensing-benchmark-datasets&redirect_uri=https%3A%2F%2Fcatalyzex.com&quote=Currently%20reading%20%22Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model%22%20via%20CatalyzeX.com" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" xmlns="http://www.w3.org/2000/svg" fill="#1DA1F2"><title>Facebook Icon</title><path d="M24 12.073c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.99 4.388 10.954 10.125 11.854v-8.385H7.078v-3.47h3.047V9.43c0-3.007 1.792-4.669 4.533-4.669 1.312 0 2.686.235 2.686.235v2.953H15.83c-1.491 0-1.956.925-1.956 1.874v2.25h3.328l-.532 3.47h-2.796v8.385C19.612 23.027 24 18.062 24 12.073z"></path></svg></a><a href="https://www.linkedin.com/sharing/share-offsite/?url=catalyzex.com/paper/multimodal-remote-sensing-benchmark-datasets&title=Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model" target="_blank" rel="noreferrer"><svg role="img" viewBox="0 0 24 24" height="28" width="28" aria-labelledby="Linkedin Icon" xmlns="http://www.w3.org/2000/svg" fill="#0e76a8"><title>Linkedin Icon</title><path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z"></path></svg></a><a href="https://api.whatsapp.com/send?text=See this paper I'm reading: Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model - catalyzex.com/paper/multimodal-remote-sensing-benchmark-datasets %0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning" target="_blank" rel="noreferrer"><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" x="0px" y="0px" viewBox="0 0 512 512" height="28" width="28"><title>Whatsapp Icon</title><path fill="#EDEDED" d="M0,512l35.31-128C12.359,344.276,0,300.138,0,254.234C0,114.759,114.759,0,255.117,0 S512,114.759,512,254.234S395.476,512,255.117,512c-44.138,0-86.51-14.124-124.469-35.31L0,512z"></path><path fill="#55CD6C" d="M137.71,430.786l7.945,4.414c32.662,20.303,70.621,32.662,110.345,32.662 c115.641,0,211.862-96.221,211.862-213.628S371.641,44.138,255.117,44.138S44.138,137.71,44.138,254.234 c0,40.607,11.476,80.331,32.662,113.876l5.297,7.945l-20.303,74.152L137.71,430.786z"></path><path fill="#FEFEFE" d="M187.145,135.945l-16.772-0.883c-5.297,0-10.593,1.766-14.124,5.297 c-7.945,7.062-21.186,20.303-24.717,37.959c-6.179,26.483,3.531,58.262,26.483,90.041s67.09,82.979,144.772,105.048 c24.717,7.062,44.138,2.648,60.028-7.062c12.359-7.945,20.303-20.303,22.952-33.545l2.648-12.359 c0.883-3.531-0.883-7.945-4.414-9.71l-55.614-25.6c-3.531-1.766-7.945-0.883-10.593,2.648l-22.069,28.248 c-1.766,1.766-4.414,2.648-7.062,1.766c-15.007-5.297-65.324-26.483-92.69-79.448c-0.883-2.648-0.883-5.297,0.883-7.062 l21.186-23.834c1.766-2.648,2.648-6.179,1.766-8.828l-25.6-57.379C193.324,138.593,190.676,135.945,187.145,135.945"></path></svg></a><a title="Send via Messenger" href="https://www.facebook.com/dialog/send?app_id=704241106642044&link=catalyzex.com/paper/multimodal-remote-sensing-benchmark-datasets&redirect_uri=https%3A%2F%2Fcatalyzex.com" target="_blank" rel="noreferrer"><svg role="img" height="24" width="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" fill="#0695FF"><title>Messenger Icon</title><path d="M.001 11.639C.001 4.949 5.241 0 12.001 0S24 4.95 24 11.639c0 6.689-5.24 11.638-12 11.638-1.21 0-2.38-.16-3.47-.46a.96.96 0 00-.64.05l-2.39 1.05a.96.96 0 01-1.35-.85l-.07-2.14a.97.97 0 00-.32-.68A11.39 11.389 0 01.002 11.64zm8.32-2.19l-3.52 5.6c-.35.53.32 1.139.82.75l3.79-2.87c.26-.2.6-.2.87 0l2.8 2.1c.84.63 2.04.4 2.6-.48l3.52-5.6c.35-.53-.32-1.13-.82-.75l-3.79 2.87c-.25.2-.6.2-.86 0l-2.8-2.1a1.8 1.8 0 00-2.61.48z"></path></svg></a><a title="Share via Email" href="mailto:?subject=See this paper I'm reading: Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model&body=%22Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model%22 - catalyzex.com/paper/multimodal-remote-sensing-benchmark-datasets%0D%0A__%0D%0Avia www.catalyzex.com - latest in machine learning%0D%0A%0D%0A" target="_blank" rel="noreferrer"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="#ff8c00" aria-hidden="true" data-slot="icon" height="30" width="30"><title>Email Icon</title><path d="M1.5 8.67v8.58a3 3 0 0 0 3 3h15a3 3 0 0 0 3-3V8.67l-8.928 5.493a3 3 0 0 1-3.144 0L1.5 8.67Z"></path><path d="M22.5 6.908V6.75a3 3 0 0 0-3-3h-15a3 3 0 0 0-3 3v.158l9.714 5.978a1.5 1.5 0 0 0 1.572 0L22.5 6.908Z"></path></svg></a></div></section><div class="Search_seperator-line__4FidS"></div></div></section></div><section data-hydration-on-demand="true"></section></div></div><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{"author":{"name":"Jingliang Hu","image_url":"https://ui-avatars.com/api/?name=Jingliang Hu\u0026background=BCBCBC\u0026color=3E3E3F\u0026size=256\u0026bold=true\u0026font-size=0.5\u0026format=png","personal_website":"https://www.catalyzex.com","bio":null,"email":"","search_results":[{"title":"CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention","content":"Referring medical image segmentation targets delineating lesions indicated by textual descriptions. Aligning visual and textual cues is challenging due to their distinct data properties. Inspired by large-scale pre-trained vision-language models, we propose CausalCLIPSeg, an end-to-end framework for referring medical image segmentation that leverages CLIP. Despite not being trained on medical data, we enforce CLIP's rich semantic space onto the medical domain by a tailored cross-modal decoding method to achieve text-to-pixel alignment. Furthermore, to mitigate confounding bias that may cause the model to learn spurious correlations instead of meaningful causal relationships, CausalCLIPSeg introduces a causal intervention module which self-annotates confounders and excavates causal features from inputs for segmentation judgments. We also devise an adversarial min-max game to optimize causal features while penalizing confounding ones. Extensive experiments demonstrate the state-of-the-art performance of our proposed method. Code is available at https://github.com/WUTCM-Lab/CausalCLIPSeg.","authors":["Yaxiong Chen","Minghong Wei","Zixuan Zheng","Jingliang Hu","Yilei Shi","Shengwu Xiong","Xiao Xiang Zhu","Lichao Mou"],"pdf_url":"http://arxiv.org/abs/2503.15949","paper_id":"2503.15949","link":"/paper/causalclipseg-unlocking-clip-s-potential-in","publication_date":"Mar 20, 2025","raw_publication_date":"2025-03-20","submission_date":"Mar 20, 2025","images":[],"arxiv_comment":"MICCAI 2024","journal_ref":null,"code_available":true,"slug":"causalclipseg-unlocking-clip-s-potential-in"},{"title":"UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation","content":"Automated radiology report generation aims to expedite the tedious and error-prone reporting process for radiologists. While recent works have made progress, learning to align medical images and textual findings remains challenging due to the relative scarcity of labeled medical data. For example, datasets for this task are much smaller than those used for image captioning in computer vision. In this work, we propose to transfer representations from CLIP, a large-scale pre-trained vision-language model, to better capture cross-modal semantics between images and texts. However, directly applying CLIP is suboptimal due to the domain gap between natural images and radiology. To enable efficient adaptation, we introduce UniCrossAdapter, lightweight adapter modules that are incorporated into CLIP and fine-tuned on the target task while keeping base parameters fixed. The adapters are distributed across modalities and their interaction to enhance vision-language alignment. Experiments on two public datasets demonstrate the effectiveness of our approach, advancing state-of-the-art in radiology report generation. The proposed transfer learning framework provides a means of harnessing semantic knowledge from large-scale pre-trained models to tackle data-scarce medical vision-language tasks. Code is available at https://github.com/chauncey-tow/MRG-CLIP.","authors":["Yaxiong Chen","Chuang Du","Chunlei Li","Jingliang Hu","Yilei Shi","Shengwu Xiong","Xiao Xiang Zhu","Lichao Mou"],"pdf_url":"http://arxiv.org/abs/2503.15940","paper_id":"2503.15940","link":"/paper/unicrossadapter-multimodal-adaptation-of-clip","publication_date":"Mar 20, 2025","raw_publication_date":"2025-03-20","submission_date":"Mar 20, 2025","images":[],"arxiv_comment":"MICCAI 2024 Workshop","journal_ref":null,"code_available":true,"slug":"unicrossadapter-multimodal-adaptation-of-clip"},{"title":"One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks","content":"Video object segmentation is crucial for the efficient analysis of complex medical video data, yet it faces significant challenges in data availability and annotation. We introduce the task of one-shot medical video object segmentation, which requires separating foreground and background pixels throughout a video given only the mask annotation of the first frame. To address this problem, we propose a temporal contrastive memory network comprising image and mask encoders to learn feature representations, a temporal contrastive memory bank that aligns embeddings from adjacent frames while pushing apart distant ones to explicitly model inter-frame relationships and stores these features, and a decoder that fuses encoded image features and memory readouts for segmentation. We also collect a diverse, multi-source medical video dataset spanning various modalities and anatomies to benchmark this task. Extensive experiments demonstrate state-of-the-art performance in segmenting both seen and unseen structures from a single exemplar, showing ability to generalize from scarce labels. This highlights the potential to alleviate annotation burdens for medical video analysis. Code is available at https://github.com/MedAITech/TCMN.","authors":["Yaxiong Chen","Junjian Hu","Chunlei Li","Zixuan Zheng","Jingliang Hu","Yilei Shi","Shengwu Xiong","Xiao Xiang Zhu","Lichao Mou"],"pdf_url":"http://arxiv.org/abs/2503.14979","paper_id":"2503.14979","link":"/paper/one-shot-medical-video-object-segmentation","publication_date":"Mar 19, 2025","raw_publication_date":"2025-03-19","submission_date":"Mar 19, 2025","images":[],"arxiv_comment":"MICCAI 2024 Workshop","journal_ref":null,"code_available":true,"slug":"one-shot-medical-video-object-segmentation"},{"title":"Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models","content":"Ultrasound video classification enables automated diagnosis and has emerged as an important research area. However, publicly available ultrasound video datasets remain scarce, hindering progress in developing effective video classification models. We propose addressing this shortage by synthesizing plausible ultrasound videos from readily available, abundant ultrasound images. To this end, we introduce a latent dynamic diffusion model (LDDM) to efficiently translate static images to dynamic sequences with realistic video characteristics. We demonstrate strong quantitative results and visually appealing synthesized videos on the BUSV benchmark. Notably, training video classification models on combinations of real and LDDM-synthesized videos substantially improves performance over using real data alone, indicating our method successfully emulates dynamics critical for discrimination. Our image-to-video approach provides an effective data augmentation solution to advance ultrasound video analysis. Code is available at https://github.com/MedAITech/U_I2V.","authors":["Tingxiu Chen","Yilei Shi","Zixuan Zheng","Bingcong Yan","Jingliang Hu","Xiao Xiang Zhu","Lichao Mou"],"pdf_url":"http://arxiv.org/abs/2503.14966","paper_id":"2503.14966","link":"/paper/ultrasound-image-to-video-synthesis-via","publication_date":"Mar 19, 2025","raw_publication_date":"2025-03-19","submission_date":"Mar 19, 2025","images":[],"arxiv_comment":"MICCAI 2024","journal_ref":null,"code_available":true,"slug":"ultrasound-image-to-video-synthesis-via"},{"title":"Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning","content":"Few-shot video object segmentation aims to reduce annotation costs; however, existing methods still require abundant dense frame annotations for training, which are scarce in the medical domain. We investigate an extremely low-data regime that utilizes annotations from only a few video frames and leverages existing labeled images to minimize costly video annotations. Specifically, we propose a two-phase framework. First, we learn a few-shot segmentation model using labeled images. Subsequently, to improve performance without full supervision, we introduce a spatiotemporal consistency relearning approach on medical videos that enforces consistency between consecutive frames. Constraints are also enforced between the image model and relearning model at both feature and prediction levels. Experiments demonstrate the superiority of our approach over state-of-the-art few-shot segmentation methods. Our model bridges the gap between abundant annotated medical images and scarce, sparsely labeled medical videos to achieve strong video segmentation performance in this low data regime. Code is available at https://github.com/MedAITech/RAB.","authors":["Zixuan Zheng","Yilei Shi","Chunlei Li","Jingliang Hu","Xiao Xiang Zhu","Lichao Mou"],"pdf_url":"http://arxiv.org/abs/2503.14958","paper_id":"2503.14958","link":"/paper/reducing-annotation-burden-exploiting-image","publication_date":"Mar 19, 2025","raw_publication_date":"2025-03-19","submission_date":"Mar 19, 2025","images":[],"arxiv_comment":"MICCAI 2024","journal_ref":null,"code_available":true,"slug":"reducing-annotation-burden-exploiting-image"},{"title":"Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection","content":"Unsupervised anomaly detection using deep learning has garnered significant research attention due to its broad applicability, particularly in medical imaging where labeled anomalous data are scarce. While earlier approaches leverage generative models like autoencoders and generative adversarial networks (GANs), they often fall short due to overgeneralization. Recent methods explore various strategies, including memory banks, normalizing flows, self-supervised learning, and knowledge distillation, to enhance discrimination. Among these, knowledge distillation, particularly reverse distillation, has shown promise. Following this paradigm, we propose a novel scale-aware contrastive reverse distillation model that addresses two key limitations of existing reverse distillation methods: insufficient feature discriminability and inability to handle anomaly scale variations. Specifically, we introduce a contrastive student-teacher learning approach to derive more discriminative representations by generating and exploring out-of-normal distributions. Further, we design a scale adaptation mechanism to softly weight contrastive distillation losses at different scales to account for the scale variation issue. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, validating the efficacy of the proposed method. Code is available at https://github.com/MedAITech/SCRD4AD.","authors":["Chunlei Li","Yilei Shi","Jingliang Hu","Xiao Xiang Zhu","Lichao Mou"],"pdf_url":"http://arxiv.org/abs/2503.13828","paper_id":"2503.13828","link":"/paper/scale-aware-contrastive-reverse-distillation","publication_date":"Mar 18, 2025","raw_publication_date":"2025-03-18","submission_date":"Mar 18, 2025","images":[],"arxiv_comment":"ICLR 2025","journal_ref":null,"code_available":true,"slug":"scale-aware-contrastive-reverse-distillation"},{"title":"Rethinking Cell Counting Methods: Decoupling Counting and Localization","content":"Cell counting in microscopy images is vital in medicine and biology but extremely tedious and time-consuming to perform manually. While automated methods have advanced in recent years, state-of-the-art approaches tend to increasingly complex model designs. In this paper, we propose a conceptually simple yet effective decoupled learning scheme for automated cell counting, consisting of separate counter and localizer networks. In contrast to jointly learning counting and density map estimation, we show that decoupling these objectives surprisingly improves results. The counter operates on intermediate feature maps rather than pixel space to leverage global context and produce count estimates, while also generating coarse density maps. The localizer then reconstructs high-resolution density maps that precisely localize individual cells, conditional on the original images and coarse density maps from the counter. Besides, to boost counting accuracy, we further introduce a global message passing module to integrate cross-region patterns. Extensive experiments on four datasets demonstrate that our approach, despite its simplicity, challenges common practice and achieves state-of-the-art performance by significant margins. Our key insight is that decoupled learning alleviates the need to learn counting on high-resolution density maps directly, allowing the model to focus on global features critical for accurate estimates. Code is available at https://github.com/MedAITech/DCL.","authors":["Zixuan Zheng","Yilei Shi","Chunlei Li","Jingliang Hu","Xiao Xiang Zhu","Lichao Mou"],"pdf_url":"http://arxiv.org/abs/2503.13989","paper_id":"2503.13989","link":"/paper/rethinking-cell-counting-methods-decoupling","publication_date":"Mar 18, 2025","raw_publication_date":"2025-03-18","submission_date":"Mar 18, 2025","images":[],"arxiv_comment":"MICCAI 2024","journal_ref":null,"code_available":true,"slug":"rethinking-cell-counting-methods-decoupling"},{"title":"Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation","content":"Medical ultrasound imaging is ubiquitous, but manual analysis struggles to keep pace. Automated segmentation can help but requires large labeled datasets, which are scarce. Semi-supervised learning leveraging both unlabeled and limited labeled data is a promising approach. State-of-the-art methods use consistency regularization or pseudo-labeling but grow increasingly complex. Without sufficient labels, these models often latch onto artifacts or allow anatomically implausible segmentations. In this paper, we present a simple yet effective pseudo-labeling method with an adversarially learned shape prior to regularize segmentations. Specifically, we devise an encoder-twin-decoder network where the shape prior acts as an implicit shape model, penalizing anatomically implausible but not ground-truth-deviating predictions. Without bells and whistles, our simple approach achieves state-of-the-art performance on two benchmarks under different partition protocols. We provide a strong baseline for future semi-supervised medical image segmentation. Code is available at https://github.com/WUTCM-Lab/Shape-Prior-Semi-Seg.","authors":["Yaxiong Chen","Yujie Wang","Zixuan Zheng","Jingliang Hu","Yilei Shi","Shengwu Xiong","Xiao Xiang Zhu","Lichao Mou"],"pdf_url":"http://arxiv.org/abs/2503.13987","paper_id":"2503.13987","link":"/paper/striving-for-simplicity-simple-yet-effective","publication_date":"Mar 18, 2025","raw_publication_date":"2025-03-18","submission_date":"Mar 18, 2025","images":[],"arxiv_comment":"MICCAI 2024","journal_ref":null,"code_available":true,"slug":"striving-for-simplicity-simple-yet-effective"},{"title":"DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation","content":"Earth observation is a fundamental tool for monitoring the evolution of land use in specific areas of interest. Observing and precisely defining change, in this context, requires both time-series data and pixel-wise segmentations. To that end, we propose the DynamicEarthNet dataset that consists of daily, multi-spectral satellite observations of 75 selected areas of interest distributed over the globe with imagery from Planet Labs. These observations are paired with pixel-wise monthly semantic segmentation labels of 7 land use and land cover (LULC) classes. DynamicEarthNet is the first dataset that provides this unique combination of daily measurements and high-quality labels. In our experiments, we compare several established baselines that either utilize the daily observations as additional training data (semi-supervised learning) or multiple observations at once (spatio-temporal learning) as a point of reference for future research. Finally, we propose a new evaluation metric SCS that addresses the specific challenges associated with time-series semantic change segmentation. The data is available at: https://mediatum.ub.tum.de/1650201.","authors":["Aysim Toker","Lukas Kondmann","Mark Weber","Marvin Eisenberger","Andr茅s Camero","Jingliang Hu","Ariadna Pregel Hoderlein","脟a臒lar 艦enaras","Timothy Davis","Daniel Cremers","Giovanni Marchisio","Xiao Xiang Zhu","Laura Leal-Taix茅"],"pdf_url":"http://arxiv.org/abs/2203.12560","paper_id":"2203.12560","link":"/paper/dynamicearthnet-daily-multi-spectral","publication_date":"Mar 23, 2022","raw_publication_date":"2022-03-23","submission_date":"Mar 23, 2022","images":["https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/7e2f67581458d2c17c5806df724a7706ed2c95e9/3-Table1-1.png","https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/7e2f67581458d2c17c5806df724a7706ed2c95e9/3-Table2-1.png","https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/7e2f67581458d2c17c5806df724a7706ed2c95e9/4-Figure2-1.png","https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/7e2f67581458d2c17c5806df724a7706ed2c95e9/7-Table3-1.png"],"arxiv_comment":"Accepted to CVPR 2022, evaluation webpage:\n https://codalab.lisn.upsaclay.fr/competitions/2882","journal_ref":null,"code_available":false,"slug":"dynamicearthnet-daily-multi-spectral"},{"title":"Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model","content":"As remote sensing (RS) data obtained from different sensors become available largely and openly, multimodal data processing and analysis techniques have been garnering increasing interest in the RS and geoscience community. However, due to the gap between different modalities in terms of imaging sensors, resolutions, and contents, embedding their complementary information into a consistent, compact, accurate, and discriminative representation, to a great extent, remains challenging. To this end, we propose a shared and specific feature learning (S2FL) model. S2FL is capable of decomposing multimodal RS data into modality-shared and modality-specific components, enabling the information blending of multi-modalities more effectively, particularly for heterogeneous data sources. Moreover, to better assess multimodal baselines and the newly-proposed S2FL model, three multimodal RS benchmark datasets, i.e., Houston2013 -- hyperspectral and multispectral data, Berlin -- hyperspectral and synthetic aperture radar (SAR) data, Augsburg -- hyperspectral, SAR, and digital surface model (DSM) data, are released and used for land cover classification. Extensive experiments conducted on the three datasets demonstrate the superiority and advancement of our S2FL model in the task of land cover classification in comparison with previously-proposed state-of-the-art baselines. Furthermore, the baseline codes and datasets used in this paper will be made available freely at https://github.com/danfenghong/ISPRS_S2FL.","authors":["Danfeng Hong","Jingliang Hu","Jing Yao","Jocelyn Chanussot","Xiao Xiang Zhu"],"pdf_url":"http://arxiv.org/abs/2105.10196","paper_id":"2105.10196","link":"/paper/multimodal-remote-sensing-benchmark-datasets","publication_date":"May 21, 2021","raw_publication_date":"2021-05-21","submission_date":"May 21, 2021","images":["https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/07fa32eb364f966fb4b44dcf9435adf165295c96/7-Figure1-1.png","https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/07fa32eb364f966fb4b44dcf9435adf165295c96/8-Table1-1.png","https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/07fa32eb364f966fb4b44dcf9435adf165295c96/9-Figure2-1.png","https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/07fa32eb364f966fb4b44dcf9435adf165295c96/17-Table2-1.png"],"arxiv_comment":null,"journal_ref":"ISPRS Journal of Photogrammetry and Remote Sensing, 2021","code_available":true,"slug":"multimodal-remote-sensing-benchmark-datasets"}]},"total":12,"userHasHiddenBanner":false,"isMobile":false,"currentBrowser":"","canonicalUrl":"https://www.catalyzex.com/author/Jingliang%20Hu"},"__N_SSP":true},"page":"/author/[name]","query":{"name":"Jingliang Hu"},"buildId":"rcP1HS6ompi8ywYpLW-WW","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[7336,5501],"gssp":true,"scriptLoader":[]}</script></body></html>