CINXE.COM
Madelon Hulsebos
<!DOCTYPE html> <html> <head> <title>Madelon Hulsebos</title> <!-- Begin Jekyll SEO tag v2.8.0 --> <meta name="generator" content="Jekyll v3.10.0" /> <meta property="og:title" content="Madelon Hulsebos" /> <meta name="author" content="Madelon Hulsebos" /> <meta property="og:locale" content="en_US" /> <link rel="canonical" href="http://madelonhulsebos.github.io/" /> <meta property="og:url" content="http://madelonhulsebos.github.io/" /> <meta property="og:site_name" content="Madelon Hulsebos" /> <meta property="og:type" content="website" /> <link rel="next" href="http://madelonhulsebos.github.io/news/page2/" /> <meta name="twitter:card" content="summary" /> <meta property="twitter:title" content="Madelon Hulsebos" /> <script type="application/ld+json"> {"@context":"https://schema.org","@type":"WebSite","author":{"@type":"Person","name":"Madelon Hulsebos"},"headline":"Madelon Hulsebos","name":"Madelon Hulsebos","url":"http://madelonhulsebos.github.io/"}</script> <!-- End Jekyll SEO tag --> <link rel="stylesheet" type="text/css" href="/assets/style.css" /> <link rel="alternate" type="application/rss+xml" title="Madelon Hulsebos" href="/feed.xml" /> <link rel="canonical" href="http://madelonhulsebos.github.io/" /> <link rel="stylesheet" href="/assets/css/academicons.css" /> <meta name="theme-color" content="#000000"> <script async src="https://www.googletagmanager.com/gtag/js?id=G-3JGXB637F3"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-3JGXB637F3'); </script> </head> <body> <div id="bar"></div> <div class="wrapper-container"> <div class="wrapper-masthead"> <div class="container"> <header class="masthead clearfix"> <div class="site-info"> <h1 class="site-name"><a href="/" style="color: black"><b>Madelon Hulsebos</b></a></h1> <!-- <p class="site-description"></p> --> </div> <nav> <a href="/" style="color: black">Home</a> <!-- <a href="/projects">Projects</a>--> <a href="/talks" style="color: black">Talks</a> <a href="/news" style="color: black">News</a> <a href="/upcoming" style="color: black">Upcoming</a> <a href="/assets/CV.pdf" target="_blank" style="color: black">CV</a> <a href="https://trl-lab.github.io" target="blank" style="color: black">TRL Lab</a> <!-- <a href="/trl-index" style="color: rgb(118, 116, 116)">TRL index</a> --> </nav> </header> </div> </div> <div class="wrapper-main"> <div id="main" role="main" class="container"> <article class="page"> <h4> </h4> <div style="text-align: center;"> <img width="18%" src="/images/madelon-hs.jpeg"/> </div> <div style="text-align: justify;"> <div style="text-align: justify;"> <h2>About me</h2> I'm a faculty at <a href="https://www.cwi.nl/en/" target="blank">CWI</a> where I lead the <a href="https://trl-lab.github.io">Table Representation Learning (TRL) Lab</a> and am member of the Database Architectures group. I'm also an <a href="https://ellis.eu/" target="blank">ELLIS</a> faculty with the Amsterdam unit. Previously, I was a postdoctoral fellow at UC Berkeley and obtained my PhD at the University of Amsterdam with research at Sigma Computing and MIT. My research has been supported by an NWO AiNed grant (>$1M), Accenture-BIDS Fellowship, and industry sponsors. Before academia, I spent 2+ years in industry working on automating data analysis pipelines with ML. <p> My research is focused on Table Representation Learning (TRL) and generative models for tabular data, as tables are prevalent in the data landscape, contain valuable data, and power important decisions in organizations such as governments, enterprises, and hospitals. The objective: <i>democratizing insights from structured data</i> ✨. </p> <p> To establish tabular data as a key modality for AI, akin to images and text, I've been driving some TRL initatives since 2020. In particular, I founded the <a href="https://table-representation-learning.github.io" target="blank">Table Representation Learning workshop</a> series at NeurIPS and ACL, and the <a href="https://ivi.fnwi.uva.nl/ellis/research/table-representation-learning/" target="blank">TRL research theme</a> at the ELLIS unit Amsterdam. I organize various related efforts at <a href="https://www.dagstuhl.de/seminars/seminar-calendar/seminar-details/25182" target="blank">Dagstuhl</a>, SIGMOD and beyond, and review for tracks/workshops at e.g. VLDB, SIGMOD, NeurIPS, ICLR, WWW. Read more in my <a href="https://madelonhulsebos.github.io/assets/CV.pdf" target="_blank"></b>CV</a>. </p> <div> <div style="text-align: center;"> <a href="mailto:madelon@berkeley.edu" target="_blank"><i class="svg-icon email" target="_blank"></i></a> <a href="https://scholar.google.com/citations?user=6IWQn2EAAAAJ" target="_blank"><i class="svg-icon googlescholar"></i></a> <a href="https://bsky.app/profile/madelonhulsebos.bsky.social" target="_blank"><i class="svg-icon bluesky"></i></a> <a href="https://www.linkedin.com/in/madelonhulsebos" target="_blank"><i class="svg-icon linkedin"></i></a> <a href="https://github.com/madelonhulsebos" target="_blank"><i class="svg-icon github"></i></a> </div> </div> </p> </div> <p> <h2>Research interests</h2> <!-- <b>Research interests</b><br> --> I'm generally interested in Table Representation Learning (TRL), with a particular focus on: <li>(Relational) Table Embeddings</li> <li>Generative Models (e.g. LLMs) for Relational Data, e.g. for QA/text2sql, data wrangling, etc</li> <li>Retrieval over Tabular Data sources (data lakes, relational databases)</li> <li>End-to-end (Agentic) Systems for Data Analysis and Data Science over Tabular Data</li> To stay updated on relevant events/updates from the broader git sTRL community, consider joining the<a href="https://discord.com/invite/E4AHvPKhxw" target="blank">TRL Discord</a>, <a href="https://bsky.app/profile/trl-research.bsky.social" target="blank">TRL Bluesky</a>, and <a href="https://forms.gle/Lg1twbWvV8w83zH38" target="blank">TRL Mailinglist</a>. </p> <h2>Selected projects</h2> <div style="text-align: justify;"> <h3></h3> <!-- <center> - </center> --> <p> The projects below reflect my main research interest. But I enjoy working on other topics too. Check <a href="https://scholar.google.com/citations?user=6IWQn2EAAAAJ&hl=nl&oi=ao" target="_blank"> my profile on Google Scholar </a> for my full publication record. <div> <b>TARGET Benchmark</b> [TRL@NeurIPS, 2024]<br> The first benchmark for evaluating table retrieval methods in generative pipelines (e.g. RAG) over structured data.<br> <a href="https://target-benchmark.github.io/static/pdfs/TARGET_V1.pdf" target="_blank">paper</a> | <a href="https://target-benchmark.github.io/" target="_blank">website</a> <br> </div> <br> <div> <b>Dataset Search</b> [HILDA@SIGMOD, 2024]<br> 1) Survey results surfacing why, what, and how is searched for data, open challenges, and system desiderata.<br> 2) System (tbc). <br> <a href="/assets/dataset_search_survey.pdf" target="_blank">1) paper survey</a> <br> </div> <br> <div> <b>SchemaPile</b> [SIGMOD 2024] <br> A dataset of approximately 221K real-world database schemas extracted from SQL files from GitHub. <br> <a href="https://dl.acm.org/doi/pdf/10.1145/3654975" target="_blank">paper</a> | <a href="https://zenodo.org/records/10931803" target="_blank">dataset</a> | <a href="https://github.com/tdoehmen/gitschemas" target="_blank">code</a> </div> <br> <div> <b>Observatory</b> [PVLDB, TRL@NeurIPS, 2023]<br> 1) Framework for analyzing table embeddings based on the relational model, and desiderata for TRL models.<br> 2) Library for extracting table embeddings on row- column-, cell-level. <br> <a href="https://arxiv.org/abs/2310.07736" target="_blank">1) analysis paper</a> | <a href="/assets/observatory_library.pdf" target="_blank">2) library paper</a> | <a href="https://github.com/superctj/observatory" target="_blank">code</a> <br> </div> <br> <div> <b>GitTables</b> [SIGMOD, 2023] <br> Corpus of 1.7M relational tables extracted from GitHub CSVs. Columns annotated w/ semantic types.<br> <a href="https://dl.acm.org/doi/pdf/10.1145/3588710" target="_blank">paper</a> | <a href="https://gittables.github.io" target="_blank">website</a> | <a href="https://zenodo.org/record/6517052" target="_blank">dataset</a> | <a href="https://github.com/madelonhulsebos/gittables" target="_blank">code</a> | <a href="https://www.youtube.com/watch?v=jEBKcmdIFzw" target="_blank">video presentation</a> | <a href="/assets/GitTables (slides).pdf" target="_blank">slides</a> | <a href="https://disseminatepodcast.podcastpage.io/episode/madelon-hulsebos-gittables-a-large-scale-corpus-of-relational-tables-36" target="blank">podcast</a> </div> <br> <div> <b>AdaTyper</b> [CIDR, 2022]<br> Adaptive semantic column type detection system focusing on productization in industry contexts.<br> <a href="https://arxiv.org/pdf/2311.13806" target="_blank">paper</a> | <a href="https://www.youtube.com/watch?v=-BE5rWNMXnU">video presentation</a> </div> <!-- <br> <div> <b>Sato</b> [VLDB, 2020]<br> Method for semantic data type detection that takes column context into account, extends Sherlock. <br> <a href="https://arxiv.org/pdf/1911.06311.pdf" target="_blank">paper</a> | <a href="https://github.com/megagonlabs/sato" target="_blank">code</a> </div> --> <br> <div> <b>Sherlock</b> [KDD, 2019]<br> DL method for semantic data type detection of table columns (<a href="https://github.com/orgs/mitmedialab/repositories?q=&type=all&language=&sort=stargazers" target="_blank">top-5 MIT Media Lab repos</a>, 2 Aug 23).<br> <a href="https://arxiv.org/pdf/1905.10688.pdf" target="_blank">paper</a> | <a href="https://sherlock.media.mit.edu/" target="_blank">website</a> | <a href="https://github.com/mitmedialab/sherlock-project" target="_blank">code</a> </div> <br> <div> <b>VizNet</b> [CHI, 2019]<br> Corpus of over 31 million datasets from open data repositories, for benchmarking visualization studies. <br> <a href="https://arxiv.org/abs/1905.04616.pdf" target="_blank">paper</a> | <a href="https://viznet.media.mit.edu/" target="_blank">website</a> </div> <br> </p> </div> <h2>Recent news</h2> <div style="text-align: justify;"> <li style="list-style-type: '- ';"> <a href="/ellis-trl-theme/">Launched a Table Representation Learning initiative at ELLIS unit Amsterdam</a> <p style="display:inline"> Nov 27, 2024 </p> <p>Very excited to have established the new research theme on Table Representation Learning at <a href="https://ivi.fnwi.uva.nl/ellis/research/table-representation-learning/" target="blank">ELLIS unit Amsterdam</a>! From this establishment, I will soon organize a monthly seminar and workshops on TRL. The first <a href="https://sites.google.com/view/rl-and-gm-for-sd" target="blank">ELLIS workshop around TRL</a> is on 27 February in Amsterdam!</p> </li> </div> <div style="text-align: justify;"> <li style="list-style-type: '- ';"> <a href="/ellis-membership/">Joined the ELLIS society</a> <p style="display:inline"> Aug 27, 2024 </p> <p>As I am moving back from the Bay Area to Amsterdam – permanently, this time :) – I am very excited to become member of the European Laboratory for Learning and Intelligent Systems (<a href="https://ellis.eu/">ELLIS</a>) society, and strengthen the roots of my research around TRL in Europe. Expect some “TRL in Europe” initatives to come soon..!</p> </li> </div> <div style="text-align: justify;"> <li style="list-style-type: '- ';"> <a href="/trl-2024/">Table Representation Learning workshop @ NeurIPS 2024</a> <p style="display:inline"> Aug 26, 2024 </p> <p>Excited to organize the third edition of the <a href="https://table-representation-learning.github.io/" target="blank">Table Representation Learning workshop at NeurIPS 2024</a>! Besides hosting the latest work on models/performance/applications of representation learning and generative models over tables (looking forward!), we’ll have some exciting announcements to share with the TRL community..</p> </li> </div> <div style="text-align: justify;"> <li style="list-style-type: '- ';"> <a href="/sigmod-recap/">SIGMOD recap!</a> <p style="display:inline"> Jun 20, 2024 </p> <p>Attended SIGMOD in Chile! It was fun and busy, Santiago was lovely. Co-organized the DEEM workshop and presented the <a href="https://www.madelonhulsebos.com/assets/dataset_search_survey.pdf" target="blank">survey on dataset search in practice</a> at HILDA. Generally, discussions and talks about retrieval, (structured) data semantics, and vector databases had my particular interest this year.<br /></p> </li> </div> <div style="text-align: justify;"> <li style="list-style-type: '- ';"> <a href="/cwi-ained-datalibra/">Awarded AiNed Fellowship Grant funding 5-year research project at CWI</a> <p style="display:inline"> Mar 20, 2024 </p> <p>Thrilled to share that I’m awarded the <a href="https://www.nwo.nl/nieuws/vijf-ained-fellowship-beurzen-toegewezen" target="blank">AiNed Fellowship Grant</a> (worth $1M) to lead the 5-year DataLibra research project at CWI in Amsterdam starting fall 2024. DataLibra is focused on democratizing insight retrieval from structured data through representation learning and generative models for relational tables.</p> </li> </div> </article> </div> </div> </div> </body> </html>