CINXE.COM
Katherine A. Keith
<!doctype html> <html class="no-js" lang=""> <head> <meta charset="utf-8"> <title>Katherine A. Keith</title> <meta name="description" content=""> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta property="og:title" content=""> <meta property="og:type" content=""> <meta property="og:url" content=""> <meta property="og:image" content=""> <link rel="manifest" href="site.webmanifest"> <link rel="apple-touch-icon" href="icon.png"> <link rel="shortcut icon" href="favicon.ico" type="image/vnd.microsoft.icon"> <!-- Place favicon.ico in the root directory --> <link rel="stylesheet" href="css/normalize.css"> <link rel="stylesheet" href="css/main.css"> <link rel="stylesheet" type="text/css" href="//fonts.googleapis.com/css?family=Nunito" /> <!-- add icons --> <link rel="stylesheet" href="https://cdn.rawgit.com/jpswalsh/academicons/master/css/academicons.min.css"> <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css"> <script src="https://kit.fontawesome.com/f1a62b90a7.js" crossorigin="anonymous"></script> <meta name="theme-color" content="#fafafa"> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script> <!-- CREATE SCROLLABLE CONTENT --> <style> /*stuff for the scrollbar*/ .scrollable { height: 400px; /* or any value */ overflow-y: scroll; } .scrollarea-content { min-height:101%; } ::-webkit-scrollbar { -webkit-appearance: none; width: 7px; } ::-webkit-scrollbar-thumb { border-radius: 4px; background-color: rgba(0, 0, 0, .5); box-shadow: 0 0 1px rgba(255, 255, 255, .5); } </style> </head> <body> <!-- Add your site or application content here --> <!--Website header--> <div id="introduction-wrapper"> <img id="main-profile-pic" src="./images/kk.jpg" alt="Katherine Keith"> <div id="bio"> <h1>Katherine A. Keith</h1> <p></p> <p style="margin: 0"> <i class="fa-solid fa-envelope fa-lg"></i> kak5 (at) williams (dot) edu</p> <p style="margin: 0"><i class="fa-solid fa-file fa-lg"></i> <a href="vita/cv.pdf">curriculum vitae (CV)</a></p> <p></p> <div class="social"> <span class="schol"><a href="https://www.semanticscholar.org/author/Katherine-A.-Keith/145137850"><i class="ai ai-semantic-scholar-square ai-2x"></i></a> </span> <span class="gh"><a href="https://scholar.google.com/citations?user=8848cTsAAAAJ&hl=en"><i class="ai ai-google-scholar-square ai-2x"></i> </a> </span> <span class="lin"><a href="https://www.linkedin.com/in/katherine-keith-96b66b44/"><i class="fa-brands fa-linkedin fa-2x"></i></a> </span> <span class="gh"><a href="https://github.com/kakeith"><i class="fa-brands fa-github fa-2x"></i></i> </a> </span> <span class="bsky"><a rel="me" href="https://bsky.app/profile/katakeith.bsky.social"><i class="fa-brands fa-bluesky fa-2x"></i></a> </span> <!-- </a> </span> <span class="twit"><a href="https://twitter.com/katakeith"><i class="fab fa-twitter-square fa-2x"></i></a> </span> <span class="mast"><a rel="me" href="https://sigmoid.social/@katakeith"><i class="fab fa-mastodon fa-2x"></i></a> </span> --> </div> </div> </div> <!--end of website header--> <!--Navigation bar--> <div id="menu"> <div> <a href="index.html">Home</a> </div> <div> <a href="research.html">Research</a> </div> <div> <a href="teaching.html">Teaching</a> </div> <script> $(function () { const current = window.location.pathname.split("/"); $('#menu div a').each(function(){ const $this = $(this); // if the current path is like this link, make it active if (current.indexOf($this.attr("href")) !== -1) { $this.parent().addClass("now"); } }) }); </script> </div> <!--end of Navigation bar--> <!--Actual tab content--> <div id="main"> <!-- Main bio --> <div> <p> I am an Assistant Professor in the Computer Science department at <a href="https://www.williams.edu/">Williams College</a>. Previously, I was a Postdoctoral Young Investigator at the <a href="https://allenai.org/">Allen Institute for Artificial Intelligence</a>. I received a PhD from the <a href="https://www.cics.umass.edu/">Manning College of Information and Computer Sciences</a> at the <a href="https://www.umass.edu/">University of Massachusetts Amherst</a> in August 2021, and I am grateful to have been supported by a <a href="https://ds.cs.umass.edu/news/cics-doctoral-student-wins-bloomberg-data-science-fellowship">Bloomberg Data Science PhD Fellowship</a> 2019-2021. </p> <!-- <p> <center> <img src="attach/bridges.png" alt="Research goals" width="360" align="center" style='border:5px solid white'> </center> </p> --> <p> My work lies at the intersection natural language processing—programming a computer to understand, classify, and generate human language—and data science—finding patterns in data that help us better understand the world. Specifically, I work on methods and applications that help make quantitative social science conclusions from large-scale text-based datasets, aligning with the subfields of text-as-data and computational social science. I focus on language because it is one of the most salient forms of expressing thoughts, sharing ideas, and recording history. My work often involves collaborations with other scholars in political science, legal studies, education, sociology, and other social science fields. </p> <p> One of my primary research goals is to expand the set of text-specific causal estimation designs available to data scientists. Causal inference is the theoretical framework underlying policy and interventions: what would happen to variable A if we intervened and set variable B to something different? I focus on causal inference (primarily using non-experimental/observational datasets) because causal inference frameworks allow us to make our assumptions explicit before moving to statistical estimation. </p> <p> Incorporating variables with text into causal inference is difficult because language is high-dimensional and complex, and often even our best natural language processing methods result in measurement error. Improved text-based causal methods could help many disciplines that want to understand the mechanisms of human behavior that intersect with speech or text communication. </p> <p> Outside of research, I am also interested in projects that build bridges between natural language processing and the social sciences. In line with this goal, <a href="https://ianbstewart.github.io/">Ian Stewart</a> and I organized the <a href="https://nlp-css-201-tutorials.github.io/nlp-css-201-tutorials/">NLP+CSS 201 Online Tutorial Series</a> and I have helped organize the <a href="https://sites.google.com/site/nlpandcss/">NLP+CSS Workshops</a>. <!-- These videos now have over 5000 cummulative views on <a href="https://www.youtube.com/channel/UCcFcF9DkanjaK3HEk7bsd-A">our YouTube channel</a>. </p> <p> <img src="attach/diaries-icon.jpeg" alt="Podcast logo" width="60" height="60" align="left" style='border:5px solid white'> --> I also hosted the podcast <a href="https://anchor.fm/diaries-soc-data-research">Diaries of Social Data Research</a> with <a href="https://lucy3.github.io/">Lucy Li</a> in which interdisciplinary scholars described the "behind the scenes" diaries of their published papers and tips for success. </p> </div> <h2>Recent News</h2><p>(scroll down for more)</p> <div class="scrollable"> <div id='scrollarea-content'> <li> January 23, 2025 — My student, Jeeyon Kang, and I released <a href="https://github.com/jeeyonkang/Supreme-Court-Dataset-Extended">open-source code</a> to scrape and update Supreme Court oral argument transcripts to the present day. </li> <li> October 14, 2024 — I gave an invited talk at Stuttgart University in Germany [<a href="attach/stuttgart-talk.pdf">slides</a>] </li> <li> August 5, 2024 — I presented our work on RCT rejection sampling at JSM [<a href="https://docs.google.com/presentation/d/1imxZoTVTUuDxl4P29oGfFN1-g-O4geVDN8jEw7eqcrk/edit?usp=sharing">slides</a>]</li> <li> July 22, 2024 — Excited that collaborators and I have been awarded $744,446 from <a href="https://www.masslifesciences.com/healey-driscoll-administration-announces-more-than-13-million-to-support-life-sciences-workforce-development-and-stem-education-initiatives/">MLSC</a>! <li> June 17-21, 2024 — I am in Mexico City for NAACL where I am co-organizing the <a href="https://sites.google.com/site/nlpandcss/nlp-css-at-naacl-2024">NLP+CSS workshop</a>.</li> <li> May 16, 2024 — I gave a talk at <a href="https://sci-info.org/annual-meeting/program/agenda/">ACIC 2024</a> during the "Statistical Innovations in Causal Inference with Text as Data: Emerging Trends and Future Directions" invited session <a href="https://docs.google.com/presentation/d/1GuzwV8hAuK2o13nohjV_kdEU6IAPJ9B4DqqMKf7npbs/edit?usp=sharing">[slides]</a>.</li> <li> February 8, 2023 — I gave an invited talk for the <a href="https://liu.se/en/organisation/liu/iei/ias">Institute for Analytical Sociology</a>'s Seminar series entitled "Melding NLP and Causal Inference" <a href="https://docs.google.com/presentation/d/1fH7at-azJCKdpQqE9eA7_8Cluwph3HeCgr7EEsz0pzU/edit?usp=drive_link">[slides] </a> <a href="https://www.youtube.com/watch?v=csKVmdTOx4E">[recording]</a>. </li> <li> December, 2023 — I've migrated to <a href="https://bsky.app/">Blue Sky</a>.</li> <li> November 10-11, 2023 — I presented our work <a href="attach/tada2023-gatekeepers.pdf">[slides]</a> and was a discussant <a href="attach/tada2023-discussant.pdf">[slides]</a> at <a href="https://tada2023.org/">TADA 2023</a>. </li> <li> October 25, 2023 — I gave an online guest lecture for <a href="https://web.stanford.edu/class/cs293/">Dora Demszky's CS 293 / EDUC 473: Empowering Educators via Language Technology</a> at Stanford. </li> <li> September 27, 2023 — Here's an <a href="https://www.e-ir.info/2023/09/22/interview-katherine-keith/">interview</a> with me for E-International Relations discussing computational social science and text-as-data. </li> <li> September 8, 2023— I gave a <a href="https://cs.uiowa.edu/event/125351/0">talk</a> for University of Iowa's Computer Science Colloquium. </li> <li> July 18, 2023 — I gave a talk to the Summer Science Program at Williams College <a href="https://docs.google.com/presentation/d/1YUaXkLfTdnh5g4i1v2xsvN4rSNVKepSli_hg1QuSBdc/edit?usp=sharing">[slides].</a> </li> <li> June 5, 2023 — Congrats to my NLP students David Goetze and Mark Bissell who won the 2023 Rich Ward Prize for Best Student Project in Computer Science for their NLP project "RAGdoll: A Retrieval-Augmented Generation System for Williams College Academic Advising." This prize is given to only one project group across all of Williams computer science for a given academic year. <li>March 20, 2023 — <a href="https://osf.io/preprints/socarxiv/4dngy/">Our paper</a> was <a href="https://pca.st/episode/32e88847-2662-45af-9acd-99346e3070f7?t=2657">mentioned on Strict Scrutiny</a>, a podcast about the Supreme Court. <li> January 6, 2023 — Congrats to <a href="https://lucy3.github.io/">Lucy Li</a>, my mentee at AI2 during the summer of 2022, for winning <a href="https://allenai.org/outstanding-interns">AI2's Outstanding Intern of the Year Award</a>. Check out the pre-print of Lucy's summer internship project <a href="https://arxiv.org/pdf/2212.09676.pdf">here</a>. </li> <li> December 5-12, 2022 — I am traveling to Abu Dhabi, UAE to attend <a href="https://2022.emnlp.org/">EMNLP</a>! I am one of the organizers of the <a href="https://sites.google.com/site/nlpandcss/home/nlp-css-at-emnlp-2022?authuser=0">NLP+CSS Workshop</a> and one of my collaborators is presenting <a href="https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00511/113490/Causal-Inference-in-Natural-Language-Processing">our TACL paper</a>. </li> <li> November 2022 — I am honored to have been awarded a <a href="https://faculty.williams.edu/faculty-support-and-resources/recently-awarded-faculty-grants/">Young Investigator Grant from the Allen Institute of Artificial Intelligence</a>. </li> <li> October 19, 2022 — I was the speaker at the <a href="https://events.williams.edu/event/automated-event-extraction-for-news-based-counterdata-by-prof-katie-keith-cs-at-williams-college/">Williams College Statistics Colloquium</a> <a href="attach/stats-talk.pdf">[slides]</a>. <!-- Keynote slides attach/stats-talk.key--> </li> <li> October 6, 2022 — I am presenting <a href="attach/tada2022-poster.pdf">a poster </a> at <a href="https://tada2022.org/">TADA 2022</a>. </li> <li> August 30, 2022 — Excited to be on the organizing committee for the <a href="https://sites.google.com/site/nlpandcss/home/nlp-css-at-emnlp-2022/call-for-papers-nlp-css-2022">NLP+CSS workshop at EMNLP</a> this year. Submit your papers by September 19! </li> <li> August 9, 2022 — I was featured in <a href="https://t.co/y1uMa89thL">a blog post by AI2</a>. <li> July 15, 2022 — I started my position at <a href="https://www.williams.edu/">Williams College</a> and have moved to beautiful <a href="https://williamstownma.gov/">Williamstown, MA</a>. </li> <li> May 24, 2022 — I presented a poster at the <a href="https://ctml.berkeley.edu/american-causal-inference-conference-2022">2022 American Causal Inference Conference (ACIC)</a>. </li> <li> March 30, 2022 — I created and presented a <a href="https://nlp-css-201-tutorials.github.io/nlp-css-201-tutorials/">tutorial</a>, "Aggregated Classification Pipelines: Propagating Probabilistic Assumptions from Start to Finish," for our NLP+CSS 201 tutorial series. </li> <li> March 28, 2022 — I presented a guest lecture for <a href="https://www.cc.gatech.edu/classes/AY2022/cs6471_spring/">Diyi Yang's Computational Social Science seminar</a> at Georgia Tech. Slides <a href="attach/2022-03-28-css-lecture.pdf">here</a>. </li> <li> November 12, 2021 — I moderated the panel discussion at the <a href="https://causaltext.github.io/2021/program/">CI+NLP Workshop at EMNLP</a>. Here's some <a href="https://twitter.com/katakeith/status/1460383339127312388">paraphrased highlights</a> from our discussion. </li> <li> November 9, 2021 — Lucy Li and I were <a href="https://podcasts.apple.com/us/podcast/lucy-li-and-katie-keith/id1533729474?i=1000544487959">interviewed on the radio show "The Graduates"</a> about our podcast, research, and our field more broadly. </li> <li> October 28-29, 2021 — I presented two posters at <a href="https://tada2021.org/">TADA</a>, held at University of Michigan Ann Arbor. </li> <li> September 24, 2021 — I am co-organizing (with <a href="https://ianbstewart.github.io/">Ian Stewart</a>) <a href="https://nlp-css-201-tutorials.github.io/nlp-css-201-tutorials/">NLP+CSS 201: Beyond the basics</a>, an online hands-on tutorial series about advanced methods in natural language processing and computational social science. We are grateful to have received assistance for this series from the <a href="https://www.ssrc.org/">Social Science Research Council (SSRC)</a>/<a href="https://sicss.io/">Summer Institutes in Computational Social Science (SICSS)</a> Research Grant. </li> <li>August 17, 2021 — I successfully defended my PhD dissertation (<a href="https://youtu.be/gSclWsDYumg">recording of the defense</a>)! </li> <li>June 30, 2021 — Our <a href="https://github.com/slanglab/IndiaPoliceEvents">IndiaPoliceEvents corpus</a> was featured in the <a href="https://www.data-is-plural.com/archive/2021-06-30-edition/">Data Is Plural</a> newsletter. </li> <li>June 14-18, 2021 — I am participating in the <a href="https://sicss.io/2021/princeton/people">Summer Institute for Computational Social Science</a> this week. Here are <a href="attach/sicss-flash-talk.pdf">slides</a> for the flash talk I gave during the event. </li> <li>June 14, 2021 — <a href="https://lucy3.github.io/">Lucy Li</a> and I launched our new podcast, <a href="https://anchor.fm/diaries-soc-data-research">Diaries of Social Data Research</a>, where we probe the “research diaries” of scholars in computational social science and adjacent fields with the hope of normalizing the challenges of and increasing accessibility in academia. </li> <li> January 2021 — I am thankful to have successfully made it through the academic job market! Here is my <a href="attach/kkeith-research-statement.pdf">research statement</a>, <a href="attach/kkeith-teaching-statement.pdf">teaching statement</a>, and <a href="attach/kkeith-diversity-statement.pdf">diversity statement</a>, <a href="attach/kkeith_cover_williams.pdf">Williams cover letter</a>, and <a href="attach/kkeith-job-talk-final.pdf">job talk slides</a>. Please reach out if you have questions about the job market process (especially for computer science at liberal arts colleges). </li> <li> December 2020 — I was named an <a href="https://www.aclweb.org/anthology/2020.emnlp-main.0.pdf">Outstanding Reviewer for EMNLP 2020</a>. </li> <li> November 2020 — I am on the organizing committee for the <a href="https://causaltext.github.io/2021/">1st Workshop on Causal Inference & NLP</a> which will be held at EMNLP 2021. </li> <li> April 2020 — <a href="https://andrewhalterman.com/">Andy Halterman</a>, <a href="https://people.cs.umass.edu/~smsarwar/">Sheikh Sarwar</a>, and I were awarded a $5,000 <a href="https://www.kaggle.com/open-data-research-grant-2020-awardees#project-title-13">Kaggle Open Data Research Grant</a> for "Semantic Role Annotations For Real-World Political Texts." </li> <li> May 2020 — <a href="https://twitter.com/yudapearl/status/1259296375398731776">Judea Pearl tweeted about our paper!</a> </li> <li> Spring 2020 — I joined and am actively working with the <a href="https://www.umass.edu/diversitysciences/rebls-network">REBLS (Research, Educator, Business Leaders, and Students) Network</a> to increase access and opportunity for underrepresented students in computer science and engineering! </li> <li> January 1, 2020 — It has been my distinct pleasure to serve as Co-Chair of <a href="http://cswomenumass.github.io">CSWomen</a>, a group of graduate women in computer science here at UMass, for two semesters! Here is a <a href="http://cswomenumass.github.io/fall2019_semester_recap/">recap</a> of the events we organized last semester. </li> <li> December 6, 2019 — I passed <a href="https://www.cics.umass.edu/grads/portfolio">portfolio</a> (the equivalent of a PhD candidacy exam) <em>with distinction.</em> </li> <li> November 15, 2019 — I was the invited speaker at <a href="https://events.williams.edu/event/computer-science-colloquium-katherine-keith-umass-amherst/">Williams College's Computer Science Colloquium</a>. </li> <li> August 2019 — Honored to have received a <a href="https://twitter.com/TechAtBloomberg/status/1166839978946695173"> Bloomberg Data Science PhD Fellowship</a>. Here's a <a href="https://ds.cs.umass.edu/news/cics-doctoral-student-wins-bloomberg-data-science-fellowship">profile</a> from UMass's Center for Data Science. </li> <li> July 2019 — Really enjoyed mentoring three undergrad researchers this summer. Check out a <a href="https://www.instagram.com/p/B0RB7jchvb6/"> profile </a> on their experience. </li> <li> September 24, 2018 — I was the invited speaker at <a href="https://college.lclark.edu/live/events/291985-natural-language-processing">Lewis & Clark College's Mathematical Sciences Colloquium</a>. </li> <li> September 2018 — <a href="https://sites.google.com/site/sulinblodgett/"> Su Lin Blodgett,</a> <a href="https://www.abehandler.com/"> Abe Handler,</a> and I co-designed and co-taught <a href="https://github.com/sblodgett/ai-ethics"> Ethical Issues Surrounding Artificial Intelligence Systems and Big Data, </a> a semester-long first-year computer science seminar at UMass. </li> <li> May 2018 — I am interning with Amanda Stent at <a href="https://www.techatbloomberg.com/post-topic/data-science/"> Bloomberg L.P.</a> in New York this summer. </li> <li> November 2017 — I helped design and organize our college's first <a href="https://github.com/thelimeburner/cics-male-allyship-workshops"> Male Ally Workshop </a> for graduate computer science students. </li> </div> </div> <h3>Misc.</h3> <p> In the past, I really enjoyed studying Chinese. I lived for twelve months in Kinmen, Taiwan on a <a href="http://taiwan-etaprogram.org">Fulbright English Teaching Assistantship</a>, and I completed a <a href="https://cetacademicprograms.com/programs/china/chinese-language-beijing-china/">language immersive study abroad program</a> in Beijing, China during a semester in undergrad. </p> <p> In my free time, I enjoy recreating outside! I grew up in Montana where learned to love trail-running, sport climbing, triathlons, and all types of skiing, particularly cross-country skate skiing, alpine skiing, and backcountry touring. </p> </div> <script src="js/vendor/modernizr-3.11.2.min.js"></script> <script src="js/plugins.js"></script> <script src="js/main.js"></script> </body> </html>