Benjamin Newman

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Benjamin Newman</title> <meta name="description" content=""> <meta name="author" content="">  <link rel="shortcut icon" href="img/favicon.ico" type="image/x-icon"> <link rel="apple-touch-icon" href="img/apple-touch-icon.png"> <link rel="apple-touch-icon" sizes="72x72" href="img/apple-touch-icon-72x72.png"> <link rel="apple-touch-icon" sizes="114x114" href="img/apple-touch-icon-114x114.png">  <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3" crossorigin="anonymous">  <link rel="stylesheet" type="text/css" href="fonts/font-awesome/css/font-awesome.css">  <link rel="stylesheet" type="text/css" href="css/style.css?v=2"> <link rel="stylesheet" type="text/css" href="css/prettyPhoto.css"> <link href='https://fonts.googleapis.com/css?family=Lato:400,700,900,300' rel='stylesheet' type='text/css'> <link href='https://fonts.googleapis.com/css?family=Open+Sans:400,700,800,600,300' rel='stylesheet' type='text/css'>    </head> </header>  <div id="about"> <div class="container"> <div class="section-title text-center center"> <h1>Ben Newman</h1> </div> <div class="row">  <div class="col-md-4 text-center">  <img src="img/bnh.jpg" class="img-responsive"> <p>blnewman at cs dot washington dot edu</p> <p><a href="https://github.com/bnewm0609">Github</a> |  <a href="https://scholar.google.com/citations?user=QehvrDoAAAAJ&hl=en">Google Scholar</a> | <a href="blog/index.html">Blog</a></p> </div>  <div class="col-md-8"> <div class="about-text"> <p>Hello! I'm Ben Newman, currently a PhD student at the <a href="https://www.cs.washington.edu/">Allen School</a> at the <a href="https://www.washington.edu/">University of Washington</a>, advised by <a href="https://homes.cs.washington.edu/~yejin/">Yejin Choi</a>. I'm interested in building computational systems that reliably process human language and studying the roles emerging language technologies can play in society more broadly.</p> <p>I've worked on projects analyzing models' abilities to extrapolate, process syntax, and communicate. I've also thought about how language technologies can usefully augment human language abilities, both for scientific discovery (e.g. testing psycholinguistic hypotheses) and in political science (e.g. countering minsinformation). I've been a course assistant for two of Stanford's NLP classes, <a href="http://cs124.stanford.edu">CS124</a> and <a href="https://cs224n.stanford.edu">CS224N</a>, and have co-taught courses in Introductory Linguistics and Computing Fundamentals to high schoolers at Stanford Splash. I was previously a Pre-doctoral Young Investigator at <a href="https://www.semanticscholar.org/research/research-team">Semantic Scholar Research</a> and a member of the <a href="https://nlp.stanford.edu">Stanford NLP group</a>.</p> </div> </div> <div class="col-md-2"></div> </div> </div> <hr> </div>  <div id="projects" class="text-center"> <div class="row"> <div class="col"> <div class="section-title center"> <h2>Publications</h2> </div> <div class="categories"> <div class="btn-project"> <p><b>A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documnts</b></p> <p>Benjamin Newman, <a href="https://soldaini.net/">Luca Soldaini</a>, <a href="https://rayfok.github.io/">Raymond Fok</a>, <a href="https://armancohan.com/">Arman Cohan</a>, <a href="https://kyleclo.github.io/">Kyle Lo</a></p> <p><i>EMNLP 2023</i></p> <p>[<a href="https://arxiv.org/pdf/2305.14772.pdf"/>pdf</a>][<a class="collapsible">tldr</a>]</p> <div id="project-description-5" style="display: None"> <br/> <p>We propose a question-answering framework for decontextualizing snippets from scientific documents and investigate the performance of state-of-the-art LLMs under this framework.</p> <p>Conducted during a PYIship at Semantic Scholar</p> </div> </div> <div class="btn-project"> <p><b>Comparing Sentence-Level Suggestions to Message-Level Suggestions in AI-Mediated Communication</b></p> <p><a href="https://www.semanticscholar.org/author/Liye-Fu/3436644">Liye Fu</a>, Benjamin Newman, <a href="https://mauricejakesch.com/">Maurice Jakesch</a>, <a href="https://government.cornell.edu/sarah-kreps">Sarah Kreps</a></p> <p><i>CHI 2023</i></p> <p>[<a href="https://dl.acm.org/doi/pdf/10.1145/3544548.3581351"/>pdf</a>][<a class="collapsible">tldr</a>]</p> <div id="project-description-4" style="display: None"> <br/> <p>LLMs can best help as email writing assistants for legislative staff members when they provide message-level suggestions compared to sentence-level ones.</p> </div> </div> <div class="btn-project"> <p><b>P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts</b></p> <p>Benjamin Newman, <a href="https://sites.google.com/view/prafulla-choubey/"/>Prafulla Kumar Choubey</a>, <a href="http://www.nazneenrajani.com/index.html">Nazneen Rajani</a></p> <p><i>ICLR 2022 (Poster)</i></p> <p>[<a href="https://openreview.net/pdf?id=DhzIU48OcZh"/>pdf</a>][<a class="collapsible">tldr</a>]</p> <div id="project-description-3" style="display: None"> <br/> <p>Querying Pretrained NLP models with semantically equivalent prompts (e.g. "[MASK] is Canada's capital" and "Canada, which has capital city [MASK]") do not necessarily lead to the same predictions. Can low-parameter fine-tuning methods increase model consistency? Yes! And they don't need as many annotations as other methods.</p> <p>Conducted during an internship at Salesforce Research</p> </div> </div> <div class="btn-project"> <p><b>Refining Targeted Syntactic Evaluation</b></p> <p>Benjamin Newman, <a href="https://www.linkedin.com/in/kai-ang"/>Kai Siang-Ang</a>, <a href="http://juliagong.com">Julia Gong</a>, <a href="https://nlp.stanford.edu//~johnhew//">John Hewitt</a></p> <p><i>NAACL 2021</i></p> <p>[<a href="https://arxiv.org/pdf/2104.09635.pdf">pdf</a>] [<a href="https://github.com/bnewm0609/refining-tse">code</a>] [<a href="research/newman2021refining.bib">cite</a>] [<a href="blog/2021/05/06/refining-tse.html">blog</a>] [<a class="collapsible">tldr</a>]</p> <div id="project-description-2" style="display: None"> <br/> <p>How should we evaluate the syntactic understanding of NLP systems? We argue for evaluating models' <b>likely behavior</b> and <b>systematicty</b>, and build off of minimal pair evaluation to address these goals. We find models are better at conjugating verbs they deem likely.</p> </div> </div> <div class="btn-project"> <p><b>The EOS Decision and Length Extrapolation</b></p> <p>Benjamin Newman, <a href="https://nlp.stanford.edu//~johnhew//">John Hewitt</a>, <a href="https://cs.stanford.edu/~pliang/">Percy Liang</a> and <a href="https://nlp.stanford.edu/~manning/">Chris Manning</a></p> <p><i>Blackbox NLP@EMNLP 2020</i> (<b>Outstanding Paper</b>)</p> <p>[<a href="https://arxiv.org/pdf/2010.07174.pdf">pdf</a>] [<a href="https://github.com/bnewm0609/eos-decision">code</a>] [<a href="research/newman2020extrapolation.bib">cite</a>] [<a class="collapsible">tldr</a>]</p> <div id="project-description-1" style="display: None"> <br/> <p>Why do sequence models struggle to extrapolate? For many reasons, but the decision to train models with End of Sequence tokens at the end of training examples is one of them. We investigate and visualize the effect that this decision has on neural models' extrapolative abilities.</p> </div> </div> <div class="btn-project"> <p><b>Communication-based Evaluation for Natural Language Generation</b></p> <p>Benjamin Newman, <a href="https://reubencohngordon.com">Reuben Cohn-Gordon</a>, and <a href="https://web.stanford.edu/~cgpotts/">Christopher Potts</a></p> <p><i>Society for Computation in Linguistics@LSA 2020</i></p> <p>[<a href="https://arxiv.org/pdf/1909.07290.pdf">pdf</a>] [<a href="https://github.com/bnewm0609/comm-eval">code</a>] [<a href="https://web.stanford.edu/~cgpotts/reference-info/potts-bibtex.html#Newman:Cohn-Gordon:Potts:2020:SCiL">cite</a>] [<a class="collapsible">tldr</a>]</p> <div id="project-description-0" style="display: None"> <br/> <p>Do n-gram overlap metrics like BLEU capture whether the models are successful communicators? Not really, so we introduce a new method for evaluating communicative effectiveness based on the Rational Speech Acts framework.</p> <p>Conducted during <a href="http://cs224u.stanford.edu">CS224U</a> and the <a href="https://www-csli.stanford.edu">Center for the Study of Language and Information (CSLI)</a> summer internship.</p> </div> </div> </div> </div> <div class="col"> <div class="section-title center"> <h2>Large-Scale Collaborations</h2> </div> <div class="categories"> <div class="btn-project"> <p><b>Holistic Evaluation of Language Models</b></p> <p>Bommasani et al., 2021</p> <p><i>Disinformation Evaluation</i>: Led the design, implementation, and analysis of the disinformation scenarios and metrics, including leading the human evaluation for disinformation.</p> <p>[<a href="https://arxiv.org/pdf/2211.09110.pdf">pdf</a>]</p> </div> <div class="btn-project"> <p><b>On the Opportunities and Risks of Foundation Models</b></p> <p>Bommasani et al., 2021</p> <p><i>Misuse Section</i>: <a href="https://atcbosselut.github.io">Antoine Bousselut</a>*, <a href="https://sites.google.com/site/shelbygrossman/">Shelby Grossman</a>*, and Benjamin Newman [<a href="https://arxiv.org/pdf/2108.07258.pdf">pdf</a>]</p> </div> <div class="btn-project"> <p><b>The Long Fuse: Misinformation and the 2020 Election</b></p> <p>The Election Integrity Partnership was a coalition of research groups that tracked misinformation in the run-up to the 2020 US election. [<a href="https://www.eipartnership.net">site</a>][<a href="https://purl.stanford.edu/tr171zs0069">pdf</a>]</p> </div> </div> <div class="section-title center"> <h2>Course Projects</h2> </div> <div class="categories"> <div class="btn-project"> <p><b>Unsupervised Recovery of Tree Metrics from Learned Representations</b></p> <p>Representations from pretrained language models likely incorporate syntax, but can we access it without training supervised probes? [<a href="projects/CS229/CS229_Final_Paper.pdf">pdf</a>]</p> <p><a href="http://web.stanford.edu/class/cs229/">CS229</a>: Machine Learning. Final Project (2019).</p> </div> <div class="btn-project"> <p><b>English-Chinese Name Machine Transliteration Using Search and Neural Networks</b></p> <p>What's your name in Chinese? Translating a name is different from translating an article because name translations are based in phoenetics and lack large corpera. We explore two approaches. [<a href="projects/CS221/CS221_Final_Paper.pdf">pdf</a>] [<a href="https://github.com/bnewm0609/cs221-project">code</a>]</p> <p><a href="http://web.stanford.edu/class/cs221/">CS221</a>: Artificial Intelligence: Principles and Techniques: Final Project with <a href="http://web.stanford.edu/~jxgong/">Julia Gong</a> (<a href="https://en.wikipedia.org/wiki/2018">2018</a>).</p> </div> <div class="btn-project"> <p><b>Using POMDPs to Learn Language in a Spatial Reference Game</b></p> <p>How can you teach computational agents to follow directions without defining what each instruction means? POMDPs! [<a href="projects/CS238/CS238_Final_Paper.pdf">pdf</a>] [<a href="https://github.com/bnewm0609/cs238-project">code</a>]</p> <p><a href="https://web.stanford.edu/class/aa228/cgi-bin/wp/">CS238</a>: Decision Making under Uncertainty: Final Project with <a href="http://suvirpmirchandani.com">Suvir Mirchandani</a> and <a href="https://www.levilian.com">Levi Lian</a> (2018)</p> </div> <div class="btn-project"> <p><b>Swear Analysis</b></p> <p>What we can learn about people's use of swears by looking at their word2vec and GLOVE embeddings? [<a href="projects/swear_analysis/swear_analysis.pdf">pdf</a>]</p> <p><a href="https://web.stanford.edu/class/linguist130a/">Linguist 130A</a>: Semantics and Pragmatics: Final Project with <a href="http://web.stanford.edu/~jxgong/">Julia Gong</a> (2018)</p> </div> <div class="btn-project"> <p><b>Zero Width Space Encrypter</b></p> <p>Hiding secret messages in HTML zero-width space characters. Demo <a href="projects/whitespace/whitespace.html">here</a>!</p> </div> <div class="btn-project"> <p><b><a href="http://stanfordssi.org">Stanford Student Space Initiative (SSI)</a></b></p> </div> </div> <div class="section-title center"> <h2>Class Notes</h2> </div> <div class="categories"> <div class="btn-project"> <p><b><a href="splash/In_a_Manner_of_Speaking_2018F.pdf">Stanford Splash Slides: In a Manner of Speaking...</a></b></p> </div> <div class="btn-project"> <p><b><a href="class-notes/linguistics_notes.pdf">LINGUIST 21N: Linguistic Diversity and Universals</a></b></p> </div> </div> </div> </div> </div> </div> <div id="footer"> <div class="container text-center"> <div class="fnav"> <p>Designed by <a href="http://www.templatewire.com" rel="nofollow">TemplateWire</a></p> </div> </div> </div> <script type="text/javascript" src="js/jquery.1.11.1.js"></script>  <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js" integrity="sha384-ka7Sk0Gln4gmtz2MlQnikT1wXgYsOg+OMhuP+IlRH9sENBO0LRn5q+8nbTov4+1p" crossorigin="anonymous"></script>  <script type="text/javascript" src="js/easypiechart.js"></script> <script type="text/javascript" src="js/jquery.prettyPhoto.js"></script> <script type="text/javascript" src="js/jquery.isotope.js"></script> <script type="text/javascript" src="js/jquery.counterup.js"></script> <script type="text/javascript" src="js/waypoints.js"></script> <script type="text/javascript" src="js/jqBootstrapValidation.js"></script> <script type="text/javascript" src="js/contact_me.js"></script> <script type="text/javascript" src="js/main.js"></script> </body> </html>

CINXE.COM

Benjamin Newman