CS 601.471/671: Self-supervised Models

<!DOCTYPE html> <html lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1"> <meta property="og:site_name" content="CSCI 601.771 (Self-supervised Models)"> <meta property="og:type" content="article"> <meta property="og:title" content="CSCI 601.771 (Self-supervised Models)"> <meta property="og:description" content="Discussing latest breakthroughs in self-supervised language models"> <meta name="twitter:card" content="summary_large_image"> <meta name="twitter:title" content="CSCI 601.771: Self-supervised Models"> <meta name="twitter:description" content="Discussing latest breakthroughs in self-supervised language models"> <meta name="twitter:url" content="https://self-supervised.cs.jhu.edu/"> <title>CS 601.471/671: Self-supervised Models </title>  <link rel="stylesheet" href="files/bootstrap.min.css">  <link href="files/fonts.css" rel="stylesheet" type="text/css"> <link rel="stylesheet" type="text/css" href="files/style.css"> <link rel="stylesheet" href="files/font-awesome.min.css">  <link rel="shortcut icon" href="files/favicon.ico"/> </head> <body data-new-gr-c-s-check-loaded="14.1063.0" data-gr-ext-installed="">   <nav class="navbar navbar-default navbar-fixed-top"> <div class="container"> <div class="navbar-header"> <a class="navbar-brand brand" href="index.html">CS 471/671</a> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1" aria-expanded="false"> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> </div> <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1"> <ul class="nav navbar-nav navbar-right"> <li><a href="#schedule">Schedule</a></li> <li><a href="#course">Assignments</a></li> <li><a href="#project">Final Project</a></li> <li><a href="#conduct">Conduct</a></li> </ul> </div> </div> </nav>  <div id="header" style="text-align:center">  <a href="https://www.cs.jhu.edu/"> <img src="files/jhu_shield.png" class="logo-right"> </a>    <h1>CS 601.471/671 NLP: Self-supervised Models</h1> <h3>Johns Hopkins University - Spring 2024</h3> <div style="clear:both;"></div> </div>  <div class="container sec" id="intro"> <p> Large self-supervised (pre-trained) models (such as Large Language Models or LLMs) have transformed various data-driven fields, such as natural language processing (NLP). In this course, students will gain a thorough introduction to self-supervised learning techniques for NLP applications. Through lectures, assignments, and a final project, students will learn the necessary skills to design, implement, and understand their own self-supervised neural network models using the Pytorch framework. </p> <p> <b> Note:</b> The course is different from <a href="https://self-supervised.cs.jhu.edu/fa2022/">601.771</a> (offered in the fall semesters) which is focused on advanced topics in recent papers and is geared toward grad students that want to specialize in the latest developments in self-supervised models. </p> <p> <strong>Prerequisites</strong>: (1) Data Structures (601.226), (2) Background in Natural Language Processing & Machine Learning or having finished one of the relevant courses such as Machine Learning (CS 475/675), Machine Learning: Deep Learning (CS 482/682), Natural Language Processing (CS 465/665), Machine Translation (CS 468/668). (3) All the class assignments will be in Python/PyTorch. If you don’t know Python or PyTorch but have experience in other programming languages (Java, C++, etc.) you can probably pick Python/PyTorch pretty quickly. (4) Calculus and linear algebra: you should be comfortable with matrix operations (matrix multiplication, transpose, inverse, dot product, gradients). (5) Probability: basic probability properties (conditionals, marginals, mean, standard deviation), distributions (normal, categorical, etc.). </p> <p> <strong>Relevant Courses at Hopkins</strong>: This course has some overlap with "Natural Language Processing" (EN.601/665), and "Artificial Agents" (EN.601.470/670), though the courses have different focuses. </p> </div>  <div class="sechighlight"> <div class="container sec" id="people"> <div class="col-md-5" style="width: 100%; text-align: center"> <br>  <div class="instructor"> <a target="_blank" rel="noopener noreferrer" href="http://danielkhashabi.com/"> <div class="instructorphoto"><img src="files/daniel.png" alt="missing image"></div> <div>Daniel Khashabi<br>Instructor</div> </a> <div></div> </div> <div class="instructor"> <a target="_blank" rel="noopener noreferrer" href="https://jefferyo.github.io/"> <div class="instructorphoto"><img src="files/joefu3.png" alt="missing image"></div> <div>Jiefu Ou<br>Teaching Assistant</div> </a> </div> <div class="instructor"> <a target="_blank" rel="noopener noreferrer" href="https://kev-kim.com/"> <div class="instructorphoto"><img src="files/kevin2.png" alt="missing image"></div> <div>Kevin Kim<br>Course Assistant</div> </a> </div> <div class="instructor"> <a target="_blank" rel="noopener noreferrer" href="https://www.linkedin.com/in/sungwon-kim-039300152/"> <div class="instructorphoto"><img src="files/sungwon.png" alt="missing image"></div> <div>Sungwon Kim<br>Course Assistant</div> </a> </div> <div class="instructor"> <a target="_blank" rel="noopener noreferrer" href="https://tianjianl.github.io/"> <div class="instructorphoto"><img src="files/tianjian2.png" alt="missing image"></div> <div>Tianjian Li <br>Course Assistant</div> </a> </div> <div class="instructor"> <a target="_blank" rel="noopener noreferrer" href="https://yining610.github.io/"> <div class="instructorphoto"><img src="files/yining2.png" alt="missing image"></div> <div>Yining Lu <br>Course Assistant</div> </a> </div> <div class="instructor"> <a target="_blank" rel="noopener noreferrer" href="https://camdenshultz.com/"> <div class="instructorphoto"><img src="files/camden2.png" alt="missing image"></div> <div>Camden Shultz<br>Course Assistant</div> </a> </div> </div> </div> </div> <div class="container sec" id="logistics"> <h2>Logistics</h2> <ul> <li><b>Classes:</b> on Tuesday/Thursday 9 - 10:15 am EST (room: Hodson 210, or <a href="https://wse.zoom.us/my/khashabi">zoom meeting</a>) </li> <li><b>Office hours:</b> <ul> <li>Daniel office hour: Thursdays 12 - 1 pm EST (Hackerman hall, 316B),</li> <li>TA office hour: Mondays 3:30 - 4:30 pm (in-person),</li> <li>CAs' office hour: Wednesdays 3-4 pm (in-person).</li> </ul> The TA/CA office hours will be held at Hackerman 3rd Floor Common Area, which is the whiteboard area between Hackerman 323 and 324. </li> <li><b>Contact:</b> If you have any questions about the course, you can post them on Piazza. </li> <li><b>Virtual or in-person</b>: The class will be in-person.    </li> <li><b>Changes:</b> The instructor reserves the right to make changes to the syllabus or project due dates. These changes will be announced as early as possible. </li> <li><b>News and announcements:</b> All the news and announcements will be made on Piazza.</li>             <li><b>COVID:</b> Students who report symptoms associated with COVID-19 are expected not to attend class and to isolate themselves for at least five days and until they have been symptom-free for 24 hours. </li> <li> <b>Course grade:</b> Your grade is based on the following activities: (1) weekly assignments (40%) -- done individually unless otherwise specified, (2) 2x quizzes (30%) -- done individually and in-class, (3) a final project (30%) -- done in groups and same grade for all members of a team. Attendance is not mandatory (hence, 0%) but highly encouraged: participation in class is our chance to learn more effectively. Up to 3% additional credit for any actions taken to improve the course that is brought to the instructors' attention. </li> </ul> <br> </div> <hr> <div class="container sec" id="links"> <h2>Key links</h2> <ul> <li><a href="https://piazza.com/class/lrpgv1zcpkpyk">Piazza</a> for discussion and announcements. Sign up, follow, ask questions, and participate in discussions! </li> <li><a href="https://www.gradescope.com/courses/719120">Gradescope</a> for submitting your assignments.</li> </ul> </div> <hr> <div class="container sec" id="course"> <h2>Assignments</h2> <p> The homework is your opportunity to practice doing the thing. The lectures and office hours hopefully provide good intuition and motivation and justification for the skills we want you to develop, but the best way to develop those skills is by trying to solve the problems yourself. The practice is far more important than the solution. </p> <p> The course has 7 ~weekly assignments which will improve both your theoretical understanding and your practical skills. All assignments contain both written questions and programming parts (mainly in Python). They will be released on this website, and submissions should be uploaded to Gradescope. </p> <p> Here is a tentative list of topics for the assignments: </p> <table class="table"> <colgroup> <col style="width:1%"> <col style="width:65%"> </colgroup> <thead> <tr class="active"> <th>#</th> <th>Focus</th>  </tr> </thead> <tbody> <tr> <td>#1</td> <td>Algebra, calculus, probability, optimization (gradient descent) recap, understanding softmax function, loss functions (cross-entropy, MSE, etc.), a machine learning problem (classification, evaluation), </td> </tr> <tr> <td>#2</td> <td> PyTorch introduction, automatic differentiation, computation graph, basic feedforward network and backpropagation </td> </tr> <tr> <td>#3</td> <td>Neural language model with feedforward network, evaluating language modeling, count-based models, decoding language models </td> </tr> <tr> <td>#4</td> <td>Recurrent neural language model and evaluation; Transformers</td> </tr> <tr> <td>#5</td> <td>Fine-tuning LMs, prompting language models, fine-tuning them, distributed tuning.</td> </tr> <tr> <td>#6</td> <td> Prompt engineering, in-context learning; Retrieval-augmented language models </td> </tr> <tr> <td>#7</td> <td> Alignment with instruction-tuning, alignment with [human] feedback </td> </tr> </tbody> </table> <ul> <li> <b>Submission:</b> Assignments must be submitted via GradeScope. If you're not familiar with Gradescope, check out <a href="https://help.gradescope.com/category/cyk4ij2dwi-student-workflow">its documentation</a>. </li> <li><b>Collaboration:</b> Study groups are allowed, but students must understand and complete their own assignments, and hand in one assignment per student. If you worked in a group, please put the names of the members of your study group at the top of your assignment. Please ask if you have any questions about the collaboration policy. Again, you must understand and complete your own assignments in your own words, and hand in one assignment per student. </li> <li> <b>Using Other Resources:</b> We strongly encourage you to use any outside source at your disposal, provided you use your sources properly and give them proper credit. If you get an idea from an outside source, citing that source will not lower your grade. Failing to properly cite an outside source—thereby taking credit for ideas that are not your own—is plagiarism. </li> <li> <b>Appropriate Citations:</b> You must write everything in your own words, and properly cite every outside source you use, including other students. Using ideas from other sources or people without citation is plagiarism. Copying other sources verbatim, even with proper citation, is plagiarism. Don't do that. The only sources that you are not required to cite are the official course materials (lectures, slides, and assignments). </li>  <li><b>Honor code:</b> We expect students to not look at solutions or implementations online. We take the student Honor Code seriously. We sometimes use automated methods to detect overly similar assignment solutions. </li> <li><b>Late days:</b> Each student has 10 late days to use for assignments. A late day extends the deadline 24 hours.     Once you have used all your late days, the penalty is 5% off the final homework grade for each additional late day. For example, if you're late for 3 days on an assignment (beyond your legitimate "late days" capacity, which is 7 days per homework and 10 days in total), you will lose 15% of the points for that assignment. The deadline cutoffs are at 12pm of each day. There are no fractional late days. If you're late for 1 hour, you lose a full day. <br> You can use up to 7 late days per assignment (so, if you're late on HW1, you can submit it until the release of HW2).    Assignments submitted after 7 late days will not be graded (unless explicit permission is given in advance by the instructor). </li> <li><b>Grading:</b> Homeworks are graded by the entire course staff, directly within Gradescope. To keep grading consistent, each numbered problem is graded by a single grader, under the supervision of one of the TAs, using a detailed rubric developed within Gradescope. Under normal circumstances, all homework should be graded within 10 calendar days of submission. </li> <li><b>Regrading:</b> Regrade requests can be submitted directly within Gradescope and must include a brief written justification for the request. We encourage students who have questions or concerns about their grades to talk with the course staff before submitting a regrade request. However, no grades will be changed in any student's presence. </li> </ul> </div> <hr> <div class="container sec"> <h2>Midterm exams/quizzes</h2> <p> There will be in-class midterms. The midterm exams will be paper-based and during the usual class time. These midterm exams aims to evaluate students' progress and understanding of ideas presented in the first two-third of the semester, which will serve as a foundation for your project and the material covered in the final weeks of the class. The exams will assess students' mastery of the topics discussed in the lectures and weekly homework assignments. The exams will also provide feedback to both the student and the instructor, and identify areas that need improvement to inform further learning and teaching.    </p> </div> <hr> <div class="container sec"> <h2 id="project">Final project</h2> <p> The objective of the final project is to make use of what you have learned during this course to solve a hard problem. </p> <p> The final project milestones include: (1) A project proposal, (2) A project midway report, (3) progress update presentation, (4) a final report, (5) a final project poster summarizing the technical aspects of the project. See the course calendar for the due dates. </p> <ul> <li><b>Topic:</b> The topic of this project is open-ended. This project, for example, can focus on demonstrating systemic limitations of prior work or suggesting improvements on methods or benchmarks discussed in the class.   <li><b>Group work:</b> Students are encouraged to work in groups on the final project (team sizes limited to 2 or 3 people). </li> <li><b>Project proposals:</b> All groups will be required to submit a project proposal (due on the class calendar). The project proposal is a 2-page description of what you intend to do (experiments, datasets, methods, etc.) All documents should follow <a href="https://www.overleaf.com/latex/templates/neurips-2022/kxymzbjpwsqx">template</a>. The instructor(s) will provide feedback on these ideas to help the teams with finding a concrete idea. Here is examples of project proposals from previous years: <ul> <li><a href="files/proposal-2_seek-extracted_171762894.pdf">Ensemble Domain-Specific Knowledge Distillation </a></li> </ul> </li> <li><b>Midway progress reports:</b> Reports discussing the progress made thus far (at most 5 pages; this <a href="https://www.overleaf.com/latex/templates/neurips-2022/kxymzbjpwsqx">template</a>.) and elaborates on the remaining work. Describe the progress made, experiments you have run, preliminary results you have obtained, how you plan to spend the rest of your time, etc. While this is called "midway" in practice it should be considered more than halfway! By this milestone, you’re expected to have implemented some system, and to have some experimental results to show by this date. </li> <li><b>Final poster presentations:</b>   All students will present their findings at a poster presentation during the final exam period. </li> <li><b>Final report:</b> Students should write code and carry out additional experiments and then write up the results in a standard conference paper format (at most 8 pages; use this <a href="https://www.overleaf.com/latex/templates/neurips-2022/kxymzbjpwsqx">template</a>). References don't count toward the page limit. Note that longer reports are not necessarily better. Students in groups are required to include a “contributions” section concretely lists each author’s contributions (see Section 8 of this paper, for example). The final report should concisely summarize your findings and answer the following questions: 1. What approach did you take to address this problem, and why? 2. How did you explore the space of solutions? 3. How did you evaluate the performance of the approach(es) you investigated? 4. What worked, what did not work, and why? <br> Here are examples of final reports from the previous year: <ul> <li><a href="files/final-seek.pdf">Ensemble Domain-Specific Knowledge Distillation </a></li> <li><a href="files/final-efficient_distillation_671_May_14.pdf">Efficient Distillation of Transformers via Self-Teaching</a></li> </ul> </li> <li><b>Project grading:</b> The goal of the project is to demonstrate the group’s understanding of the tools and challenges when using self-supervised models. Grading will reflect the quality of the approach, the rigor of evaluation, and reasoning about successes and failures. Grading will also depend on the completeness of the project, the clarity of the writeup, the level of complexity/difficulty of the approach, and your ability to justify the choices you made. Here is the grade breakdown for the projects:  <ul> <li>Project proposal: 15%</li> <li>Midway report: 15%</li> <li>Progress update presentation: 10%</li> <li>Quality of final report write-up, implementation and results: 40%</li> <ul> <li>Compelling introduction/motivation and clear problem statement and desired outcome: 5%</li> <li>Clear description of methods and evaluation protocol: 5%</li> <li>Clear and complete coverage of related work: 5%</li> <li>The rigor of evaluation and reasoning / discussion: 5%</li> <li>Clear articulation of the results (includes figures, tables): 10%</li> <li>Innovativeness: 5%</li> <li>Discussion and conclusion comprised of well-formulated arguments, grounded in the experimental results and the broader scientific literature: 5% </li> </ul> <li>Final poster and its presentation: 20%</li> </ul> </li> </ul> </div> <hr> <div class="container sec" id="schedule" style="margin-top:-20px"> <br> <h2>Content Schedule</h2> <p> Each session will involve an instructor-led presentation on a focused topic self-supervised models. There will be weekly assignments related to class presentations, a midterm exam, and a final project. </p> <p> The current class schedule is below (subject to change): </p> <table class="table"> <colgroup> <col style="width:12%"> <col style="width:25%"> <col style="width:55%"> <col style="width:10%"> <col style="width:10%"> </colgroup> <thead> <tr class="active"> <th>Date</th> <th>Topic</th> <th>Course Materials</th> <th>Events</th> <th>Deadlines</th> </tr> </thead> <tbody> <tr> <td>#1 - Tue Jan 23</td> <td> Course introduction: <ul> <li>Course overview</li> <li>Plan and expectations</li> </ul> [slides: <a href="files/slides/01.intro.pptx">pptx</a>, <a href="files/slides/01.intro.pdf">pdf</a>] </td> <td> Suggested reading: <a href="http://preview.d2l.ai/d2l-en/master/chapter_preliminaries/linear-algebra.html">Dive into Deep Learning: Linear Algebra in PyTorch</a>< <br> Additional Reading: <ol> <li><a href="https://cs231n.github.io/python-numpy-tutorial/">Python / Numpy Tutorial (with Jupyter and Colab)</a></li> <li><a href="https://cs231n.github.io/optimization-1/">Optimization: Stochastic Gradient Descent </a></li> </ol> </td> <td class="sechighlight5"> HW1 is released!  </td> </tr> <tr> <td>#2 - Thu Jan 25</td> <td> Language modeling: <ul> <li> Definitions and history, </li> <li> Counting and n-grams, </li> <li> Measuring LM quality, </li> <li> Language modeling as a learning problem </li> </ul> [slides: <a href="files/slides/02.neural-language-modeling.pptx">pptx</a>, <a href="files/slides/02.neural-language-modeling.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://web.stanford.edu/~jurafsky/slp3/3.pdf">Jurafsky & Martin Chapter 3</a> <br> Additional Reading: <ol> <li><a href="https://www.princeton.edu/~wbialek/rome/refs/shannon_51.pdf">Prediction and Entropy of Printed English </a> (the foundational paper by Shannon on language compression and uncertainty) </li> <li><a href="https://books.google.com/ngrams/">Google N-grams </a> (very insightful trends over time) </li> </ol> </td> <td></td> <td></td> </tr> <tr> <td>#3 - Tue Jan 30</td> <td> Feedforward networks: <ul> <li>Definitions</li> <li>Brief history</li> <li>Background (algebra + optimization)</li> <li>Analytical Backprop</li> </ul> [slides: <a href="files/slides/03-4-5.feedforward-nets.pptx">pptx</a>, <a href="files/slides/03-4-5.feedforward-nets.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://web.stanford.edu/~jurafsky/slp3/7.pdf">Jurafsky & Martin Chapter 7</a> <br> Additional Reading: <ol> <li><a href="https://cs231n.github.io/neural-networks-1/">Neural Networks: the Architecture </a></li> <li><a href="http://preview.d2l.ai/d2l-en/master/chapter_multilayer-perceptrons/index.html">Dive into Deep Learning: Multilayer Perceptron </a></li> <li><a href="https://pytorch.org/docs/stable/index.html">PyTorch documentation</a></li> <li><a href="https://pytorch.org/tutorials/">These tutorials</a> do a good job of introducing PyTorch. </li> </ol> </td> <td class="sechighlight5">HW2 is released!  </td> <td class="sechighlight5">HW1 due</td> </tr> <tr> <td>#4 - Thu Feb 1</td> <td> Feedforward networks: <ul> <li>Algebra recap</li> <li>Analytical backprop</li> <li>Backprop in practice</li> </ul> [slides: <a href="files/slides/03-4-5.feedforward-nets.pptx">pptx</a>, <a href="files/slides/03-4-5.feedforward-nets.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://web.stanford.edu/~jurafsky/slp3/7.pdf">Jurafsky & Martin Chapter 7</a> <br> Additional Reading: <ol> <li><a href="https://cs231n.github.io/optimization-2/">Neural Networks: Backpropagation</a></li> <li><a href="https://cs231n.github.io/neural-networks-3/">Neural Networks: Training and empirical tips </a></li> <li><a href="https://cs231n.github.io/neural-networks-2/">Neural Networks: data and loss </a></li> <li><a href="https://web.stanford.edu/class/cs224n/readings/gradient-notes.pdf">Computing Neural Network Gradients</a></li> <li><a href="http://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf">Learning representations by back-propagating errors</a> (the original backpropagation paper) </li> </ol> </td> <td> </td> <td></td> </tr> <tr> <td>#5 - Tue Feb 6</td> <td> Feedforward networks: <ul> <li>Backprop in practice</li> <li>Practical tips</li> </ul> [slides: <a href="files/slides/03-4-5.feedforward-nets.pptx">pptx</a>, <a href="files/slides/03-4-5.feedforward-nets.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://github.com/google-research/tuning_playbook">Deep Learning Tuning Playbook</a> <br> Additional Reading: <ol> <li><a href="http://preview.d2l.ai/d2l-en/master/chapter_builders-guide/index.html">Dive into Deep Learning: Practitioners Guide to Neural Networks </a></li> <li><a href="http://www.iro.umontreal.ca/~lisa/pointeurs/ieeetrnn94.pdf">Learning long-term dependencies with gradient descent is difficult</a> (one of the original vanishing gradient papers) </li> </ol> </td> <td class="sechighlight5">HW3 is released!  </td> <td class="sechighlight5">HW2 due</td> </tr> <tr> <td>#6 - Thu Feb 8</td> <td> Feeding text to neural networks: <ul> <li>Tokenization and subwords</li> <li>Fixed-window MLP LMs</li> </ul> [slides: <a href="files/slides/05.mlp-language-modeling.pptx">pptx</a>, <a href="files/slides/05.mlp-language-modeling.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/2104.03474">Revisiting Simple Neural Probabilistic Language Models </a> <br> Additional Reading: <ol> <li><a href="https://huggingface.co/course/chapter2/4?fw=pt">Huggingface tutorials on Tokenization</a></li> </ol> </td> <td> </td> <td></td> </tr> <tr class="sechighlight5"> <td>#7 - Tue Feb 13</td> <td>Quiz 1</td> <td> Topics: everything discussed in class until the beginning of class #6 </td> <td class="sechighlight5">HW4 is released!  </td> <td class="sechighlight5">HW3 due</td> </tr> <tr> <td>#8 - Thu Feb 15</td> <td> Recurrent Neural LMs: <ul> <li>Introducing RNNs</li> <li>Training RNNs</li> <li>RNNs for natural language and language modeling</li> <li>RNNs: Pros and Cons</li> <li>Sampling from LMs</li> <li>Pre-training RNNs</li> </ul> [slides: <a href="files/slides/06-8.recurrent-nets.pptx">pptx</a>, <a href="files/slides/06-8.recurrent-nets.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/1904.09751">The Curious Case of Neural Text Degeneration</a> <br> Additional Reading: <ol> <li><a href="http://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes05-LM_RNN.pdf">CS224N course notes on RNNs</a></li> <li><a href="http://preview.d2l.ai/d2l-en/master/chapter_recurrent-neural-networks/index.html">Dive into Deep Learning: Recurrent Neural Networks </a></li> <li><a href="http://karpathy.github.io/2015/05/21/rnn-effectiveness/">The Unreasonable Effectiveness of Recurrent Neural Networks</a> (blog post overview) </li> <li><a href="https://arxiv.org/abs/1802.05365">Deep contextualized word representations</a> (ELMo paper)</li> </ol>      </td> <td></td> <td></td> </tr> <tr> <td>#9 - Tue Feb 20</td> <td> Recurrent Neural LMs: <ul>     <li>Sampling from LMs</li> <li>Bonus: Pre-training RNNs</li> </ul> [slides: <a href="files/slides/06-8.recurrent-nets.pptx">pptx</a>, <a href="files/slides/06-8.recurrent-nets.pdf">pdf</a>] <hr> Transformer LMs: <ul> <li>Self-attention</li> <li>Transformer LMs</li> <li>Positional embeddings</li> </ul> [slides: <a href="files/slides/09-10.transformers.pptx">pptx</a>, <a href="files/slides/09-10.transformers.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/1706.03762.pdf">Attention Is All You Need</a> <br> Additional Reading: <ol> <li> <a href="http://preview.d2l.ai/d2l-en/master/chapter_attention-mechanisms-and-transformers/index.html">Dive into Deep Learning: Attention Mechanism</a></li> <li><a href="https://jalammar.github.io/illustrated-transformer/">The Illustrated Transformer</a></li> <li><a href="http://nlp.seas.harvard.edu/annotated-transformer/">The Annotated Transformer </a></li> </ol> </td> <td class="sechighlight5">HW5 is released!  </td> <td class="sechighlight5">HW4 due</td> </tr> <tr> <td>#10 - Thu Feb 22</td> <td> Transformer LMs: <ul> <li>Efficiency considerations</li> <li>Architectural variants</li> <li>Notable models</li>  </ul> [slides: <a href="files/slides/09-10.transformers.pptx">pptx</a>, <a href="files/slides/09-10.transformers.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/2307.09288">LLAMA 2: Open Foundation and Fine-Tuned Chat Models</a> <br> Additional Reading: <ol>   <li><a href="https://jalammar.github.io/illustrated-bert/">The Illustrated BERT, ELMo, and co</a> </li> <li><a href="https://jalammar.github.io/illustrated-gpt2/">The Illustrated GPT-2</a></li>     </ol> </td> <td></td> <td></td> </tr>      <tr> <td>#11 - Tue Feb 27</td> <td> Transformer LMs: <ul> <li>Notable models</li> <li>Training tips</li> </ul> [slides: <a href="files/slides/09-10.transformers.pptx">pptx</a>, <a href="files/slides/09-10.transformers.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/2005.14165">Language Models are Few-Shot Learners</a> (GPT3 paper) <br> </td> <td></td> <td></td> </tr> <tr> <td>#12 - Thu Feb 29</td> <td> Adapting LMs: <ul> <li>Adaption as fine-tuning</li> <li>Parameter-efficient tuning</li> </ul> [slides: <a href="files/slides/11-12.adaptation.pptx">pptx</a>, <a href="files/slides/11-12.adaptation.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/2104.08691">The Power of Scale for Parameter-Efficient Prompt Tuning</a> <br> Additional Reading: <ol> <li><a href="https://arxiv.org/abs/2101.00190">Prefix-Tuning: Optimizing Continuous Prompts for Generation</a></li> <li><a href="https://arxiv.org/abs/2112.08348">Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts</a></li> </ol> </td> <td class="sechighlight5">HW6 is released!  </td> <td class="sechighlight5">HW5 due</td> </tr> <tr> <td>#13 - Tue Mar 5</td> <td> Adapting LMs: <ul> <li>Adaption as in-context learning</li> <li>ICL: Making sense of it</li> <li>Prompt engineering</li> <li>Multi-step prompting</li> <li>Failures of ICL</li> </ul> [slides: <a href="files/slides/11-12.adaptation.pptx">pptx</a>, <a href="files/slides/11-12.adaptation.pdf">pdf</a>] <br> </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/2206.04615">Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models</a> <br> Additional Reading: <ol> </ol> </td> <td></td> <td></td> </tr> <tr> <td>#14 - Thu Mar 7</td> <td> Alignment of LMs: <br> <ul> <li>Alignment: definitions</li> <li>Instruction-tuning</li> </ul> [slides: <a href="files/slides/13.alignment.pptx">pptx</a>, <a href="files/slides/13.alignment.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/2210.11416">Scaling Instruction-Finetuned Language Models</a> (FLAN paper) <br> Additional Reading: <ol> <li><a href="https://arxiv.org/abs/2204.07705">Generalization via Declarative Instructions on 1600+ NLP Tasks</a></li> </ol> </td> <td> </td> <td class="sechighlight5">HW6 due</td> </tr> <tr> <td>#15 - Tue Mar 12</td> <td> Introducing final projects: <ul> <li>Defining final projects</li> <li>Tips for successful project</li> </ul> [slides: <a href="files/slides/15.final_project_ideas.pptx">pptx</a>, <a href="files/slides/15.final_project_ideas.pdf">pdf</a>] <hr> Alignment of LMs: <br> <ul> <li>RLHF and variants</li> </ul> [slides: <a href="files/slides/13.alignment.pptx">pptx</a>, <a href="files/slides/13.alignment.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/pdf/2203.02155.pdf">Training language models to follow instructions with human feedback</a> (GPT3 + RLHF paper) <br> Additional Reading: <ol> <li><a href="https://huggingface.co/blog/rlhf">Illustrating Reinforcement Learning from Human Feedback </a></li> <li><a href="https://arxiv.org/abs/2009.01325">Learning to summarize from human feedback</a></li> <li><a href="https://arxiv.org/abs/1706.03741">Deep reinforcement learning from human preferences</a> (an early RLHF paper) </li> </ol> </td> <td></td> <td></td> </tr> <tr class="sechighlight5"> <td>#16 - Thu Mar 14</td> <td> Quiz 2 </td> <td> Topics: everything discussed in class until the beginning of class #15 </td> <td></td> <td></td> </tr>                       <tr class="sechighlight4 centered"> <td>#17 - Tue Mar 19</td> <td>No Class - Spring Break</td> <td></td> <td></td> <td></td> </tr> <tr class="sechighlight4 centered"> <td>#18 - Thu Mar 21</td> <td>No Class - Spring Break</td> <td></td> <td></td> <td></td> </tr> <tr> <td>#19 - Tue Mar 26</td> <td> Alignment of LMs: <br> <ul> <li>Alignment: failures/open questions</li> <li>Simplifying RLHF</li> <li>Alignment with self-generated instructions</li> <li>Value alignment</li>  </ul> [slides: <a href="files/slides/13.alignment.pptx">pptx</a>, <a href="files/slides/13.alignment.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/2305.18290">Direct Preference Optimization: Your Language Model is Secretly a Reward Model</a> <br> Additional Reading: <ol> <li><a href="https://arxiv.org/pdf/1606.06565.pdf">Concrete Problems in AI Safety</a></li> <li><a href="https://arxiv.org/pdf/2210.10760.pdf">Scaling Laws for Reward Model Overoptimization</a></li> <li><a href="https://arxiv.org/abs/2212.10560">Self-Instruct: Aligning Language Models with Self-Generated Instructions</a></li> </ol> </td> <td class="sechighlight5"> HW7 released!  </td> <td></td> </tr> <tr> <td>#20 - Thu Mar 28</td> <td> Feeding lots of things to LMs <br> <ul> <li>Delving into positional encoding</li> <li>Length generalization</li> </ul> [slides: <a href="files/slides/20.feedings-lots-of-things.pptx">pptx</a>, <a href="files/slides/20.feedings-lots-of-things.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/2203.16634">Transformer Language Models without Positional Encodings Still Learn Positional Information</a> <br> Additional Reading: <ol> <li><a href="https://arxiv.org/abs/2108.12409">Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation</a></li> <li><a href="https://arxiv.org/abs/2305.19466">The Impact of Positional Encoding on Length Generalization in Transformers</a></li> </ol> </td> <td> </td> <td></td> </tr> <tr class="sechighlight5"> <td> Apr 1 </td> <td>Project proposals deadline</td> <td></td> <td></td> <td></td> </tr> <tr> <td>#21 - Tue Apr 2</td> <td> Feeding lots of things to LMs <br> <ul> <li>Retrieval-augmentation</li> </ul> [slides: <a href="files/slides/20.feedings-lots-of-things.pptx">pptx</a>, <a href="files/slides/20.feedings-lots-of-things.pdf">pdf</a>] <hr> Connecting language to outside world: <ul> <li>Connecting vision and language </li> </ul> [slides: <a href="files/slides/14-15.seeing-acting-models_1.pptx">pptx</a>, <a href="files/slides/14-15.seeing-acting-models.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/2112.04426v3">Improving language models by retrieving from trillions of tokens</a> <br> Additional Reading: <ol> <li><a href="https://arxiv.org/abs/2002.08909">REALM: Retrieval-Augmented Language Model Pre-Training</a></li> <li><a href="https://arxiv.org/abs/2210.16773">An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks</a></li> <li><a href="https://arxiv.org/abs/2212.10511">When Not to Trust Language Models: Investigating Effectiveness and Limitations of Parametric and Non-Parametric Memories.</a></li> <li><a href="https://arxiv.org/abs/2001.08361">Scaling Laws for Neural Language Models</a></li> </ol> </td> <td></td> <td></td> </tr> <tr> <td>#22 - Thu Apr 4</td> <td> Connecting language to outside world: <ul> <li>Connecting vision and language </li> <li>Generative vision-language</li> </ul> [slides: <a href="files/slides/14-15.seeing-acting-models_1.pptx">pptx</a>, <a href="files/slides/14-15.seeing-acting-models.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/pdf/2205.11487.pdf">Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding</a> <br> Additional Reading: <ul> <li><a href="https://arxiv.org/abs/2204.06125">Hierarchical Text-Conditional Image Generation with CLIP Latents</a></li> <li><a href="https://arxiv.org/abs/2104.14294">Emerging Properties in Self-Supervised Vision Transformers</a></li> </ul> </td> <td></td> <td class="sechighlight5"> HW7 due </td> </tr> <tr> <td>#23 - Tue Apr 9</td> <td> Connecting language to outside world: <ul> <li>Transformers for Audio/speech</li> <li>LMs for coding</li> <li>LMs and grounded actions</li> <li>Open questions</li> </ul> [slides: <a href="files/slides/14-15.seeing-acting-models_2.pptx">pptx</a>, <a href="files/slides/14-15.seeing-acting-models.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/abs/2107.03374"> Evaluating Large Language Models Trained on Code</a> <br> Additional Reading: <ol> <li><a href="https://arxiv.org/abs/2204.01691"> Do As I Can, Not As I Say: Grounding Language in Robotic Affordances </a></li> <li><a href="https://arxiv.org/abs/2303.03378"> PaLM-E: An Embodied Multimodal Language Model</a></li> </ol> </td> <td></td> <td> </td> </tr> <tr> <td>#24 - Thu Apr 11</td> <td> Efficiency considerations: <ul> <li>Quantization</li> <li>Distillation</li> <li>Distributed training</li> </ul> [slides: <a href="https://docs.google.com/presentation/d/1LobbCqyFLJYEeghFB9_kECqX-ZK9Y5po8Dd267XQiz8/edit?usp=sharing">pptx</a>, <a href="files/slides/22.model-efficiency.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/pdf/2208.07339.pdf"> LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale</a> <br> Additional Reading: <ol> <li><a href="https://arxiv.org/pdf/1906.04721.pdf">Data-Free Quantization Through Weight Equalization and Bias Correction</a></li> <li><a href="https://arxiv.org/pdf/1712.05877.pdf"> Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference</a></li> <li><a href="https://arxiv.org/pdf/1910.02054.pdf">ZeRO: Memory Optimizations Toward Training Trillion Parameter Models</a></li> <li><a href="https://arxiv.org/pdf/1811.06965.pdf">GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism</a></li> </ol> </td> <td> </td> <td></td> </tr> <tr> <td>#25 - Tue Apr 16</td> <td> Scaling LMs: <ul> <li>Thinking about computation cost</li> <li>Optimal scaling</li>    </ul> [slides: <a href="files/slides/21.scaling-lms.pptx">pptx</a>, <a href="files/slides/21.scaling-lms.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://arxiv.org/pdf/2203.15556.pdf"> Training Compute-Optimal Large Language Models</a> <br> Additional Reading: <ol> <li><a href="https://medium.com/@dzmitrybahdanau/the-flops-calculus-of-language-model-training-3b19c1f025e4">The FLOPs Calculus of Language Model Training</a></li> <li><a href="https://arxiv.org/abs/2001.08361"> Scaling Laws for Neural Language Models</a></li> </ol> </td> <td></td> <td></td> </tr> <tr> <td>#26 - Thu Apr 18</td> <td> Scaling LMs: <ul>   <li>Why we didn't scale earlier?</li> <li>When scale does not help</li> <li>Is scale all you need?</li> </ul> [slides: <a href="files/slides/21.scaling-lms.pptx">pptx</a>, <a href="files/slides/21.scaling-lms.pdf">pdf</a>] <hr> Social concerns about LMs: <ul> <li>Bias, fairness and toxic language</li> </ul> [slides: <a href="files/slides/23.societal-harms.pptx">pptx</a> <a href="files/slides/23.societal-harms.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://papers.nips.cc/paper/2021/file/1531beb762df4029513ebf9295e0d34f-Paper.pdf"> Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models</a> <br> Additional Reading: <ol> <li><a href="https://arxiv.org/abs/2010.02428">UnQovering Stereotyping Biases via Underspecified Questions</a></li> <li><a href="https://arxiv.org/abs/2206.09860">Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias</a></li> </ol> </td> <td></td> <td></td> </tr> <tr> <td>#27 - Tue Apr 23</td> <td> Social concerns about LMs: <ul> <li>Hallucination</li> <li>Truthfulness and veracity</li> <li>Legal considerations and fair use</li> <li>Reflections about future, dangers and misuses</li> </ul> [slides: <a href="files/slides/23.societal-harms.pptx">pptx</a> <a href="files/slides/23.societal-harms.pdf">pdf</a>] </td> <td> Suggested Reading: <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4523551"> Talkin’ ‘Bout AI Generation: Copyright and the Generative-AI Supply Chain </a> <br> Additional Reading: <ol> <li><a href="https://arxiv.org/abs/2202.03286">Red Teaming Language Models with Language Models</a> </li> <li><a href="https://arxiv.org/abs/2109.07958">TruthfulQA: Measuring How Models Mimic Human Falsehoods</a></li> <li><a href="https://arxiv.org/abs/2104.08758">Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus</a></li> <li><a href="https://arxiv.org/abs/2303.15715">Foundation Models and Fair Use</a></li> <li><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4523551">Copyright and the Generative-AI Supply Chain</a></li> </ol> </td> <td> </td> <td></td> </tr> <tr class="sechighlight5"> <td> Apr 24 </td> <td>Midway reports deadline</td> <td></td> <td></td> <td></td> </tr> <tr> <td>#28 - Thu Apr 25</td> <td> Project progress presentation   </td> <td>          </td> <td></td> <td> </td> </tr> <tr class="sechighlight4 centered"> <td>#29 - Tue Apr 30</td> <td>No Class - Reading Days</td> <td></td> <td></td> <td></td> </tr> <tr class="sechighlight4 centered"> <td>#30 - Thu May 2</td> <td>No Class - Reading Days</td> <td></td> <td></td> <td></td> </tr> <tr class="sechighlight5"> <td> May 13  </td> <td>Final project reports</td> <td>  </td> <td></td> <td></td> </tr> <tr class="sechighlight5"> <td>May 13</td> <td colspan="2">Final project poster session (6-9pm)</td> <td></td> <td></td> </tr> </tbody> </table> </div> <hr> <div class="container"> <h2>Reference text</h2> <p> There is no required text. Though the following can be useful: </p> <ul> <li><a href="https://www.amazon.com/Introduction-Language-Processing-Adaptive-Computation/dp/0262042843">Introduction to Natural Language Processing, Eisenstein</a> </li> <li><a href="https://web.stanford.edu/~jurafsky/slp3/">Speech and Language Processing, Jurafsky and Martin</a> </li> <li><a href="https://catalyst.library.jhu.edu/catalog/bib_9689868">Natural language processing with PyTorch, Rao and McMahan</a></li> <li><a href="https://catalyst.library.jhu.edu/catalog/bib_9697352">Transformers for Natural Language Processing, Rothman </a></li> <li><a href="https://catalyst.library.jhu.edu/catalog/bib_9822241">Neural Network Methods for Natural Language Processing, Goldberg </a></li> <li><a href="https://d2l.ai/">Dive into Deep Learning, Zhang et al. </a></li> </ul> </div> <hr> <div class="container sec" id="resources"> <h2>Relevant Resources</h2> <p>Here are several resources available for free:</p> <ul> <li>Compute resources: <ul>   <li><a href="https://colab.research.google.com/">Google Colab</a> provides free GPU usage for up to 12 hours/day for academic purposes. One can obtain <a href="https://medium.com/@yufengg/how-to-upgrade-colab-with-more-compute-64d53a9b05dc"> more compute on Colab</a> with relatively minimal pay. </li> <li>Google offers <a href="https://sites.research.google/trc/about/">research TPU credits</a>.</li>  <li><a href="https://aws.amazon.com/education/awseducate/">AWS</a> and <a href="https://azure.microsoft.com/en-us/free/students/">Azure</a> both offer welcome credits to students. </li> <li>If you need credits to use GPT3/GPT4 or other APIs, discuss it with the instructor.</li> </ul> </li> <li>Demos: <ul> <li><a href="https://6b.eleuther.ai">GPT-J demo</a></li> <li><a href="https://huggingface.co/bigscience/bloom">BLOOM demo</a></li> <li><a href="https://huggingface.co/spaces/dalle-mini/dalle-mini">DALL-E mini demo</a></li> <li><a href="https://c4-search.apps.allenai.org">A queryable interface to C4</a></li> <li><a href="https://vision-explorer.allenai.org">AllenAI vision demo</a></li> <li><a href="https://demo.allennlp.org">AllenNLP demo</a></li> <li><a href="https://beta.dreamstudio.ai/">DreamStudio image generation demo</a></li> <li><a href="https://google-research.github.io/seanet/audiolm/examples/">Examples from AudioLM</a></li> </ul> </li> <li>Tutorials:</li> <ul> <li>A <a href="https://huggingface.co/course/chapter1/1">course</a> on Huggingface's Transformers library. </li>     <li>Tutorials on <a href="https://www.pytorchlightning.ai/tutorials">Learn with Torch Lightning</a></li> </ul> </ul> <p> Besides these resources, we will try our best to satisfy individual needs through discussion. </p> </div> <hr> <div class="container sec" id="conduct"> <h2>Code of Conduct</h2> <p> The strength of the university depends on academic and personal integrity. In this course, you must be honest and truthful, abiding by the Computer Science Academic Integrity Policy: </p> <div class="container sec" id="cs-conduct"> <i> <p> Cheating is wrong. Cheating hurts our community by undermining academic integrity, creating mistrust, and fostering unfair competition. The university will punish cheaters with failure on an assignment, failure in a course, permanent transcript notation, suspension, and/or expulsion. Offenses may be reported to medical, law or other professional or graduate schools when a cheater applies. Violations can include cheating on exams, plagiarism, reuse of assignments without permission, improper use of the Internet and electronic devices, unauthorized collaboration, alteration of graded assignments, forgery and falsification, lying, facilitating academic dishonesty, and unfair competition. Ignorance of these rules is not an excuse. </p> <p> Academic honesty is required in all work you submit to be graded. Except where the instructor specifies group work, you must solve all homework and programming assignments without the help of others. For example, you must not look at anyone else’s solutions (including program code) to your homework problems. However, you may discuss assignment specifications (not solutions) with others to be sure you understand what is required by the assignment. If your instructor permits using fragments of source code from outside sources, such as your textbook or on-line resources, you must properly cite the source. Not citing it constitutes plagiarism. Similarly, your group projects must list everyone who participated. </p> <p> In the above paragraph "outside sources" also include content that was produced by an AI assistant like ChatGPT. This follows either by treating the AI assistant as a person for the purposes of this policy (controversial) or acknowledging that the AI assistant was trained directly on people's original work. Thus, while you are not forbidden from using these tools, you should consider the above policy carefully and quote where appropriate. Assignments that are in large part quoted from an AI assistant are very unlikely to be evaluated positively. In addition, if a student's work is substantially identical to another student's work, that will be grounds for an investigation of plagiarism regardless of whether the prose was produced by an AI assistant. </p> <p> Falsifying program output or results is prohibited. Your instructor is free to override parts of this policy for particular assignments. To protect yourself: (1) Ask the instructor if you are not sure what is permissible. (2) Seek help from the instructor, TA or CAs, as you are always encouraged to do, rather than from other students. (3) Cite any questionable sources of help you may have received. </p> </i> </div> <p>     Report any violations you witness to the instructor. You can find more information about university misconduct policies on the web for <a href="https://studentaffairs.jhu.edu/policies-guidelines/undergradethics/">undergraduates</a> and <a href="http://e-catalog.jhu.edu/grad-students/graduate-specificpolicies/">graduates</a> students. </p>        <p> Johns Hopkins University is committed to equal opportunity for its faculty, staff, and students. To that end, the university does not discriminate on the basis of sex, gender, marital status, pregnancy, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, military status, immigration status or other legally protected characteristic. The University's <a href="https://oie.jhu.edu/policies-and-laws/JHU-Discrimination-and-Harassment-Policy-and-Procedures-7.1.21-Present">Discrimination and Harassment Policy and Procedures</a> provides information on how to report or file a complaint of discrimination or harassment based on any of the protected statuses listed in the earlier sentence, and the University’s prompt and equitable response to such complaints. </p> </div> <br><br><br><br>  <script src="files/jquery.min.js"></script> <script src="files/bootstrap.min.js"></script> </body> </html>

CINXE.COM

CS 601.471/671: Self-supervised Models