Table of Contents
- Course Information
- Course Description
- Course Schedule
- Final Project Instruction
- Final Project Signup Form
- A Simple Instruction of Using Rivanna
1. Course Information
- Instructor: Yangfeng Ji
- Semester: Fall 2023
- Location: Olsson Hall 009
- Time: TuTh 11 AM - 12:15 PM
- Office Hours:
- Yangfeng Ji: Wednesday 11 AM (Rice 510)
- Dane Williamson: Tuesday 3 PM (Rice 414)
- Xu Ouyang: Thursday 3 PM (Rice 414)
1.1 Additional Information
- Piazza for online discussion. By the time of our first class, students registered for this course should all receive an invitation from Piazza. Please let the instructor know if you haven’t gotten one.
- Homework submission template for homework assignments
Class sessions for this course will be recorded. Recordings will be available only to the instructor and students enrolled in the class. Recordings will be deleted when no longer necessary. Recordings may not be reproduced, shared with those not enrolled in the class, or uploaded to other online environments.
2. Course Description
Natural language processing (NLP) seeks to provide computers with the ability to process and understand human language intelligently. Examples of NLP techniques include (i) automatically translating from one natural language to another, (ii) analyzing documents to answer related questions or make related predictions, and (iii) generating texts to help story writing or build conversational agents. This course, consisting of one fundamental part and one advanced part, will give an overview of modern NLP techniques.
This course will mainly focus on applying machine learning (particularly, deep learning) techniques to natural language processing. NLP topics covered by this course
- Text classification and its applications
- Word embeddings
- Language modeling
- Sequence-to-sequence models and machine translation
- Large pre-trained models and text generation
Other advanced topics, such as explainable NLP and NLP in social science.
For detail information, please refer to the course schedule.
- Proficiency in Python
This course requires some programming in both homeworks and the final project. The preference of programming language for this course is Python (with some additional packages like Scipy, Sklearn, and PyTorch).
- Calculus and Linear Algebra
Multivariable derivatives, matrix/vector notations and operations; singular value decomposition, etc.
- Probability and Statistics
Mean and variance, multinomial distribution, conditional dependence, maximum likelihood estimation, Bayes theorem, etc.
- Foundations of Machine Learning
Logistic regression, cross validation, optimization with gradient descent, bias and variance decomposition, etc.
- [JE] Eisenstein, Natural Language Processing, 2018
- Jurafsky and Martin, Speech and Language Processing, 3rd Edition, 2019
- Shalev-Shwartz and Ben-David, Understanding Machine Learning: From Theory to Algorithms, 2014
- Goodfellow, Bengio and Courville, Deep Learning, 2016
3. Assignments and Final Project
- Homework (75%):
- There will be five homeworks, one for each main topics covered in this course
- Each homework assignment is worth 15%.
- Project (25%):
There is only one course project and the credit breaks down to four parts. Students should team up for this project, each group should have 2 - 3 students.
- Project proposal: 5%
- Mid-term report: 5%
- Final project presentation: 8%
- Final project report: 7%
- In both homework and the final project, other than using the machine learning libraries including Sklearn, PyTorch, Tensorflow, students need to implemented the rest of the proposed model by themselves. Copying code from any resources (e.g., Github, Bitbucket, and Gitlab) is prohibited and will be considered as plagiarism.
3.1 Collaboration policy
For homework assignments
- Students should be fully responsible for the answers in their own submissions.
- Students are allowed to discuss homework with their classmates. If you discuss with your classmates, please disclose their names in your submission. Directly copying answers from others is definitely considered as plagiarism.
- It is not prohibited to use generative AI tools (e.g., ChatGPT and Bard) for reference, but an acknowledgment is mandatory if students do use these tools.
- Instructors reserve the possibility of requesting further clarification during grading (via Emails or Canvas discussion), and the grading will reflect students’ understanding of their own answers.
For the final project, replace the word “student(s)” with “group(s)”.
3.2 Computing Resources
A complementary computing resource is provided by the UVA Rivanna system, including both CPU and GPU hours. Stay tuned for more details.
3.3 Resources for the Final Project
The general theme for the final project is NLP for Social Good, which covers a wide range of problems that concern us, for example, social fairness, medical applications, and educational applications.
To get some inspirations for the final project, you can start from the following reading list.
4. Additional Information
Last updated on 08/16/2023