Natural Language Processing

UVA CS 6501-011 (Fall 2024)

Final Project

This post includes some important information about the final project. Please read it carefully and let me know if you have any questions.


1. Proposal

1.1 Background

The common theme of the final project is NLP for Social Goods. The goal of this theme is to study the social problems for which natural language processing has the potential to offer meaningful solutions. There is a wide range of topics under this theme, including education, urban planning, assistive technology for people with disabilities, health, agriculture, environmental sustainability, economic, social, gender inequality, social welfare and justice, ethics, privacy, and security. As you can see, this is a long list, and it should not be difficult to find a specific research project to work on.

As an example, sentiment classification can be considered one of the popular text classification problems in NLP. In our class lecture, we also used it as an example to explain the definition of text classification and the advantage of building a data-driven classifier (instead of merely counting positive/negative words). However, one important practical value of sentiment classification is that many real-world NLP problems can be addressed by sentiment classification (at least, to some extent). For example, prior work has shown that it can be used to understand customer behaviors and analyze mental health issues with social media.

The challenge of this project is how to build the connections between NLP techniques and some real-world problems. The purpose of the project proposal is to encourage students to think about what they learned from this class can be helpful to address some critical social issues.

If you would like to understand more about this project theme and also figure out what kinds of topics that you can work on, please check out

1.2 Rubric for Project Proposal

The project proposal should contain the following sections and components in each section.

  1. Introduction:
    • (1 point) A description of a problem that is related to social goods
  2. Data:
    • (1 point) Which dataset that you plan to use, or where you plan to get the dataset
    • (1 point) Dataset description
      • If it is an existing dataset, please describe some basic statistics of this dataset, for example, (1) the numbers of training/validation/test examples; (2) the vocab size of the training examples.
      • If you plan to collect a new dataset, please describe (1) the expected numbers of training/validation/test examples; (2) any guarantee that you can collect the dataset as expected (e.g., if you plan to collect a new dataset from Twitter, make sure you have access to the Twitter APIs).
  3. Proposed Methods:
    • (1 point) A simple description of the proposed solutions, which should be based on the last item in the introduction section
    • (1 point) What are the evaluation criteria (e.g., accuracy, F1 score, or any other criteria)?

1.3 Project Proposal Submission

2. Final Project Presentation and Report

Deadlines of Project Presentation and Report

2.1 Rubric for Final Presentation

The final presentation will be recorded and submitted via Canvas. Each team will give a six-minute presentation, which covers the following items. You can use either Zoom or any other tools to record your presentation.

In the presentation, please include the following components

  1. Introduction (2 points):
    • A brief explanation of problem definition
    • The motivation of why you worked on this problem
  2. Proposed methods (3 points):
    • A description of the proposed methods
    • A simple justification about why you think the proposed methods could work
  3. Experimental Results (3 points)
    • A highlight of 2 - 3 interesting results/observations from experiments (it is not necessary to present all the results here)

Note that, because we are doing online recording, all team members need to present part of the work, unless there is a special situation and the instructor is notified.

2.2 Rubric for Final Report

Please submit the final report on Collab – each group only needs to submit one copy of the report.

In the final report, please keep all the sections in the project proposal and only add the following three sections in the end. You have up to three pages for the three sections.

  1. Implementation Details for Reproducibility (3 points) please describe the information related to experiment setup, including
    • How the data was pre-processed?
    • What is the vocab size after data pre-proprecessing?
    • Any machine learning (deep learning) packages that were used in the implementation?
    • Besides the standard machine learning packages mentioned before, did you use other people’s code?
    • Please also upload the code or provide a Github link to the code
  2. Experimental Results (4 points)
    • Experiment results (2 points): please list the important results from the experiments
    • Results analysis (2 points): together with the proposed methods, explain whether the results meet your expectations?
      • If the results are expected and good, please identify the important factors that lead to the success;
      • Otherwise, please identify the unexpected issues in the proposed methods.

Note that, if you revised the proposed idea after submitting the proposal (and also got the permission from the instructor), please explain the revision briefly in the final report. Otherwise, the final project will be considered as not following the proposed idea.