Natural Language Processing

UVA CS 6501-011 (Fall 2024)

Final Project

This post includes some important information about the final project. Please read it carefully and let me know if you have any questions.

Highlights

1. Proposal

1.1 Background

The common theme of the final project is NLP for Social Goods. The goal of this theme is to study the social problems for which natural language processing has the potential to offer meaningful solutions. There is a wide range of topics under this theme, including education, urban planning, assistive technology for people with disabilities, health, agriculture, environmental sustainability, economic, social, gender inequality, social welfare and justice, ethics, privacy, and security. As you can see, this is a long list, and it should not be difficult to find a specific research project to work on.

As an example, sentiment classification can be considered one of the popular text classification problems in NLP. In our class lecture, we also used it as an example to explain the definition of text classification and the advantage of building a data-driven classifier (instead of merely counting positive/negative words). However, one important practical value of sentiment classification is that many real-world NLP problems can be addressed by sentiment classification (at least, to some extent). For example, prior work has shown that it can be used to understand customer behaviors and analyze mental health issues with social media.

The challenge of this project is how to build the connections between NLP techniques and some real-world problems. The purpose of the project proposal is to encourage students to think about what they learned from this class can be helpful to address some critical social issues.

If you would like to understand more about this project theme and also figure out what kinds of topics that you can work on, please check out

1.2 Rubric for Project Proposal

The project proposal should contain the following sections and components in each section.

  1. Introduction:
    • (1 point) A description of a problem that is related to social goods
    • (1 point) The motivation of why you want to work on this problem
    • (2 points) How can this problem be formulated as an NLP problem?
  2. Data:
    • (1 point) Which dataset that you plan to use, or where you plan to get the dataset
    • (1 point) Dataset description
      • If it is an existing dataset, please describe some basic statistics of this dataset, for example, (1) the numbers of training/validation/test examples; (2) the vocab size of the training examples.
      • If you plan to collect a new dataset, please describe (1) the expected numbers of training/validation/test examples; (2) any guarantee that you can collect the dataset as expected (e.g., if you plan to collect a new dataset from Twitter, make sure you have access to the Twitter APIs).
  3. Proposed Methods:
    • (2 points) A simple description of the proposed solutions, which should be based on the last item in the introduction section
    • (2 points) What are the evaluation criteria (e.g., accuracy, F1 score, or any other criteria) and why these evaluation criteria are good or sufficient for this task?

1.3 Project Proposal Submission