NLP Homework 2

Classical Model Baselines 📊

🎯 Goal of this Assignment

The goal of this assignment is to build and evaluate traditional, non-neural-network models on your preprocessed data from Homework 1. These "classical" models are computationally efficient and provide a crucial performance baseline. A more complex deep learning model is only worthwhile if it can significantly outperform these simpler baselines.

Important Submission Note

  • Submit a single, well-documented Jupyter Notebook (`.ipynb`) file containing all your code, outputs, visualizations, and written explanations for the tasks corresponding to your chosen track.
  • Please ensure that all code cells have been run and their **outputs** (e.g., print statements, plots, dataframes) **are visible** in the submitted notebook for grading.
  • Please structure your notebook to follow the sections and subsections of this assignment document. Use markdown cells to clearly mark section indices (e.g., "1. Feature Engineering", "2. Model Training").
  • Please make sure your submission for each task includes all items listed under the "Expected Output" for that task.
  • You are encouraged to refer to the demo code provided in class for practical examples of the concepts covered in this assignment.
  • This is the coding assignment, and please don't forget about the reading assignment.

A Note on Your Dataset

If you were not satisfied with your dataset from Homework 1, you may select a new one. However, you **must** first perform all preprocessing steps from Homework 1 on the new data before proceeding.

A General Requirement on Data Splits

If your dataset already provides official splits (e.g., train/validation/test), please use them. If not, create your own splits by randomly sampling—for example, 80% for the **training set** and 20% for the **evaluation set**—before you start.

Select Your Project Track

Click a track to view the specific tasks.

🏷️

Track A: Text Classification

Categorize text into predefined labels.

✍️

Track B: Text Generation

Create new, coherent text from input.