Interactive Guide: NLP Homework 2

🎯 Goal of this Assignment

The goal of this assignment is to build and evaluate traditional, non-neural-network models on your preprocessed data from Homework 1. These "classical" models are computationally efficient and provide a crucial performance baseline. A more complex deep learning model is only worthwhile if it can significantly outperform these simpler baselines.

Important Submission Note

Submit a single, well-documented Jupyter Notebook (`.ipynb`) file containing all your code, outputs, visualizations, and written explanations for the tasks corresponding to your chosen track.
Please ensure that all code cells have been run and their **outputs** (e.g., print statements, plots, dataframes) **are visible** in the submitted notebook for grading.
Please structure your notebook to follow the sections and subsections of this assignment document. Use markdown cells to clearly mark section indices (e.g., "1. Feature Engineering", "2. Model Training").
Please make sure your submission for each task includes all items listed under the "Expected Output" for that task.
You are encouraged to refer to the demo code provided in class for practical examples of the concepts covered in this assignment.
This is the coding assignment, and please don't forget about the reading assignment.

A Note on Your Dataset

If you were not satisfied with your dataset from Homework 1, you may select a new one. However, you **must** first perform all preprocessing steps from Homework 1 on the new data before proceeding.

A General Requirement on Data Splits

If your dataset already provides official splits (e.g., train/validation/test), please use them. If not, create your own splits by randomly sampling—for example, 80% for the **training set** and 20% for the **evaluation set**—before you start.

Select Your Project Track

Click a track to view the specific tasks.

🏷️

Track A: Text Classification

Categorize text into predefined labels.

✍️

Track B: Text Generation

Create new, coherent text from input.

NLP Homework 2