π― Goal of this Assignment
The goal of this first assignment is to deeply understand your chosen dataset and prepare it for the modeling stages in the upcoming homeworks. High-quality data is the foundation of any successful NLP model, and this initial step is arguably the most critical in the entire pipeline.
Important Submission Note
- Submit a single, well-documented Jupyter Notebook (`.ipynb`) file containing all your code, outputs, visualizations, and written explanations for the tasks corresponding to your chosen track.
- Please ensure that all code cells have been run and their **outputs** (e.g., print statements, plots, dataframes) **are visible** in the submitted notebook for grading.
- Please structure your notebook to follow the sections and subsections of this assignment document. Use markdown cells to clearly mark section indices (e.g., "1. Exploratory Data Analysis", "1.1 Load Data").
- Please make sure your submission for each task includes all items listed under the "Expected Output" for that task.
- You are encouraged to refer to the demo code provided in class for practical examples of the concepts covered in this assignment.
- This is the coding assignment, and please donβt forget about the reading assignment.
Select Your Project Track
Click a track to view the specific tasks.
π·οΈ
Track A: Text Classification
Categorize text into predefined labels.
βοΈ
Track B: Text Generation
Create new, coherent text from input.