1. Select a Track
Click on a track to see the available options.
To reduce the grading challenge, students who want to pick a different task/dataset should get it approved.
Text Classification
Categorize text into predefined labels. Ideal for tasks like sentiment analysis, topic modeling, and intent detection.
Text Generation
Create new, coherent text. Perfect for projects like dialogue systems, poetry generation, and summarization.
2. Choose a Classification Task
The goal is to classify text into predefined categories. Your dataset must contain text data and corresponding labels.
Intent Detection
Classify user queries into specific goals like "play music" or "get weather." Core to building virtual assistants.
Dataset: Amazon MASSIVEPersuasion Detection
Identify whether a piece of text is persuasive. Involves recognizing subtle linguistic cues and rhetorical strategies.
Dataset: Anthropic PersuasionNews Article Categorization
Classify articles into topics like "Sports," "Politics," or "Technology." A classic multi-class classification problem.
Dataset: AG NewsEmotion Detection
Classify text by the emotion conveyed (e.g., "joy," "sadness," "anger"). A fine-grained classification task.
Dataset: Emotions NLP2. Choose a Generation Task
The goal is to generate new text based on a prompt. Your dataset should contain input/prompt text and corresponding target text.
Dialogue Generation
Generate a response to a line of dialogue. A foundational task for building chatbots and conversational agents.
Dataset: Cornell Movie-DialogsHeadline Generation
Generate a news headline from the first paragraph of an article. A classic text summarization task.
Dataset: News SummaryPoetry Generation
Generate lines of poetry in a specific style (e.g., Shakespearean sonnets). A creative and challenging generation task.
Dataset: Gutenberg PoetryCode Documentation Generation (Python ONLY)
Generate a docstring that explains what a function does based on its code. A practical application of sequence-to-sequence models.
Dataset: CodeSearchNet