Skip to content

Latest commit

 

History

History
85 lines (65 loc) · 4.69 KB

11411.md

File metadata and controls

85 lines (65 loc) · 4.69 KB

11-411: Natural Language Processing

Category Difficulty
HW 4
Exams 8
Project 7

Natural Language Processing (NLP) is an exciting application field for machine learning, since NLP is found almost everywhere, from Siri on your phone, to telephone operators, automatic subtitle generation, and sentiment analysis for documents. Despite how hot NLP is, this class has not been known to be an instructive class, and is more about showing and introducing a bunch of concepts, rather than going in-depth to teach them. If you're doing research or you have a solid foundation in machine learning, this class may be okay for you. If you are a beginner at programming or don't have any machine learning experience, this class will be hard to keep up in and you will have a hard time learning things.

Lectures

The lectures in this class are more of a way to show you a bunch of concepts, rather than teach you skills. Algorithms are glanced over, and you are expected to learn them on your own. In addition, any prerequisites you are just expected to know. Therefore, a lot of people find the lectures pretty dry, and hard to pick up things. My suggestion is that if you don't have some sort of machine learning background i.e. 10-601, then this class is probably not going to be very fun.

Homeworks

Homework assignments are usually easy to medium length programming assignments that involve implementing a concept or algorithm learned in class. There are also sometimes short answer portions. Overall, the homeworks are not too hard, but are much harder if you don't have prior programming/ML experience.

Final Project

There is a semester-long project in the class where you have to design a Question and Answer generation program, where your program performs the two following functions:

  1. Question(Article) => Questions about the article
  2. Answers(Article, Questions) => Answers to the questions in the article

The entire class competes to see who has the best question and answer program, where the class performs peer-review at the end of the semester.

The assignment itself is pretty interesting, but the class poorly motivates the assignment. There is only one check-in, and most people struggle to figure out how to actually approach the problem. Usually, the way it goes is

  • Beginning: I'm going to do a deep-learning semantically tagging based analysis, feed it through a multi-layer network and then perform reinforcement learning to refine my questions and answers.
  • Checkpoint: Ok, to be more realistic, we will use a neural network to learn question and answer characteristics, and automate a training program to iteratively produce better questions and answers.
  • Final: I'm just going to pattern match a bunch of templates to questions and answers

Surprisingly, you can generate pretty solid questions with just templates and some surrounding context. For example, the following program will already do pretty well

# Warning: this is just python pseudocode
def Questions(text):
  questions = []
  for sentence in text:
    if sentence matches template '<phrase 1> is/was <phrase 2>':
      questions.append('Who/what is <phase 1>?')

  return questions

If you just find enough templates, the quality of your final questions will be pretty solid. I know it seems cheap, but this is a weakness of the assignment, where an easy solution is very solid and can perform better than almost all complicated solutions.

Because most teams do not do very well on this assignment, grading is fairly lenient. Doing pattern-matching usually gets you around a B if you make it give reasonable questions and answers. TLDR, it's in your best interest to start with something pretty easy, and improve that if you have time.

Exams

The exams are very hard for this class -- mostly because the class is not well taught. Multiple choice questions feel like class trivia, and the best way to study for them is just exposing yourself to a lot of NLP practice exams and trivia. Making your cheatsheet while copying down random facts on slides is a good way to prepare for some of these. The short and long answer portion of the exam is pretty straightforward, but is very hit or miss. What this means is that if there is an algorithm you don't know well, and they test it, then you're likely to lose a ton of points for that question. Given that they don't tend to focus on how algorithms work in class, you'll be on your own to learn many of the algorithms in this class.

Some links I found helpful for algorithms not covered well in class: