Schedule

Schedule: Reading and homework assignments

  • THIS SCHEDULE IS SUBJECT TO CHANGE
  • Check Canvas for specific due dates and times.

 

Week 1. NLP and Basic Text Processing. Chaps 1 and 2

January 8: The NLP Pipeline

  • General applications of NLP
  • The Holy Grail: Deep understanding of extended discourses such as stories or dialogues.
  • Installing NLTK, examples of how to use it
  • Reading in language data from files, splitting sentences etc.
  • Counting words modeling their frequency (CH. 1)
  • Collocations and Bigrams
  • Reading: Chapters 1 and  2 (reading in natural language data from files) 
  • Homework 1 assigned.

January 10: Basic Text Processing. Unigrams, Words, POS

  • Review NLTK Lexical resources (Read Chapter 2)
  • Tokenization (READ ch. 3.1.1)
  • POS Word categorization & generalization
  • Stemming
  • Collocations
  • Lexical Meaning: Wordnet 
  • Wordnet (READ ch. 2.5)
  • Synonyms and Synsets.
  • Word Senses
  • Read Chapter 3 for next week.

Week 2:  Words, sentences, frequencies, POS, ngrams. Chaps 3 and 5

Homework 1 DUE

January 15: Lexical Resources. Moving beyond Words and POS

  • PYTHON: Review chapter 4 if you are learning Python as you go along.
  • Review HW1.
  • Homework 2 assigned.
  • Regular Expressions (Read ch 3.4, 3.5, 3.6, 3.7)
  • POS tagging patterns with Regexp 
  • What is an ontology?
  • Wordnet API, how it works (READ ch. 2.5)
  • Synonyms and Synsets.
  • Semantic Relatedness
  • Brief introduction to Distributional Semantics and Word Embeddings (Word2Vec)

January 17 : Processing Bits of Language above the word 

  • POS tagging & applications (READ ch. 5.1 & 5.2)
  • Review of Probability and Conditional Probability
  • N-Gram language models  (READ Chapter 5.4 and 5.5)
  • Introduction to Text Classification

Week 3: Natural Language Understanding I 

Homework 2 DUE

January 22: Text Classification

  • READING: Chapter 5, section 3 (python dictionaries) Chapter 6, sections 6.1 to 6.4 
  • Homework 3 assigned.
  • Text Classification Using Sentiment Lexicons, Lexical Resources
  • Classifying Texts or Utterances into Categories.
  • Defining an Experiment.
  • Example: Restaurant Reviews. For homework.
  • Constructing Feature Representations of Texts. 
  • Features for POS, features for words (unigrams), Bigram Features.
  • Sentiment Lexicons: LIWC Linguistic Inquiry and Word Count. LIWC Features
  • How to do error analysis on your classifier predicted output.

January 24:  

  • How many NLU problems can be cast as classification problems?
  • How to figure out what features are useful.
  • Example: Movie Reviews: Thumbs up or Thumbs Down?
  • Examining the most important features.
  • SciKit-learn package for classifiers in NLTK.
  • Decision Tree learners
  • Differences between different  types of classifiers
  • How Naive Bayes works.

Week 4: More classification

Homework 3 DUE

January 29: Classifiers, feature analysis, word embeddings. 

  • HW4: Extend HW3 with more features, learners, analysis.
    • Try word Embeddings, Decision Tree learners, SVM in SciKit Learn
  • Feature Selection
  • Different types of classifiers
  • Distributional Semantics: Background to Word Embeddings
  • Word Embeddings: how they are built, how you can use in classification 

January 31:  The Lexicon, Verbs and their subcategorization.

  • Homework 5 provides you with sample problems that will allow you to review for the midterm.
  • More on Lexical Subcategorization (Re-read Chapter 2)
  • Lexical Meaning: Wordnet and Verbnet
  • Review Wordnet
  • Verbs and their dependents

Week 5:

Homework 4 is due

February 5

  • Midterm review
  • How Parsers Work, Part 1.

February 7: Midterm

Bring a Scantron

  • Midterm (Probability, Conditional Probability, NGram Language models, POS tagging, Stemming, Collocations, Text Classification, Naive Bayes, Wordnet, Verbnet}
  • Multiple Choice. Bring a PINK SCANTRON.

Week 6:

February 12: Starting on the projects!! Natural Language Understanding II

Homework 6 assigned. 

    • Factoid QA

    • Baseline QA system. String operations, sentence selection/ranking.

    • Types of Questions in baseline QA: Who, What, When, Where

    • Identifying likely phrases and sentences, REGEXP patterns.

    • Sample Code Stubs for baseline system

    • Evaluation metrics: Precision, Recall, F-measure

    • Maximizing Recall at the expense of Precision

    • Setup of QA task and demo of scoring

    • The evolution of parsing algorithms.

    • How Parsers work. Probabilistic Parsing.

February 14:

Week 7: 

February 19Chunking, Sentence Structure and Parsing I 

February 21:  MORE Natural Language Understanding for QA

  • Chunking (Shallow Parsing vs. Parsing). Read Ch. 7.  
  • Sentence Structure. READ ch. 8.1-8.3, 8.5
  • QA pipeline
  • Baseline QA system using string operations and sentence ranking
  • Next Steps: Using Syntax
  • Stanford Dependency Parse structure VS. Constituent structure
  • Dependency Structures and Relations
  • Constituent Structures and CFG rules
  • Where  can we use text classification? How many NLU problems can be cast as classification problems?
  • POS tagging as a classification problem
  • Revisit N-Gram language models  (READ Chapter 5.4 and 5.5)
  • Introduction to Parsing

Week 8:  Question Answering  II

HOMEWORK 6 is DUE

February 26: Question Answering II

  • Question Answering using Syntax, Working with NLU representations for Question Answering

  • Homework 7 assigned.
  • IMPROVING PRECISION.
  • Grammars and Parsing
  • Constituency and Dependency Tree Readers
  • Chunking and Parsing: How to search trees
  • Pattern Matching on Dependency Relations
  • Ranking possible responses

February 28:

  • Working with NLU representations.
  • Syntactic Structure and Coordination
  • Prepositional Phrase Attachments
  • Dependency vs. Constituent Structures II
  • Answering Questions from NLU/parsing representations

Week 9: Question Answering III: Lexicons & Lexical Semantics 

Homework 7 DUE.

March 5:  Using VerbNet and WordNet API in QA

  • HW8: final HW assigned. The final QA competition. 
  • Using WordNet and Verbnet. New kinds of questions.
  • Increasing Precision of Answers
  • Verbs and their dependents, VerbNet semantic role types
  • Verbnet and Wordnet API, how it works
  • Word Sense Disambiguation .DICT files provided with HW8.
  • Constituent and Dependency Trees, finding subjects etc
  • Increasing Precision of Answers

March 7: No class. Work on your projects

  • Special section during class time

Week 10: Question Answering Competition & Poster presentations 

March 12: Poster session 

March 14: Poster session

 

FINALS WEEK

Homework 8 DUE Monday.

FINAL SLOT: Wednesday March 20th. 8:00 - 11:00 AM. 

Bring a scantron.