Steve's Explanation of CKY & Earley Parsing

CKY and Earley are two styles of parsing for context-free grammars (CFGs). They ensure that, given a set of grammar rules and an input sentence, all possible parses of that sentence are extracted. The results are the possible parse trees for the complete sentence, and all of the subtrees that were found during the extraction of this parse tree.

CKY stands for Cocke-Kasami-Younger, and involves parsing substrings of length 1, then length 2, and so on until the entire string has been parsed. The reason why this is useful is because the shorter substrings from previous iterations can be used when applying grammar rules to parse the longer substrings. In this way, CKY implements bottom-up parsing, without forgetting substring groupings that had been done previously.

The Earley parsing algorithm is similar in that it tries to use grammar rules to build non-terminals from substrings. However, instead of applying grammar rules for substrings of increasing length, it reads in words one at a time, and applies any grammar rules that are allowed by these input words. To do this, the parser stores a list of partially completed grammar rules. As words are taken in from the left-most position, the parser first determines what new grammar rules could start with a word of that type, and those rules are put into the list. The parser then determines if a partially-completed rule that is already in the list needs a word of that type in that position to complete itself further. If so, the rule is taken out and replaced with the more complete version of itself. When a word fully completes a rule, it is taken out, replaced with the non-terminal corresponding to that rule, and the rule completion process is repeated, using the non-terminal to complete rules instead of the word. When complete, any sentence non-terminals that encompass the entire string are treated as valid parses for the sentence.


This explanation is derived from my interpretation of various web pages and books that I have read on the topic.