Why my grammar is not LL(1) ?

If you applied the usual procedures: remove epsilon production and left recursion, and apply left factorization, but your grammar is still not LL, then there are 3 reason why the a grammar is ambiguous.

  1. The grammar is for a language that is inherently ambiguous (extremely rare), nothing can be done.
  2. The associativity of the operators makes the grammar ambiguous. See here for the solution.
  3. The single look-ahead is not enough, it’s necessary to rewrite the grammar. This case is shown below.

Tool for manipulating grammars: http://smlweb.cpsc.ucalgary.ca/

Tool for creating parser table: https://sourceforge.net/projects/jsmachines/

In certain situation the look-ahead of one character isn’t enough to determinate the correct production rule to apply, so we have to modify the grammar. An example in HTML style is:

  • LIST_SECTION → LIST_SECTION SECTION | SECTION
  • SECTION → < section name = string > LIST_BLOCK < / section >
  • LIST_BLOCK → LIST_BLOCK BLOCK | BLOCK
  • BLOCK → < field name = string > VALUE < / field >
  • VALUE → number | string

If we try to transform it in LL(1), we obtain:

  • LIST_SECTION → SECTION LIST_SECTION’
  • LIST_SECTION’ → SECTION LIST_SECTION’ | epsilon
  • SECTION → < section name = string > LIST_BLOCK < / section >
  • LIST_BLOCK → BLOCK LIST_BLOCK’
  • LIST_BLOCK’ → BLOCK LIST_BLOCK’ | epsilon
  • BLOCK → < field name = string > VALUE < / field >
  • VALUE → number | string

The problem here is that the parser when is inside LIST_BLOCK and it sees “<” it’s unable to decide if the symbol is the opening of a new block or the closing of the section. The idea for the rewrite of the grammar is to move the problematic symbols outside the production into the previous production. It’s easier to see in the original grammar, we want to move the “<” of the closing section into LIST_BLOCK and fix BLOCK accordingly.

  • LIST_SECTION → LIST_SECTION SECTION | SECTION
  • SECTION → < section name = string > LIST_BLOCK / section >
  • LIST_BLOCK → LIST_BLOCK < BLOCK < | < BLOCK <
  • BLOCK → field name = string > VALUE < / field >
  • VALUE → number | string

It’s also possible to move the right “<” inside BLOCK.

  • LIST_SECTION → LIST_SECTION SECTION | SECTION
  • SECTION → < section name = string > LIST_BLOCK / section >
  • LIST_BLOCK → LIST_BLOCK < BLOCK| < BLOCK
  • BLOCK → field name = string > VALUE < / field > <
  • VALUE → number | string

Now we just have to eliminate left recursion and the grammar is LL(1).

An other example, always in HTML style is:

  • LIST_DIV → DIV LIST_DIV’
  • LIST_DIV’ → DIV LIST_DIV’ | epsilon
  • DIV → < div > ELEMENT < / div >
  • ELEMENT → LIST_DIV | string | < p > string < / p >

The problem is always the “<” if we switch it in ELEMENT we obtain:

  • DIV → < div > ELEMENT / div >
  • ELEMENT → LIST_DIV < | string < | < p > string < / p > <

We put aside for a moment “< p > string < / p > <” . The grammar is:

  • LIST_DIV → DIV LIST_DIV’
  • LIST_DIV’ → DIV LIST_DIV’ | epsilon
  • DIV → < div > ELEMENT / div >
  • ELEMENT → LIST_DIV < | string <

The fact that ELEMENT goes back to the initial production is a problem, so we introduce LIST_DIV2 that is the same as LIST_DIV.

  • LIST_DIV → DIV LIST_DIV’
  • LIST_DIV’ → DIV LIST_DIV’ | epsilon
  • DIV → < div > ELEMENT / div >
  • ELEMENT → LIST_DIV2 < | string <
  • LIST_DIV2 → DIV LIST_DIV2'
  • LIST_DIV2' → DIV LIST_DIV2' | epsilon

Now this piece of the grammar is LL(1). But if we add the “< p > string < / p > <” we have a problem with “<” in

  • ELEMENT → LIST_DIV2 < | < p > string < / p > <

because of left factorization, the solution is to extract “<” from DIV. Given the complete grammar:

  • LIST_DIV → DIV LIST_DIV’
  • LIST_DIV’ → DIV LIST_DIV’ | epsilon
  • DIV → < div > ELEMENT / div >
  • ELEMENT → LIST_DIV2 < | string < | < p > string < / p > <
  • LIST_DIV2 → DIV LIST_DIV2'
  • LIST_DIV2' → DIV LIST_DIV2' | epsilon

We extract “<” from DIV and obtain:

  • LIST_DIV → < DIV LIST_DIV’
  • LIST_DIV’ → < DIV LIST_DIV’ | epsilon
  • DIV → div > ELEMENT / div >
  • ELEMENT → LIST_DIV2 < | string < | < p > string < / p > <
  • LIST_DIV2 → < DIV LIST_DIV2'
  • LIST_DIV2' → < DIV LIST_DIV2' | epsilon

Successive step for resolving:

  • ELEMENT → < LIST_DIV2 < | string < | < p > string < / p > <
  • LIST_DIV2 → DIV LIST_DIV2'
  • LIST_DIV2' → DIV LIST_DIV2' | epsilon

Resolve left factorization:

  • ELEMENT → < ELEMENT’ | string <
  • ELEMENT’ → LIST_DIV2 < | p > string < / p > <
  • LIST_DIV2 → DIV LIST_DIV2'
  • LIST_DIV2' → DIV LIST_DIV2' | epsilon

The final grammar LL(1) is:

  • LIST_DIV → < DIV LIST_DIV’
  • LIST_DIV’ → < DIV LIST_DIV’ | epsilon
  • DIV → div > ELEMENT / div >
  • ELEMENT → < ELEMENT’ | string <
  • ELEMENT’ → LIST_DIV2 < | p > string < / p > <
  • LIST_DIV2 → DIV LIST_DIV2'
  • LIST_DIV2' → DIV LIST_DIV2' | epsilon