[7687] in www-talk@info.cern.ch

home help back first fref pref prev next nref lref last post

Re: HTML parser in Yacc form???

daemon@ATHENA.MIT.EDU (uid#15033@dxal18.cern.ch)
Wed Mar 22 14:33:33 1995

Date: Wed, 22 Mar 1995 13:19:38 +0500
Errors-To: procmaster@www19.w3.org
Reply-To: uid#15033@dxal18.cern.ch
From: uid#15033@dxal18.cern.ch
To: Multiple recipients of list <www-talk@www10.w3.org>

In article <3k4hss$l06@stratus.CAM.ORG> you write:

|>	Hi all,
|>
|>	I was wondering if there exists a specification of HTML in yacc 
|>(or bnr) form. It has probably been done as constructing such a parser is 
|>way more easier in this way than with a traditional C subroutine.

Don't think about it. HTML is not an LR(1) grammar and so trying to use yacc
is only going to cause pain. The best way of parsing SGML is with a top down 
recursive descent parser. Try to use yacc and you will end up in all sorts of
troubles, especially with error reporting.

One of the problems with comp sci courses is that lecturers often make
silly statments such as bottom up parsing being somehow better than top down. 
This is not the case. Bottom up parsers can be made slightly faster but at
a disproportionate cost in terms of complexity. My view is that a language 
requiring a yacc parser is probably too complex in any case. Nobody uses
an LR(1) parser to parse LISP.

--
Phillip M. Hallam-Baker

Not Speaking for anyone else.

home help back first fref pref prev next nref lref last post