[875] in java-interest

home help back first fref pref prev next nref lref last post

First cut at a Java expression grammar (Yacc spec enclosed)

daemon@ATHENA.MIT.EDU (Scott Hudson)
Sat Aug 12 03:38:10 1995

From: hudson@cc.gatech.edu (Scott Hudson)
To: java-interest@java.sun.com
Date: Thu, 10 Aug 1995 01:25:57 -0400 (EDT)


/*==============================================================*/ 
/*
  First cut at a Java expression grammar, v0.1 (8/9/95)
  by Scott Hudson GVU Center, Georgia Tech
      hudson@cc.gatech.edu  
      http://www.cc.gatech.edu/gvu/people/Faculty/Scott.E.Hudson.html
 
  This Yacc specification is a first attempt at an LALR grammar for
  Java expressions.  It has not been tested (beyond working out the
  conflicts with Yacc) and is primarily for the purpose of discussing
  the syntax of the language and the problems there in.  Comments and
  corrections would be appreciated.
 
  Because of several ambiguous constructs in the language (most notably
  casts and the overloading of ".") Java expressions are exceptionally
  difficult to parse.  At present this grammar parses a superset of the
  language, accepting several illegal constructs which have to be found
  and reported after syntax analysis.  I think this is *extremely*
  undesirable, but I have thus far found no way around this.   Please
  let me know if you can find a cleaner way to do some of this, and/or
  productions which are "tighter" than the superset of the language that
  I'm currently accepting (without introducing more conflicts).
 
  Despite the parsing of a superset language, this grammar is still ambiguous
  and results in 3 shift/reduce conflicts.  Two of the conflicts involve the 
  cast construct (in particular things of the form "(a)(b)").  The third 
  conflict involves the "new" operator. All these conflicts can be sensibly 
  resolved by always shifting, although some of them require sorting out after 
  the fact.  See comments in the grammar for full details regarding each 
  conflict.
 
  This grammar represents the discussion and "(semi-)official verdicts" that 
  have taken place up to 8/9/95 as best I understand them.  Again, corrections 
  would be appreciated.  In particular, this grammar places the precedence of 
  "new" above that of "." and "[]" (which I believe is a dubious choice), 
  allows any expression on the left hand side of an assignment, and allows 
  any expression to be "called" as a method invocation.
 
  Unfortunately, this grammar does not seem to work out when directly combined 
  with the larger grammar of the full language.  In particular the inherent 
  ambiguity surrounding constructs of the form "a.b.c" raises its head.  
  Basically in a statement context when the parser sees a leading "a.b" under 
  lookahead of "." or "[" it cannot determine if it is seeing the start of an 
  expression or the start of a type in a variable declaration.  This leads to 
  reduce/reduce conflicts which, if resolved arbitrarily, will always be wrong 
  for some inputs. There is probably a factoring of the grammar that will 
  resolve this problem, but I haven't worked it out yet.  Any help on this 
  would be greatly appreciated.
 
  Revisions
    v0.1     First released version                  8/9/95  [SEH]
 
*/
/*==============================================================*/ 
 
%token ABSTRACT AND AND_EQUALS BITAND BITNOT BITOR BOOLEAN BREAK BYTE CASE
%token CATCH CHAR CHARACTER CLASS COLON COMMA CONTINUE DECREMENT DEFAULT 
%token DIVIDE DIVIDE_EQUALS DO DOT DOUBLE ELSE EQUALS EQUAL_EQUAL EXTENDS FALSE %token FILL_SHIFT_RIGHT FILL_SHIFT_RIGHT_EQUALS FINAL FINALLY FLOAT GREATER 
%token GREATER_EQ ID IF IMPLEMENTS IMPORT INCREMENT INSTANCEOF INT INTERFACE 
%token LBRACK LCURLY LESS LESS_EQ LONG LPAREN MINUS MINUS_EQUALS MOD 
%token MOD_EQUALS NATIVE NEW NOT NOT_EQUAL NULL NUMBER OR OR_EQUALS PACKAGE 
%token PLUS PLUS_EQUALS PRIVATE PROTECTED PUBLIC QUESTION RBRACK RCURLY RETURN 
%token RPAREN SEMI SHIFT_LEFT SHIFT_LEFT_EQUALS SHIFT_RIGHT SHIFT_RIGHT_EQUALS 
%token SHORT STAR STATIC STRING SUPER SWITCH SYNCHRONIZED THIS THREADSAFE THROW
%token TIMES_EQUALS TRANSIENT TRUE TRY WHILE XOR XOR_EQUALS 
 
%start expr
 
%%
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
expr : 
        assign_expr
        |
        expr COMMA assign_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
/* Its probably the case that the left hand side of an assignment needs to
   be a more restricted class, but there are a number of ways to do this,
   and this has yet to be commented on officially.  Consequently, here we leave
   it wide open and allow essentially any expression on the left hand side of 
   an assignment.  As it stands, this allows things such as "2+2 = 5" which are    obviously wrong, along with things like "(a) = 5" which javac does not like,
   but are not obviously wrong.
*/
 
assign_expr :
        conditional_expr
        |
        unary_expr EQUALS assign_expr
        |
        unary_expr PLUS_EQUALS assign_expr
        |
        unary_expr MINUS_EQUALS assign_expr
        |
        unary_expr TIMES_EQUALS assign_expr
        |
        unary_expr DIVIDE_EQUALS assign_expr
        |
        unary_expr MOD_EQUALS assign_expr
        |
        unary_expr AND_EQUALS assign_expr
        |
        unary_expr OR_EQUALS assign_expr
        |
        unary_expr XOR_EQUALS assign_expr
        |
        unary_expr SHIFT_LEFT_EQUALS assign_expr
        |
        unary_expr SHIFT_RIGHT_EQUALS assign_expr
        |
        unary_expr FILL_SHIFT_RIGHT_EQUALS assign_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
conditional_expr :
        or_expr
        |
        or_expr QUESTION expr COLON conditional_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
or_expr :
        and_expr
        |
        or_expr OR and_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
and_expr :
        bitor_expr
        |
        and_expr AND bitor_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
bitor_expr :
        xor_expr
        |
        bitor_expr BITOR xor_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
xor_expr :
        bitand_expr
        |
        xor_expr XOR bitand_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
bitand_expr : 
        equality_expr
        |
        bitand_expr BITAND equality_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
equality_expr :
        relational_expr
        |
        equality_expr EQUAL_EQUAL relational_expr
        |
        equality_expr NOT_EQUAL relational_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
relational_expr :
        shift_expr
        |
        relational_expr LESS shift_expr
        |
        relational_expr GREATER shift_expr
        |
        relational_expr LESS_EQ shift_expr
        |
        relational_expr GREATER_EQ shift_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
shift_expr :
        add_expr
        |
        shift_expr SHIFT_LEFT add_expr
        |
        shift_expr SHIFT_RIGHT add_expr
        |
        shift_expr FILL_SHIFT_RIGHT add_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
add_expr :
        mult_expr
        |
        add_expr PLUS mult_expr
        |
        add_expr MINUS mult_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
mult_expr :
        cast_expr
        |
        mult_expr STAR cast_expr
        |
        mult_expr DIVIDE cast_expr
        |
        mult_expr MOD cast_expr 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
/* Note that the third production here recognizes a little bit more than is 
   legal in order to avoid enough ambiguity to be able to actually parse.  
   The cast construct is syntactically ambiguous in Java and is resolved
   by fiat.  In particular, something of the form "(a) <expr>" (where a
   is not a built-in type name) is considered a cast if and only if "<expr>" 
   does not start with an operator.  (Any use of a built-in type within the 
   parenthesis is always treated as a cast.) Hence, "(a) - b" is considered a 
   subtraction even if "a" turns out later to be the name of a class.  Here, in    order to be able to parse, we allow any expression inside the parenthesis, 
   and assume that some later processing finds and reports error for any 
   expressions that cannot be treated as a type.  (Also see the comment on 
   postfix_expr regarding the allowance for "<expr>[]" instead of just 
   "<expr>[<expr>]").
 
   We get a shift/reduce conflict involving this construct.  Basically we can't
   tell if "(a)(<expr>)" is a cast to class "a" or a call of method "a".  The 
   rules for resolving cast ambiguity would seem to require this to be treated
   as a cast even if "a" later turns out to be a method name and not a class.
   This corresponds to resolving the conflict here with a shift, which is 
   is the default resolution for shift/reduce conflicts.
*/
   
cast_expr :
        unary_expr
        |
        LPAREN not_named_type RPAREN cast_expr 
        |
        LPAREN expr RPAREN not_op_cast_expr
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
/* These productions encode a cast expression that does not begin with an
   operator.  This is used only following a potential cast.  As with
   cast_expr, we are allowing more than is legal here, and we have a 
   shift/reduce conflict.  (All the comments from cast_expr above apply here
   as well.) 
*/
 
not_op_cast_expr :
        not_op_unary_expr
        |
        LPAREN not_named_type RPAREN cast_expr 
        |
        LPAREN expr RPAREN not_op_cast_expr
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
unary_expr :
        postfix_expr
        |
        postfix_expr INSTANCEOF multi_part_id 
        |
        INCREMENT unary_expr  
        |
        DECREMENT unary_expr
        |
        MINUS cast_expr
        |
        NOT cast_expr
        |
        BITNOT cast_expr 
        ; 
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
/* These productions represent the subset of unary_expr that does not start
   with an operator.  This is used only following a cast.  See the comments
   in cast_expr above.
*/
 
not_op_unary_expr :
        postfix_expr
        |
        postfix_expr INSTANCEOF multi_part_id 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
/* Note that the third production here recognizes a little bit more than is
   legal in order to avoid enough ambiguity to be able to actually parse.  In 
   this case we allow array operations without an index expression inside the 
   brackets.  This is here because we were forced to allow any expression where    normally the type would go in a cast, and we needed to allow array types.  
   This assumes the illegal occurrences of this construct are detected
   "manually" after parsing.
*/
 
postfix_expr :
        primary_expr
        |
        postfix_expr LBRACK expr RBRACK
        |
        postfix_expr LBRACK RBRACK
        |
        postfix_expr LPAREN opt_arg_list RPAREN 
        |
        postfix_expr DOT ID
        |
        postfix_expr INCREMENT
        |
        postfix_expr DECREMENT 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
primary_expr :
        NUMBER
        |
        STRING
        |
        CHARACTER
        |
        TRUE
        |
        FALSE
        |
        NULL
        |
        THIS
        |
        SUPER
        | 
        ID
        |
        NEW class_name  LPAREN opt_arg_list RPAREN
        |
        NEW type_specifier new_array_spec_list
        |
        NEW LPAREN expr  RPAREN 
        |
        LPAREN expr RPAREN 
        ;
 
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
/* 
   Note that the construct below recognizes a little bit more than is strictly 
   legal in order to avoid enough ambiguity to be able to actually parse.  In 
   particular, we allow mixtures of brackets with expressions (specified array 
   bounds) and brackets without expressions (unspecified array bounds) in any 
   order.  The language however, requires that all the unspecified bounds come 
   last and that there be at least one specified bound.  Here we are assuming 
   that this is checked after parsing (e.g., constructs such as 
   "new aclass[][5][]" are reported as errors) and that specified bounds that 
   occur after unspecified bounds are converted into array subscripting 
   operations (e.g., constructs such as "new aclass[5][][5]" is reformulated 
   as "(new aclass[5][])[5]"). 
 
   A shift reduce/conflict occurs with this construct.  Because of the 
   precedence of "new" we cannot tell if a trailing "[<expr>]" is the 
   declaration of additional dimension of the array being allocated, or a 
   subscripting operation applied to the result of the "new".  We resolve this 
   conflict by always shifting, which corresponds to always assuming that this 
   is another dimension.  After parse processing as described above is required    to sort this out after the fact.
*/
 
new_array_spec_list :
        new_array_spec_list new_array_spec
        |
        new_array_spec 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
new_array_spec :
        LBRACK expr RBRACK 
        |
        LBRACK RBRACK 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
opt_arg_list :
        arg_list
        |
        empty 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
arg_list :
        assign_expr 
        |
        arg_list COMMA assign_expr
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
not_named_type :
        simple_type_specifier array_decl_list 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
type_specifier :
        simple_type_specifier
        |
        multi_part_id 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
simple_type_specifier :
        BOOLEAN
        |
        BYTE
        |
        CHAR
        |
        SHORT
        |
        INT
        |
        FLOAT
        |
        LONG
        |
        DOUBLE 
        ; 
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
array_decl_list :
        array_decl_list LBRACK RBRACK
        |
        empty
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
multi_part_id :
        ID 
        |
        multi_part_id DOT ID 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
class_name :
        multi_part_id 
        ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 
empty : /* nothing */ ;
 
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */
 

-
Note to Sun employees: this is an EXTERNAL mailing list!
Info: send 'help' to java-interest-request@java.sun.com

home help back first fref pref prev next nref lref last post