ReactOS Fundraising Campaign 2012
 
€ 4,410 / € 30,000

Information | Donate

Home | Info | Community | Development | myReactOS | Contact Us

  1. Home
  2. Community
  3. Development
  4. myReactOS
  5. Fundraiser 2012

  1. Main Page
  2. Alphabetical List
  3. Data Structures
  4. Directories
  5. File List
  6. Data Fields
  7. Globals
  8. Related Pages

ReactOS Development > Doxygen

grammar.c
Go to the documentation of this file.
00001 /*
00002  * Mesa 3-D graphics library
00003  * Version:  6.6
00004  *
00005  * Copyright (C) 1999-2006  Brian Paul   All Rights Reserved.
00006  *
00007  * Permission is hereby granted, free of charge, to any person obtaining a
00008  * copy of this software and associated documentation files (the "Software"),
00009  * to deal in the Software without restriction, including without limitation
00010  * the rights to use, copy, modify, merge, publish, distribute, sublicense,
00011  * and/or sell copies of the Software, and to permit persons to whom the
00012  * Software is furnished to do so, subject to the following conditions:
00013  *
00014  * The above copyright notice and this permission notice shall be included
00015  * in all copies or substantial portions of the Software.
00016  *
00017  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
00018  * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
00019  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
00020  * BRIAN PAUL BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
00021  * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
00022  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
00023  */
00024 
00031 #ifndef GRAMMAR_PORT_BUILD
00032 #error Do not build this file directly, build your grammar_XXX.c instead, which includes this file
00033 #endif
00034 
00035 /*
00036 */
00037 
00038 /*
00039     INTRODUCTION
00040     ------------
00041 
00042     The task is to check the syntax of an input string. Input string is a stream of ASCII
00043     characters terminated with a null-character ('\0'). Checking it using C language is
00044     difficult and hard to implement without bugs. It is hard to maintain and make changes when
00045     the syntax changes.
00046 
00047     This is because of a high redundancy of the C code. Large blocks of code are duplicated with
00048     only small changes. Even use of macros does not solve the problem because macros cannot
00049     erase the complexity of the problem.
00050 
00051     The resolution is to create a new language that will be highly oriented to our task. Once
00052     we describe a particular syntax, we are done. We can then focus on the code that implements
00053     the language. The size and complexity of it is relatively small than the code that directly
00054     checks the syntax.
00055 
00056     First, we must implement our new language. Here, the language is implemented in C, but it
00057     could also be implemented in any other language. The code is listed below. We must take
00058     a good care that it is bug free. This is simple because the code is simple and clean.
00059 
00060     Next, we must describe the syntax of our new language in itself. Once created and checked
00061     manually that it is correct, we can use it to check another scripts.
00062 
00063     Note that our new language loading code does not have to check the syntax. It is because we
00064     assume that the script describing itself is correct, and other scripts can be syntactically
00065     checked by the former script. The loading code must only do semantic checking which leads us to
00066     simple resolving references.
00067 
00068     THE LANGUAGE
00069     ------------
00070 
00071     Here I will describe the syntax of the new language (further called "Synek"). It is mainly a
00072     sequence of declarations terminated by a semicolon. The declaration consists of a symbol,
00073     which is an identifier, and its definition. A definition is in turn a sequence of specifiers
00074     connected with ".and" or ".or" operator. These operators cannot be mixed together in a one
00075     definition. Specifier can be a symbol, string, character, character range or a special
00076     keyword ".true" or ".false".
00077 
00078     On the very beginning of the script there is a declaration of a root symbol and is in the form:
00079         .syntax <root_symbol>;
00080     The <root_symbol> must be on of the symbols in declaration sequence. The syntax is correct if
00081     the root symbol evaluates to true. A symbol evaluates to true if the definition associated with
00082     the symbol evaluates to true. Definition evaluation depends on the operator used to connect
00083     specifiers in the definition. If ".and" operator is used, definition evaluates to true if and
00084     only if all the specifiers evaluate to true. If ".or" operator is used, definition evalutes to
00085     true if any of the specifiers evaluates to true. If definition contains only one specifier,
00086     it is evaluated as if it was connected with ".true" keyword by ".and" operator.
00087 
00088     If specifier is a ".true" keyword, it always evaluates to true.
00089 
00090     If specifier is a ".false" keyword, it always evaluates to false. Specifier evaluates to false
00091     when it does not evaluate to true.
00092 
00093     Character range specifier is in the form:
00094         '<first_character>' - '<second_character>'
00095     If specifier is a character range, it evaluates to true if character in the stream is greater
00096     or equal to <first_character> and less or equal to <second_character>. In that situation 
00097     the stream pointer is advanced to point to next character in the stream. All C-style escape
00098     sequences are supported although trigraph sequences are not. The comparisions are performed
00099     on 8-bit unsigned integers.
00100 
00101     Character specifier is in the form:
00102         '<single_character>'
00103     It evaluates to true if the following character range specifier evaluates to true:
00104         '<single_character>' - '<single_character>'
00105 
00106     String specifier is in the form:
00107         "<string>"
00108     Let N be the number of characters in <string>. Let <string>[i] designate i-th character in
00109     <string>. Then the string specifier evaluates to true if and only if for i in the range [0, N)
00110     the following character specifier evaluates to true:
00111         '<string>[i]'
00112     If <string>[i] is a quotation mark, '<string>[i]' is replaced with '<string>[i]'.
00113 
00114     Symbol specifier can be optionally preceded by a ".loop" keyword in the form:
00115         .loop <symbol>                  (1)
00116     where <symbol> is defined as follows:
00117         <symbol> <definition>;          (2)
00118     Construction (1) is replaced by the following code:
00119         <symbol$1>
00120     and declaration (2) is replaced by the following:
00121         <symbol$1> <symbol$2> .or .true;
00122         <symbol$2> <symbol> .and <symbol$1>;
00123         <symbol> <definition>;
00124 
00125     Synek supports also a register mechanizm. User can, in its SYN file, declare a number of
00126     registers that can be accessed in the syn body. Each reg has its name and a default value.
00127     The register is one byte wide. The C code can change the default value by calling
00128     grammar_set_reg8() with grammar id, register name and a new value. As we know, each rule is
00129     a sequence of specifiers joined with .and or .or operator. And now each specifier can be
00130     prefixed with a condition expression in a form ".if (<reg_name> <operator> <hex_literal>)"
00131     where <operator> can be == or !=. If the condition evaluates to false, the specifier
00132     evaluates to .false. Otherwise it evalutes to the specifier.
00133 
00134     ESCAPE SEQUENCES
00135     ----------------
00136 
00137     Synek supports all escape sequences in character specifiers. The mapping table is listed below.
00138     All occurences of the characters in the first column are replaced with the corresponding
00139     character in the second column.
00140 
00141         Escape sequence         Represents
00142     ------------------------------------------------------------------------------------------------
00143         \a                      Bell (alert)
00144         \b                      Backspace
00145         \f                      Formfeed
00146         \n                      New line
00147         \r                      Carriage return
00148         \t                      Horizontal tab
00149         \v                      Vertical tab
00150         \'                      Single quotation mark
00151         \"                      Double quotation mark
00152         \\                      Backslash
00153         \?                      Literal question mark
00154         \ooo                    ASCII character in octal notation
00155         \xhhh                   ASCII character in hexadecimal notation
00156     ------------------------------------------------------------------------------------------------
00157 
00158     RAISING ERRORS
00159     --------------
00160 
00161     Any specifier can be followed by a special construction that is executed when the specifier
00162     evaluates to false. The construction is in the form:
00163         .error <ERROR_TEXT>
00164     <ERROR_TEXT> is an identifier declared earlier by error text declaration. The declaration is
00165     in the form:
00166         .errtext <ERROR_TEXT> "<error_desc>"
00167     When specifier evaluates to false and this construction is present, parsing is stopped
00168     immediately and <error_desc> is returned as a result of parsing. The error position is also
00169     returned and it is meant as an offset from the beggining of the stream to the character that
00170     was valid so far. Example:
00171 
00172         (**** syntax script ****)
00173 
00174         .syntax program;
00175         .errtext MISSING_SEMICOLON      "missing ';'"
00176         program         declaration .and .loop space .and ';' .error MISSING_SEMICOLON .and
00177                         .loop space .and '\0';
00178         declaration     "declare" .and .loop space .and identifier;
00179         space           ' ';
00180 
00181         (**** sample code ****)
00182 
00183         declare foo ,
00184 
00185     In the example above checking the sample code will result in error message "missing ';'" and
00186     error position 12. The sample code is not correct. Note the presence of '\0' specifier to
00187     assure that there is no code after semicolon - only spaces.
00188     <error_desc> can optionally contain identifier surrounded by dollar signs $. In such a case,
00189     the identifier and dollar signs are replaced by a string retrieved by invoking symbol with
00190     the identifier name. The starting position is the error position. The lenght of the resulting
00191     string is the position after invoking the symbol.
00192 
00193     PRODUCTION
00194     ----------
00195 
00196     Synek not only checks the syntax but it can also produce (emit) bytes associated with specifiers
00197     that evaluate to true. That is, every specifier and optional error construction can be followed
00198     by a number of emit constructions that are in the form:
00199         .emit <parameter>
00200     <paramater> can be a HEX number, identifier, a star * or a dollar $. HEX number is preceded by
00201     0x or 0X. If <parameter> is an identifier, it must be earlier declared by emit code declaration
00202     in the form:
00203         .emtcode <identifier> <hex_number>
00204 
00205     When given specifier evaluates to true, all emits associated with the specifier are output
00206     in order they were declared. A star means that last-read character should be output instead
00207     of constant value. Example:
00208 
00209         (**** syntax script ****)
00210 
00211         .syntax foobar;
00212         .emtcode WORD_FOO       0x01
00213         .emtcode WORD_BAR       0x02
00214         foobar      FOO .emit WORD_FOO .or BAR .emit WORD_BAR .or .true .emit 0x00;
00215         FOO         "foo" .and SPACE;
00216         BAR         "bar" .and SPACE;
00217         SPACE       ' ' .or '\0';
00218 
00219         (**** sample text 1 ****)
00220 
00221         foo
00222 
00223         (**** sample text 2 ****)
00224 
00225         foobar
00226 
00227     For both samples the result will be one-element array. For first sample text it will be
00228     value 1, for second - 0. Note that every text will be accepted because of presence of
00229     .true as an alternative.
00230 
00231     Another example:
00232 
00233         (**** syntax script ****)
00234 
00235         .syntax declaration;
00236         .emtcode VARIABLE       0x01
00237         declaration     "declare" .and .loop space .and
00238                         identifier .emit VARIABLE .and          (1)
00239                         .true .emit 0x00 .and                   (2)
00240                         .loop space .and ';';
00241         space           ' ' .or '\t';
00242         identifier      .loop id_char .emit *;                  (3)
00243         id_char         'a'-'z' .or 'A'-'Z' .or '_';
00244 
00245         (**** sample code ****)
00246 
00247         declare    fubar;
00248 
00249     In specifier (1) symbol <identifier> is followed by .emit VARIABLE. If it evaluates to
00250     true, VARIABLE constant and then production of the symbol is output. Specifier (2) is used
00251     to terminate the string with null to signal when the string ends. Specifier (3) outputs
00252     all characters that make declared identifier. The result of sample code will be the
00253     following array:
00254         { 1, 'f', 'u', 'b', 'a', 'r', 0 }
00255 
00256     If .emit is followed by dollar $, it means that current position should be output. Current
00257     position is a 32-bit unsigned integer distance from the very beginning of the parsed string to
00258     first character consumed by the specifier associated with the .emit instruction. Current
00259     position is stored in the output buffer in Little-Endian convention (the lowest byte comes
00260     first).
00261 */
00262 
00263 #include <stdio.h>
00264 
00265 static void mem_free (void **);
00266 
00267 /*
00268     internal error messages
00269 */
00270 static const byte *OUT_OF_MEMORY =          (byte *) "internal error 1001: out of physical memory";
00271 static const byte *UNRESOLVED_REFERENCE =   (byte *) "internal error 1002: unresolved reference '$'";
00272 static const byte *INVALID_GRAMMAR_ID =     (byte *) "internal error 1003: invalid grammar object";
00273 static const byte *INVALID_REGISTER_NAME =  (byte *) "internal error 1004: invalid register name: '$'";
00274 /*static const byte *DUPLICATE_IDENTIFIER =   (byte *) "internal error 1005: identifier '$' already defined";*/
00275 static const byte *UNREFERENCED_IDENTIFIER =(byte *) "internal error 1006: unreferenced identifier '$'";
00276 
00277 static const byte *error_message = NULL;    /* points to one of the error messages above */
00278 static byte *error_param = NULL;        /* this is inserted into error_message in place of $ */
00279 static int error_position = -1;
00280 
00281 static byte *unknown = (byte *) "???";
00282 
00283 static void clear_last_error (void)
00284 {
00285     /* reset error message */
00286     error_message = NULL;
00287 
00288     /* free error parameter - if error_param is a "???" don't free it - it's static */
00289     if (error_param != unknown)
00290         mem_free ((void **) (void *) &error_param);
00291     else
00292         error_param = NULL;
00293 
00294     /* reset error position */
00295     error_position = -1;
00296 }
00297 
00298 static void set_last_error (const byte *msg, byte *param, int pos)
00299 {
00300     /* error message can be set only once */
00301     if (error_message != NULL)
00302     {
00303         mem_free ((void **) (void *) &param);
00304         return;
00305     }
00306 
00307     error_message = msg;
00308 
00309     /* if param is NULL, set error_param to unknown ("???") */
00310     /* note: do not try to strdup the "???" - it may be that we are here because of */
00311     /* out of memory error so strdup can fail */
00312     if (param != NULL)
00313         error_param = param;
00314     else
00315         error_param = unknown;
00316 
00317     error_position = pos;
00318 }
00319 
00320 /*
00321     memory management routines
00322 */
00323 static void *mem_alloc (size_t size)
00324 {
00325     void *ptr = grammar_alloc_malloc (size);
00326     if (ptr == NULL)
00327         set_last_error (OUT_OF_MEMORY, NULL, -1);
00328     return ptr;
00329 }
00330 
00331 static void *mem_copy (void *dst, const void *src, size_t size)
00332 {
00333     return grammar_memory_copy (dst, src, size);
00334 }
00335 
00336 static void mem_free (void **ptr)
00337 {
00338     grammar_alloc_free (*ptr);
00339     *ptr = NULL;
00340 }
00341 
00342 static void *mem_realloc (void *ptr, size_t old_size, size_t new_size)
00343 {
00344     void *ptr2 = grammar_alloc_realloc (ptr, old_size, new_size);
00345     if (ptr2 == NULL)
00346         set_last_error (OUT_OF_MEMORY, NULL, -1);
00347     return ptr2;
00348 }
00349 
00350 static byte *str_copy_n (byte *dst, const byte *src, size_t max_len)
00351 {
00352     return grammar_string_copy_n (dst, src, max_len);
00353 }
00354 
00355 static byte *str_duplicate (const byte *str)
00356 {
00357     byte *new_str = grammar_string_duplicate (str);
00358     if (new_str == NULL)
00359         set_last_error (OUT_OF_MEMORY, NULL, -1);
00360     return new_str;
00361 }
00362 
00363 static int str_equal (const byte *str1, const byte *str2)
00364 {
00365     return grammar_string_compare (str1, str2) == 0;
00366 }
00367 
00368 static int str_equal_n (const byte *str1, const byte *str2, unsigned int n)
00369 {
00370     return grammar_string_compare_n (str1, str2, n) == 0;
00371 }
00372 
00373 static int
00374 str_length (const byte *str)
00375 {
00376    return (int) (grammar_string_length (str));
00377 }
00378 
00379 /*
00380     useful macros
00381 */
00382 #define GRAMMAR_IMPLEMENT_LIST_APPEND(_Ty)\
00383     static void _Ty##_append (_Ty **x, _Ty *nx) {\
00384         while (*x) x = &(**x).next;\
00385         *x = nx;\
00386     }
00387 
00388 /*
00389     string to byte map typedef
00390 */
00391 typedef struct map_byte_
00392 {
00393     byte *key;
00394     byte data;
00395     struct map_byte_ *next;
00396 } map_byte;
00397 
00398 static void map_byte_create (map_byte **ma)
00399 {
00400     *ma = (map_byte *) mem_alloc (sizeof (map_byte));
00401     if (*ma)
00402     {
00403         (**ma).key = NULL;
00404         (**ma).data = '\0';
00405         (**ma).next = NULL;
00406     }
00407 }
00408 
00409 static void map_byte_destroy (map_byte **ma)
00410 {
00411     if (*ma)
00412     {
00413         map_byte_destroy (&(**ma).next);
00414         mem_free ((void **) &(**ma).key);
00415         mem_free ((void **) ma);
00416     }
00417 }
00418 
00419 GRAMMAR_IMPLEMENT_LIST_APPEND(map_byte)
00420 
00421 /*
00422     searches the map for the specified key,
00423     returns pointer to the element with the specified key if it exists
00424     returns NULL otherwise
00425 */
00426 static map_byte *map_byte_locate (map_byte **ma, const byte *key)
00427 {
00428     while (*ma)
00429     {
00430         if (str_equal ((**ma).key, key))
00431             return *ma;
00432 
00433         ma = &(**ma).next;
00434     }
00435 
00436     set_last_error (UNRESOLVED_REFERENCE, str_duplicate (key), -1);
00437     return NULL;
00438 }
00439 
00440 /*
00441     searches the map for specified key,
00442     if the key is matched, *data is filled with data associated with the key,
00443     returns 0 if the key is matched,
00444     returns 1 otherwise
00445 */
00446 static int map_byte_find (map_byte **ma, const byte *key, byte *data)
00447 {
00448     map_byte *found = map_byte_locate (ma, key);
00449     if (found != NULL)
00450     {
00451         *data = found->data;
00452 
00453         return 0;
00454     }
00455 
00456     return 1;
00457 }
00458 
00459 /*
00460     regbyte context typedef
00461 
00462     Each regbyte consists of its name and a default value. These are static and created at
00463     grammar script compile-time, for example the following line:
00464         .regbyte vertex_blend      0x00
00465     adds a new regbyte named "vertex_blend" to the static list and initializes it to 0.
00466     When the script is executed, this regbyte can be accessed by name for read and write. When a
00467     particular regbyte is written, a new regbyte_ctx entry is added to the top of the regbyte_ctx
00468     stack. The new entry contains information abot which regbyte it references and its new value.
00469     When a given regbyte is accessed for read, the stack is searched top-down to find an
00470     entry that references the regbyte. The first matching entry is used to return the current
00471     value it holds. If no entry is found, the default value is returned.
00472 */
00473 typedef struct regbyte_ctx_
00474 {
00475     map_byte *m_regbyte;
00476     byte m_current_value;
00477     struct regbyte_ctx_ *m_prev;
00478 } regbyte_ctx;
00479 
00480 static void regbyte_ctx_create (regbyte_ctx **re)
00481 {
00482     *re = (regbyte_ctx *) mem_alloc (sizeof (regbyte_ctx));
00483     if (*re)
00484     {
00485         (**re).m_regbyte = NULL;
00486         (**re).m_prev = NULL;
00487     }
00488 }
00489 
00490 static void regbyte_ctx_destroy (regbyte_ctx **re)
00491 {
00492     if (*re)
00493     {
00494         mem_free ((void **) re);
00495     }
00496 }
00497 
00498 static byte regbyte_ctx_extract (regbyte_ctx **re, map_byte *reg)
00499 {
00500     /* first lookup in the register stack */
00501     while (*re != NULL)
00502     {
00503         if ((**re).m_regbyte == reg)
00504             return (**re).m_current_value;
00505 
00506         re = &(**re).m_prev;
00507     }
00508 
00509     /* if not found - return the default value */
00510     return reg->data;
00511 }
00512 
00513 /*
00514     emit type typedef
00515 */
00516 typedef enum emit_type_
00517 {
00518     et_byte,            /* explicit number */
00519     et_stream,          /* eaten character */
00520     et_position         /* current position */
00521 } emit_type;
00522 
00523 /*
00524     emit destination typedef
00525 */
00526 typedef enum emit_dest_
00527 {
00528     ed_output,          /* write to the output buffer */
00529     ed_regbyte          /* write a particular regbyte */
00530 } emit_dest;
00531 
00532 /*
00533     emit typedef
00534 */
00535 typedef struct emit_
00536 {
00537     emit_dest m_emit_dest;
00538     emit_type m_emit_type;      /* ed_output */
00539     byte m_byte;                /* et_byte */
00540     map_byte *m_regbyte;        /* ed_regbyte */
00541     byte *m_regname;            /* ed_regbyte - temporary */
00542     struct emit_ *m_next;
00543 } emit;
00544 
00545 static void emit_create (emit **em)
00546 {
00547     *em = (emit *) mem_alloc (sizeof (emit));
00548     if (*em)
00549     {
00550         (**em).m_emit_dest = ed_output;
00551         (**em).m_emit_type = et_byte;
00552         (**em).m_byte = '\0';
00553         (**em).m_regbyte = NULL;
00554         (**em).m_regname = NULL;
00555         (**em).m_next = NULL;
00556     }
00557 }
00558 
00559 static void emit_destroy (emit **em)
00560 {
00561     if (*em)
00562     {
00563         emit_destroy (&(**em).m_next);
00564         mem_free ((void **) &(**em).m_regname);
00565         mem_free ((void **) em);
00566     }
00567 }
00568 
00569 static unsigned int emit_size (emit *_E)
00570 {
00571     unsigned int n = 0;
00572 
00573     while (_E != NULL)
00574     {
00575         if (_E->m_emit_dest == ed_output)
00576         {
00577             if (_E->m_emit_type == et_position)
00578                 n += 4;     /* position is a 32-bit unsigned integer */
00579             else
00580                 n++;
00581         }
00582         _E = _E->m_next;
00583     }
00584 
00585     return n;
00586 }
00587 
00588 static int emit_push (emit *_E, byte *_P, byte c, unsigned int _Pos, regbyte_ctx **_Ctx)
00589 {
00590     while (_E != NULL)
00591     {
00592         if (_E->m_emit_dest == ed_output)
00593         {
00594             if (_E->m_emit_type == et_byte)
00595                 *_P++ = _E->m_byte;
00596             else if (_E->m_emit_type == et_stream)
00597                 *_P++ = c;
00598             else /* _Em->type == et_position */
00599             {
00600                 *_P++ = (byte) (_Pos);
00601                 *_P++ = (byte) (_Pos >> 8);
00602                 *_P++ = (byte) (_Pos >> 16);
00603                 *_P++ = (byte) (_Pos >> 24);
00604             }
00605         }
00606         else
00607         {
00608             regbyte_ctx *new_rbc;
00609             regbyte_ctx_create (&new_rbc);
00610             if (new_rbc == NULL)
00611                 return 1;
00612 
00613             new_rbc->m_prev = *_Ctx;
00614             new_rbc->m_regbyte = _E->m_regbyte;
00615             *_Ctx = new_rbc;
00616 
00617             if (_E->m_emit_type == et_byte)
00618                 new_rbc->m_current_value = _E->m_byte;
00619             else if (_E->m_emit_type == et_stream)
00620                 new_rbc->m_current_value = c;
00621         }
00622 
00623         _E = _E->m_next;
00624     }
00625 
00626     return 0;
00627 }
00628 
00629 /*
00630     error typedef
00631 */
00632 typedef struct error_
00633 {
00634     byte *m_text;
00635     byte *m_token_name;
00636     struct rule_ *m_token;
00637 } error;
00638 
00639 static void error_create (error **er)
00640 {
00641     *er = (error *) mem_alloc (sizeof (error));
00642     if (*er)
00643     {
00644         (**er).m_text = NULL;
00645         (**er).m_token_name = NULL;
00646         (**er).m_token = NULL;
00647     }
00648 }
00649 
00650 static void error_destroy (error **er)
00651 {
00652     if (*er)
00653     {
00654         mem_free ((void **) &(**er).m_text);
00655         mem_free ((void **) &(**er).m_token_name);
00656         mem_free ((void **) er);
00657     }
00658 }
00659 
00660 struct dict_;
00661 
00662 static byte *
00663 error_get_token (error *, struct dict_ *, const byte *, int);
00664 
00665 /*
00666     condition operand type typedef
00667 */
00668 typedef enum cond_oper_type_
00669 {
00670     cot_byte,               /* constant 8-bit unsigned integer */
00671     cot_regbyte             /* pointer to byte register containing the current value */
00672 } cond_oper_type;
00673 
00674 /*
00675     condition operand typedef
00676 */
00677 typedef struct cond_oper_
00678 {
00679     cond_oper_type m_type;
00680     byte m_byte;            /* cot_byte */
00681     map_byte *m_regbyte;    /* cot_regbyte */
00682     byte *m_regname;        /* cot_regbyte - temporary */
00683 } cond_oper;
00684 
00685 /*
00686     condition type typedef
00687 */
00688 typedef enum cond_type_
00689 {
00690     ct_equal,
00691     ct_not_equal
00692 } cond_type;
00693 
00694 /*
00695     condition typedef
00696 */
00697 typedef struct cond_
00698 {
00699     cond_type m_type;
00700     cond_oper m_operands[2];
00701 } cond;
00702 
00703 static void cond_create (cond **co)
00704 {
00705     *co = (cond *) mem_alloc (sizeof (cond));
00706     if (*co)
00707     {
00708         (**co).m_operands[0].m_regname = NULL;
00709         (**co).m_operands[1].m_regname = NULL;
00710     }
00711 }
00712 
00713 static void cond_destroy (cond **co)
00714 {
00715     if (*co)
00716     {
00717         mem_free ((void **) &(**co).m_operands[0].m_regname);
00718         mem_free ((void **) &(**co).m_operands[1].m_regname);
00719         mem_free ((void **) co);
00720     }
00721 }
00722 
00723 /*
00724     specifier type typedef
00725 */
00726 typedef enum spec_type_
00727 {
00728     st_false,
00729     st_true,
00730     st_byte,
00731     st_byte_range,
00732     st_string,
00733     st_identifier,
00734     st_identifier_loop,
00735     st_debug
00736 } spec_type;
00737 
00738 /*
00739     specifier typedef
00740 */
00741 typedef struct spec_
00742 {
00743     spec_type m_spec_type;
00744     byte m_byte[2];                 /* st_byte, st_byte_range */
00745     byte *m_string;                 /* st_string */
00746     struct rule_ *m_rule;           /* st_identifier, st_identifier_loop */
00747     emit *m_emits;
00748     error *m_errtext;
00749     cond *m_cond;
00750     struct spec_ *next;
00751 } spec;
00752 
00753 static void spec_create (spec **sp)
00754 {
00755     *sp = (spec *) mem_alloc (sizeof (spec));
00756     if (*sp)
00757     {
00758         (**sp).m_spec_type = st_false;
00759         (**sp).m_byte[0] = '\0';
00760         (**sp).m_byte[1] = '\0';
00761         (**sp).m_string = NULL;
00762         (**sp).m_rule = NULL;
00763         (**sp).m_emits = NULL;
00764         (**sp).m_errtext = NULL;
00765         (**sp).m_cond = NULL;
00766         (**sp).next = NULL;
00767     }
00768 }
00769 
00770 static void spec_destroy (spec **sp)
00771 {
00772     if (*sp)
00773     {
00774         spec_destroy (&(**sp).next);
00775         emit_destroy (&(**sp).m_emits);
00776         error_destroy (&(**sp).m_errtext);
00777         mem_free ((void **) &(**sp).m_string);
00778         cond_destroy (&(**sp).m_cond);
00779         mem_free ((void **) sp);
00780     }
00781 }
00782 
00783 GRAMMAR_IMPLEMENT_LIST_APPEND(spec)
00784 
00785 /*
00786     operator typedef
00787 */
00788 typedef enum oper_
00789 {
00790     op_none,
00791     op_and,
00792     op_or
00793 } oper;
00794 
00795 /*
00796     rule typedef
00797 */
00798 typedef struct rule_
00799 {
00800     oper m_oper;
00801     spec *m_specs;
00802     struct rule_ *next;
00803     int m_referenced;
00804 } rule;
00805 
00806 static void rule_create (rule **ru)
00807 {
00808     *ru = (rule *) mem_alloc (sizeof (rule));
00809     if (*ru)
00810     {
00811         (**ru).m_oper = op_none;
00812         (**ru).m_specs = NULL;
00813         (**ru).next = NULL;
00814         (**ru).m_referenced = 0;
00815     }
00816 }
00817 
00818 static void rule_destroy (rule **ru)
00819 {
00820     if (*ru)
00821     {
00822         rule_destroy (&(**ru).next);
00823         spec_destroy (&(**ru).m_specs);
00824         mem_free ((void **) ru);
00825     }
00826 }
00827 
00828 GRAMMAR_IMPLEMENT_LIST_APPEND(rule)
00829 
00830 /*
00831     returns unique grammar id
00832 */
00833 static grammar next_valid_grammar_id (void)
00834 {
00835     static grammar id = 0;
00836 
00837     return ++id;
00838 }
00839 
00840 /*
00841     dictionary typedef
00842 */
00843 typedef struct dict_
00844 {
00845     rule *m_rulez;
00846     rule *m_syntax;
00847     rule *m_string;
00848     map_byte *m_regbytes;
00849     grammar m_id;
00850     struct dict_ *next;
00851 } dict;
00852 
00853 static void dict_create (dict **di)
00854 {
00855     *di = (dict *) mem_alloc (sizeof (dict));
00856     if (*di)
00857     {
00858         (**di).m_rulez = NULL;
00859         (**di).m_syntax = NULL;
00860         (**di).m_string = NULL;
00861         (**di).m_regbytes = NULL;
00862         (**di).m_id = next_valid_grammar_id ();
00863         (**di).next = NULL;
00864     }
00865 }
00866 
00867 static void dict_destroy (dict **di)
00868 {
00869     if (*di)
00870     {
00871         rule_destroy (&(**di).m_rulez);
00872         map_byte_destroy (&(**di).m_regbytes);
00873         mem_free ((void **) di);
00874     }
00875 }
00876 
00877 GRAMMAR_IMPLEMENT_LIST_APPEND(dict)
00878 
00879 static void dict_find (dict **di, grammar key, dict **data)
00880 {
00881     while (*di)
00882     {
00883         if ((**di).m_id == key)
00884         {
00885             *data = *di;
00886             return;
00887         }
00888 
00889         di = &(**di).next;
00890     }
00891 
00892     *data = NULL;
00893 }
00894 
00895 static dict *g_dicts = NULL;
00896 
00897 /*
00898     byte array typedef
00899 */
00900 typedef struct barray_
00901 {
00902     byte *data;
00903     unsigned int len;
00904 } barray;
00905 
00906 static void barray_create (barray **ba)
00907 {
00908     *ba = (barray *) mem_alloc (sizeof (barray));
00909     if (*ba)
00910     {
00911         (**ba).data = NULL;
00912         (**ba).len = 0;
00913     }
00914 }
00915 
00916 static void barray_destroy (barray **ba)
00917 {
00918     if (*ba)
00919     {
00920         mem_free ((void **) &(**ba).data);
00921         mem_free ((void **) ba);
00922     }
00923 }
00924 
00925 /*
00926     reallocates byte array to requested size,
00927     returns 0 on success,
00928     returns 1 otherwise
00929 */
00930 static int barray_resize (barray **ba, unsigned int nlen)
00931 {
00932     byte *new_pointer;
00933 
00934     if (nlen == 0)
00935     {
00936         mem_free ((void **) &(**ba).data);
00937         (**ba).data = NULL;
00938         (**ba).len = 0;
00939 
00940         return 0;
00941     }
00942     else
00943     {
00944         new_pointer = (byte *) mem_realloc ((**ba).data, (**ba).len * sizeof (byte),
00945             nlen * sizeof (byte));
00946         if (new_pointer)
00947         {
00948             (**ba).data = new_pointer;
00949             (**ba).len = nlen;
00950 
00951             return 0;
00952         }
00953     }
00954 
00955     return 1;
00956 }
00957 
00958 /*
00959     adds byte array pointed by *nb to the end of array pointed by *ba,
00960     returns 0 on success,
00961     returns 1 otherwise
00962 */
00963 static int barray_append (barray **ba, barray **nb)
00964 {
00965     const unsigned int len = (**ba).len;
00966 
00967     if (barray_resize (ba, (**ba).len + (**nb).len))
00968         return 1;
00969 
00970     mem_copy ((**ba).data + len, (**nb).data, (**nb).len);
00971 
00972     return 0;
00973 }
00974 
00975 /*
00976     adds emit chain pointed by em to the end of array pointed by *ba,
00977     returns 0 on success,
00978     returns 1 otherwise
00979 */
00980 static int barray_push (barray **ba, emit *em, byte c, unsigned int pos, regbyte_ctx **rbc)
00981 {
00982     unsigned int count = emit_size (em);
00983 
00984     if (barray_resize (ba, (**ba).len + count))
00985         return 1;
00986 
00987     return emit_push (em, (**ba).data + ((**ba).len - count), c, pos, rbc);
00988 }
00989 
00990 /*
00991     byte pool typedef
00992 */
00993 typedef struct bytepool_
00994 {
00995     byte *_F;
00996     unsigned int _Siz;
00997 } bytepool;
00998 
00999 static void bytepool_destroy (bytepool **by)
01000 {
01001     if (*by != NULL)
01002     {
01003         mem_free ((void **) &(**by)._F);
01004         mem_free ((void **) by);
01005     }
01006 }
01007 
01008 static void bytepool_create (bytepool **by, int len)
01009 {
01010     *by = (bytepool *) (mem_alloc (sizeof (bytepool)));
01011     if (*by != NULL)
01012     {
01013         (**by)._F = (byte *) (mem_alloc (sizeof (byte) * len));
01014         (**by)._Siz = len;
01015 
01016         if ((**by)._F == NULL)
01017             bytepool_destroy (by);
01018     }
01019 }
01020 
01021 static int bytepool_reserve (bytepool *by, unsigned int n)
01022 {
01023     byte *_P;
01024 
01025     if (n <= by->_Siz)
01026         return 0;
01027 
01028     /* byte pool can only grow and at least by doubling its size */
01029     n = n >= by->_Siz * 2 ? n : by->_Siz * 2;
01030 
01031     /* reallocate the memory and adjust pointers to the new memory location */
01032     _P = (byte *) (mem_realloc (by->_F, sizeof (byte) * by->_Siz, sizeof (byte) * n));
01033     if (_P != NULL)
01034     {
01035         by->_F = _P;
01036         by->_Siz = n;
01037         return 0;
01038     }
01039 
01040     return 1;
01041 }
01042 
01043 /*
01044     string to string map typedef
01045 */
01046 typedef struct map_str_
01047 {
01048     byte *key;
01049     byte *data;
01050     struct map_str_ *next;
01051 } map_str;
01052 
01053 static void map_str_create (map_str **ma)
01054 {
01055     *ma = (map_str *) mem_alloc (sizeof (map_str));
01056     if (*ma)
01057     {
01058         (**ma).key = NULL;
01059         (**ma).data = NULL;
01060         (**ma).next = NULL;
01061     }
01062 }
01063 
01064 static void map_str_destroy (map_str **ma)
01065 {
01066     if (*ma)
01067     {
01068         map_str_destroy (&(**ma).next);
01069         mem_free ((void **) &(**ma).key);
01070         mem_free ((void **) &(**ma).data);
01071         mem_free ((void **) ma);
01072     }
01073 }
01074 
01075 GRAMMAR_IMPLEMENT_LIST_APPEND(map_str)
01076 
01077 /*
01078     searches the map for specified key,
01079     if the key is matched, *data is filled with data associated with the key,
01080     returns 0 if the key is matched,
01081     returns 1 otherwise
01082 */
01083 static int map_str_find (map_str **ma, const byte *key, byte **data)
01084 {
01085     while (*ma)
01086     {
01087         if (str_equal ((**ma).key, key))
01088         {
01089             *data = str_duplicate ((**ma).data);
01090             if (*data == NULL)
01091                 return 1;
01092 
01093             return 0;
01094         }
01095 
01096         ma = &(**ma).next;
01097     }
01098 
01099     set_last_error (UNRESOLVED_REFERENCE, str_duplicate (key), -1);
01100     return 1;
01101 }
01102 
01103 /*
01104     string to rule map typedef
01105 */
01106 typedef struct map_rule_
01107 {
01108     byte *key;
01109     rule *data;
01110     struct map_rule_ *next;
01111 } map_rule;
01112 
01113 static void map_rule_create (map_rule **ma)
01114 {
01115     *ma = (map_rule *) mem_alloc (sizeof (map_rule));
01116     if (*ma)
01117     {
01118         (**ma).key = NULL;
01119         (**ma).data = NULL;
01120         (**ma).next = NULL;
01121     }
01122 }
01123 
01124 static void map_rule_destroy (map_rule **ma)
01125 {
01126     if (*ma)
01127     {
01128         map_rule_destroy (&(**ma).next);
01129         mem_free ((void **) &(**ma).key);
01130         mem_free ((void **) ma);
01131     }
01132 }
01133 
01134 GRAMMAR_IMPLEMENT_LIST_APPEND(map_rule)
01135 
01136 /*
01137     searches the map for specified key,
01138     if the key is matched, *data is filled with data associated with the key,
01139     returns 0 if the is matched,
01140     returns 1 otherwise
01141 */
01142 static int map_rule_find (map_rule **ma, const byte *key, rule **data)
01143 {
01144     while (*ma)
01145     {
01146         if (str_equal ((**ma).key, key))
01147         {
01148             *data = (**ma).data;
01149 
01150             return 0;
01151         }
01152 
01153         ma = &(**ma).next;
01154     }
01155 
01156     set_last_error (UNRESOLVED_REFERENCE, str_duplicate (key), -1);
01157     return 1;
01158 }
01159 
01160 /*
01161     returns 1 if given character is a white space,
01162     returns 0 otherwise
01163 */
01164 static int is_space (byte c)
01165 {
01166     return c == ' ' || c == '\t' || c == '\n' || c == '\r';
01167 }
01168 
01169 /*
01170     advances text pointer by 1 if character pointed by *text is a space,
01171     returns 1 if a space has been eaten,
01172     returns 0 otherwise
01173 */
01174 static int eat_space (const byte **text)
01175 {
01176     if (is_space (**text))
01177     {
01178         (*text)++;
01179 
01180         return 1;
01181     }
01182 
01183     return 0;
01184 }
01185 
01186 /*
01187     returns 1 if text points to C-style comment start string,
01188     returns 0 otherwise
01189 */
01190 static int is_comment_start (const byte *text)
01191 {
01192     return text[0] == '/' && text[1] == '*';
01193 }
01194 
01195 /*
01196     advances text pointer to first character after C-style comment block - if any,
01197     returns 1 if C-style comment block has been encountered and eaten,
01198     returns 0 otherwise
01199 */
01200 static int eat_comment (const byte **text)
01201 {
01202     if (is_comment_start (*text))
01203     {
01204         /* *text points to comment block - skip two characters to enter comment body */
01205         *text += 2;
01206         /* skip any character except consecutive '*' and '/' */
01207         while (!((*text)[0] == '*' && (*text)[1] == '/'))
01208             (*text)++;
01209         /* skip those two terminating characters */
01210         *text += 2;
01211 
01212         return 1;
01213     }
01214 
01215     return 0;
01216 }
01217 
01218 /*
01219     advances text pointer to first character that is neither space nor C-style comment block
01220 */
01221 static void eat_spaces (const byte **text)
01222 {
01223     while (eat_space (text) || eat_comment (text))
01224         ;
01225 }
01226 
01227 /*
01228     resizes string pointed by *ptr to successfully add character c to the end of the string,
01229     returns 0 on success,
01230     returns 1 otherwise
01231 */
01232 static int string_grow (byte **ptr, unsigned int *len, byte c)
01233 {
01234     /* reallocate the string in 16-byte increments */
01235     if ((*len & 0x0F) == 0x0F || *ptr == NULL)
01236     {
01237         byte *tmp = (byte *) mem_realloc (*ptr, ((*len + 1) & ~0x0F) * sizeof (byte),
01238             ((*len + 1 + 0x10) & ~0x0F) * sizeof (byte));
01239         if (tmp == NULL)
01240             return 1;
01241 
01242         *ptr = tmp;
01243     }
01244 
01245     if (c)
01246     {
01247         /* append given character */
01248         (*ptr)[*len] = c;
01249         (*len)++;
01250     }
01251     (*ptr)[*len] = '\0';
01252 
01253     return 0;
01254 }
01255 
01256 /*
01257     returns 1 if given character is a valid identifier character a-z, A-Z, 0-9 or _
01258     returns 0 otherwise
01259 */
01260 static int is_identifier (byte c)
01261 {
01262     return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_';
01263 }
01264 
01265 /*
01266     copies characters from *text to *id until non-identifier character is encountered,
01267     assumes that *id points to NULL object - caller is responsible for later freeing the string,
01268     text pointer is advanced to point past the copied identifier,
01269     returns 0 if identifier was successfully copied,
01270     returns 1 otherwise
01271 */
01272 static int get_identifier (const byte **text, byte **id)
01273 {
01274     const byte *t = *text;
01275     byte *p = NULL;
01276     unsigned int len = 0;
01277 
01278     if (string_grow (&p, &len, '\0'))
01279         return 1;
01280 
01281     /* loop while next character in buffer is valid for identifiers */
01282     while (is_identifier (*t))
01283     {
01284         if (string_grow (&p, &len, *t++))
01285         {
01286             mem_free ((void **) (void *) &p);
01287             return 1;
01288         }
01289     }
01290 
01291     *text = t;
01292     *id = p;
01293 
01294     return 0;
01295 }
01296 
01297 /*
01298     converts sequence of DEC digits pointed by *text until non-DEC digit is encountered,
01299     advances text pointer past the converted sequence,
01300     returns the converted value
01301 */
01302 static unsigned int dec_convert (const byte **text)
01303 {
01304     unsigned int value = 0;
01305 
01306     while (**text >= '0' && **text <= '9')
01307     {
01308         value = value * 10 + **text - '0';
01309         (*text)++;
01310     }
01311 
01312     return value;
01313 }
01314 
01315 /*
01316     returns 1 if given character is HEX digit 0-9, A-F or a-f,
01317     returns 0 otherwise
01318 */
01319 static int is_hex (byte c)
01320 {
01321     return (c >= '0' && c <= '9') || (c >= 'A' && c <= 'F') || (c >= 'a' && c <= 'f');
01322 }
01323 
01324 /*
01325     returns value of passed character as if it was HEX digit
01326 */
01327 static unsigned int hex2dec (byte c)
01328 {
01329     if (c >= '0' && c <= '9')
01330         return c - '0';
01331     if (c >= 'A' && c <= 'F')
01332         return c - 'A' + 10;
01333     return c - 'a' + 10;
01334 }
01335 
01336 /*
01337     converts sequence of HEX digits pointed by *text until non-HEX digit is encountered,
01338     advances text pointer past the converted sequence,
01339     returns the converted value
01340 */
01341 static unsigned int hex_convert (const byte **text)
01342 {
01343     unsigned int value = 0;
01344 
01345     while (is_hex (**text))
01346     {
01347         value = value * 0x10 + hex2dec (**text);
01348         (*text)++;
01349     }
01350 
01351     return value;
01352 }
01353 
01354 /*
01355     returns 1 if given character is OCT digit 0-7,
01356     returns 0 otherwise
01357 */
01358 static int is_oct (byte c)
01359 {
01360     return c >= '0' && c <= '7';
01361 }
01362 
01363 /*
01364     returns value of passed character as if it was OCT digit
01365 */
01366 static int oct2dec (byte c)
01367 {
01368     return c - '0';
01369 }
01370 
01371 static byte get_escape_sequence (const byte **text)
01372 {
01373     int value = 0;
01374 
01375     /* skip '\' character */
01376     (*text)++;
01377 
01378     switch (*(*text)++)
01379     {
01380     case '\'':
01381         return '\'';
01382     case '"':
01383         return '\"';
01384     case '?':
01385         return '\?';
01386     case '\\':
01387         return '\\';
01388     case 'a':
01389         return '\a';
01390     case 'b':
01391         return '\b';
01392     case 'f':
01393         return '\f';
01394     case 'n':
01395         return '\n';
01396     case 'r':
01397         return '\r';
01398     case 't':
01399         return '\t';
01400     case 'v':
01401         return '\v';
01402     case 'x':
01403         return (byte) hex_convert (text);
01404     }
01405 
01406     (*text)--;
01407     if (is_oct (**text))
01408     {
01409         value = oct2dec (*(*text)++);
01410         if (is_oct (**text))
01411         {
01412             value = value * 010 + oct2dec (*(*text)++);
01413             if (is_oct (**text))
01414                 value = value * 010 + oct2dec (*(*text)++);
01415         }
01416     }
01417 
01418     return (byte) value;
01419 }
01420 
01421 /*
01422     copies characters from *text to *str until " or ' character is encountered,
01423     assumes that *str points to NULL object - caller is responsible for later freeing the string,
01424     assumes that *text points to " or ' character that starts the string,
01425     text pointer is advanced to point past the " or ' character,
01426     returns 0 if string was successfully copied,
01427     returns 1 otherwise
01428 */
01429 static int get_string (const byte **text, byte **str)
01430 {
01431     const byte *t = *text;
01432     byte *p = NULL;
01433     unsigned int len = 0;
01434     byte term_char;
01435 
01436     if (string_grow (&p, &len, '\0'))
01437         return 1;
01438 
01439     /* read " or ' character that starts the string */
01440     term_char = *t++;
01441     /* while next character is not the terminating character */
01442     while (*t && *t != term_char)
01443     {
01444         byte c;
01445 
01446         if (*t == '\\')
01447             c = get_escape_sequence (&t);
01448         else
01449             c = *t++;
01450 
01451         if (string_grow (&p, &len, c))
01452         {
01453             mem_free ((void **) (void *) &p);
01454             return 1;
01455         }
01456     }
01457     /* skip " or ' character that ends the string */
01458     t++;
01459 
01460     *text = t;
01461     *str = p;
01462     return 0;
01463 }
01464 
01465 /*
01466     gets emit code, the syntax is:
01467     ".emtcode" " " <symbol> " " (("0x" | "0X") <hex_value>) | <dec_value> | <character>
01468     assumes that *text already points to <symbol>,
01469     returns 0 if emit code is successfully read,
01470     returns 1 otherwise
01471 */
01472 static int get_emtcode (const byte **text, map_byte **ma)
01473 {
01474     const byte *t = *text;
01475     map_byte *m = NULL;
01476 
01477     map_byte_create (&m);
01478     if (m == NULL)
01479         return 1;
01480 
01481     if (get_identifier (&t, &m->key))
01482     {
01483         map_byte_destroy (&m);
01484         return 1;
01485     }
01486     eat_spaces (&t);
01487 
01488     if (*t == '\'')
01489     {
01490         byte *c;
01491 
01492         if (get_string (&t, &c))
01493         {
01494             map_byte_destroy (&m);
01495             return 1;
01496         }
01497 
01498         m->data = (byte) c[0];
01499         mem_free ((void **) (void *) &c);
01500     }
01501     else if (t[0] == '0' && (t[1] == 'x' || t[1] == 'X'))
01502     {
01503         /* skip HEX "0x" or "0X" prefix */
01504         t += 2;
01505         m->data = (byte) hex_convert (&t);
01506     }
01507     else
01508     {
01509         m->data = (byte) dec_convert (&t);
01510     }
01511 
01512     eat_spaces (&t);
01513 
01514     *text = t;
01515     *ma = m;
01516     return 0;
01517 }
01518 
01519 /*
01520     gets regbyte declaration, the syntax is:
01521     ".regbyte" " " <symbol> " " (("0x" | "0X") <hex_value>) | <dec_value> | <character>
01522     assumes that *text already points to <symbol>,
01523     returns 0 if regbyte is successfully read,
01524     returns 1 otherwise
01525 */
01526 static int get_regbyte (const byte **text, map_byte **ma)
01527 {
01528     /* pass it to the emtcode parser as it has the same syntax starting at <symbol> */
01529     return get_emtcode (text, ma);
01530 }
01531 
01532 /*
01533     returns 0 on success,
01534     returns 1 otherwise
01535 */
01536 static int get_errtext (const byte **text, map_str **ma)
01537 {
01538     const byte *t = *text;
01539     map_str *m = NULL;
01540 
01541     map_str_create (&m);
01542     if (m == NULL)
01543         return 1;
01544 
01545     if (get_identifier (&t, &m->key))
01546     {
01547         map_str_destroy (&m);
01548         return 1;
01549     }
01550     eat_spaces (&t);
01551 
01552     if (get_string (&t, &m->data))
01553     {
01554         map_str_destroy (&m);
01555         return 1;
01556     }
01557     eat_spaces (&t);
01558 
01559     *text = t;
01560     *ma = m;
01561     return 0;
01562 }
01563 
01564 /*
01565     returns 0 on success,
01566     returns 1 otherwise,
01567 */
01568 static int get_error (const byte **text, error **er, map_str *maps)
01569 {
01570     const byte *t = *text;
01571     byte *temp = NULL;
01572 
01573     if (*t != '.')
01574         return 0;
01575 
01576     t++;
01577     if (get_identifier (&t, &temp))
01578         return 1;
01579     eat_spaces (&t);
01580 
01581     if (!str_equal ((byte *) "error", temp))
01582     {
01583         mem_free ((void **) (void *) &temp);
01584         return 0;
01585     }
01586 
01587     mem_free ((void **) (void *) &temp);
01588 
01589     error_create (er);
01590     if (*er == NULL)
01591         return 1;
01592 
01593     if (*t == '\"')
01594     {
01595         if (get_string (&t, &(**er).m_text))
01596         {
01597             error_destroy (er);
01598             return 1;
01599         }
01600         eat_spaces (&t);
01601     }
01602     else
01603     {
01604         if (get_identifier (&t, &temp))
01605         {
01606             error_destroy (er);
01607             return 1;
01608         }
01609         eat_spaces (&t);
01610 
01611         if (map_str_find (&maps, temp, &(**er).m_text))
01612         {
01613             mem_free ((void **) (void *) &temp);
01614             error_destroy (er);
01615             return 1;
01616         }
01617 
01618         mem_free ((void **) (void *) &temp);
01619     }
01620 
01621     /* try to extract "token" from "...$token$..." */
01622     {
01623         byte *processed = NULL;
01624         unsigned int len = 0;
01625       int i = 0;
01626 
01627         if (string_grow (&processed, &len, '\0'))
01628         {
01629             error_destroy (er);
01630             return 1;
01631         }
01632 
01633         while (i < str_length ((**er).m_text))
01634         {
01635             /* check if the dollar sign is repeated - if so skip it */
01636             if ((**er).m_text[i] == '$' && (**er).m_text[i + 1] == '$')
01637             {
01638                 if (string_grow (&processed, &len, '$'))
01639                 {
01640                     mem_free ((void **) (void *) &processed);
01641                     error_destroy (er);
01642                     return 1;
01643                 }
01644 
01645                 i += 2;
01646             }
01647             else if ((**er).m_text[i] != '$')
01648             {
01649                 if (string_grow (&processed, &len, (**er).m_text[i]))
01650                 {
01651                     mem_free ((void **) (void *) &processed);
01652                     error_destroy (er);
01653                     return 1;
01654                 }
01655 
01656                 i++;
01657             }
01658             else
01659             {
01660                 if (string_grow (&processed, &len, '$'))
01661                 {
01662                     mem_free ((void **) (void *) &processed);
01663                     error_destroy (er);
01664                     return 1;
01665                 }
01666 
01667                 {
01668                     /* length of token being extracted */
01669                     unsigned int tlen = 0;
01670 
01671                     if (string_grow (&(**er).m_token_name, &tlen, '\0'))
01672                     {
01673                         mem_free ((void **) (void *) &processed);
01674                         error_destroy (er);
01675                         return 1;
01676                     }
01677 
01678                     /* skip the dollar sign */
01679                     i++;
01680 
01681                     while ((**er).m_text[i] != '$')
01682                     {
01683                         if (string_grow (&(**er).m_token_name, &tlen, (**er).m_text[i]))
01684                         {
01685                             mem_free ((void **) (void *) &processed);
01686                             error_destroy (er);
01687                             return 1;
01688                         }
01689 
01690                         i++;
01691                     }
01692 
01693                     /* skip the dollar sign */
01694                     i++;
01695                 }
01696             }
01697         }
01698 
01699         mem_free ((void **) &(**er).m_text);
01700         (**er).m_text = processed;
01701     }
01702 
01703     *text = t;
01704     return 0;
01705 }
01706 
01707 /*
01708     returns 0 on success,
01709     returns 1 otherwise,
01710 */
01711 static int get_emits (const byte **text, emit **em, map_byte *mapb)
01712 {
01713     const byte *t = *text;
01714     byte *temp = NULL;
01715     emit *e = NULL;
01716     emit_dest dest;
01717 
01718     if (*t != '.')
01719         return 0;
01720 
01721     t++;
01722     if (get_identifier (&t, &temp))
01723         return 1;
01724     eat_spaces (&t);
01725 
01726     /* .emit */
01727     if (str_equal ((byte *) "emit", temp))
01728         dest = ed_output;
01729     /* .load */
01730     else if (str_equal ((byte *) "load", temp))
01731         dest = ed_regbyte;
01732     else
01733     {
01734         mem_free ((void **) (void *) &temp);
01735         return 0;
01736     }
01737 
01738     mem_free ((void **) (void *) &temp);
01739 
01740     emit_create (&e);
01741     if (e == NULL)
01742         return 1;
01743 
01744     e->m_emit_dest = dest;
01745 
01746     if (dest == ed_regbyte)
01747     {
01748         if (get_identifier (&t, &e->m_regname))
01749         {
01750             emit_destroy (&e);
01751             return 1;
01752         }
01753         eat_spaces (&t);
01754     }
01755 
01756     /* 0xNN */
01757     if (*t == '0' && (t[1] == 'x' || t[1] == 'X'))
01758     {
01759         t += 2;
01760         e->m_byte = (byte) hex_convert (&t);
01761 
01762         e->m_emit_type = et_byte;
01763     }
01764     /* NNN */
01765     else if (*t >= '0' && *t <= '9')
01766     {
01767         e->m_byte = (byte) dec_convert (&t);
01768 
01769         e->m_emit_type = et_byte;
01770     }
01771     /* * */
01772     else if (*t == '*')
01773     {
01774         t++;
01775 
01776         e->m_emit_type = et_stream;
01777     }
01778     /* $ */
01779     else if (*t == '$')
01780     {
01781         t++;
01782 
01783         e->m_emit_type = et_position;
01784     }
01785     /* 'c' */
01786     else if (*t == '\'')
01787     {
01788         if (get_string (&t, &temp))
01789         {
01790             emit_destroy (&e);
01791             return 1;
01792         }
01793         e->m_byte = (byte) temp[0];
01794 
01795         mem_free ((void **) (void *) &temp);
01796 
01797         e->m_emit_type = et_byte;
01798     }
01799     else
01800     {
01801         if (get_identifier (&t, &temp))
01802         {
01803             emit_destroy (&e);
01804             return 1;
01805         }
01806 
01807         if (map_byte_find (&mapb, temp, &e->m_byte))
01808         {
01809             mem_free ((void **) (void *) &temp);
01810             emit_destroy (&e);
01811             return 1;
01812         }
01813 
01814         mem_free ((void **) (void *) &temp);
01815 
01816         e->m_emit_type = et_byte;
01817     }
01818 
01819     eat_spaces (&t);
01820 
01821     if (get_emits (&t, &e->m_next, mapb))
01822     {
01823         emit_destroy (&e);
01824         return 1;
01825     }
01826 
01827     *text = t;
01828     *em = e;
01829     return 0;
01830 }
01831 
01832 /*
01833     returns 0 on success,
01834     returns 1 otherwise,
01835 */
01836 static int get_spec (const byte **text, spec **sp, map_str *maps, map_byte *mapb)
01837 {
01838     const byte *t = *text;
01839     spec *s = NULL;
01840 
01841     spec_create (&s);
01842     if (s == NULL)
01843         return 1;
01844 
01845     /* first - read optional .if statement */
01846     if (*t == '.')
01847     {
01848         const byte *u = t;
01849         byte *keyword = NULL;
01850 
01851         /* skip the dot */
01852         u++;
01853 
01854         if (get_identifier (&u, &keyword))
01855         {
01856             spec_destroy (&s);
01857             return 1;
01858         }
01859 
01860         /* .if */
01861         if (str_equal ((byte *) "if", keyword))
01862         {
01863             cond_create (&s->m_cond);
01864             if (s->m_cond == NULL)
01865             {
01866                 spec_destroy (&s);
01867                 return 1;
01868             }
01869 
01870             /* skip the left paren */
01871             eat_spaces (&u);
01872             u++;
01873 
01874             /* get the left operand */
01875             eat_spaces (&u);
01876             if (get_identifier (&u, &s->m_cond->m_operands[0].m_regname))
01877             {
01878                 spec_destroy (&s);
01879                 return 1;
01880             }
01881             s->m_cond->m_operands[0].m_type = cot_regbyte;
01882 
01883             /* get the operator (!= or ==) */
01884             eat_spaces (&u);
01885             if (*u == '!')
01886                 s->m_cond->m_type = ct_not_equal;
01887             else
01888                 s->m_cond->m_type = ct_equal;
01889             u += 2;
01890             eat_spaces (&u);
01891 
01892             if (u[0] == '0' && (u[1] == 'x' || u[1] == 'X'))
01893             {
01894                 /* skip the 0x prefix */
01895                 u += 2;
01896 
01897                 /* get the right operand */
01898                 s->m_cond->m_operands[1].m_byte = hex_convert (&u);
01899                 s->m_cond->m_operands[1].m_type = cot_byte;
01900             }
01901             else /*if (*u >= '0' && *u <= '9')*/
01902             {
01903                 /* get the right operand */
01904                 s->m_cond->m_operands[1].m_byte = dec_convert (&u);
01905                 s->m_cond->m_operands[1].m_type = cot_byte;
01906             }
01907 
01908             /* skip the right paren */
01909             eat_spaces (&u);
01910             u++;
01911 
01912             eat_spaces (&u);
01913 
01914             t = u;
01915         }
01916 
01917         mem_free ((void **) (void *) &keyword);
01918     }
01919 
01920     if (*t == '\'')
01921     {
01922         byte *temp = NULL;
01923 
01924         if (get_string (&t, &temp))
01925         {
01926             spec_destroy (&s);
01927             return 1;
01928         }
01929         eat_spaces (&t);
01930 
01931         if (*t == '-')
01932         {
01933             byte *temp2 = NULL;
01934 
01935             /* skip the '-' character */
01936             t++;
01937             eat_spaces (&t);
01938 
01939             if (get_string (&t, &temp2))
01940             {
01941                 mem_free ((void **) (void *) &temp);
01942                 spec_destroy (&s);
01943                 return 1;
01944             }
01945             eat_spaces (&t);
01946 
01947             s->m_spec_type = st_byte_range;
01948             s->m_byte[0] = *temp;
01949             s->m_byte[1] = *temp2;
01950 
01951             mem_free ((void **) (void *) &temp2);
01952         }
01953         else
01954         {
01955             s->m_spec_type = st_byte;
01956             *s->m_byte = *temp;
01957         }
01958 
01959         mem_free ((void **) (void *) &temp);
01960     }
01961     else if (*t == '"')
01962     {
01963         if (get_string (&t, &s->m_string))
01964         {
01965             spec_destroy (&s);
01966             return 1;
01967         }
01968         eat_spaces (&t);
01969 
01970         s->m_spec_type = st_string;
01971     }
01972     else if (*t == '.')
01973     {
01974         byte *keyword = NULL;
01975 
01976         /* skip the dot */
01977         t++;
01978 
01979         if (get_identifier (&t, &keyword))
01980         {
01981             spec_destroy (&s);
01982             return 1;
01983         }
01984         eat_spaces (&t);
01985 
01986         /* .true */
01987         if (str_equal ((byte *) "true", keyword))
01988         {
01989             s->m_spec_type = st_true;
01990         }
01991         /* .false */
01992         else if (str_equal ((byte *) "false", keyword))
01993         {
01994             s->m_spec_type = st_false;
01995         }
01996         /* .debug */
01997         else if (str_equal ((byte *) "debug", keyword))
01998         {
01999             s->m_spec_type = st_debug;
02000         }
02001         /* .loop */
02002         else if (str_equal ((byte *) "loop", keyword))
02003         {
02004             if (get_identifier (&t, &s->m_string))
02005             {
02006                 mem_free ((void **) (void *) &keyword);
02007                 spec_destroy (&s);
02008                 return 1;
02009             }
02010             eat_spaces (&t);
02011 
02012             s->m_spec_type = st_identifier_loop;
02013         }
02014         mem_free ((void **) (void *) &keyword);
02015     }
02016     else
02017     {
02018         if (get_identifier (&t, &s->m_string))
02019         {
02020             spec_destroy (&s);
02021             return 1;
02022         }
02023         eat_spaces (&t);
02024 
02025         s->m_spec_type = st_identifier;
02026     }
02027 
02028     if (get_error (&t, &s->m_errtext, maps))
02029     {
02030         spec_destroy (&s);
02031         return 1;
02032     }
02033 
02034     if (get_emits (&t, &s->m_emits, mapb))
02035     {
02036         spec_destroy (&s);
02037         return 1;
02038     }
02039 
02040     *text = t;
02041     *sp = s;
02042     return 0;
02043 }
02044 
02045 /*
02046     returns 0 on success,
02047     returns 1 otherwise,
02048 */
02049 static int get_rule (const byte **text, rule **ru, map_str *maps, map_byte *mapb)
02050 {
02051     const byte *t = *text;
02052     rule *r = NULL;
02053 
02054     rule_create (&r);
02055     if (r == NULL)
02056         return 1;
02057 
02058     if (get_spec (&t, &r->m_specs, maps, mapb))
02059     {
02060         rule_destroy (&r);
02061         return 1;
02062     }
02063 
02064     while (*t != ';')
02065     {
02066         byte *op = NULL;
02067         spec *sp = NULL;
02068 
02069         /* skip the dot that precedes "and" or "or" */
02070         t++;
02071 
02072         /* read "and" or "or" keyword */
02073         if (get_identifier (&t, &op))
02074         {
02075             rule_destroy (&r);
02076             return 1;
02077         }
02078         eat_spaces (&t);
02079 
02080         if (r->m_oper == op_none)
02081         {
02082             /* .and */
02083             if (str_equal ((byte *) "and", op))
02084                 r->m_oper = op_and;
02085             /* .or */
02086             else
02087                 r->m_oper = op_or;
02088         }
02089 
02090         mem_free ((void **) (void *) &op);
02091 
02092         if (get_spec (&t, &sp, maps, mapb))
02093         {
02094             rule_destroy (&r);
02095             return 1;
02096         }
02097 
02098         spec_append (&r->m_specs, sp);
02099     }
02100 
02101     /* skip the semicolon */
02102     t++;
02103     eat_spaces (&t);
02104 
02105     *text = t;
02106     *ru = r;
02107     return 0;
02108 }
02109 
02110 /*
02111     returns 0 on success,
02112     returns 1 otherwise,
02113 */
02114 static int update_dependency (map_rule *mapr, byte *symbol, rule **ru)
02115 {
02116     if (map_rule_find (&mapr, symbol, ru))
02117         return 1;
02118 
02119     (**ru).m_referenced = 1;
02120 
02121     return 0;
02122 }
02123 
02124 /*
02125     returns 0 on success,
02126     returns 1 otherwise,
02127 */
02128 static int update_dependencies (dict *di, map_rule *mapr, byte **syntax_symbol,
02129     byte **string_symbol, map_byte *regbytes)
02130 {
02131     rule *rulez = di->m_rulez;
02132 
02133     /* update dependecies for the root and lexer symbols */
02134     if (update_dependency (mapr, *syntax_symbol, &di->m_syntax) ||
02135         (*string_symbol != NULL && update_dependency (mapr, *string_symbol, &di->m_string)))
02136         return 1;
02137 
02138     mem_free ((void **) syntax_symbol);
02139     mem_free ((void **) string_symbol);
02140 
02141     /* update dependecies for the rest of the rules */
02142     while (rulez)
02143     {
02144         spec *sp = rulez->m_specs;
02145 
02146         /* iterate through all the specifiers */
02147         while (sp)
02148         {
02149             /* update dependency for identifier */
02150             if (sp->m_spec_type == st_identifier || sp->m_spec_type == st_identifier_loop)
02151             {
02152                 if (update_dependency (mapr, sp->m_string, &sp->m_rule))
02153                     return 1;
02154 
02155                 mem_free ((void **) &sp->m_string);
02156             }
02157 
02158             /* some errtexts reference to a rule */
02159             if (sp->m_errtext && sp->m_errtext->m_token_name)
02160             {
02161                 if (update_dependency (mapr, sp->m_errtext->m_token_name, &sp->m_errtext->m_token))
02162                     return 1;
02163 
02164                 mem_free ((void **) &sp->m_errtext->m_token_name);
02165             }
02166 
02167             /* update dependency for condition */
02168             if (sp->m_cond)
02169             {
02170                 int i;
02171                 for (i = 0; i < 2; i++)
02172                     if (sp->m_cond->m_operands[i].m_type == cot_regbyte)
02173                     {
02174                         sp->m_cond->m_operands[i].m_regbyte = map_byte_locate (&regbytes,
02175                             sp->m_cond->m_operands[i].m_regname);
02176 
02177                         if (sp->m_cond->m_operands[i].m_regbyte == NULL)
02178                             return 1;
02179 
02180                         mem_free ((void **) &sp->m_cond->m_operands[i].m_regname);
02181                     }
02182             }
02183 
02184             /* update dependency for all .load instructions */
02185             if (sp->m_emits)
02186             {
02187                 emit *em = sp->m_emits;
02188                 while (em != NULL)
02189                 {
02190                     if (em->m_emit_dest == ed_regbyte)
02191                     {
02192                         em->m_regbyte = map_byte_locate (&regbytes, em->m_regname);
02193 
02194                         if (em->m_regbyte == NULL)
02195                             return 1;
02196 
02197                         mem_free ((void **) &em->m_regname);
02198                     }
02199 
02200                     em = em->m_next;
02201                 }
02202             }
02203 
02204             sp = sp->next;
02205         }
02206 
02207         rulez = rulez->next;
02208     }
02209 
02210     /* check for unreferenced symbols */
02211     rulez = di->m_rulez;
02212     while (rulez != NULL)
02213     {
02214         if (!rulez->m_referenced)
02215         {
02216             map_rule *ma = mapr;
02217             while (ma)
02218             {
02219                 if (ma->data == rulez)
02220                 {
02221                     set_last_error (UNREFERENCED_IDENTIFIER, str_duplicate (ma->key), -1);
02222                     return 1;
02223                 }
02224                 ma = ma->next;
02225             }
02226         }
02227         rulez = rulez->next;
02228     }
02229 
02230     return 0;
02231 }
02232 
02233 static int satisfies_condition (cond *co, regbyte_ctx *ctx)
02234 {
02235     byte values[2];
02236     int i;
02237 
02238     if (co == NULL)
02239         return 1;
02240 
02241     for (i = 0; i < 2; i++)
02242         switch (co->m_operands[i].m_type)
02243         {
02244         case cot_byte:
02245             values[i] = co->m_operands[i].m_byte;
02246             break;
02247         case cot_regbyte:
02248             values[i] = regbyte_ctx_extract (&ctx, co->m_operands[i].m_regbyte);
02249             break;
02250         }
02251 
02252     switch (co->m_type)
02253     {
02254     case ct_equal:
02255         return values[0] == values[1];
02256     case ct_not_equal:
02257         return values[0] != values[1];
02258     }
02259 
02260     return 0;
02261 }
02262 
02263 static void free_regbyte_ctx_stack (regbyte_ctx *top, regbyte_ctx *limit)
02264 {
02265     while (top != limit)
02266     {
02267         regbyte_ctx *rbc = top->m_prev;
02268         regbyte_ctx_destroy (&top);
02269         top = rbc;
02270     }
02271 }
02272 
02273 typedef enum match_result_
02274 {
02275     mr_not_matched,     /* the examined string does not match */
02276     mr_matched,         /* the examined string matches */
02277     mr_error_raised,    /* mr_not_matched + error has been raised */
02278     mr_dont_emit,       /* used by identifier loops only */
02279     mr_internal_error   /* an internal error has occured such as out of memory */
02280 } match_result;
02281 
02282 /*
02283  * This function does the main job. It parses the text and generates output data.
02284  */
02285 static match_result
02286 match (dict *di, const byte *text, int *index, rule *ru, barray **ba, int filtering_string,
02287        regbyte_ctx **rbc)
02288 {
02289    int ind = *index;
02290     match_result status = mr_not_matched;
02291     spec *sp = ru->m_specs;
02292     regbyte_ctx *ctx = *rbc;
02293 
02294     /* for every specifier in the rule */
02295     while (sp)
02296     {
02297       int i, len, save_ind = ind;
02298         barray *array = NULL;
02299 
02300         if (satisfies_condition (sp->m_cond, ctx))
02301         {
02302             switch (sp->m_spec_type)
02303             {
02304             case st_identifier:
02305                 barray_create (&array);
02306                 if (array == NULL)
02307                 {
02308                     free_regbyte_ctx_stack (ctx, *rbc);
02309                     return mr_internal_error;
02310                 }
02311 
02312                 status = match (di, text, &ind, sp->m_rule, &array, filtering_string, &ctx);
02313 
02314                 if (status == mr_internal_error)
02315                 {
02316                     free_regbyte_ctx_stack (ctx, *rbc);
02317                     barray_destroy (&array);
02318                     return mr_internal_error;
02319                 }
02320                 break;
02321             case st_string:
02322                 len = str_length (sp->m_string);
02323 
02324                 /* prefilter the stream */
02325                 if (!filtering_string && di->m_string)
02326                 {
02327                     barray *ba;
02328                int filter_index = 0;
02329                     match_result result;
02330                     regbyte_ctx *null_ctx = NULL;
02331 
02332                     barray_create (&ba);
02333                     if (ba == NULL)
02334                     {
02335                         free_regbyte_ctx_stack (ctx, *rbc);
02336                         return mr_internal_error;
02337                     }
02338 
02339                     result = match (di, text + ind, &filter_index, di->m_string, &ba, 1, &null_ctx);
02340 
02341                     if (result == mr_internal_error)
02342                     {
02343                         free_regbyte_ctx_stack (ctx, *rbc);
02344                         barray_destroy (&ba);
02345                         return mr_internal_error;
02346                     }
02347 
02348                     if (result != mr_matched)
02349                     {
02350                         barray_destroy (&ba);
02351                         status = mr_not_matched;
02352                         break;
02353                     }
02354 
02355                     barray_destroy (&ba);
02356 
02357                     if (filter_index != len || !str_equal_n (sp->m_string, text + ind, len))
02358                     {
02359                         status = mr_not_matched;
02360                         break;
02361                     }
02362 
02363                     status = mr_matched;
02364                     ind += len;
02365                 }
02366                 else
02367                 {
02368                     status = mr_matched;
02369                     for (i = 0; status == mr_matched && i < len; i++)
02370                         if (text[ind + i] != sp->m_string[i])
02371                             status = mr_not_matched;
02372 
02373                     if (status == mr_matched)
02374                         ind += len;
02375                 }
02376                 break;
02377             case st_byte:
02378                 status = text[ind] == *sp->m_byte ? mr_matched : mr_not_matched;
02379                 if (status == mr_matched)
02380                     ind++;
02381                 break;
02382             case st_byte_range:
02383                 status = (text[ind] >= sp->m_byte[0] && text[ind] <= sp->m_byte[1]) ?
02384                     mr_matched : mr_not_matched;
02385                 if (status == mr_matched)
02386                     ind++;
02387                 break;
02388             case st_true:
02389                 status = mr_matched;
02390                 break;
02391             case st_false:
02392                 status = mr_not_matched;
02393                 break;
02394             case st_debug:
02395                 status = ru->m_oper == op_and ? mr_matched : mr_not_matched;
02396                 break;
02397             case st_identifier_loop:
02398                 barray_create (&array);
02399                 if (array == NULL)
02400                 {
02401                     free_regbyte_ctx_stack (ctx, *rbc);
02402                     return mr_internal_error;
02403                 }
02404 
02405                 status = mr_dont_emit;
02406                 for (;;)
02407                 {
02408                     match_result result;
02409 
02410                     save_ind = ind;
02411                     result = match (di, text, &ind, sp->m_rule, &array, filtering_string, &ctx);
02412 
02413                     if (result == mr_error_raised)
02414                     {
02415                         status = result;
02416                         break;
02417                     }
02418                     else if (result == mr_matched)
02419                     {
02420                         if (barray_push (ba, sp->m_emits, text[ind - 1], save_ind, &ctx) ||
02421                             barray_append (ba, &array))
02422                         {
02423                             free_regbyte_ctx_stack (ctx, *rbc);
02424                             barray_destroy (&array);
02425                             return mr_internal_error;
02426                         }
02427                         barray_destroy (&array);
02428                         barray_create (&array);
02429                         if (array == NULL)
02430                         {
02431                             free_regbyte_ctx_stack (ctx, *rbc);
02432                             return mr_internal_error;
02433                         }
02434                     }
02435                     else if (result == mr_internal_error)
02436                     {
02437                         free_regbyte_ctx_stack (ctx, *rbc);
02438                         barray_destroy (&array);
02439                         return mr_internal_error;
02440                     }
02441                     else
02442                         break;
02443                 }
02444                 break;
02445             }
02446         }
02447         else
02448         {
02449             status = mr_not_matched;
02450         }
02451 
02452         if (status == mr_error_raised)
02453         {
02454             free_regbyte_ctx_stack (ctx, *rbc);
02455             barray_destroy (&array);
02456 
02457             return mr_error_raised;
02458         }
02459 
02460         if (ru->m_oper == op_and && status != mr_matched && status != mr_dont_emit)
02461         {
02462             free_regbyte_ctx_stack (ctx, *rbc);
02463             barray_destroy (&array);
02464 
02465             if (sp->m_errtext)
02466             {
02467                 set_last_error (sp->m_errtext->m_text, error_get_token (sp->m_errtext, di, text,
02468                     ind), ind);
02469 
02470                 return mr_error_raised;
02471             }
02472 
02473             return mr_not_matched;
02474         }
02475 
02476         if (status == mr_matched)
02477         {
02478             if (sp->m_emits)
02479                 if (barray_push (ba, sp->m_emits, text[ind - 1], save_ind, &ctx))
02480                 {
02481                     free_regbyte_ctx_stack (ctx, *rbc);
02482                     barray_destroy (&array);
02483                     return mr_internal_error;
02484                 }
02485 
02486             if (array)
02487                 if (barray_append (ba, &array))
02488                 {
02489                     free_regbyte_ctx_stack (ctx, *rbc);
02490                     barray_destroy (&array);
02491                     return mr_internal_error;
02492                 }
02493         }
02494 
02495         barray_destroy (&array);
02496 
02497         /* if the rule operator is a logical or, we pick up the first matching specifier */
02498         if (ru->m_oper == op_or && (status == mr_matched || status == mr_dont_emit))
02499         {
02500             *index = ind;
02501             *rbc = ctx;
02502             return mr_matched;
02503         }
02504 
02505         sp = sp->next;
02506     }
02507 
02508     /* everything went fine - all specifiers match up */
02509     if (ru->m_oper == op_and && (status == mr_matched || status == mr_dont_emit))
02510     {
02511         *index = ind;
02512         *rbc = ctx;
02513         return mr_matched;
02514     }
02515 
02516     free_regbyte_ctx_stack (ctx, *rbc);
02517     return mr_not_matched;
02518 }
02519 
02520 static match_result
02521 fast_match (dict *di, const byte *text, int *index, rule *ru, int *_PP, bytepool *_BP,
02522             int filtering_string, regbyte_ctx **rbc)
02523 {
02524    int ind = *index;
02525     int _P = filtering_string ? 0 : *_PP;
02526     int _P2;
02527     match_result status = mr_not_matched;
02528     spec *sp = ru->m_specs;
02529     regbyte_ctx *ctx = *rbc;
02530 
02531     /* for every specifier in the rule */
02532     while (sp)
02533     {
02534       int i, len, save_ind = ind;
02535 
02536         _P2 = _P + (sp->m_emits ? emit_size (sp->m_emits) : 0);
02537         if (bytepool_reserve (_BP, _P2))
02538         {
02539             free_regbyte_ctx_stack (ctx, *rbc);
02540             return mr_internal_error;
02541         }
02542 
02543         if (satisfies_condition (sp->m_cond, ctx))
02544         {
02545             switch (sp->m_spec_type)
02546             {
02547             case st_identifier:
02548                 status = fast_match (di, text, &ind, sp->m_rule, &_P2, _BP, filtering_string, &ctx);
02549 
02550                 if (status == mr_internal_error)
02551                 {
02552                     free_regbyte_ctx_stack (ctx, *rbc);
02553                     return mr_internal_error;
02554                 }
02555                 break;
02556             case st_string:
02557                 len = str_length (sp->m_string);
02558 
02559                 /* prefilter the stream */
02560                 if (!filtering_string && di->m_string)
02561                 {
02562                int filter_index = 0;
02563                     match_result result;
02564                     regbyte_ctx *null_ctx = NULL;
02565 
02566                     result = fast_match (di, text + ind, &filter_index, di->m_string, NULL, _BP, 1, &null_ctx);
02567 
02568                     if (result == mr_internal_error)
02569                     {
02570                         free_regbyte_ctx_stack (ctx, *rbc);
02571                         return mr_internal_error;
02572                     }
02573 
02574                     if (result != mr_matched)
02575                     {
02576                         status = mr_not_matched;
02577                         break;
02578                     }
02579 
02580                     if (filter_index != len || !str_equal_n (sp->m_string, text + ind, len))
02581                     {
02582                         status = mr_not_matched;
02583                         break;
02584                     }
02585 
02586                     status = mr_matched;
02587                     ind += len;
02588                 }
02589                 else
02590                 {
02591                     status = mr_matched;
02592                     for (i = 0; status == mr_matched && i < len; i++)
02593                         if (text[ind + i] != sp->m_string[i])
02594                             status = mr_not_matched;
02595 
02596                     if (status == mr_matched)
02597                         ind += len;
02598                 }
02599                 break;
02600             case st_byte:
02601                 status = text[ind] == *sp->m_byte ? mr_matched : mr_not_matched;
02602                 if (status == mr_matched)
02603                     ind++;
02604                 break;
02605             case st_byte_range:
02606                 status = (text[ind] >= sp->m_byte[0] && text[ind] <= sp->m_byte[1]) ?
02607                     mr_matched : mr_not_matched;
02608                 if (status == mr_matched)
02609                     ind++;
02610                 break;
02611             case st_true:
02612                 status = mr_matched;
02613                 break;
02614             case st_false:
02615                 status = mr_not_matched;
02616                 break;
02617             case st_debug:
02618                 status = ru->m_oper == op_and ? mr_matched : mr_not_matched;
02619                 break;
02620             case st_identifier_loop:
02621                 status = mr_dont_emit;
02622                 for (;;)
02623                 {
02624                     match_result result;
02625 
02626                     save_ind = ind;
02627                     result = fast_match (di, text, &ind, sp->m_rule, &_P2, _BP, filtering_string, &ctx);
02628 
02629                     if (result == mr_error_raised)
02630                     {
02631                         status = result;
02632                         break;
02633                     }
02634                     else if (result == mr_matched)
02635                     {
02636                         if (!filtering_string)
02637                         {
02638                             if (sp->m_emits != NULL)
02639                             {
02640                                 if (emit_push (sp->m_emits, _BP->_F + _P, text[ind - 1], save_ind, &ctx))
02641                                 {
02642                                     free_regbyte_ctx_stack (ctx, *rbc);
02643                                     return mr_internal_error;
02644                                 }
02645                             }
02646 
02647                             _P = _P2;
02648                             _P2 += sp->m_emits ? emit_size (sp->m_emits) : 0;
02649                             if (bytepool_reserve (_BP, _P2))
02650                             {
02651                                 free_regbyte_ctx_stack (ctx, *rbc);
02652                                 return mr_internal_error;
02653                             }
02654                         }
02655                     }
02656                     else if (result == mr_internal_error)
02657                     {
02658                         free_regbyte_ctx_stack (ctx, *rbc);
02659                         return mr_internal_error;
02660                     }
02661                     else
02662                         break;
02663                 }
02664                 break;
02665             }
02666         }
02667         else
02668         {
02669             status = mr_not_matched;
02670         }
02671 
02672         if (status == mr_error_raised)
02673         {
02674             free_regbyte_ctx_stack (ctx, *rbc);
02675 
02676             return mr_error_raised;
02677         }
02678 
02679         if (ru->m_oper == op_and && status != mr_matched && status != mr_dont_emit)
02680         {
02681             free_regbyte_ctx_stack (ctx, *rbc);
02682 
02683             if (sp->m_errtext)
02684             {
02685                 set_last_error (sp->m_errtext->m_text, error_get_token (sp->m_errtext, di, text,
02686                     ind), ind);
02687 
02688                 return mr_error_raised;
02689             }
02690 
02691             return mr_not_matched;
02692         }
02693 
02694         if (status == mr_matched)
02695         {
02696             if (sp->m_emits != NULL) {
02697                 const byte ch = (ind <= 0) ? 0 : text[ind - 1];
02698                 if (emit_push (sp->m_emits, _BP->_F + _P, ch, save_ind, &ctx))
02699                 {
02700                     free_regbyte_ctx_stack (ctx, *rbc);
02701                     return mr_internal_error;
02702                 }
02703 
02704            }
02705            _P = _P2;
02706         }
02707 
02708         /* if the rule operator is a logical or, we pick up the first matching specifier */
02709         if (ru->m_oper == op_or && (status == mr_matched || status == mr_dont_emit))
02710         {
02711             *index = ind;
02712             *rbc = ctx;
02713             if (!filtering_string)
02714                 *_PP = _P;
02715             return mr_matched;
02716         }
02717 
02718         sp = sp->next;
02719     }
02720 
02721     /* everything went fine - all specifiers match up */
02722     if (ru->m_oper == op_and && (status == mr_matched || status == mr_dont_emit))
02723     {
02724         *index = ind;
02725         *rbc = ctx;
02726         if (!filtering_string)
02727             *_PP = _P;
02728         return mr_matched;
02729     }
02730 
02731     free_regbyte_ctx_stack (ctx, *rbc);
02732     return mr_not_matched;
02733 }
02734 
02735 static byte *
02736 error_get_token (error *er, dict *di, const byte *text, int ind)
02737 {
02738     byte *str = NULL;
02739 
02740     if (er->m_token)
02741     {
02742         barray *ba;
02743       int filter_index = 0;
02744         regbyte_ctx *ctx = NULL;
02745 
02746         barray_create (&ba);
02747         if (ba != NULL)
02748         {
02749             if (match (di, text + ind, &filter_index, er->m_token, &ba, 0, &ctx) == mr_matched &&
02750                 filter_index)
02751             {
02752                 str = (byte *) mem_alloc (filter_index + 1);
02753                 if (str != NULL)
02754                 {
02755                     str_copy_n (str, text + ind, filter_index);
02756                     str[filter_index] = '\0';
02757                 }
02758             }
02759             barray_destroy (&ba);
02760         }
02761     }
02762 
02763     return str;
02764 }
02765 
02766 typedef struct grammar_load_state_
02767 {
02768     dict *di;
02769     byte *syntax_symbol;
02770     byte *string_symbol;
02771     map_str *maps;
02772     map_byte *mapb;
02773     map_rule *mapr;
02774 } grammar_load_state;
02775 
02776 static void grammar_load_state_create (grammar_load_state **gr)
02777 {
02778     *gr = (grammar_load_state *) mem_alloc (sizeof (grammar_load_state));
02779     if (*gr)
02780     {
02781         (**gr).di = NULL;
02782         (**gr).syntax_symbol = NULL;
02783         (**gr).string_symbol = NULL;
02784         (**gr).maps = NULL;
02785         (**gr).mapb = NULL;
02786         (**gr).mapr = NULL;
02787     }
02788 }
02789 
02790 static void grammar_load_state_destroy (grammar_load_state **gr)
02791 {
02792     if (*gr)
02793     {
02794         dict_destroy (&(**gr).di);
02795         mem_free ((void **) &(**gr).syntax_symbol);
02796         mem_free ((void **) &(**gr).string_symbol);
02797         map_str_destroy (&(**gr).maps);
02798         map_byte_destroy (&(**gr).mapb);
02799         map_rule_destroy (&(**gr).mapr);
02800         mem_free ((void **) gr);
02801     }
02802 }
02803 
02804 
02805 static void error_msg(int line, const char *msg)
02806 {
02807    fprintf(stderr, "Error in grammar_load_from_text() at line %d: %s\n", line, msg);
02808 }
02809 
02810 
02811 /*
02812     the API
02813 */
02814 grammar grammar_load_from_text (const byte *text)
02815 {
02816     grammar_load_state *g = NULL;
02817     grammar id = 0;
02818 
02819     clear_last_error ();
02820 
02821     grammar_load_state_create (&g);
02822     if (g == NULL) {
02823         error_msg(__LINE__, "");
02824         return 0;
02825     }
02826 
02827     dict_create (&g->di);
02828     if (g->di == NULL)
02829     {
02830         grammar_load_state_destroy (&g);
02831         error_msg(__LINE__, "");
02832         return 0;
02833     }
02834 
02835     eat_spaces (&text);
02836 
02837     /* skip ".syntax" keyword */
02838     text += 7;
02839     eat_spaces (&text);
02840 
02841     /* retrieve root symbol */
02842     if (get_identifier (&text, &g->syntax_symbol))
02843     {
02844         grammar_load_state_destroy (&g);
02845         error_msg(__LINE__, "");
02846         return 0;
02847     }
02848     eat_spaces (&text);
02849 
02850     /* skip semicolon */
02851     text++;
02852     eat_spaces (&text);
02853 
02854     while (*text)
02855     {
02856         byte *symbol = NULL;
02857         int is_dot = *text == '.';
02858 
02859         if (is_dot)
02860             text++;
02861 
02862         if (get_identifier (&text, &symbol))
02863         {
02864             grammar_load_state_destroy (&g);
02865             error_msg(__LINE__, "");
02866             return 0;
02867         }
02868         eat_spaces (&text);
02869 
02870         /* .emtcode */
02871         if (is_dot && str_equal (symbol, (byte *) "emtcode"))
02872         {
02873             map_byte *ma = NULL;
02874 
02875             mem_free ((void **) (void *) &symbol);
02876 
02877             if (get_emtcode (&text, &ma))
02878             {
02879                 grammar_load_state_destroy (&g);
02880                 error_msg(__LINE__, "");
02881                 return 0;
02882             }
02883 
02884             map_byte_append (&g->mapb, ma);
02885         }
02886         /* .regbyte */
02887         else if (is_dot && str_equal (symbol, (byte *) "regbyte"))
02888         {
02889             map_byte *ma = NULL;
02890 
02891             mem_free ((void **) (void *) &symbol);
02892 
02893             if (get_regbyte (&text, &ma))
02894             {
02895                 grammar_load_state_destroy (&g);
02896                 error_msg(__LINE__, "");
02897                 return 0;
02898             }
02899 
02900             map_byte_append (&g->di->m_regbytes, ma);
02901         }
02902         /* .errtext */
02903         else if (is_dot && str_equal (symbol, (byte *) "errtext"))
02904         {
02905             map_str *ma = NULL;
02906 
02907             mem_free ((void **) (void *) &symbol);
02908 
02909             if (get_errtext (&text, &ma))
02910             {
02911                 grammar_load_state_destroy (&g);
02912                 error_msg(__LINE__, "");
02913                 return 0;
02914             }
02915 
02916             map_str_append (&g->maps, ma);
02917         }
02918         /* .string */
02919         else if (is_dot && str_equal (symbol, (byte *) "string"))
02920         {
02921             mem_free ((void **) (void *) &symbol);
02922 
02923             if (g->di->m_string != NULL)
02924             {
02925                 grammar_load_state_destroy (&g);
02926                 error_msg(__LINE__, "");
02927                 return 0;
02928             }
02929 
02930             if (get_identifier (&text, &g->string_symbol))
02931             {
02932                 grammar_load_state_destroy (&g);
02933                 error_msg(__LINE__, "");
02934                 return 0;
02935             }
02936 
02937             /* skip semicolon */
02938             eat_spaces (&text);
02939             text++;
02940             eat_spaces (&text);
02941         }
02942         else
02943         {
02944             rule *ru = NULL;
02945             map_rule *ma = NULL;
02946 
02947             if (get_rule (&text, &ru, g->maps, g->mapb))
02948             {
02949                 grammar_load_state_destroy (&g);
02950                 error_msg(__LINE__, "");
02951                 return 0;
02952             }
02953 
02954             rule_append (&g->di->m_rulez, ru);
02955 
02956             /* if a rule consist of only one specifier, give it an ".and" operator */
02957             if (ru->m_oper == op_none)
02958                 ru->m_oper = op_and;
02959 
02960             map_rule_create (&ma);
02961             if (ma == NULL)
02962             {
02963                 grammar_load_state_destroy (&g);
02964                 error_msg(__LINE__, "");
02965                 return 0;
02966             }
02967 
02968             ma->key = symbol;
02969             ma->data = ru;
02970             map_rule_append (&g->mapr, ma);
02971         }
02972     }
02973 
02974     if (update_dependencies (g->di, g->mapr, &g->syntax_symbol, &g->string_symbol,
02975         g->di->m_regbytes))
02976     {
02977         grammar_load_state_destroy (&g);
02978         error_msg(__LINE__, "update_dependencies() failed");
02979         return 0;
02980     }
02981 
02982     dict_append (&g_dicts, g->di);
02983     id = g->di->m_id;
02984     g->di = NULL;
02985 
02986     grammar_load_state_destroy (&g);
02987 
02988     return id;
02989 }
02990 
02991 int grammar_set_reg8 (grammar id, const byte *name, byte value)
02992 {
02993     dict *di = NULL;
02994     map_byte *reg = NULL;
02995 
02996     clear_last_error ();
02997 
02998     dict_find (&g_dicts, id, &di);
02999     if (di == NULL)
03000     {
03001         set_last_error (INVALID_GRAMMAR_ID, NULL, -1);
03002         return 0;
03003     }
03004 
03005     reg = map_byte_locate (&di->m_regbytes, name);
03006     if (reg == NULL)
03007     {
03008         set_last_error (INVALID_REGISTER_NAME, str_duplicate (name), -1);
03009         return 0;
03010     }
03011 
03012     reg->data = value;
03013     return 1;
03014 }
03015 
03016 /*
03017     internal checking function used by both grammar_check and grammar_fast_check functions
03018 */
03019 static int _grammar_check (grammar id, const byte *text, byte **prod, unsigned int *size,
03020     unsigned int estimate_prod_size, int use_fast_path)
03021 {
03022     dict *di = NULL;
03023    int index = 0;
03024 
03025     clear_last_error ();
03026 
03027     dict_find (&g_dicts, id, &di);
03028     if (di == NULL)
03029     {
03030         set_last_error (INVALID_GRAMMAR_ID, NULL, -1);
03031         return 0;
03032     }
03033 
03034     *prod = NULL;
03035     *size = 0;
03036 
03037     if (use_fast_path)
03038     {
03039         regbyte_ctx *rbc = NULL;
03040         bytepool *bp = NULL;
03041         int _P = 0;
03042 
03043         bytepool_create (&bp, estimate_prod_size);
03044         if (bp == NULL)
03045             return 0;
03046 
03047         if (fast_match (di, text, &index, di->m_syntax, &_P, bp, 0, &rbc) != mr_matched)
03048         {
03049             bytepool_destroy (&bp);
03050             free_regbyte_ctx_stack (rbc, NULL);
03051             return 0;
03052         }
03053 
03054         free_regbyte_ctx_stack (rbc, NULL);
03055 
03056         *prod = bp->_F;
03057         *size = _P;
03058         bp->_F = NULL;
03059         bytepool_destroy (&bp);
03060     }
03061     else
03062     {
03063         regbyte_ctx *rbc = NULL;
03064         barray *ba = NULL;
03065 
03066         barray_create (&ba);
03067         if (ba == NULL)
03068             return 0;
03069 
03070         if (match (di, text, &index, di->m_syntax, &ba, 0, &rbc) != mr_matched)
03071         {
03072             barray_destroy (&ba);
03073             free_regbyte_ctx_stack (rbc, NULL);
03074             return 0;
03075         }
03076 
03077         free_regbyte_ctx_stack (rbc, NULL);
03078 
03079         *prod = (byte *) mem_alloc (ba->len * sizeof (byte));
03080         if (*prod == NULL)
03081         {
03082             barray_destroy (&ba);
03083             return 0;
03084         }
03085 
03086         mem_copy (*prod, ba->data, ba->len * sizeof (byte));
03087         *size = ba->len;
03088         barray_destroy (&ba);
03089     }
03090 
03091     return 1;
03092 }
03093 
03094 int grammar_check (grammar id, const byte *text, byte **prod, unsigned int *size)
03095 {
03096     return _grammar_check (id, text, prod, size, 0, 0);
03097 }
03098 
03099 int grammar_fast_check (grammar id, const byte *text, byte **prod, unsigned int *size,
03100     unsigned int estimate_prod_size)
03101 {
03102     return _grammar_check (id, text, prod, size, estimate_prod_size, 1);
03103 }
03104 
03105 int grammar_destroy (grammar id)
03106 {
03107     dict **di = &g_dicts;
03108 
03109     clear_last_error ();
03110 
03111     while (*di != NULL)
03112     {
03113         if ((**di).m_id == id)
03114         {
03115             dict *tmp = *di;
03116             *di = (**di).next;
03117             dict_destroy (&tmp);
03118             return 1;
03119         }
03120 
03121         di = &(**di).next;
03122     }
03123 
03124     set_last_error (INVALID_GRAMMAR_ID, NULL, -1);
03125     return 0;
03126 }
03127 
03128 static void append_character (const char x, byte *text, int *dots_made, int *len, int size)
03129 {
03130     if (*dots_made == 0)
03131     {
03132         if (*len < size - 1)
03133         {
03134             text[(*len)++] = x;
03135             text[*len] = '\0';
03136         }
03137         else
03138         {
03139             int i;
03140             for (i = 0; i < 3; i++)
03141                 if (--(*len) >= 0)
03142                     text[*len] = '.';
03143             *dots_made = 1;
03144         }
03145     }
03146 }
03147 
03148 void grammar_get_last_error (byte *text, unsigned int size, int *pos)
03149 {
03150     int len = 0, dots_made = 0;
03151     const byte *p = error_message;
03152 
03153     *text = '\0';
03154 
03155     if (p)
03156     {
03157         while (*p)
03158         {
03159             if (*p == '$')
03160             {
03161                 const byte *r = error_param;
03162 
03163                 while (*r)
03164                 {
03165                     append_character (*r++, text, &dots_made, &len, (int) size);
03166                 }
03167 
03168                 p++;
03169             }
03170             else
03171             {
03172                 append_character (*p++, text, &dots_made, &len, size);
03173             }
03174         }
03175     }
03176 
03177     *pos = error_position;
03178 }

Generated on Fri May 25 2012 04:18:45 for ReactOS by doxygen 1.7.6.1

ReactOS is a registered trademark or a trademark of ReactOS Foundation in the United States and other countries.