Home | Info | Community | Development | myReactOS | Contact Us
ReactOS Development > Doxygengrammar.c
Go to the documentation of this file.
00001 /* 00002 * Mesa 3-D graphics library 00003 * Version: 6.6 00004 * 00005 * Copyright (C) 1999-2006 Brian Paul All Rights Reserved. 00006 * 00007 * Permission is hereby granted, free of charge, to any person obtaining a 00008 * copy of this software and associated documentation files (the "Software"), 00009 * to deal in the Software without restriction, including without limitation 00010 * the rights to use, copy, modify, merge, publish, distribute, sublicense, 00011 * and/or sell copies of the Software, and to permit persons to whom the 00012 * Software is furnished to do so, subject to the following conditions: 00013 * 00014 * The above copyright notice and this permission notice shall be included 00015 * in all copies or substantial portions of the Software. 00016 * 00017 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 00018 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 00019 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 00020 * BRIAN PAUL BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN 00021 * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 00022 * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 00023 */ 00024 00031 #ifndef GRAMMAR_PORT_BUILD 00032 #error Do not build this file directly, build your grammar_XXX.c instead, which includes this file 00033 #endif 00034 00035 /* 00036 */ 00037 00038 /* 00039 INTRODUCTION 00040 ------------ 00041 00042 The task is to check the syntax of an input string. Input string is a stream of ASCII 00043 characters terminated with a null-character ('\0'). Checking it using C language is 00044 difficult and hard to implement without bugs. It is hard to maintain and make changes when 00045 the syntax changes. 00046 00047 This is because of a high redundancy of the C code. Large blocks of code are duplicated with 00048 only small changes. Even use of macros does not solve the problem because macros cannot 00049 erase the complexity of the problem. 00050 00051 The resolution is to create a new language that will be highly oriented to our task. Once 00052 we describe a particular syntax, we are done. We can then focus on the code that implements 00053 the language. The size and complexity of it is relatively small than the code that directly 00054 checks the syntax. 00055 00056 First, we must implement our new language. Here, the language is implemented in C, but it 00057 could also be implemented in any other language. The code is listed below. We must take 00058 a good care that it is bug free. This is simple because the code is simple and clean. 00059 00060 Next, we must describe the syntax of our new language in itself. Once created and checked 00061 manually that it is correct, we can use it to check another scripts. 00062 00063 Note that our new language loading code does not have to check the syntax. It is because we 00064 assume that the script describing itself is correct, and other scripts can be syntactically 00065 checked by the former script. The loading code must only do semantic checking which leads us to 00066 simple resolving references. 00067 00068 THE LANGUAGE 00069 ------------ 00070 00071 Here I will describe the syntax of the new language (further called "Synek"). It is mainly a 00072 sequence of declarations terminated by a semicolon. The declaration consists of a symbol, 00073 which is an identifier, and its definition. A definition is in turn a sequence of specifiers 00074 connected with ".and" or ".or" operator. These operators cannot be mixed together in a one 00075 definition. Specifier can be a symbol, string, character, character range or a special 00076 keyword ".true" or ".false". 00077 00078 On the very beginning of the script there is a declaration of a root symbol and is in the form: 00079 .syntax <root_symbol>; 00080 The <root_symbol> must be on of the symbols in declaration sequence. The syntax is correct if 00081 the root symbol evaluates to true. A symbol evaluates to true if the definition associated with 00082 the symbol evaluates to true. Definition evaluation depends on the operator used to connect 00083 specifiers in the definition. If ".and" operator is used, definition evaluates to true if and 00084 only if all the specifiers evaluate to true. If ".or" operator is used, definition evalutes to 00085 true if any of the specifiers evaluates to true. If definition contains only one specifier, 00086 it is evaluated as if it was connected with ".true" keyword by ".and" operator. 00087 00088 If specifier is a ".true" keyword, it always evaluates to true. 00089 00090 If specifier is a ".false" keyword, it always evaluates to false. Specifier evaluates to false 00091 when it does not evaluate to true. 00092 00093 Character range specifier is in the form: 00094 '<first_character>' - '<second_character>' 00095 If specifier is a character range, it evaluates to true if character in the stream is greater 00096 or equal to <first_character> and less or equal to <second_character>. In that situation 00097 the stream pointer is advanced to point to next character in the stream. All C-style escape 00098 sequences are supported although trigraph sequences are not. The comparisions are performed 00099 on 8-bit unsigned integers. 00100 00101 Character specifier is in the form: 00102 '<single_character>' 00103 It evaluates to true if the following character range specifier evaluates to true: 00104 '<single_character>' - '<single_character>' 00105 00106 String specifier is in the form: 00107 "<string>" 00108 Let N be the number of characters in <string>. Let <string>[i] designate i-th character in 00109 <string>. Then the string specifier evaluates to true if and only if for i in the range [0, N) 00110 the following character specifier evaluates to true: 00111 '<string>[i]' 00112 If <string>[i] is a quotation mark, '<string>[i]' is replaced with '<string>[i]'. 00113 00114 Symbol specifier can be optionally preceded by a ".loop" keyword in the form: 00115 .loop <symbol> (1) 00116 where <symbol> is defined as follows: 00117 <symbol> <definition>; (2) 00118 Construction (1) is replaced by the following code: 00119 <symbol$1> 00120 and declaration (2) is replaced by the following: 00121 <symbol$1> <symbol$2> .or .true; 00122 <symbol$2> <symbol> .and <symbol$1>; 00123 <symbol> <definition>; 00124 00125 Synek supports also a register mechanizm. User can, in its SYN file, declare a number of 00126 registers that can be accessed in the syn body. Each reg has its name and a default value. 00127 The register is one byte wide. The C code can change the default value by calling 00128 grammar_set_reg8() with grammar id, register name and a new value. As we know, each rule is 00129 a sequence of specifiers joined with .and or .or operator. And now each specifier can be 00130 prefixed with a condition expression in a form ".if (<reg_name> <operator> <hex_literal>)" 00131 where <operator> can be == or !=. If the condition evaluates to false, the specifier 00132 evaluates to .false. Otherwise it evalutes to the specifier. 00133 00134 ESCAPE SEQUENCES 00135 ---------------- 00136 00137 Synek supports all escape sequences in character specifiers. The mapping table is listed below. 00138 All occurences of the characters in the first column are replaced with the corresponding 00139 character in the second column. 00140 00141 Escape sequence Represents 00142 ------------------------------------------------------------------------------------------------ 00143 \a Bell (alert) 00144 \b Backspace 00145 \f Formfeed 00146 \n New line 00147 \r Carriage return 00148 \t Horizontal tab 00149 \v Vertical tab 00150 \' Single quotation mark 00151 \" Double quotation mark 00152 \\ Backslash 00153 \? Literal question mark 00154 \ooo ASCII character in octal notation 00155 \xhhh ASCII character in hexadecimal notation 00156 ------------------------------------------------------------------------------------------------ 00157 00158 RAISING ERRORS 00159 -------------- 00160 00161 Any specifier can be followed by a special construction that is executed when the specifier 00162 evaluates to false. The construction is in the form: 00163 .error <ERROR_TEXT> 00164 <ERROR_TEXT> is an identifier declared earlier by error text declaration. The declaration is 00165 in the form: 00166 .errtext <ERROR_TEXT> "<error_desc>" 00167 When specifier evaluates to false and this construction is present, parsing is stopped 00168 immediately and <error_desc> is returned as a result of parsing. The error position is also 00169 returned and it is meant as an offset from the beggining of the stream to the character that 00170 was valid so far. Example: 00171 00172 (**** syntax script ****) 00173 00174 .syntax program; 00175 .errtext MISSING_SEMICOLON "missing ';'" 00176 program declaration .and .loop space .and ';' .error MISSING_SEMICOLON .and 00177 .loop space .and '\0'; 00178 declaration "declare" .and .loop space .and identifier; 00179 space ' '; 00180 00181 (**** sample code ****) 00182 00183 declare foo , 00184 00185 In the example above checking the sample code will result in error message "missing ';'" and 00186 error position 12. The sample code is not correct. Note the presence of '\0' specifier to 00187 assure that there is no code after semicolon - only spaces. 00188 <error_desc> can optionally contain identifier surrounded by dollar signs $. In such a case, 00189 the identifier and dollar signs are replaced by a string retrieved by invoking symbol with 00190 the identifier name. The starting position is the error position. The lenght of the resulting 00191 string is the position after invoking the symbol. 00192 00193 PRODUCTION 00194 ---------- 00195 00196 Synek not only checks the syntax but it can also produce (emit) bytes associated with specifiers 00197 that evaluate to true. That is, every specifier and optional error construction can be followed 00198 by a number of emit constructions that are in the form: 00199 .emit <parameter> 00200 <paramater> can be a HEX number, identifier, a star * or a dollar $. HEX number is preceded by 00201 0x or 0X. If <parameter> is an identifier, it must be earlier declared by emit code declaration 00202 in the form: 00203 .emtcode <identifier> <hex_number> 00204 00205 When given specifier evaluates to true, all emits associated with the specifier are output 00206 in order they were declared. A star means that last-read character should be output instead 00207 of constant value. Example: 00208 00209 (**** syntax script ****) 00210 00211 .syntax foobar; 00212 .emtcode WORD_FOO 0x01 00213 .emtcode WORD_BAR 0x02 00214 foobar FOO .emit WORD_FOO .or BAR .emit WORD_BAR .or .true .emit 0x00; 00215 FOO "foo" .and SPACE; 00216 BAR "bar" .and SPACE; 00217 SPACE ' ' .or '\0'; 00218 00219 (**** sample text 1 ****) 00220 00221 foo 00222 00223 (**** sample text 2 ****) 00224 00225 foobar 00226 00227 For both samples the result will be one-element array. For first sample text it will be 00228 value 1, for second - 0. Note that every text will be accepted because of presence of 00229 .true as an alternative. 00230 00231 Another example: 00232 00233 (**** syntax script ****) 00234 00235 .syntax declaration; 00236 .emtcode VARIABLE 0x01 00237 declaration "declare" .and .loop space .and 00238 identifier .emit VARIABLE .and (1) 00239 .true .emit 0x00 .and (2) 00240 .loop space .and ';'; 00241 space ' ' .or '\t'; 00242 identifier .loop id_char .emit *; (3) 00243 id_char 'a'-'z' .or 'A'-'Z' .or '_'; 00244 00245 (**** sample code ****) 00246 00247 declare fubar; 00248 00249 In specifier (1) symbol <identifier> is followed by .emit VARIABLE. If it evaluates to 00250 true, VARIABLE constant and then production of the symbol is output. Specifier (2) is used 00251 to terminate the string with null to signal when the string ends. Specifier (3) outputs 00252 all characters that make declared identifier. The result of sample code will be the 00253 following array: 00254 { 1, 'f', 'u', 'b', 'a', 'r', 0 } 00255 00256 If .emit is followed by dollar $, it means that current position should be output. Current 00257 position is a 32-bit unsigned integer distance from the very beginning of the parsed string to 00258 first character consumed by the specifier associated with the .emit instruction. Current 00259 position is stored in the output buffer in Little-Endian convention (the lowest byte comes 00260 first). 00261 */ 00262 00263 #include <stdio.h> 00264 00265 static void mem_free (void **); 00266 00267 /* 00268 internal error messages 00269 */ 00270 static const byte *OUT_OF_MEMORY = (byte *) "internal error 1001: out of physical memory"; 00271 static const byte *UNRESOLVED_REFERENCE = (byte *) "internal error 1002: unresolved reference '$'"; 00272 static const byte *INVALID_GRAMMAR_ID = (byte *) "internal error 1003: invalid grammar object"; 00273 static const byte *INVALID_REGISTER_NAME = (byte *) "internal error 1004: invalid register name: '$'"; 00274 /*static const byte *DUPLICATE_IDENTIFIER = (byte *) "internal error 1005: identifier '$' already defined";*/ 00275 static const byte *UNREFERENCED_IDENTIFIER =(byte *) "internal error 1006: unreferenced identifier '$'"; 00276 00277 static const byte *error_message = NULL; /* points to one of the error messages above */ 00278 static byte *error_param = NULL; /* this is inserted into error_message in place of $ */ 00279 static int error_position = -1; 00280 00281 static byte *unknown = (byte *) "???"; 00282 00283 static void clear_last_error (void) 00284 { 00285 /* reset error message */ 00286 error_message = NULL; 00287 00288 /* free error parameter - if error_param is a "???" don't free it - it's static */ 00289 if (error_param != unknown) 00290 mem_free ((void **) (void *) &error_param); 00291 else 00292 error_param = NULL; 00293 00294 /* reset error position */ 00295 error_position = -1; 00296 } 00297 00298 static void set_last_error (const byte *msg, byte *param, int pos) 00299 { 00300 /* error message can be set only once */ 00301 if (error_message != NULL) 00302 { 00303 mem_free ((void **) (void *) ¶m); 00304 return; 00305 } 00306 00307 error_message = msg; 00308 00309 /* if param is NULL, set error_param to unknown ("???") */ 00310 /* note: do not try to strdup the "???" - it may be that we are here because of */ 00311 /* out of memory error so strdup can fail */ 00312 if (param != NULL) 00313 error_param = param; 00314 else 00315 error_param = unknown; 00316 00317 error_position = pos; 00318 } 00319 00320 /* 00321 memory management routines 00322 */ 00323 static void *mem_alloc (size_t size) 00324 { 00325 void *ptr = grammar_alloc_malloc (size); 00326 if (ptr == NULL) 00327 set_last_error (OUT_OF_MEMORY, NULL, -1); 00328 return ptr; 00329 } 00330 00331 static void *mem_copy (void *dst, const void *src, size_t size) 00332 { 00333 return grammar_memory_copy (dst, src, size); 00334 } 00335 00336 static void mem_free (void **ptr) 00337 { 00338 grammar_alloc_free (*ptr); 00339 *ptr = NULL; 00340 } 00341 00342 static void *mem_realloc (void *ptr, size_t old_size, size_t new_size) 00343 { 00344 void *ptr2 = grammar_alloc_realloc (ptr, old_size, new_size); 00345 if (ptr2 == NULL) 00346 set_last_error (OUT_OF_MEMORY, NULL, -1); 00347 return ptr2; 00348 } 00349 00350 static byte *str_copy_n (byte *dst, const byte *src, size_t max_len) 00351 { 00352 return grammar_string_copy_n (dst, src, max_len); 00353 } 00354 00355 static byte *str_duplicate (const byte *str) 00356 { 00357 byte *new_str = grammar_string_duplicate (str); 00358 if (new_str == NULL) 00359 set_last_error (OUT_OF_MEMORY, NULL, -1); 00360 return new_str; 00361 } 00362 00363 static int str_equal (const byte *str1, const byte *str2) 00364 { 00365 return grammar_string_compare (str1, str2) == 0; 00366 } 00367 00368 static int str_equal_n (const byte *str1, const byte *str2, unsigned int n) 00369 { 00370 return grammar_string_compare_n (str1, str2, n) == 0; 00371 } 00372 00373 static int 00374 str_length (const byte *str) 00375 { 00376 return (int) (grammar_string_length (str)); 00377 } 00378 00379 /* 00380 useful macros 00381 */ 00382 #define GRAMMAR_IMPLEMENT_LIST_APPEND(_Ty)\ 00383 static void _Ty##_append (_Ty **x, _Ty *nx) {\ 00384 while (*x) x = &(**x).next;\ 00385 *x = nx;\ 00386 } 00387 00388 /* 00389 string to byte map typedef 00390 */ 00391 typedef struct map_byte_ 00392 { 00393 byte *key; 00394 byte data; 00395 struct map_byte_ *next; 00396 } map_byte; 00397 00398 static void map_byte_create (map_byte **ma) 00399 { 00400 *ma = (map_byte *) mem_alloc (sizeof (map_byte)); 00401 if (*ma) 00402 { 00403 (**ma).key = NULL; 00404 (**ma).data = '\0'; 00405 (**ma).next = NULL; 00406 } 00407 } 00408 00409 static void map_byte_destroy (map_byte **ma) 00410 { 00411 if (*ma) 00412 { 00413 map_byte_destroy (&(**ma).next); 00414 mem_free ((void **) &(**ma).key); 00415 mem_free ((void **) ma); 00416 } 00417 } 00418 00419 GRAMMAR_IMPLEMENT_LIST_APPEND(map_byte) 00420 00421 /* 00422 searches the map for the specified key, 00423 returns pointer to the element with the specified key if it exists 00424 returns NULL otherwise 00425 */ 00426 static map_byte *map_byte_locate (map_byte **ma, const byte *key) 00427 { 00428 while (*ma) 00429 { 00430 if (str_equal ((**ma).key, key)) 00431 return *ma; 00432 00433 ma = &(**ma).next; 00434 } 00435 00436 set_last_error (UNRESOLVED_REFERENCE, str_duplicate (key), -1); 00437 return NULL; 00438 } 00439 00440 /* 00441 searches the map for specified key, 00442 if the key is matched, *data is filled with data associated with the key, 00443 returns 0 if the key is matched, 00444 returns 1 otherwise 00445 */ 00446 static int map_byte_find (map_byte **ma, const byte *key, byte *data) 00447 { 00448 map_byte *found = map_byte_locate (ma, key); 00449 if (found != NULL) 00450 { 00451 *data = found->data; 00452 00453 return 0; 00454 } 00455 00456 return 1; 00457 } 00458 00459 /* 00460 regbyte context typedef 00461 00462 Each regbyte consists of its name and a default value. These are static and created at 00463 grammar script compile-time, for example the following line: 00464 .regbyte vertex_blend 0x00 00465 adds a new regbyte named "vertex_blend" to the static list and initializes it to 0. 00466 When the script is executed, this regbyte can be accessed by name for read and write. When a 00467 particular regbyte is written, a new regbyte_ctx entry is added to the top of the regbyte_ctx 00468 stack. The new entry contains information abot which regbyte it references and its new value. 00469 When a given regbyte is accessed for read, the stack is searched top-down to find an 00470 entry that references the regbyte. The first matching entry is used to return the current 00471 value it holds. If no entry is found, the default value is returned. 00472 */ 00473 typedef struct regbyte_ctx_ 00474 { 00475 map_byte *m_regbyte; 00476 byte m_current_value; 00477 struct regbyte_ctx_ *m_prev; 00478 } regbyte_ctx; 00479 00480 static void regbyte_ctx_create (regbyte_ctx **re) 00481 { 00482 *re = (regbyte_ctx *) mem_alloc (sizeof (regbyte_ctx)); 00483 if (*re) 00484 { 00485 (**re).m_regbyte = NULL; 00486 (**re).m_prev = NULL; 00487 } 00488 } 00489 00490 static void regbyte_ctx_destroy (regbyte_ctx **re) 00491 { 00492 if (*re) 00493 { 00494 mem_free ((void **) re); 00495 } 00496 } 00497 00498 static byte regbyte_ctx_extract (regbyte_ctx **re, map_byte *reg) 00499 { 00500 /* first lookup in the register stack */ 00501 while (*re != NULL) 00502 { 00503 if ((**re).m_regbyte == reg) 00504 return (**re).m_current_value; 00505 00506 re = &(**re).m_prev; 00507 } 00508 00509 /* if not found - return the default value */ 00510 return reg->data; 00511 } 00512 00513 /* 00514 emit type typedef 00515 */ 00516 typedef enum emit_type_ 00517 { 00518 et_byte, /* explicit number */ 00519 et_stream, /* eaten character */ 00520 et_position /* current position */ 00521 } emit_type; 00522 00523 /* 00524 emit destination typedef 00525 */ 00526 typedef enum emit_dest_ 00527 { 00528 ed_output, /* write to the output buffer */ 00529 ed_regbyte /* write a particular regbyte */ 00530 } emit_dest; 00531 00532 /* 00533 emit typedef 00534 */ 00535 typedef struct emit_ 00536 { 00537 emit_dest m_emit_dest; 00538 emit_type m_emit_type; /* ed_output */ 00539 byte m_byte; /* et_byte */ 00540 map_byte *m_regbyte; /* ed_regbyte */ 00541 byte *m_regname; /* ed_regbyte - temporary */ 00542 struct emit_ *m_next; 00543 } emit; 00544 00545 static void emit_create (emit **em) 00546 { 00547 *em = (emit *) mem_alloc (sizeof (emit)); 00548 if (*em) 00549 { 00550 (**em).m_emit_dest = ed_output; 00551 (**em).m_emit_type = et_byte; 00552 (**em).m_byte = '\0'; 00553 (**em).m_regbyte = NULL; 00554 (**em).m_regname = NULL; 00555 (**em).m_next = NULL; 00556 } 00557 } 00558 00559 static void emit_destroy (emit **em) 00560 { 00561 if (*em) 00562 { 00563 emit_destroy (&(**em).m_next); 00564 mem_free ((void **) &(**em).m_regname); 00565 mem_free ((void **) em); 00566 } 00567 } 00568 00569 static unsigned int emit_size (emit *_E) 00570 { 00571 unsigned int n = 0; 00572 00573 while (_E != NULL) 00574 { 00575 if (_E->m_emit_dest == ed_output) 00576 { 00577 if (_E->m_emit_type == et_position) 00578 n += 4; /* position is a 32-bit unsigned integer */ 00579 else 00580 n++; 00581 } 00582 _E = _E->m_next; 00583 } 00584 00585 return n; 00586 } 00587 00588 static int emit_push (emit *_E, byte *_P, byte c, unsigned int _Pos, regbyte_ctx **_Ctx) 00589 { 00590 while (_E != NULL) 00591 { 00592 if (_E->m_emit_dest == ed_output) 00593 { 00594 if (_E->m_emit_type == et_byte) 00595 *_P++ = _E->m_byte; 00596 else if (_E->m_emit_type == et_stream) 00597 *_P++ = c; 00598 else /* _Em->type == et_position */ 00599 { 00600 *_P++ = (byte) (_Pos); 00601 *_P++ = (byte) (_Pos >> 8); 00602 *_P++ = (byte) (_Pos >> 16); 00603 *_P++ = (byte) (_Pos >> 24); 00604 } 00605 } 00606 else 00607 { 00608 regbyte_ctx *new_rbc; 00609 regbyte_ctx_create (&new_rbc); 00610 if (new_rbc == NULL) 00611 return 1; 00612 00613 new_rbc->m_prev = *_Ctx; 00614 new_rbc->m_regbyte = _E->m_regbyte; 00615 *_Ctx = new_rbc; 00616 00617 if (_E->m_emit_type == et_byte) 00618 new_rbc->m_current_value = _E->m_byte; 00619 else if (_E->m_emit_type == et_stream) 00620 new_rbc->m_current_value = c; 00621 } 00622 00623 _E = _E->m_next; 00624 } 00625 00626 return 0; 00627 } 00628 00629 /* 00630 error typedef 00631 */ 00632 typedef struct error_ 00633 { 00634 byte *m_text; 00635 byte *m_token_name; 00636 struct rule_ *m_token; 00637 } error; 00638 00639 static void error_create (error **er) 00640 { 00641 *er = (error *) mem_alloc (sizeof (error)); 00642 if (*er) 00643 { 00644 (**er).m_text = NULL; 00645 (**er).m_token_name = NULL; 00646 (**er).m_token = NULL; 00647 } 00648 } 00649 00650 static void error_destroy (error **er) 00651 { 00652 if (*er) 00653 { 00654 mem_free ((void **) &(**er).m_text); 00655 mem_free ((void **) &(**er).m_token_name); 00656 mem_free ((void **) er); 00657 } 00658 } 00659 00660 struct dict_; 00661 00662 static byte * 00663 error_get_token (error *, struct dict_ *, const byte *, int); 00664 00665 /* 00666 condition operand type typedef 00667 */ 00668 typedef enum cond_oper_type_ 00669 { 00670 cot_byte, /* constant 8-bit unsigned integer */ 00671 cot_regbyte /* pointer to byte register containing the current value */ 00672 } cond_oper_type; 00673 00674 /* 00675 condition operand typedef 00676 */ 00677 typedef struct cond_oper_ 00678 { 00679 cond_oper_type m_type; 00680 byte m_byte; /* cot_byte */ 00681 map_byte *m_regbyte; /* cot_regbyte */ 00682 byte *m_regname; /* cot_regbyte - temporary */ 00683 } cond_oper; 00684 00685 /* 00686 condition type typedef 00687 */ 00688 typedef enum cond_type_ 00689 { 00690 ct_equal, 00691 ct_not_equal 00692 } cond_type; 00693 00694 /* 00695 condition typedef 00696 */ 00697 typedef struct cond_ 00698 { 00699 cond_type m_type; 00700 cond_oper m_operands[2]; 00701 } cond; 00702 00703 static void cond_create (cond **co) 00704 { 00705 *co = (cond *) mem_alloc (sizeof (cond)); 00706 if (*co) 00707 { 00708 (**co).m_operands[0].m_regname = NULL; 00709 (**co).m_operands[1].m_regname = NULL; 00710 } 00711 } 00712 00713 static void cond_destroy (cond **co) 00714 { 00715 if (*co) 00716 { 00717 mem_free ((void **) &(**co).m_operands[0].m_regname); 00718 mem_free ((void **) &(**co).m_operands[1].m_regname); 00719 mem_free ((void **) co); 00720 } 00721 } 00722 00723 /* 00724 specifier type typedef 00725 */ 00726 typedef enum spec_type_ 00727 { 00728 st_false, 00729 st_true, 00730 st_byte, 00731 st_byte_range, 00732 st_string, 00733 st_identifier, 00734 st_identifier_loop, 00735 st_debug 00736 } spec_type; 00737 00738 /* 00739 specifier typedef 00740 */ 00741 typedef struct spec_ 00742 { 00743 spec_type m_spec_type; 00744 byte m_byte[2]; /* st_byte, st_byte_range */ 00745 byte *m_string; /* st_string */ 00746 struct rule_ *m_rule; /* st_identifier, st_identifier_loop */ 00747 emit *m_emits; 00748 error *m_errtext; 00749 cond *m_cond; 00750 struct spec_ *next; 00751 } spec; 00752 00753 static void spec_create (spec **sp) 00754 { 00755 *sp = (spec *) mem_alloc (sizeof (spec)); 00756 if (*sp) 00757 { 00758 (**sp).m_spec_type = st_false; 00759 (**sp).m_byte[0] = '\0'; 00760 (**sp).m_byte[1] = '\0'; 00761 (**sp).m_string = NULL; 00762 (**sp).m_rule = NULL; 00763 (**sp).m_emits = NULL; 00764 (**sp).m_errtext = NULL; 00765 (**sp).m_cond = NULL; 00766 (**sp).next = NULL; 00767 } 00768 } 00769 00770 static void spec_destroy (spec **sp) 00771 { 00772 if (*sp) 00773 { 00774 spec_destroy (&(**sp).next); 00775 emit_destroy (&(**sp).m_emits); 00776 error_destroy (&(**sp).m_errtext); 00777 mem_free ((void **) &(**sp).m_string); 00778 cond_destroy (&(**sp).m_cond); 00779 mem_free ((void **) sp); 00780 } 00781 } 00782 00783 GRAMMAR_IMPLEMENT_LIST_APPEND(spec) 00784 00785 /* 00786 operator typedef 00787 */ 00788 typedef enum oper_ 00789 { 00790 op_none, 00791 op_and, 00792 op_or 00793 } oper; 00794 00795 /* 00796 rule typedef 00797 */ 00798 typedef struct rule_ 00799 { 00800 oper m_oper; 00801 spec *m_specs; 00802 struct rule_ *next; 00803 int m_referenced; 00804 } rule; 00805 00806 static void rule_create (rule **ru) 00807 { 00808 *ru = (rule *) mem_alloc (sizeof (rule)); 00809 if (*ru) 00810 { 00811 (**ru).m_oper = op_none; 00812 (**ru).m_specs = NULL; 00813 (**ru).next = NULL; 00814 (**ru).m_referenced = 0; 00815 } 00816 } 00817 00818 static void rule_destroy (rule **ru) 00819 { 00820 if (*ru) 00821 { 00822 rule_destroy (&(**ru).next); 00823 spec_destroy (&(**ru).m_specs); 00824 mem_free ((void **) ru); 00825 } 00826 } 00827 00828 GRAMMAR_IMPLEMENT_LIST_APPEND(rule) 00829 00830 /* 00831 returns unique grammar id 00832 */ 00833 static grammar next_valid_grammar_id (void) 00834 { 00835 static grammar id = 0; 00836 00837 return ++id; 00838 } 00839 00840 /* 00841 dictionary typedef 00842 */ 00843 typedef struct dict_ 00844 { 00845 rule *m_rulez; 00846 rule *m_syntax; 00847 rule *m_string; 00848 map_byte *m_regbytes; 00849 grammar m_id; 00850 struct dict_ *next; 00851 } dict; 00852 00853 static void dict_create (dict **di) 00854 { 00855 *di = (dict *) mem_alloc (sizeof (dict)); 00856 if (*di) 00857 { 00858 (**di).m_rulez = NULL; 00859 (**di).m_syntax = NULL; 00860 (**di).m_string = NULL; 00861 (**di).m_regbytes = NULL; 00862 (**di).m_id = next_valid_grammar_id (); 00863 (**di).next = NULL; 00864 } 00865 } 00866 00867 static void dict_destroy (dict **di) 00868 { 00869 if (*di) 00870 { 00871 rule_destroy (&(**di).m_rulez); 00872 map_byte_destroy (&(**di).m_regbytes); 00873 mem_free ((void **) di); 00874 } 00875 } 00876 00877 GRAMMAR_IMPLEMENT_LIST_APPEND(dict) 00878 00879 static void dict_find (dict **di, grammar key, dict **data) 00880 { 00881 while (*di) 00882 { 00883 if ((**di).m_id == key) 00884 { 00885 *data = *di; 00886 return; 00887 } 00888 00889 di = &(**di).next; 00890 } 00891 00892 *data = NULL; 00893 } 00894 00895 static dict *g_dicts = NULL; 00896 00897 /* 00898 byte array typedef 00899 */ 00900 typedef struct barray_ 00901 { 00902 byte *data; 00903 unsigned int len; 00904 } barray; 00905 00906 static void barray_create (barray **ba) 00907 { 00908 *ba = (barray *) mem_alloc (sizeof (barray)); 00909 if (*ba) 00910 { 00911 (**ba).data = NULL; 00912 (**ba).len = 0; 00913 } 00914 } 00915 00916 static void barray_destroy (barray **ba) 00917 { 00918 if (*ba) 00919 { 00920 mem_free ((void **) &(**ba).data); 00921 mem_free ((void **) ba); 00922 } 00923 } 00924 00925 /* 00926 reallocates byte array to requested size, 00927 returns 0 on success, 00928 returns 1 otherwise 00929 */ 00930 static int barray_resize (barray **ba, unsigned int nlen) 00931 { 00932 byte *new_pointer; 00933 00934 if (nlen == 0) 00935 { 00936 mem_free ((void **) &(**ba).data); 00937 (**ba).data = NULL; 00938 (**ba).len = 0; 00939 00940 return 0; 00941 } 00942 else 00943 { 00944 new_pointer = (byte *) mem_realloc ((**ba).data, (**ba).len * sizeof (byte), 00945 nlen * sizeof (byte)); 00946 if (new_pointer) 00947 { 00948 (**ba).data = new_pointer; 00949 (**ba).len = nlen; 00950 00951 return 0; 00952 } 00953 } 00954 00955 return 1; 00956 } 00957 00958 /* 00959 adds byte array pointed by *nb to the end of array pointed by *ba, 00960 returns 0 on success, 00961 returns 1 otherwise 00962 */ 00963 static int barray_append (barray **ba, barray **nb) 00964 { 00965 const unsigned int len = (**ba).len; 00966 00967 if (barray_resize (ba, (**ba).len + (**nb).len)) 00968 return 1; 00969 00970 mem_copy ((**ba).data + len, (**nb).data, (**nb).len); 00971 00972 return 0; 00973 } 00974 00975 /* 00976 adds emit chain pointed by em to the end of array pointed by *ba, 00977 returns 0 on success, 00978 returns 1 otherwise 00979 */ 00980 static int barray_push (barray **ba, emit *em, byte c, unsigned int pos, regbyte_ctx **rbc) 00981 { 00982 unsigned int count = emit_size (em); 00983 00984 if (barray_resize (ba, (**ba).len + count)) 00985 return 1; 00986 00987 return emit_push (em, (**ba).data + ((**ba).len - count), c, pos, rbc); 00988 } 00989 00990 /* 00991 byte pool typedef 00992 */ 00993 typedef struct bytepool_ 00994 { 00995 byte *_F; 00996 unsigned int _Siz; 00997 } bytepool; 00998 00999 static void bytepool_destroy (bytepool **by) 01000 { 01001 if (*by != NULL) 01002 { 01003 mem_free ((void **) &(**by)._F); 01004 mem_free ((void **) by); 01005 } 01006 } 01007 01008 static void bytepool_create (bytepool **by, int len) 01009 { 01010 *by = (bytepool *) (mem_alloc (sizeof (bytepool))); 01011 if (*by != NULL) 01012 { 01013 (**by)._F = (byte *) (mem_alloc (sizeof (byte) * len)); 01014 (**by)._Siz = len; 01015 01016 if ((**by)._F == NULL) 01017 bytepool_destroy (by); 01018 } 01019 } 01020 01021 static int bytepool_reserve (bytepool *by, unsigned int n) 01022 { 01023 byte *_P; 01024 01025 if (n <= by->_Siz) 01026 return 0; 01027 01028 /* byte pool can only grow and at least by doubling its size */ 01029 n = n >= by->_Siz * 2 ? n : by->_Siz * 2; 01030 01031 /* reallocate the memory and adjust pointers to the new memory location */ 01032 _P = (byte *) (mem_realloc (by->_F, sizeof (byte) * by->_Siz, sizeof (byte) * n)); 01033 if (_P != NULL) 01034 { 01035 by->_F = _P; 01036 by->_Siz = n; 01037 return 0; 01038 } 01039 01040 return 1; 01041 } 01042 01043 /* 01044 string to string map typedef 01045 */ 01046 typedef struct map_str_ 01047 { 01048 byte *key; 01049 byte *data; 01050 struct map_str_ *next; 01051 } map_str; 01052 01053 static void map_str_create (map_str **ma) 01054 { 01055 *ma = (map_str *) mem_alloc (sizeof (map_str)); 01056 if (*ma) 01057 { 01058 (**ma).key = NULL; 01059 (**ma).data = NULL; 01060 (**ma).next = NULL; 01061 } 01062 } 01063 01064 static void map_str_destroy (map_str **ma) 01065 { 01066 if (*ma) 01067 { 01068 map_str_destroy (&(**ma).next); 01069 mem_free ((void **) &(**ma).key); 01070 mem_free ((void **) &(**ma).data); 01071 mem_free ((void **) ma); 01072 } 01073 } 01074 01075 GRAMMAR_IMPLEMENT_LIST_APPEND(map_str) 01076 01077 /* 01078 searches the map for specified key, 01079 if the key is matched, *data is filled with data associated with the key, 01080 returns 0 if the key is matched, 01081 returns 1 otherwise 01082 */ 01083 static int map_str_find (map_str **ma, const byte *key, byte **data) 01084 { 01085 while (*ma) 01086 { 01087 if (str_equal ((**ma).key, key)) 01088 { 01089 *data = str_duplicate ((**ma).data); 01090 if (*data == NULL) 01091 return 1; 01092 01093 return 0; 01094 } 01095 01096 ma = &(**ma).next; 01097 } 01098 01099 set_last_error (UNRESOLVED_REFERENCE, str_duplicate (key), -1); 01100 return 1; 01101 } 01102 01103 /* 01104 string to rule map typedef 01105 */ 01106 typedef struct map_rule_ 01107 { 01108 byte *key; 01109 rule *data; 01110 struct map_rule_ *next; 01111 } map_rule; 01112 01113 static void map_rule_create (map_rule **ma) 01114 { 01115 *ma = (map_rule *) mem_alloc (sizeof (map_rule)); 01116 if (*ma) 01117 { 01118 (**ma).key = NULL; 01119 (**ma).data = NULL; 01120 (**ma).next = NULL; 01121 } 01122 } 01123 01124 static void map_rule_destroy (map_rule **ma) 01125 { 01126 if (*ma) 01127 { 01128 map_rule_destroy (&(**ma).next); 01129 mem_free ((void **) &(**ma).key); 01130 mem_free ((void **) ma); 01131 } 01132 } 01133 01134 GRAMMAR_IMPLEMENT_LIST_APPEND(map_rule) 01135 01136 /* 01137 searches the map for specified key, 01138 if the key is matched, *data is filled with data associated with the key, 01139 returns 0 if the is matched, 01140 returns 1 otherwise 01141 */ 01142 static int map_rule_find (map_rule **ma, const byte *key, rule **data) 01143 { 01144 while (*ma) 01145 { 01146 if (str_equal ((**ma).key, key)) 01147 { 01148 *data = (**ma).data; 01149 01150 return 0; 01151 } 01152 01153 ma = &(**ma).next; 01154 } 01155 01156 set_last_error (UNRESOLVED_REFERENCE, str_duplicate (key), -1); 01157 return 1; 01158 } 01159 01160 /* 01161 returns 1 if given character is a white space, 01162 returns 0 otherwise 01163 */ 01164 static int is_space (byte c) 01165 { 01166 return c == ' ' || c == '\t' || c == '\n' || c == '\r'; 01167 } 01168 01169 /* 01170 advances text pointer by 1 if character pointed by *text is a space, 01171 returns 1 if a space has been eaten, 01172 returns 0 otherwise 01173 */ 01174 static int eat_space (const byte **text) 01175 { 01176 if (is_space (**text)) 01177 { 01178 (*text)++; 01179 01180 return 1; 01181 } 01182 01183 return 0; 01184 } 01185 01186 /* 01187 returns 1 if text points to C-style comment start string, 01188 returns 0 otherwise 01189 */ 01190 static int is_comment_start (const byte *text) 01191 { 01192 return text[0] == '/' && text[1] == '*'; 01193 } 01194 01195 /* 01196 advances text pointer to first character after C-style comment block - if any, 01197 returns 1 if C-style comment block has been encountered and eaten, 01198 returns 0 otherwise 01199 */ 01200 static int eat_comment (const byte **text) 01201 { 01202 if (is_comment_start (*text)) 01203 { 01204 /* *text points to comment block - skip two characters to enter comment body */ 01205 *text += 2; 01206 /* skip any character except consecutive '*' and '/' */ 01207 while (!((*text)[0] == '*' && (*text)[1] == '/')) 01208 (*text)++; 01209 /* skip those two terminating characters */ 01210 *text += 2; 01211 01212 return 1; 01213 } 01214 01215 return 0; 01216 } 01217 01218 /* 01219 advances text pointer to first character that is neither space nor C-style comment block 01220 */ 01221 static void eat_spaces (const byte **text) 01222 { 01223 while (eat_space (text) || eat_comment (text)) 01224 ; 01225 } 01226 01227 /* 01228 resizes string pointed by *ptr to successfully add character c to the end of the string, 01229 returns 0 on success, 01230 returns 1 otherwise 01231 */ 01232 static int string_grow (byte **ptr, unsigned int *len, byte c) 01233 { 01234 /* reallocate the string in 16-byte increments */ 01235 if ((*len & 0x0F) == 0x0F || *ptr == NULL) 01236 { 01237 byte *tmp = (byte *) mem_realloc (*ptr, ((*len + 1) & ~0x0F) * sizeof (byte), 01238 ((*len + 1 + 0x10) & ~0x0F) * sizeof (byte)); 01239 if (tmp == NULL) 01240 return 1; 01241 01242 *ptr = tmp; 01243 } 01244 01245 if (c) 01246 { 01247 /* append given character */ 01248 (*ptr)[*len] = c; 01249 (*len)++; 01250 } 01251 (*ptr)[*len] = '\0'; 01252 01253 return 0; 01254 } 01255 01256 /* 01257 returns 1 if given character is a valid identifier character a-z, A-Z, 0-9 or _ 01258 returns 0 otherwise 01259 */ 01260 static int is_identifier (byte c) 01261 { 01262 return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_'; 01263 } 01264 01265 /* 01266 copies characters from *text to *id until non-identifier character is encountered, 01267 assumes that *id points to NULL object - caller is responsible for later freeing the string, 01268 text pointer is advanced to point past the copied identifier, 01269 returns 0 if identifier was successfully copied, 01270 returns 1 otherwise 01271 */ 01272 static int get_identifier (const byte **text, byte **id) 01273 { 01274 const byte *t = *text; 01275 byte *p = NULL; 01276 unsigned int len = 0; 01277 01278 if (string_grow (&p, &len, '\0')) 01279 return 1; 01280 01281 /* loop while next character in buffer is valid for identifiers */ 01282 while (is_identifier (*t)) 01283 { 01284 if (string_grow (&p, &len, *t++)) 01285 { 01286 mem_free ((void **) (void *) &p); 01287 return 1; 01288 } 01289 } 01290 01291 *text = t; 01292 *id = p; 01293 01294 return 0; 01295 } 01296 01297 /* 01298 converts sequence of DEC digits pointed by *text until non-DEC digit is encountered, 01299 advances text pointer past the converted sequence, 01300 returns the converted value 01301 */ 01302 static unsigned int dec_convert (const byte **text) 01303 { 01304 unsigned int value = 0; 01305 01306 while (**text >= '0' && **text <= '9') 01307 { 01308 value = value * 10 + **text - '0'; 01309 (*text)++; 01310 } 01311 01312 return value; 01313 } 01314 01315 /* 01316 returns 1 if given character is HEX digit 0-9, A-F or a-f, 01317 returns 0 otherwise 01318 */ 01319 static int is_hex (byte c) 01320 { 01321 return (c >= '0' && c <= '9') || (c >= 'A' && c <= 'F') || (c >= 'a' && c <= 'f'); 01322 } 01323 01324 /* 01325 returns value of passed character as if it was HEX digit 01326 */ 01327 static unsigned int hex2dec (byte c) 01328 { 01329 if (c >= '0' && c <= '9') 01330 return c - '0'; 01331 if (c >= 'A' && c <= 'F') 01332 return c - 'A' + 10; 01333 return c - 'a' + 10; 01334 } 01335 01336 /* 01337 converts sequence of HEX digits pointed by *text until non-HEX digit is encountered, 01338 advances text pointer past the converted sequence, 01339 returns the converted value 01340 */ 01341 static unsigned int hex_convert (const byte **text) 01342 { 01343 unsigned int value = 0; 01344 01345 while (is_hex (**text)) 01346 { 01347 value = value * 0x10 + hex2dec (**text); 01348 (*text)++; 01349 } 01350 01351 return value; 01352 } 01353 01354 /* 01355 returns 1 if given character is OCT digit 0-7, 01356 returns 0 otherwise 01357 */ 01358 static int is_oct (byte c) 01359 { 01360 return c >= '0' && c <= '7'; 01361 } 01362 01363 /* 01364 returns value of passed character as if it was OCT digit 01365 */ 01366 static int oct2dec (byte c) 01367 { 01368 return c - '0'; 01369 } 01370 01371 static byte get_escape_sequence (const byte **text) 01372 { 01373 int value = 0; 01374 01375 /* skip '\' character */ 01376 (*text)++; 01377 01378 switch (*(*text)++) 01379 { 01380 case '\'': 01381 return '\''; 01382 case '"': 01383 return '\"'; 01384 case '?': 01385 return '\?'; 01386 case '\\': 01387 return '\\'; 01388 case 'a': 01389 return '\a'; 01390 case 'b': 01391 return '\b'; 01392 case 'f': 01393 return '\f'; 01394 case 'n': 01395 return '\n'; 01396 case 'r': 01397 return '\r'; 01398 case 't': 01399 return '\t'; 01400 case 'v': 01401 return '\v'; 01402 case 'x': 01403 return (byte) hex_convert (text); 01404 } 01405 01406 (*text)--; 01407 if (is_oct (**text)) 01408 { 01409 value = oct2dec (*(*text)++); 01410 if (is_oct (**text)) 01411 { 01412 value = value * 010 + oct2dec (*(*text)++); 01413 if (is_oct (**text)) 01414 value = value * 010 + oct2dec (*(*text)++); 01415 } 01416 } 01417 01418 return (byte) value; 01419 } 01420 01421 /* 01422 copies characters from *text to *str until " or ' character is encountered, 01423 assumes that *str points to NULL object - caller is responsible for later freeing the string, 01424 assumes that *text points to " or ' character that starts the string, 01425 text pointer is advanced to point past the " or ' character, 01426 returns 0 if string was successfully copied, 01427 returns 1 otherwise 01428 */ 01429 static int get_string (const byte **text, byte **str) 01430 { 01431 const byte *t = *text; 01432 byte *p = NULL; 01433 unsigned int len = 0; 01434 byte term_char; 01435 01436 if (string_grow (&p, &len, '\0')) 01437 return 1; 01438 01439 /* read " or ' character that starts the string */ 01440 term_char = *t++; 01441 /* while next character is not the terminating character */ 01442 while (*t && *t != term_char) 01443 { 01444 byte c; 01445 01446 if (*t == '\\') 01447 c = get_escape_sequence (&t); 01448 else 01449 c = *t++; 01450 01451 if (string_grow (&p, &len, c)) 01452 { 01453 mem_free ((void **) (void *) &p); 01454 return 1; 01455 } 01456 } 01457 /* skip " or ' character that ends the string */ 01458 t++; 01459 01460 *text = t; 01461 *str = p; 01462 return 0; 01463 } 01464 01465 /* 01466 gets emit code, the syntax is: 01467 ".emtcode" " " <symbol> " " (("0x" | "0X") <hex_value>) | <dec_value> | <character> 01468 assumes that *text already points to <symbol>, 01469 returns 0 if emit code is successfully read, 01470 returns 1 otherwise 01471 */ 01472 static int get_emtcode (const byte **text, map_byte **ma) 01473 { 01474 const byte *t = *text; 01475 map_byte *m = NULL; 01476 01477 map_byte_create (&m); 01478 if (m == NULL) 01479 return 1; 01480 01481 if (get_identifier (&t, &m->key)) 01482 { 01483 map_byte_destroy (&m); 01484 return 1; 01485 } 01486 eat_spaces (&t); 01487 01488 if (*t == '\'') 01489 { 01490 byte *c; 01491 01492 if (get_string (&t, &c)) 01493 { 01494 map_byte_destroy (&m); 01495 return 1; 01496 } 01497 01498 m->data = (byte) c[0]; 01499 mem_free ((void **) (void *) &c); 01500 } 01501 else if (t[0] == '0' && (t[1] == 'x' || t[1] == 'X')) 01502 { 01503 /* skip HEX "0x" or "0X" prefix */ 01504 t += 2; 01505 m->data = (byte) hex_convert (&t); 01506 } 01507 else 01508 { 01509 m->data = (byte) dec_convert (&t); 01510 } 01511 01512 eat_spaces (&t); 01513 01514 *text = t; 01515 *ma = m; 01516 return 0; 01517 } 01518 01519 /* 01520 gets regbyte declaration, the syntax is: 01521 ".regbyte" " " <symbol> " " (("0x" | "0X") <hex_value>) | <dec_value> | <character> 01522 assumes that *text already points to <symbol>, 01523 returns 0 if regbyte is successfully read, 01524 returns 1 otherwise 01525 */ 01526 static int get_regbyte (const byte **text, map_byte **ma) 01527 { 01528 /* pass it to the emtcode parser as it has the same syntax starting at <symbol> */ 01529 return get_emtcode (text, ma); 01530 } 01531 01532 /* 01533 returns 0 on success, 01534 returns 1 otherwise 01535 */ 01536 static int get_errtext (const byte **text, map_str **ma) 01537 { 01538 const byte *t = *text; 01539 map_str *m = NULL; 01540 01541 map_str_create (&m); 01542 if (m == NULL) 01543 return 1; 01544 01545 if (get_identifier (&t, &m->key)) 01546 { 01547 map_str_destroy (&m); 01548 return 1; 01549 } 01550 eat_spaces (&t); 01551 01552 if (get_string (&t, &m->data)) 01553 { 01554 map_str_destroy (&m); 01555 return 1; 01556 } 01557 eat_spaces (&t); 01558 01559 *text = t; 01560 *ma = m; 01561 return 0; 01562 } 01563 01564 /* 01565 returns 0 on success, 01566 returns 1 otherwise, 01567 */ 01568 static int get_error (const byte **text, error **er, map_str *maps) 01569 { 01570 const byte *t = *text; 01571 byte *temp = NULL; 01572 01573 if (*t != '.') 01574 return 0; 01575 01576 t++; 01577 if (get_identifier (&t, &temp)) 01578 return 1; 01579 eat_spaces (&t); 01580 01581 if (!str_equal ((byte *) "error", temp)) 01582 { 01583 mem_free ((void **) (void *) &temp); 01584 return 0; 01585 } 01586 01587 mem_free ((void **) (void *) &temp); 01588 01589 error_create (er); 01590 if (*er == NULL) 01591 return 1; 01592 01593 if (*t == '\"') 01594 { 01595 if (get_string (&t, &(**er).m_text)) 01596 { 01597 error_destroy (er); 01598 return 1; 01599 } 01600 eat_spaces (&t); 01601 } 01602 else 01603 { 01604 if (get_identifier (&t, &temp)) 01605 { 01606 error_destroy (er); 01607 return 1; 01608 } 01609 eat_spaces (&t); 01610 01611 if (map_str_find (&maps, temp, &(**er).m_text)) 01612 { 01613 mem_free ((void **) (void *) &temp); 01614 error_destroy (er); 01615 return 1; 01616 } 01617 01618 mem_free ((void **) (void *) &temp); 01619 } 01620 01621 /* try to extract "token" from "...$token$..." */ 01622 { 01623 byte *processed = NULL; 01624 unsigned int len = 0; 01625 int i = 0; 01626 01627 if (string_grow (&processed, &len, '\0')) 01628 { 01629 error_destroy (er); 01630 return 1; 01631 } 01632 01633 while (i < str_length ((**er).m_text)) 01634 { 01635 /* check if the dollar sign is repeated - if so skip it */ 01636 if ((**er).m_text[i] == '$' && (**er).m_text[i + 1] == '$') 01637 { 01638 if (string_grow (&processed, &len, '$')) 01639 { 01640 mem_free ((void **) (void *) &processed); 01641 error_destroy (er); 01642 return 1; 01643 } 01644 01645 i += 2; 01646 } 01647 else if ((**er).m_text[i] != '$') 01648 { 01649 if (string_grow (&processed, &len, (**er).m_text[i])) 01650 { 01651 mem_free ((void **) (void *) &processed); 01652 error_destroy (er); 01653 return 1; 01654 } 01655 01656 i++; 01657 } 01658 else 01659 { 01660 if (string_grow (&processed, &len, '$')) 01661 { 01662 mem_free ((void **) (void *) &processed); 01663 error_destroy (er); 01664 return 1; 01665 } 01666 01667 { 01668 /* length of token being extracted */ 01669 unsigned int tlen = 0; 01670 01671 if (string_grow (&(**er).m_token_name, &tlen, '\0')) 01672 { 01673 mem_free ((void **) (void *) &processed); 01674 error_destroy (er); 01675 return 1; 01676 } 01677 01678 /* skip the dollar sign */ 01679 i++; 01680 01681 while ((**er).m_text[i] != '$') 01682 { 01683 if (string_grow (&(**er).m_token_name, &tlen, (**er).m_text[i])) 01684 { 01685 mem_free ((void **) (void *) &processed); 01686 error_destroy (er); 01687 return 1; 01688 } 01689 01690 i++; 01691 } 01692 01693 /* skip the dollar sign */ 01694 i++; 01695 } 01696 } 01697 } 01698 01699 mem_free ((void **) &(**er).m_text); 01700 (**er).m_text = processed; 01701 } 01702 01703 *text = t; 01704 return 0; 01705 } 01706 01707 /* 01708 returns 0 on success, 01709 returns 1 otherwise, 01710 */ 01711 static int get_emits (const byte **text, emit **em, map_byte *mapb) 01712 { 01713 const byte *t = *text; 01714 byte *temp = NULL; 01715 emit *e = NULL; 01716 emit_dest dest; 01717 01718 if (*t != '.') 01719 return 0; 01720 01721 t++; 01722 if (get_identifier (&t, &temp)) 01723 return 1; 01724 eat_spaces (&t); 01725 01726 /* .emit */ 01727 if (str_equal ((byte *) "emit", temp)) 01728 dest = ed_output; 01729 /* .load */ 01730 else if (str_equal ((byte *) "load", temp)) 01731 dest = ed_regbyte; 01732 else 01733 { 01734 mem_free ((void **) (void *) &temp); 01735 return 0; 01736 } 01737 01738 mem_free ((void **) (void *) &temp); 01739 01740 emit_create (&e); 01741 if (e == NULL) 01742 return 1; 01743 01744 e->m_emit_dest = dest; 01745 01746 if (dest == ed_regbyte) 01747 { 01748 if (get_identifier (&t, &e->m_regname)) 01749 { 01750 emit_destroy (&e); 01751 return 1; 01752 } 01753 eat_spaces (&t); 01754 } 01755 01756 /* 0xNN */ 01757 if (*t == '0' && (t[1] == 'x' || t[1] == 'X')) 01758 { 01759 t += 2; 01760 e->m_byte = (byte) hex_convert (&t); 01761 01762 e->m_emit_type = et_byte; 01763 } 01764 /* NNN */ 01765 else if (*t >= '0' && *t <= '9') 01766 { 01767 e->m_byte = (byte) dec_convert (&t); 01768 01769 e->m_emit_type = et_byte; 01770 } 01771 /* * */ 01772 else if (*t == '*') 01773 { 01774 t++; 01775 01776 e->m_emit_type = et_stream; 01777 } 01778 /* $ */ 01779 else if (*t == '$') 01780 { 01781 t++; 01782 01783 e->m_emit_type = et_position; 01784 } 01785 /* 'c' */ 01786 else if (*t == '\'') 01787 { 01788 if (get_string (&t, &temp)) 01789 { 01790 emit_destroy (&e); 01791 return 1; 01792 } 01793 e->m_byte = (byte) temp[0]; 01794 01795 mem_free ((void **) (void *) &temp); 01796 01797 e->m_emit_type = et_byte; 01798 } 01799 else 01800 { 01801 if (get_identifier (&t, &temp)) 01802 { 01803 emit_destroy (&e); 01804 return 1; 01805 } 01806 01807 if (map_byte_find (&mapb, temp, &e->m_byte)) 01808 { 01809 mem_free ((void **) (void *) &temp); 01810 emit_destroy (&e); 01811 return 1; 01812 } 01813 01814 mem_free ((void **) (void *) &temp); 01815 01816 e->m_emit_type = et_byte; 01817 } 01818 01819 eat_spaces (&t); 01820 01821 if (get_emits (&t, &e->m_next, mapb)) 01822 { 01823 emit_destroy (&e); 01824 return 1; 01825 } 01826 01827 *text = t; 01828 *em = e; 01829 return 0; 01830 } 01831 01832 /* 01833 returns 0 on success, 01834 returns 1 otherwise, 01835 */ 01836 static int get_spec (const byte **text, spec **sp, map_str *maps, map_byte *mapb) 01837 { 01838 const byte *t = *text; 01839 spec *s = NULL; 01840 01841 spec_create (&s); 01842 if (s == NULL) 01843 return 1; 01844 01845 /* first - read optional .if statement */ 01846 if (*t == '.') 01847 { 01848 const byte *u = t; 01849 byte *keyword = NULL; 01850 01851 /* skip the dot */ 01852 u++; 01853 01854 if (get_identifier (&u, &keyword)) 01855 { 01856 spec_destroy (&s); 01857 return 1; 01858 } 01859 01860 /* .if */ 01861 if (str_equal ((byte *) "if", keyword)) 01862 { 01863 cond_create (&s->m_cond); 01864 if (s->m_cond == NULL) 01865 { 01866 spec_destroy (&s); 01867 return 1; 01868 } 01869 01870 /* skip the left paren */ 01871 eat_spaces (&u); 01872 u++; 01873 01874 /* get the left operand */ 01875 eat_spaces (&u); 01876 if (get_identifier (&u, &s->m_cond->m_operands[0].m_regname)) 01877 { 01878 spec_destroy (&s); 01879 return 1; 01880 } 01881 s->m_cond->m_operands[0].m_type = cot_regbyte; 01882 01883 /* get the operator (!= or ==) */ 01884 eat_spaces (&u); 01885 if (*u == '!') 01886 s->m_cond->m_type = ct_not_equal; 01887 else 01888 s->m_cond->m_type = ct_equal; 01889 u += 2; 01890 eat_spaces (&u); 01891 01892 if (u[0] == '0' && (u[1] == 'x' || u[1] == 'X')) 01893 { 01894 /* skip the 0x prefix */ 01895 u += 2; 01896 01897 /* get the right operand */ 01898 s->m_cond->m_operands[1].m_byte = hex_convert (&u); 01899 s->m_cond->m_operands[1].m_type = cot_byte; 01900 } 01901 else /*if (*u >= '0' && *u <= '9')*/ 01902 { 01903 /* get the right operand */ 01904 s->m_cond->m_operands[1].m_byte = dec_convert (&u); 01905 s->m_cond->m_operands[1].m_type = cot_byte; 01906 } 01907 01908 /* skip the right paren */ 01909 eat_spaces (&u); 01910 u++; 01911 01912 eat_spaces (&u); 01913 01914 t = u; 01915 } 01916 01917 mem_free ((void **) (void *) &keyword); 01918 } 01919 01920 if (*t == '\'') 01921 { 01922 byte *temp = NULL; 01923 01924 if (get_string (&t, &temp)) 01925 { 01926 spec_destroy (&s); 01927 return 1; 01928 } 01929 eat_spaces (&t); 01930 01931 if (*t == '-') 01932 { 01933 byte *temp2 = NULL; 01934 01935 /* skip the '-' character */ 01936 t++; 01937 eat_spaces (&t); 01938 01939 if (get_string (&t, &temp2)) 01940 { 01941 mem_free ((void **) (void *) &temp); 01942 spec_destroy (&s); 01943 return 1; 01944 } 01945 eat_spaces (&t); 01946 01947 s->m_spec_type = st_byte_range; 01948 s->m_byte[0] = *temp; 01949 s->m_byte[1] = *temp2; 01950 01951 mem_free ((void **) (void *) &temp2); 01952 } 01953 else 01954 { 01955 s->m_spec_type = st_byte; 01956 *s->m_byte = *temp; 01957 } 01958 01959 mem_free ((void **) (void *) &temp); 01960 } 01961 else if (*t == '"') 01962 { 01963 if (get_string (&t, &s->m_string)) 01964 { 01965 spec_destroy (&s); 01966 return 1; 01967 } 01968 eat_spaces (&t); 01969 01970 s->m_spec_type = st_string; 01971 } 01972 else if (*t == '.') 01973 { 01974 byte *keyword = NULL; 01975 01976 /* skip the dot */ 01977 t++; 01978 01979 if (get_identifier (&t, &keyword)) 01980 { 01981 spec_destroy (&s); 01982 return 1; 01983 } 01984 eat_spaces (&t); 01985 01986 /* .true */ 01987 if (str_equal ((byte *) "true", keyword)) 01988 { 01989 s->m_spec_type = st_true; 01990 } 01991 /* .false */ 01992 else if (str_equal ((byte *) "false", keyword)) 01993 { 01994 s->m_spec_type = st_false; 01995 } 01996 /* .debug */ 01997 else if (str_equal ((byte *) "debug", keyword)) 01998 { 01999 s->m_spec_type = st_debug; 02000 } 02001 /* .loop */ 02002 else if (str_equal ((byte *) "loop", keyword)) 02003 { 02004 if (get_identifier (&t, &s->m_string)) 02005 { 02006 mem_free ((void **) (void *) &keyword); 02007 spec_destroy (&s); 02008 return 1; 02009 } 02010 eat_spaces (&t); 02011 02012 s->m_spec_type = st_identifier_loop; 02013 } 02014 mem_free ((void **) (void *) &keyword); 02015 } 02016 else 02017 { 02018 if (get_identifier (&t, &s->m_string)) 02019 { 02020 spec_destroy (&s); 02021 return 1; 02022 } 02023 eat_spaces (&t); 02024 02025 s->m_spec_type = st_identifier; 02026 } 02027 02028 if (get_error (&t, &s->m_errtext, maps)) 02029 { 02030 spec_destroy (&s); 02031 return 1; 02032 } 02033 02034 if (get_emits (&t, &s->m_emits, mapb)) 02035 { 02036 spec_destroy (&s); 02037 return 1; 02038 } 02039 02040 *text = t; 02041 *sp = s; 02042 return 0; 02043 } 02044 02045 /* 02046 returns 0 on success, 02047 returns 1 otherwise, 02048 */ 02049 static int get_rule (const byte **text, rule **ru, map_str *maps, map_byte *mapb) 02050 { 02051 const byte *t = *text; 02052 rule *r = NULL; 02053 02054 rule_create (&r); 02055 if (r == NULL) 02056 return 1; 02057 02058 if (get_spec (&t, &r->m_specs, maps, mapb)) 02059 { 02060 rule_destroy (&r); 02061 return 1; 02062 } 02063 02064 while (*t != ';') 02065 { 02066 byte *op = NULL; 02067 spec *sp = NULL; 02068 02069 /* skip the dot that precedes "and" or "or" */ 02070 t++; 02071 02072 /* read "and" or "or" keyword */ 02073 if (get_identifier (&t, &op)) 02074 { 02075 rule_destroy (&r); 02076 return 1; 02077 } 02078 eat_spaces (&t); 02079 02080 if (r->m_oper == op_none) 02081 { 02082 /* .and */ 02083 if (str_equal ((byte *) "and", op)) 02084 r->m_oper = op_and; 02085 /* .or */ 02086 else 02087 r->m_oper = op_or; 02088 } 02089 02090 mem_free ((void **) (void *) &op); 02091 02092 if (get_spec (&t, &sp, maps, mapb)) 02093 { 02094 rule_destroy (&r); 02095 return 1; 02096 } 02097 02098 spec_append (&r->m_specs, sp); 02099 } 02100 02101 /* skip the semicolon */ 02102 t++; 02103 eat_spaces (&t); 02104 02105 *text = t; 02106 *ru = r; 02107 return 0; 02108 } 02109 02110 /* 02111 returns 0 on success, 02112 returns 1 otherwise, 02113 */ 02114 static int update_dependency (map_rule *mapr, byte *symbol, rule **ru) 02115 { 02116 if (map_rule_find (&mapr, symbol, ru)) 02117 return 1; 02118 02119 (**ru).m_referenced = 1; 02120 02121 return 0; 02122 } 02123 02124 /* 02125 returns 0 on success, 02126 returns 1 otherwise, 02127 */ 02128 static int update_dependencies (dict *di, map_rule *mapr, byte **syntax_symbol, 02129 byte **string_symbol, map_byte *regbytes) 02130 { 02131 rule *rulez = di->m_rulez; 02132 02133 /* update dependecies for the root and lexer symbols */ 02134 if (update_dependency (mapr, *syntax_symbol, &di->m_syntax) || 02135 (*string_symbol != NULL && update_dependency (mapr, *string_symbol, &di->m_string))) 02136 return 1; 02137 02138 mem_free ((void **) syntax_symbol); 02139 mem_free ((void **) string_symbol); 02140 02141 /* update dependecies for the rest of the rules */ 02142 while (rulez) 02143 { 02144 spec *sp = rulez->m_specs; 02145 02146 /* iterate through all the specifiers */ 02147 while (sp) 02148 { 02149 /* update dependency for identifier */ 02150 if (sp->m_spec_type == st_identifier || sp->m_spec_type == st_identifier_loop) 02151 { 02152 if (update_dependency (mapr, sp->m_string, &sp->m_rule)) 02153 return 1; 02154 02155 mem_free ((void **) &sp->m_string); 02156 } 02157 02158 /* some errtexts reference to a rule */ 02159 if (sp->m_errtext && sp->m_errtext->m_token_name) 02160 { 02161 if (update_dependency (mapr, sp->m_errtext->m_token_name, &sp->m_errtext->m_token)) 02162 return 1; 02163 02164 mem_free ((void **) &sp->m_errtext->m_token_name); 02165 } 02166 02167 /* update dependency for condition */ 02168 if (sp->m_cond) 02169 { 02170 int i; 02171 for (i = 0; i < 2; i++) 02172 if (sp->m_cond->m_operands[i].m_type == cot_regbyte) 02173 { 02174 sp->m_cond->m_operands[i].m_regbyte = map_byte_locate (®bytes, 02175 sp->m_cond->m_operands[i].m_regname); 02176 02177 if (sp->m_cond->m_operands[i].m_regbyte == NULL) 02178 return 1; 02179 02180 mem_free ((void **) &sp->m_cond->m_operands[i].m_regname); 02181 } 02182 } 02183 02184 /* update dependency for all .load instructions */ 02185 if (sp->m_emits) 02186 { 02187 emit *em = sp->m_emits; 02188 while (em != NULL) 02189 { 02190 if (em->m_emit_dest == ed_regbyte) 02191 { 02192 em->m_regbyte = map_byte_locate (®bytes, em->m_regname); 02193 02194 if (em->m_regbyte == NULL) 02195 return 1; 02196 02197 mem_free ((void **) &em->m_regname); 02198 } 02199 02200 em = em->m_next; 02201 } 02202 } 02203 02204 sp = sp->next; 02205 } 02206 02207 rulez = rulez->next; 02208 } 02209 02210 /* check for unreferenced symbols */ 02211 rulez = di->m_rulez; 02212 while (rulez != NULL) 02213 { 02214 if (!rulez->m_referenced) 02215 { 02216 map_rule *ma = mapr; 02217 while (ma) 02218 { 02219 if (ma->data == rulez) 02220 { 02221 set_last_error (UNREFERENCED_IDENTIFIER, str_duplicate (ma->key), -1); 02222 return 1; 02223 } 02224 ma = ma->next; 02225 } 02226 } 02227 rulez = rulez->next; 02228 } 02229 02230 return 0; 02231 } 02232 02233 static int satisfies_condition (cond *co, regbyte_ctx *ctx) 02234 { 02235 byte values[2]; 02236 int i; 02237 02238 if (co == NULL) 02239 return 1; 02240 02241 for (i = 0; i < 2; i++) 02242 switch (co->m_operands[i].m_type) 02243 { 02244 case cot_byte: 02245 values[i] = co->m_operands[i].m_byte; 02246 break; 02247 case cot_regbyte: 02248 values[i] = regbyte_ctx_extract (&ctx, co->m_operands[i].m_regbyte); 02249 break; 02250 } 02251 02252 switch (co->m_type) 02253 { 02254 case ct_equal: 02255 return values[0] == values[1]; 02256 case ct_not_equal: 02257 return values[0] != values[1]; 02258 } 02259 02260 return 0; 02261 } 02262 02263 static void free_regbyte_ctx_stack (regbyte_ctx *top, regbyte_ctx *limit) 02264 { 02265 while (top != limit) 02266 { 02267 regbyte_ctx *rbc = top->m_prev; 02268 regbyte_ctx_destroy (&top); 02269 top = rbc; 02270 } 02271 } 02272 02273 typedef enum match_result_ 02274 { 02275 mr_not_matched, /* the examined string does not match */ 02276 mr_matched, /* the examined string matches */ 02277 mr_error_raised, /* mr_not_matched + error has been raised */ 02278 mr_dont_emit, /* used by identifier loops only */ 02279 mr_internal_error /* an internal error has occured such as out of memory */ 02280 } match_result; 02281 02282 /* 02283 * This function does the main job. It parses the text and generates output data. 02284 */ 02285 static match_result 02286 match (dict *di, const byte *text, int *index, rule *ru, barray **ba, int filtering_string, 02287 regbyte_ctx **rbc) 02288 { 02289 int ind = *index; 02290 match_result status = mr_not_matched; 02291 spec *sp = ru->m_specs; 02292 regbyte_ctx *ctx = *rbc; 02293 02294 /* for every specifier in the rule */ 02295 while (sp) 02296 { 02297 int i, len, save_ind = ind; 02298 barray *array = NULL; 02299 02300 if (satisfies_condition (sp->m_cond, ctx)) 02301 { 02302 switch (sp->m_spec_type) 02303 { 02304 case st_identifier: 02305 barray_create (&array); 02306 if (array == NULL) 02307 { 02308 free_regbyte_ctx_stack (ctx, *rbc); 02309 return mr_internal_error; 02310 } 02311 02312 status = match (di, text, &ind, sp->m_rule, &array, filtering_string, &ctx); 02313 02314 if (status == mr_internal_error) 02315 { 02316 free_regbyte_ctx_stack (ctx, *rbc); 02317 barray_destroy (&array); 02318 return mr_internal_error; 02319 } 02320 break; 02321 case st_string: 02322 len = str_length (sp->m_string); 02323 02324 /* prefilter the stream */ 02325 if (!filtering_string && di->m_string) 02326 { 02327 barray *ba; 02328 int filter_index = 0; 02329 match_result result; 02330 regbyte_ctx *null_ctx = NULL; 02331 02332 barray_create (&ba); 02333 if (ba == NULL) 02334 { 02335 free_regbyte_ctx_stack (ctx, *rbc); 02336 return mr_internal_error; 02337 } 02338 02339 result = match (di, text + ind, &filter_index, di->m_string, &ba, 1, &null_ctx); 02340 02341 if (result == mr_internal_error) 02342 { 02343 free_regbyte_ctx_stack (ctx, *rbc); 02344 barray_destroy (&ba); 02345 return mr_internal_error; 02346 } 02347 02348 if (result != mr_matched) 02349 { 02350 barray_destroy (&ba); 02351 status = mr_not_matched; 02352 break; 02353 } 02354 02355 barray_destroy (&ba); 02356 02357 if (filter_index != len || !str_equal_n (sp->m_string, text + ind, len)) 02358 { 02359 status = mr_not_matched; 02360 break; 02361 } 02362 02363 status = mr_matched; 02364 ind += len; 02365 } 02366 else 02367 { 02368 status = mr_matched; 02369 for (i = 0; status == mr_matched && i < len; i++) 02370 if (text[ind + i] != sp->m_string[i]) 02371 status = mr_not_matched; 02372 02373 if (status == mr_matched) 02374 ind += len; 02375 } 02376 break; 02377 case st_byte: 02378 status = text[ind] == *sp->m_byte ? mr_matched : mr_not_matched; 02379 if (status == mr_matched) 02380 ind++; 02381 break; 02382 case st_byte_range: 02383 status = (text[ind] >= sp->m_byte[0] && text[ind] <= sp->m_byte[1]) ? 02384 mr_matched : mr_not_matched; 02385 if (status == mr_matched) 02386 ind++; 02387 break; 02388 case st_true: 02389 status = mr_matched; 02390 break; 02391 case st_false: 02392 status = mr_not_matched; 02393 break; 02394 case st_debug: 02395 status = ru->m_oper == op_and ? mr_matched : mr_not_matched; 02396 break; 02397 case st_identifier_loop: 02398 barray_create (&array); 02399 if (array == NULL) 02400 { 02401 free_regbyte_ctx_stack (ctx, *rbc); 02402 return mr_internal_error; 02403 } 02404 02405 status = mr_dont_emit; 02406 for (;;) 02407 { 02408 match_result result; 02409 02410 save_ind = ind; 02411 result = match (di, text, &ind, sp->m_rule, &array, filtering_string, &ctx); 02412 02413 if (result == mr_error_raised) 02414 { 02415 status = result; 02416 break; 02417 } 02418 else if (result == mr_matched) 02419 { 02420 if (barray_push (ba, sp->m_emits, text[ind - 1], save_ind, &ctx) || 02421 barray_append (ba, &array)) 02422 { 02423 free_regbyte_ctx_stack (ctx, *rbc); 02424 barray_destroy (&array); 02425 return mr_internal_error; 02426 } 02427 barray_destroy (&array); 02428 barray_create (&array); 02429 if (array == NULL) 02430 { 02431 free_regbyte_ctx_stack (ctx, *rbc); 02432 return mr_internal_error; 02433 } 02434 } 02435 else if (result == mr_internal_error) 02436 { 02437 free_regbyte_ctx_stack (ctx, *rbc); 02438 barray_destroy (&array); 02439 return mr_internal_error; 02440 } 02441 else 02442 break; 02443 } 02444 break; 02445 } 02446 } 02447 else 02448 { 02449 status = mr_not_matched; 02450 } 02451 02452 if (status == mr_error_raised) 02453 { 02454 free_regbyte_ctx_stack (ctx, *rbc); 02455 barray_destroy (&array); 02456 02457 return mr_error_raised; 02458 } 02459 02460 if (ru->m_oper == op_and && status != mr_matched && status != mr_dont_emit) 02461 { 02462 free_regbyte_ctx_stack (ctx, *rbc); 02463 barray_destroy (&array); 02464 02465 if (sp->m_errtext) 02466 { 02467 set_last_error (sp->m_errtext->m_text, error_get_token (sp->m_errtext, di, text, 02468 ind), ind); 02469 02470 return mr_error_raised; 02471 } 02472 02473 return mr_not_matched; 02474 } 02475 02476 if (status == mr_matched) 02477 { 02478 if (sp->m_emits) 02479 if (barray_push (ba, sp->m_emits, text[ind - 1], save_ind, &ctx)) 02480 { 02481 free_regbyte_ctx_stack (ctx, *rbc); 02482 barray_destroy (&array); 02483 return mr_internal_error; 02484 } 02485 02486 if (array) 02487 if (barray_append (ba, &array)) 02488 { 02489 free_regbyte_ctx_stack (ctx, *rbc); 02490 barray_destroy (&array); 02491 return mr_internal_error; 02492 } 02493 } 02494 02495 barray_destroy (&array); 02496 02497 /* if the rule operator is a logical or, we pick up the first matching specifier */ 02498 if (ru->m_oper == op_or && (status == mr_matched || status == mr_dont_emit)) 02499 { 02500 *index = ind; 02501 *rbc = ctx; 02502 return mr_matched; 02503 } 02504 02505 sp = sp->next; 02506 } 02507 02508 /* everything went fine - all specifiers match up */ 02509 if (ru->m_oper == op_and && (status == mr_matched || status == mr_dont_emit)) 02510 { 02511 *index = ind; 02512 *rbc = ctx; 02513 return mr_matched; 02514 } 02515 02516 free_regbyte_ctx_stack (ctx, *rbc); 02517 return mr_not_matched; 02518 } 02519 02520 static match_result 02521 fast_match (dict *di, const byte *text, int *index, rule *ru, int *_PP, bytepool *_BP, 02522 int filtering_string, regbyte_ctx **rbc) 02523 { 02524 int ind = *index; 02525 int _P = filtering_string ? 0 : *_PP; 02526 int _P2; 02527 match_result status = mr_not_matched; 02528 spec *sp = ru->m_specs; 02529 regbyte_ctx *ctx = *rbc; 02530 02531 /* for every specifier in the rule */ 02532 while (sp) 02533 { 02534 int i, len, save_ind = ind; 02535 02536 _P2 = _P + (sp->m_emits ? emit_size (sp->m_emits) : 0); 02537 if (bytepool_reserve (_BP, _P2)) 02538 { 02539 free_regbyte_ctx_stack (ctx, *rbc); 02540 return mr_internal_error; 02541 } 02542 02543 if (satisfies_condition (sp->m_cond, ctx)) 02544 { 02545 switch (sp->m_spec_type) 02546 { 02547 case st_identifier: 02548 status = fast_match (di, text, &ind, sp->m_rule, &_P2, _BP, filtering_string, &ctx); 02549 02550 if (status == mr_internal_error) 02551 { 02552 free_regbyte_ctx_stack (ctx, *rbc); 02553 return mr_internal_error; 02554 } 02555 break; 02556 case st_string: 02557 len = str_length (sp->m_string); 02558 02559 /* prefilter the stream */ 02560 if (!filtering_string && di->m_string) 02561 { 02562 int filter_index = 0; 02563 match_result result; 02564 regbyte_ctx *null_ctx = NULL; 02565 02566 result = fast_match (di, text + ind, &filter_index, di->m_string, NULL, _BP, 1, &null_ctx); 02567 02568 if (result == mr_internal_error) 02569 { 02570 free_regbyte_ctx_stack (ctx, *rbc); 02571 return mr_internal_error; 02572 } 02573 02574 if (result != mr_matched) 02575 { 02576 status = mr_not_matched; 02577 break; 02578 } 02579 02580 if (filter_index != len || !str_equal_n (sp->m_string, text + ind, len)) 02581 { 02582 status = mr_not_matched; 02583 break; 02584 } 02585 02586 status = mr_matched; 02587 ind += len; 02588 } 02589 else 02590 { 02591 status = mr_matched; 02592 for (i = 0; status == mr_matched && i < len; i++) 02593 if (text[ind + i] != sp->m_string[i]) 02594 status = mr_not_matched; 02595 02596 if (status == mr_matched) 02597 ind += len; 02598 } 02599 break; 02600 case st_byte: 02601 status = text[ind] == *sp->m_byte ? mr_matched : mr_not_matched; 02602 if (status == mr_matched) 02603 ind++; 02604 break; 02605 case st_byte_range: 02606 status = (text[ind] >= sp->m_byte[0] && text[ind] <= sp->m_byte[1]) ? 02607 mr_matched : mr_not_matched; 02608 if (status == mr_matched) 02609 ind++; 02610 break; 02611 case st_true: 02612 status = mr_matched; 02613 break; 02614 case st_false: 02615 status = mr_not_matched; 02616 break; 02617 case st_debug: 02618 status = ru->m_oper == op_and ? mr_matched : mr_not_matched; 02619 break; 02620 case st_identifier_loop: 02621 status = mr_dont_emit; 02622 for (;;) 02623 { 02624 match_result result; 02625 02626 save_ind = ind; 02627 result = fast_match (di, text, &ind, sp->m_rule, &_P2, _BP, filtering_string, &ctx); 02628 02629 if (result == mr_error_raised) 02630 { 02631 status = result; 02632 break; 02633 } 02634 else if (result == mr_matched) 02635 { 02636 if (!filtering_string) 02637 { 02638 if (sp->m_emits != NULL) 02639 { 02640 if (emit_push (sp->m_emits, _BP->_F + _P, text[ind - 1], save_ind, &ctx)) 02641 { 02642 free_regbyte_ctx_stack (ctx, *rbc); 02643 return mr_internal_error; 02644 } 02645 } 02646 02647 _P = _P2; 02648 _P2 += sp->m_emits ? emit_size (sp->m_emits) : 0; 02649 if (bytepool_reserve (_BP, _P2)) 02650 { 02651 free_regbyte_ctx_stack (ctx, *rbc); 02652 return mr_internal_error; 02653 } 02654 } 02655 } 02656 else if (result == mr_internal_error) 02657 { 02658 free_regbyte_ctx_stack (ctx, *rbc); 02659 return mr_internal_error; 02660 } 02661 else 02662 break; 02663 } 02664 break; 02665 } 02666 } 02667 else 02668 { 02669 status = mr_not_matched; 02670 } 02671 02672 if (status == mr_error_raised) 02673 { 02674 free_regbyte_ctx_stack (ctx, *rbc); 02675 02676 return mr_error_raised; 02677 } 02678 02679 if (ru->m_oper == op_and && status != mr_matched && status != mr_dont_emit) 02680 { 02681 free_regbyte_ctx_stack (ctx, *rbc); 02682 02683 if (sp->m_errtext) 02684 { 02685 set_last_error (sp->m_errtext->m_text, error_get_token (sp->m_errtext, di, text, 02686 ind), ind); 02687 02688 return mr_error_raised; 02689 } 02690 02691 return mr_not_matched; 02692 } 02693 02694 if (status == mr_matched) 02695 { 02696 if (sp->m_emits != NULL) { 02697 const byte ch = (ind <= 0) ? 0 : text[ind - 1]; 02698 if (emit_push (sp->m_emits, _BP->_F + _P, ch, save_ind, &ctx)) 02699 { 02700 free_regbyte_ctx_stack (ctx, *rbc); 02701 return mr_internal_error; 02702 } 02703 02704 } 02705 _P = _P2; 02706 } 02707 02708 /* if the rule operator is a logical or, we pick up the first matching specifier */ 02709 if (ru->m_oper == op_or && (status == mr_matched || status == mr_dont_emit)) 02710 { 02711 *index = ind; 02712 *rbc = ctx; 02713 if (!filtering_string) 02714 *_PP = _P; 02715 return mr_matched; 02716 } 02717 02718 sp = sp->next; 02719 } 02720 02721 /* everything went fine - all specifiers match up */ 02722 if (ru->m_oper == op_and && (status == mr_matched || status == mr_dont_emit)) 02723 { 02724 *index = ind; 02725 *rbc = ctx; 02726 if (!filtering_string) 02727 *_PP = _P; 02728 return mr_matched; 02729 } 02730 02731 free_regbyte_ctx_stack (ctx, *rbc); 02732 return mr_not_matched; 02733 } 02734 02735 static byte * 02736 error_get_token (error *er, dict *di, const byte *text, int ind) 02737 { 02738 byte *str = NULL; 02739 02740 if (er->m_token) 02741 { 02742 barray *ba; 02743 int filter_index = 0; 02744 regbyte_ctx *ctx = NULL; 02745 02746 barray_create (&ba); 02747 if (ba != NULL) 02748 { 02749 if (match (di, text + ind, &filter_index, er->m_token, &ba, 0, &ctx) == mr_matched && 02750 filter_index) 02751 { 02752 str = (byte *) mem_alloc (filter_index + 1); 02753 if (str != NULL) 02754 { 02755 str_copy_n (str, text + ind, filter_index); 02756 str[filter_index] = '\0'; 02757 } 02758 } 02759 barray_destroy (&ba); 02760 } 02761 } 02762 02763 return str; 02764 } 02765 02766 typedef struct grammar_load_state_ 02767 { 02768 dict *di; 02769 byte *syntax_symbol; 02770 byte *string_symbol; 02771 map_str *maps; 02772 map_byte *mapb; 02773 map_rule *mapr; 02774 } grammar_load_state; 02775 02776 static void grammar_load_state_create (grammar_load_state **gr) 02777 { 02778 *gr = (grammar_load_state *) mem_alloc (sizeof (grammar_load_state)); 02779 if (*gr) 02780 { 02781 (**gr).di = NULL; 02782 (**gr).syntax_symbol = NULL; 02783 (**gr).string_symbol = NULL; 02784 (**gr).maps = NULL; 02785 (**gr).mapb = NULL; 02786 (**gr).mapr = NULL; 02787 } 02788 } 02789 02790 static void grammar_load_state_destroy (grammar_load_state **gr) 02791 { 02792 if (*gr) 02793 { 02794 dict_destroy (&(**gr).di); 02795 mem_free ((void **) &(**gr).syntax_symbol); 02796 mem_free ((void **) &(**gr).string_symbol); 02797 map_str_destroy (&(**gr).maps); 02798 map_byte_destroy (&(**gr).mapb); 02799 map_rule_destroy (&(**gr).mapr); 02800 mem_free ((void **) gr); 02801 } 02802 } 02803 02804 02805 static void error_msg(int line, const char *msg) 02806 { 02807 fprintf(stderr, "Error in grammar_load_from_text() at line %d: %s\n", line, msg); 02808 } 02809 02810 02811 /* 02812 the API 02813 */ 02814 grammar grammar_load_from_text (const byte *text) 02815 { 02816 grammar_load_state *g = NULL; 02817 grammar id = 0; 02818 02819 clear_last_error (); 02820 02821 grammar_load_state_create (&g); 02822 if (g == NULL) { 02823 error_msg(__LINE__, ""); 02824 return 0; 02825 } 02826 02827 dict_create (&g->di); 02828 if (g->di == NULL) 02829 { 02830 grammar_load_state_destroy (&g); 02831 error_msg(__LINE__, ""); 02832 return 0; 02833 } 02834 02835 eat_spaces (&text); 02836 02837 /* skip ".syntax" keyword */ 02838 text += 7; 02839 eat_spaces (&text); 02840 02841 /* retrieve root symbol */ 02842 if (get_identifier (&text, &g->syntax_symbol)) 02843 { 02844 grammar_load_state_destroy (&g); 02845 error_msg(__LINE__, ""); 02846 return 0; 02847 } 02848 eat_spaces (&text); 02849 02850 /* skip semicolon */ 02851 text++; 02852 eat_spaces (&text); 02853 02854 while (*text) 02855 { 02856 byte *symbol = NULL; 02857 int is_dot = *text == '.'; 02858 02859 if (is_dot) 02860 text++; 02861 02862 if (get_identifier (&text, &symbol)) 02863 { 02864 grammar_load_state_destroy (&g); 02865 error_msg(__LINE__, ""); 02866 return 0; 02867 } 02868 eat_spaces (&text); 02869 02870 /* .emtcode */ 02871 if (is_dot && str_equal (symbol, (byte *) "emtcode")) 02872 { 02873 map_byte *ma = NULL; 02874 02875 mem_free ((void **) (void *) &symbol); 02876 02877 if (get_emtcode (&text, &ma)) 02878 { 02879 grammar_load_state_destroy (&g); 02880 error_msg(__LINE__, ""); 02881 return 0; 02882 } 02883 02884 map_byte_append (&g->mapb, ma); 02885 } 02886 /* .regbyte */ 02887 else if (is_dot && str_equal (symbol, (byte *) "regbyte")) 02888 { 02889 map_byte *ma = NULL; 02890 02891 mem_free ((void **) (void *) &symbol); 02892 02893 if (get_regbyte (&text, &ma)) 02894 { 02895 grammar_load_state_destroy (&g); 02896 error_msg(__LINE__, ""); 02897 return 0; 02898 } 02899 02900 map_byte_append (&g->di->m_regbytes, ma); 02901 } 02902 /* .errtext */ 02903 else if (is_dot && str_equal (symbol, (byte *) "errtext")) 02904 { 02905 map_str *ma = NULL; 02906 02907 mem_free ((void **) (void *) &symbol); 02908 02909 if (get_errtext (&text, &ma)) 02910 { 02911 grammar_load_state_destroy (&g); 02912 error_msg(__LINE__, ""); 02913 return 0; 02914 } 02915 02916 map_str_append (&g->maps, ma); 02917 } 02918 /* .string */ 02919 else if (is_dot && str_equal (symbol, (byte *) "string")) 02920 { 02921 mem_free ((void **) (void *) &symbol); 02922 02923 if (g->di->m_string != NULL) 02924 { 02925 grammar_load_state_destroy (&g); 02926 error_msg(__LINE__, ""); 02927 return 0; 02928 } 02929 02930 if (get_identifier (&text, &g->string_symbol)) 02931 { 02932 grammar_load_state_destroy (&g); 02933 error_msg(__LINE__, ""); 02934 return 0; 02935 } 02936 02937 /* skip semicolon */ 02938 eat_spaces (&text); 02939 text++; 02940 eat_spaces (&text); 02941 } 02942 else 02943 { 02944 rule *ru = NULL; 02945 map_rule *ma = NULL; 02946 02947 if (get_rule (&text, &ru, g->maps, g->mapb)) 02948 { 02949 grammar_load_state_destroy (&g); 02950 error_msg(__LINE__, ""); 02951 return 0; 02952 } 02953 02954 rule_append (&g->di->m_rulez, ru); 02955 02956 /* if a rule consist of only one specifier, give it an ".and" operator */ 02957 if (ru->m_oper == op_none) 02958 ru->m_oper = op_and; 02959 02960 map_rule_create (&ma); 02961 if (ma == NULL) 02962 { 02963 grammar_load_state_destroy (&g); 02964 error_msg(__LINE__, ""); 02965 return 0; 02966 } 02967 02968 ma->key = symbol; 02969 ma->data = ru; 02970 map_rule_append (&g->mapr, ma); 02971 } 02972 } 02973 02974 if (update_dependencies (g->di, g->mapr, &g->syntax_symbol, &g->string_symbol, 02975 g->di->m_regbytes)) 02976 { 02977 grammar_load_state_destroy (&g); 02978 error_msg(__LINE__, "update_dependencies() failed"); 02979 return 0; 02980 } 02981 02982 dict_append (&g_dicts, g->di); 02983 id = g->di->m_id; 02984 g->di = NULL; 02985 02986 grammar_load_state_destroy (&g); 02987 02988 return id; 02989 } 02990 02991 int grammar_set_reg8 (grammar id, const byte *name, byte value) 02992 { 02993 dict *di = NULL; 02994 map_byte *reg = NULL; 02995 02996 clear_last_error (); 02997 02998 dict_find (&g_dicts, id, &di); 02999 if (di == NULL) 03000 { 03001 set_last_error (INVALID_GRAMMAR_ID, NULL, -1); 03002 return 0; 03003 } 03004 03005 reg = map_byte_locate (&di->m_regbytes, name); 03006 if (reg == NULL) 03007 { 03008 set_last_error (INVALID_REGISTER_NAME, str_duplicate (name), -1); 03009 return 0; 03010 } 03011 03012 reg->data = value; 03013 return 1; 03014 } 03015 03016 /* 03017 internal checking function used by both grammar_check and grammar_fast_check functions 03018 */ 03019 static int _grammar_check (grammar id, const byte *text, byte **prod, unsigned int *size, 03020 unsigned int estimate_prod_size, int use_fast_path) 03021 { 03022 dict *di = NULL; 03023 int index = 0; 03024 03025 clear_last_error (); 03026 03027 dict_find (&g_dicts, id, &di); 03028 if (di == NULL) 03029 { 03030 set_last_error (INVALID_GRAMMAR_ID, NULL, -1); 03031 return 0; 03032 } 03033 03034 *prod = NULL; 03035 *size = 0; 03036 03037 if (use_fast_path) 03038 { 03039 regbyte_ctx *rbc = NULL; 03040 bytepool *bp = NULL; 03041 int _P = 0; 03042 03043 bytepool_create (&bp, estimate_prod_size); 03044 if (bp == NULL) 03045 return 0; 03046 03047 if (fast_match (di, text, &index, di->m_syntax, &_P, bp, 0, &rbc) != mr_matched) 03048 { 03049 bytepool_destroy (&bp); 03050 free_regbyte_ctx_stack (rbc, NULL); 03051 return 0; 03052 } 03053 03054 free_regbyte_ctx_stack (rbc, NULL); 03055 03056 *prod = bp->_F; 03057 *size = _P; 03058 bp->_F = NULL; 03059 bytepool_destroy (&bp); 03060 } 03061 else 03062 { 03063 regbyte_ctx *rbc = NULL; 03064 barray *ba = NULL; 03065 03066 barray_create (&ba); 03067 if (ba == NULL) 03068 return 0; 03069 03070 if (match (di, text, &index, di->m_syntax, &ba, 0, &rbc) != mr_matched) 03071 { 03072 barray_destroy (&ba); 03073 free_regbyte_ctx_stack (rbc, NULL); 03074 return 0; 03075 } 03076 03077 free_regbyte_ctx_stack (rbc, NULL); 03078 03079 *prod = (byte *) mem_alloc (ba->len * sizeof (byte)); 03080 if (*prod == NULL) 03081 { 03082 barray_destroy (&ba); 03083 return 0; 03084 } 03085 03086 mem_copy (*prod, ba->data, ba->len * sizeof (byte)); 03087 *size = ba->len; 03088 barray_destroy (&ba); 03089 } 03090 03091 return 1; 03092 } 03093 03094 int grammar_check (grammar id, const byte *text, byte **prod, unsigned int *size) 03095 { 03096 return _grammar_check (id, text, prod, size, 0, 0); 03097 } 03098 03099 int grammar_fast_check (grammar id, const byte *text, byte **prod, unsigned int *size, 03100 unsigned int estimate_prod_size) 03101 { 03102 return _grammar_check (id, text, prod, size, estimate_prod_size, 1); 03103 } 03104 03105 int grammar_destroy (grammar id) 03106 { 03107 dict **di = &g_dicts; 03108 03109 clear_last_error (); 03110 03111 while (*di != NULL) 03112 { 03113 if ((**di).m_id == id) 03114 { 03115 dict *tmp = *di; 03116 *di = (**di).next; 03117 dict_destroy (&tmp); 03118 return 1; 03119 } 03120 03121 di = &(**di).next; 03122 } 03123 03124 set_last_error (INVALID_GRAMMAR_ID, NULL, -1); 03125 return 0; 03126 } 03127 03128 static void append_character (const char x, byte *text, int *dots_made, int *len, int size) 03129 { 03130 if (*dots_made == 0) 03131 { 03132 if (*len < size - 1) 03133 { 03134 text[(*len)++] = x; 03135 text[*len] = '\0'; 03136 } 03137 else 03138 { 03139 int i; 03140 for (i = 0; i < 3; i++) 03141 if (--(*len) >= 0) 03142 text[*len] = '.'; 03143 *dots_made = 1; 03144 } 03145 } 03146 } 03147 03148 void grammar_get_last_error (byte *text, unsigned int size, int *pos) 03149 { 03150 int len = 0, dots_made = 0; 03151 const byte *p = error_message; 03152 03153 *text = '\0'; 03154 03155 if (p) 03156 { 03157 while (*p) 03158 { 03159 if (*p == '$') 03160 { 03161 const byte *r = error_param; 03162 03163 while (*r) 03164 { 03165 append_character (*r++, text, &dots_made, &len, (int) size); 03166 } 03167 03168 p++; 03169 } 03170 else 03171 { 03172 append_character (*p++, text, &dots_made, &len, size); 03173 } 03174 } 03175 } 03176 03177 *pos = error_position; 03178 } Generated on Fri May 25 2012 04:18:45 for ReactOS by
1.7.6.1
|