문제

I'm currently working on a toy language that works like this: one can embed blocks written in this language into a C++ source, and before compilation, these blocks are translated into C++ in an extra preprocessing step, producing a valid C++ source.

I want to make sure that these blocks can always be identified in the source unambiguously and also, whenever such a block is present in the source, it cannot be valid C++. Moreover, I want to achieve these by putting as few constraints to the embedded language as possible (the language itself is still somewhat fluid).

The obvious way would be to introduce a pair of special multi-character parentheses, made of characters that cannot appear together in valid C++ code (or in the embedded language). However, I'm not sure how to ensure that particular a character sequence is good for this purpose (not after GotW #78, anyway (: ).

So what is a good way to escape these blocks?

도움이 되었습니까?

해결책

If your compiler can be made to accept C++11 standard, you could use raw string literals like eg:

  std::cout << R"*(<!DOCTYPE html>
       <html>
       <head>
       <title>Title with a backslash \ here 
     and double " quote</title>)*";

Hence with raw string literals there is no forbidden sequence of characters in those raw string literals. Any sequence of characters could appear in them (but you can define the ending sequence of the raw string)


And you could use #{ and }# like I do in MELT macro-strings; MELT is Lisp-like domain specific language to extend GCC, and you can embed code in it with e.g.

(code_chunk hellocount_chk
            #{ /* $HELLOCOUNT_CHK chunk */ 
                 static int $HELLOCOUNT_CHK#_counter; 
                 $HELLOCOUNT_CHK#_counter++;
               $HELLOCOUNT_CHK#_lab:
                 printf ("Hello World, counted %d\n", 
                         $HELLOCOUNT_CHK#_counter);
                 if (random() % 4 == 0) goto $HELLOCOUNT_CHK#_lab;
            }#)

The #{ and }# are enclosing macro-strings (these character sequences are unlikely to appear in C or C++ code, except in string literals and comments), with the $ starting symbols in such macro-strings (up to a non-letter or # character).

Using #{ and }# is not fool-proof (e.g. because of raw string literals) but good enough: a cooperative user could manage to avoid them.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top