Exercise 1-24 from K&R - Rudimentary Syntax Checking

https://stackoverflow.com/questions/7018866

23-12-2020
|

Вопрос

The exercise reads "Write a program to check a C program for rudimentary syntax errors like unbalanced parentheses, brackets, and braces. Don't forget about quotes, both single and double, escape sequences, and comments."

I chose to go about solving the problem by putting parentheses, brackets, and braces on a stack and making sure everything was LIFO along with various counters for marking whether we're in a comment, quote, etc.

The issue is that I feel my code, although it works, is poorly structured and not particularly idiomatic. I tried implementing the state variables (the stack, escaped, inString, etc.) within a struct and breaking apart the tests into subroutines. It didn't help much. Is there a way to solve this problem in a cleaner way while still handling escaped characters and the like correctly?

#include <stdio.h>
#include <stdlib.h>
#define INITIALSTACK 8
#define FALSE 0
#define TRUE 1

typedef struct {
  int position;
  int maxLength;
  char* array;
} stack;

int match(char, char);

stack create();
void delete(stack*);
void push(stack*, char);
char pop(stack*);

int main() {
  char c, out;
  stack elemStack = create();

  int escaped, inString, inChar, inComment, startComment, i, lineNum;
  int returnValue;

  escaped = inString = inChar = inComment = startComment = 0;
  lineNum = 1;

  while ((c = getchar()) != EOF) {
    if (c == '\n')
      lineNum++;

    /* Test if in escaped state or for escape character */
    if (escaped) {
      escaped = FALSE;
    }
    else if (c == '\\') {
      escaped = TRUE;
    }

    /* Test if currently in double/single quote or a comment */
    else if (inString) {
      if (c == '"' && !escaped) {
        inString = FALSE;
      }
    }
    else if (inChar) {
      if (escaped)
        escaped = FALSE;
      else if (c == '\'' && !escaped) {
        inChar = FALSE;
      }
    }
    else if (inComment) {
      if (c == '*')
        startComment = TRUE;
      else if (c == '/' && startComment)
        inComment = FALSE;
      else
        startComment = FALSE;
    }

    /* Test if we should be starting a comment, quote, or escaped character */
    else if (c == '*' && startComment)
      inComment = TRUE;
    else if (c == '/')
      startComment = TRUE;
    else if (c == '"') {
      inString = TRUE;
    }
    else if (c == '\'') {
      inChar = TRUE;
    }

    /* Accept the character and check braces on the stack */
    else {
      startComment = FALSE;

      if (c == '(' || c == '[' || c == '{')
        push(&elemStack, c);
      else if (c == ')' || c == ']' || c == '}') {
        out = pop(&elemStack);
        if (out == -1 || !match(out, c)) {
          printf("Syntax error on line %d: %c matched with %c\n.", lineNum, out, c);
          return -1;
        }
      }
    }
  }

  if (inString || inChar) {
    printf("Syntax error: Quote not terminated by end of file.\n");
    returnValue = -1;
  }
  else if (!elemStack.position) {
    printf("Syntax check passed on %d line(s).\n", lineNum);
    returnValue = 0;
  }
  else {
    printf("Syntax error: Reached end of file with %d unmatched elements.\n  ",
           elemStack.position);
    for(i = 0; i < elemStack.position; ++i)
      printf(" %c", elemStack.array[i]);
    printf("\n");
    returnValue = -1;
  }

  delete(&elemStack);
  return returnValue;
}

int match(char left, char right) {
  return ((left == '{' && right == '}') ||
          (left == '(' && right == ')') ||
          (left == '[' && right == ']'));
}

stack create() {
  stack newStack;
  newStack.array = malloc(INITIALSTACK * sizeof(char));
  newStack.maxLength = INITIALSTACK;
  newStack.position = 0;
  return newStack;
}

void delete(stack* stack) {
  free(stack -> array);
  stack -> array = NULL;
}

void push(stack* stack, char elem) {
  if (stack -> position >= stack -> maxLength) {
    char* newArray = malloc(2 * (stack -> maxLength) * sizeof(char));
    int i;

    for (i = 0; i < stack -> maxLength; ++i)
      newArray[i] = stack -> array[i];

    free(stack -> array);
    stack -> array = newArray;
  }

  stack -> array[stack -> position] = elem;
  (stack -> position)++;
}

char pop(stack* stack) {
  if (!(stack -> position)) {
    printf("Pop attempted on empty stack.\n");
    return -1;
  }
  else {
    (stack -> position)--;
    return stack -> array[stack -> position];
  }
}

Решение

Your solution is not that bad. It is very straight forward, which is a good thing. To learn a bit more from this excercise, I would probably implement this with a state machine. E.g. you have a few states like: code, comment, string etc.. then you define transitions between them. It gets much easier because you end up with logic depending on the state (so you don't have a blob of code, like in your main function). After that you can parse your code depending on the state. This means for example: If you're in a comment state, you ignore everything until you encounter an ending comment character. Then you change the state to code for example, and so forth.

In pseudo code it could look like this:

current_state = CODE

while(...) {

   switch(current_state) {
      case CODE:
         if(input == COMMENT_START) {
            current_state = COMMENT
            break
         }

         if(input == STRING_START) {
            current_state = STRING
            break
         }

         // handle your {, [, ( stuff...

         break

      case COMMENT:
         if(input == COMMENT_END) {
            current_state = CODE
            break
         }

         // handle comment.. i.e. ignore everything

         break
      case STRING:
         // ... string stuff like above with state transitions..
         break
   }

}

Of course this can be done with e.g. yacc. But as I stated in a comment, I wouldn't suggest you use that. Maybe you could do that if you have enough time and want to learn as much as possible, but first I would implement it "the hard way".

Другие советы

I would probably approach this quite differently, by making use of a parser generator, like yacc, combined with a lexer generator, like lex.

You could base yourself on existing input files for these tools, for ANSI C. This lex specification and yacc grammar eg. can be a starting point. Alternatively, K&R contains a yacc compatible C grammar too in appendix A, or you could of course work directly with the grammar in the C standard.

For this exercise, you would only use those parts of the grammar that are of interest to you, and ignore the rest. The grammar will ensure that the syntax is correct (all braces matched etc.), and lex/yacc will take care of all the code generation. That leaves you with only having to specify some glue code, which will mostly be error messages in this case.

It will be a complete re-write of your code, but will probably give you a better understanding of the C grammar, and at the very least, you'll have learned to work with the great tools lex/yacc, which never hurts.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow