Comment puis-je parse Retraits et dedents avec pyparsing?

https://stackoverflow.com/questions/1547944

20-09-2019
|

Question

Voici un sous-ensemble de la grammaire Python:

single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE

stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE

small_stmt: pass_stmt
pass_stmt: 'pass'

compound_stmt: if_stmt
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]

suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT

(Vous pouvez lire la grammaire complète dans le dépôt SVN Python: http://svn.python.org/.../Grammar )

Je suis en train d'utiliser cette grammaire pour générer un analyseur syntaxique pour Python, en Python. Ce que je ne parviens pas à est de savoir comment exprimer les jetons de INDENT et DEDENT comme des objets pyparsing.

Voici comment je l'ai mis en œuvre les autres terminaux:

import pyparsing as p

string_start = (p.Literal('"""') | "'''" | '"' | "'")
string_token = ('\\' + p.CharsNotIn("",exact=1) | p.CharsNotIn('\\',exact=1))
string_end = p.matchPreviousExpr(string_start)

terminals = {
    'NEWLINE': p.Literal('\n').setWhitespaceChars(' \t')
        .setName('NEWLINE').setParseAction(terminal_action('NEWLINE')),
    'ENDMARKER': p.stringEnd.copy().setWhitespaceChars(' \t')
        .setName('ENDMARKER').setParseAction(terminal_action('ENDMARKER')),
    'NAME': (p.Word(p.alphas + "_", p.alphanums + "_", asKeyword=True))
        .setName('NAME').setParseAction(terminal_action('NAME')),
    'NUMBER': p.Combine(
            p.Word(p.nums) + p.CaselessLiteral("l") |
            (p.Word(p.nums) + p.Optional("." + p.Optional(p.Word(p.nums))) | "." + p.Word(p.nums)) +
                p.Optional(p.CaselessLiteral("e") + p.Optional(p.Literal("+") | "-") + p.Word(p.nums)) +
                p.Optional(p.CaselessLiteral("j"))
        ).setName('NUMBER').setParseAction(terminal_action('NUMBER')),
    'STRING': p.Combine(
            p.Optional(p.CaselessLiteral('u')) +
            p.Optional(p.CaselessLiteral('r')) +
            string_start + p.ZeroOrMore(~string_end + string_token) + string_end
        ).setName('STRING').setParseAction(terminal_action('STRING')),

    # I can't find a good way of parsing indents/dedents.
    # The Grammar just has the tokens NEWLINE, INDENT and DEDENT scattered accross the rules.
    # A single NEWLINE would be translated to NEWLINE + PEER (from pyparsing.indentedBlock()), unless followed by INDENT or DEDENT
    # That NEWLINE and IN/DEDENT could be spit across rule boundaries. (see the 'suite' rule)
    'INDENT': (p.LineStart() + p.Optional(p.Word(' '))).setName('INDENT'),
    'DEDENT': (p.LineStart() + p.Optional(p.Word(' '))).setName('DEDENT')
}

terminal_action est une fonction qui renvoie l'action d'analyse correspondante, en fonction de ses arguments.

Je suis au courant de la fonction d'aide de pyparsing.indentedBlock, mais je ne suis pas peut comprendre comment adopter une grammaire qui à sans le jeton PEER.

(Regardez le pyparsing code souce pour voir ce dont je parle)

Vous pouvez voir mon code source complet ici: http://pastebin.ca/1609860

La solution

Il y a quelques exemples sur le wiki pyparsing Exemples que pourrait vous donner quelques idées:

Pour utiliser la indentedBlock de pyparsing, je pense que vous définiriez suite comme:

indentstack = [1]
suite = indentedBlock(stmt, indentstack, True)

Notez que indentedGrammarExample.py date d'avant l'inclusion de indentedBlock dans pyparsing, donc fait sa propre analyse syntaxique de tiret implemention.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow