Commit de1ffa74 authored by Igor Dejanovic's avatar Igor Dejanovic

Initial import.

parents
syntax: glob
*pyc
*orig
*bak
Arpeggio.egg-info
Arpeggio - Parser interpreter based on PEG grammars
Author: Igor R. Dejanović <igor DOT dejanovic AT gmail DOT com>
Changelog for Arpeggio
Author: Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
Copyright: (c) Igor R. Dejanovic, 2009
Licence: MIT Licence
2009-09-15 - Initial release (v0.1-dev)
Implemented features:
- Basic error reporting.
- Basic support for comments handling (needs refactoring)
- Raw parse tree.
- Support for semantic actions with abbility to transform parse
tree to semantic representation - aka Abstract Semantic Graphs (see examples).
Arpeggio is released under the terms of the MIT License
-------------------------------------------------------
Copyright (c) 2009 Igor R. Dejanović <igor DOT dejanovic AT gmail DOT com>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Arpeggio - Pacrat parser interpreter
====================================
Arpeggio is parser interpreter based on PEG grammars implemented as recursive descent
parser with memoization (aka Pacrat parser).
Arpeggio is part of research project whose main goal is building environment for DSL development.
The main domain of application is IDE for DSL development but it can be used for all
sort of general purpose parsing.
Some essential planed/done features are error reporting and error recovery as well
as access to the raw parse tree in order to support syntax highlighting and
other nice features of today's IDEs.
For more information on PEG and pacrat parsers see:
http://pdos.csail.mit.edu/~baford/packrat/
http://pdos.csail.mit.edu/~baford/packrat/thesis/
http://en.wikipedia.org/wiki/Parsing_expression_grammar
INSTALLATION
------------
Arpeggio is written in Python programming language and distributed with setuptools support.
Install it with the following command
python setup.py install
after installation you should be able to import arpeggio python module with
import arpeggio
There is no documentation at the moment. See examples for some ideas of how it can
be used.
OVERVIEW
--------
Here is a basic explanation of how arpeggio works and the definition of some terms
used in the arpeggio project.
Language grammar is specified using PEG's textual notation (similar to EBNF) or
python language constructs (lists, tuples, functions...). This grammar representation,
whether in textual or python form, is referred to as "the parser model".
Parser is constructed out of the parser model.
Parser is a tree of python objects where each object is an instance of class
which represents parsing expressions from PEG (i.e. Sequence, OrderedChoice, ZeroOrMore).
This tree is referred to as "the parser model tree".
This design choice requires some upfront work during initialization phase so arpeggio
may not be well suited for one-shot parsing where parser needs to be initialized
every time parsing is performed and the speed is of the utmost importance.
Arpeggio is designed to be used in integrated development environments where parser
is constructed once (usually during IDE start-up) and used many times.
Once constructed, parser can be used to transform input text to a tree
representation where tree structure must adhere to the parser model.
This tree representation is called "parse tree".
After construction of parse tree it is possible to construct Astract Syntax Tree or,
more generally, Abstract Semantic Graph(ASG) using semantic actions.
ASG is constructed using two-pass bottom-up walking of the parse tree.
ASG, generally has a graph structure, but it can be any specialization of it
(a tree or just a single node - see calc.py for the example of ASG constructed as
a single node/value).
Python module arpeggio.peg is a good demonstration of how semantic action can be used
to build PEG parser itself. See also peg_peg.py example where PEG parser is bootstraped
using description given in PEG language itself.
CONTRIBUTION
------------
If you have ideas, suggestions or code that you wish to contribute to the project
please use google code issue tracker at http://arpeggio.googlecode.com/
Arpeggio is done or influenced by the following free tehnologies.
Python Programming Language (http://www.python.org)
- Arpeggio is implemented 100% in Python programming language.
PyPEG - a PEG Parser-Interpreter in Python (http://www.fdik.org/pyPEG/)
- PyPEG is a parser interpreter based on PEG grammars like Arpeggio but
with different design and implementation approach and different goals in mind.
The idea of Arpeggio parser definition using Python language constructs is
taken from the PyPEG project. Arpeggio also supports parser definition using PEG
textual notation.
pyparsing (http://pyparsing.wikispaces.com/)
- pyparsing is IMO currently the most advanced parser written 100% in python.
Currently there is no much similarity between pyparsing and Arpeggio but
there are some nice features and ideas from pyparsing that I think would be nice to
have implemented in Arpeggio.
Although not directly related to Arpeggio I wish also to thank to the
following free software projects that makes the development of Arpeggio (and some other
projects I am working on) easier and more fun:
- Arch Linux (http://www.archlinux.org/) - Linux distro that I'm using on my dev machine.
- Editra (http://www.editra.org/) - Nice programmer's editor written in Python and wxWidgets.
- Mercurial (www.selenic.com/mercurial/) - Distributed version control system written in Python.
... and many more
Arpeggio parser - TODO
----------------------
Author: Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
Copyright: (c) Igor R. Dejanovic, 2009
Licence: MIT Licence
Some stuff that should be done in the near future:
- Documentation.
- Test suite.
- Error recovery.
This is the essential requirement for the Arpeggio because IDE usage is the main
motivation that has started Arpeggio development.
It would be nice to find all errors, whether syntactic or semantic, in one
parsing session. If error is found the parser should report the error and then
try to recover and continue parsing.
This diff is collapsed.
# -*- coding: utf-8 -*-
#######################################################################
# Name: export.py
# Purpose: Export support for arpeggio
# Author: Igor R. Dejanović <igor DOT dejanovic AT gmail DOT com>
# Copyright: (c) 2009 Igor R. Dejanović <igor DOT dejanovic AT gmail DOT com>
# License: MIT License
#######################################################################
import StringIO
from arpeggio import Terminal
class Export(object):
'''
Base class for all Exporters.
'''
def __init__(self):
super(Export, self).__init__()
# Export initialization
self._render_set = set() # Used in rendering to prevent rendering
# of the same node multiple times
self._adapter_map = {} # Used as a registry of adapters to ensure
# ensure that the same adapter is
# returned for the same adaptee object
def export(self, obj):
'''Export of obj to a string.'''
self._outf = StringIO()
self._export(obj)
return self._outf.getvalue()
def exportFile(self, obj, file_name):
'''Export of obj to a file.'''
self._outf = open(file_name, "w")
self._export(obj)
self._outf.close()
def _export(self, obj):
self._outf.write(self._start())
self._render_node(obj)
self._outf.write(self._end())
def _start(self):
'''
Overide this to specify the begining of the graph representation.
'''
return ""
def _end(self):
'''
Overide this to specify the end of the graph representation.
'''
return ""
class ExportAdapter(object):
'''
Base adapter class for the export support.
Adapter should be defined for every graph type.
'''
def __init__(self, node, export):
'''
@param node - node to adapt
@param export - export object used as a context of the export.
'''
self.adaptee = node # adaptee is adapted graph node
self.export = export
# -------------------------------------------------------------------------
# Support for DOT language
class DOTExportAdapter(ExportAdapter):
'''
Base adapter class for the DOT export support.
'''
@property
def id(self):
'''Graph node unique identification.'''
raise NotImplementedError()
@property
def desc(self):
'''Graph node textual description.'''
raise NotImplementedError()
@property
def children(self):
'''Children of the graph node.'''
raise NotImplementedError()
class PMDOTExportAdapter(DOTExportAdapter):
'''
Adapter for ParsingExpression graph types (parser model).
'''
@property
def id(self):
return id(self.adaptee)
@property
def desc(self):
return self.adaptee.desc
@property
def children(self):
if not hasattr(self, "_children"):
self._children = []
adapter_map = self.export._adapter_map # Registry of adapters used in this export
for c,n in enumerate(self.adaptee.nodes):
if isinstance(n, PMDOTExportAdapter): # if child node is already adapted use that adapter
self._children.append((str(c+1), n))
elif adapter_map.has_key(id(n)): # current node is adaptee -> there is registered adapter
self._children.append((str(c+1), adapter_map[id(n)]))
else:
adapter = PMDOTExportAdapter(n, self.export)
self._children.append((str(c+1), adapter))
adapter_map[adapter.id] = adapter
return self._children
class PTDOTExportAdapter(PMDOTExportAdapter):
'''
Adapter for ParseTreeNode graph types.
'''
@property
def children(self):
if isinstance(self.adaptee, Terminal):
return []
else:
if not hasattr(self, "_children"):
self._children = []
for c,n in enumerate(self.adaptee.nodes):
adapter = PTDOTExportAdapter(n, self.export)
self._children.append((str(c+1), adapter))
return self._children
class DOTExport(Export):
'''
Export to DOT language (part of GraphViz, see http://www.graphviz.org/)
'''
def _render_node(self, node):
if not node in self._render_set:
self._render_set.add(node)
self._outf.write('\n%s [label="%s"];' % (node.id, self._dot_label_esc(node.desc)))
#TODO Comment handling
# if hasattr(node, "comments") and root.comments:
# retval += self.node(root.comments)
# retval += '\n%s->%s [label="comment"]' % (id(root), id(root.comments))
for name, n in node.children:
self._outf.write('\n%s->%s [label="%s"]' % (node.id, n.id, name))
self._outf.write('\n')
self._render_node(n)
def _start(self):
return "digraph arpeggio_graph {"
def _end(self):
return "\n}"
def _dot_label_esc(self, to_esc):
to_esc = to_esc.replace("\\", "\\\\")
to_esc = to_esc.replace('\"', '\\"')
to_esc = to_esc.replace('\n', '\\n')
return to_esc
class PMDOTExport(DOTExport):
'''
Convenience DOTExport extension that uses ParserExpressionDOTExportAdapter
'''
def export(self, obj):
return super(PMDOTExport, self).\
export(PMDOTExportAdapter(obj, self))
def exportFile(self, obj, file_name):
return super(PMDOTExport, self).\
exportFile(PMDOTExportAdapter(obj, self), file_name)
class PTDOTExport(DOTExport):
'''
Convenience DOTExport extension that uses PTDOTExportAdapter
'''
def export(self, obj):
return super(PTDOTExport, self).\
export(PTDOTExportAdapter(obj, self))
def exportFile(self, obj, file_name):
return super(PTDOTExport, self).\
exportFile(PTDOTExportAdapter(obj, self), file_name)
\ No newline at end of file
# -*- coding: utf-8 -*-
#######################################################################
# Name: peg.py
# Purpose: Implementing PEG language
# Author: Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# Copyright: (c) 2009 Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# License: MIT License
#######################################################################
__all__ = ['ParserPEG']
from arpeggio import *
from arpeggio import _log
from arpeggio import RegExMatch as _
# PEG Grammar
def grammar(): return OneOrMore(rule), EOF
def rule(): return identifier, LEFT_ARROW, ordered_choice, ";"
def ordered_choice(): return sequence, ZeroOrMore(SLASH, sequence)
def sequence(): return OneOrMore(prefix)
def prefix(): return Optional([AND,NOT]), sufix
def sufix(): return expression, Optional([QUESTION, STAR, PLUS])
def expression(): return [regex,(identifier, Not(LEFT_ARROW)),
(OPEN, ordered_choice, CLOSE),
literal]
def regex(): return "r", "'", _(r"(\\\'|[^\'])*"),"'"
def identifier(): return _(r"[a-zA-Z_]([a-zA-Z_]|[0-9])*")
#def literal(): return [_(r"\'(\\\'|[^\'])*\'"),_(r'"[^"]*"')]
def literal(): return _(r'(\'(\\\'|[^\'])*\')|("[^"]*")')
def LEFT_ARROW(): return "<-"
def SLASH(): return "/"
def STAR(): return "*"
def QUESTION(): return "?"
def PLUS(): return "+"
def AND(): return "&"
def NOT(): return "!"
def OPEN(): return "("
def CLOSE(): return ")"
def comment(): return "//", _(".*\n")
# ------------------------------------------------------------------
# PEG Semantic Actions
class PEGSemanticAction(SemanticAction):
def second_pass(self, parser, node):
if isinstance(node, Terminal):
return
for i,n in enumerate(node.nodes):
if isinstance(n, Terminal):
if parser.peg_rules.has_key(n.value):
node.nodes[i] = parser.peg_rules[n.value]
else:
raise SemanticError("Rule \"%s\" does not exists." % n)
class SemGrammar(SemanticAction):
def first_pass(self, parser, node, nodes):
return parser.peg_rules[parser.root_rule_name]
class SemRule(PEGSemanticAction):
def first_pass(self, parser, node, nodes):
rule_name = nodes[0].value
if len(nodes)>4:
retval = Sequence(nodes=nodes[2:-1])
else:
retval = nodes[2]
retval.rule = rule_name
retval.root = True
if not hasattr(parser, "peg_rules"):
parser.peg_rules = {} # Used for linking phase
parser.peg_rules["EndOfFile"] = EndOfFile()
parser.peg_rules[rule_name] = retval
return retval
class SemSequence(PEGSemanticAction):
def first_pass(self, parser, node, nodes):
if len(nodes)>1:
return Sequence(nodes=nodes)
else:
return nodes[0]
class SemOrderedChoice(PEGSemanticAction):
def first_pass(self, parser, node, nodes):
if len(nodes)>1:
retval = OrderedChoice(nodes=nodes[::2])
else:
retval = nodes[0]
return retval
class SemPrefix(PEGSemanticAction):
def first_pass(self, parser, node, nodes):
_log("Prefix: %s " % str(nodes))
if len(nodes)==2:
if nodes[0] == NOT():
retval = Not()
else:
retval = And()
if type(nodes[1]) is list:
retval.nodes = nodes[1]
else:
retval.nodes = [nodes[1]]
else:
retval = nodes[0]
return retval
class SemSufix(PEGSemanticAction):
def first_pass(self, parser, node, nodes):
_log("Sufix : %s" % str(nodes))
if len(nodes) == 2:
_log("Sufix : %s" % str(nodes[1]))
if nodes[1] == STAR():
retval = ZeroOrMore(nodes[0])
elif nodes[1] == QUESTION():
retval = Optional(nodes[0])
else:
retval = OneOrMore(nodes[0])
if type(nodes[0]) is list:
retval.nodes = nodes[0]
else:
retval.nodes = [nodes[0]]
else:
retval = nodes[0]
return retval
class SemExpression(PEGSemanticAction):
def first_pass(self, parser, node, nodes):
_log("Expression : %s" % str(nodes))
if len(nodes)==1:
return nodes[0]
else:
return nodes[1]
class SemIdentifier(SemanticAction):
def first_pass(self, parser, node, nodes):
_log("Identifier %s." % node.value)
return node
class SemRegEx(SemanticAction):
def first_pass(self, parser, node, nodes):
_log("RegEx %s." % nodes[2].value)
return RegExMatch(nodes[2].value)
class SemLiteral(SemanticAction):
def first_pass(self, parser, node, nodes):
_log("Literal: %s" % node.value)
match_str = node.value[1:-1]
match_str = match_str.replace("\\'", "'")
match_str = match_str.replace("\\\\", "\\")
return StrMatch(match_str)
class SemTerminal(SemanticAction):
def first_pass(self, parser, node, nodes):
return StrMatch(node.value)
grammar.sem = SemGrammar()
rule.sem = SemRule()
ordered_choice.sem = SemOrderedChoice()
sequence.sem = SemSequence()
prefix.sem = SemPrefix()
sufix.sem = SemSufix()
expression.sem = SemExpression()
regex.sem = SemRegEx()
identifier.sem = SemIdentifier()
literal.sem = SemLiteral()
for sem in [LEFT_ARROW, SLASH, STAR, QUESTION, PLUS, AND, NOT, OPEN, CLOSE]:
sem.sem = SemTerminal()
class ParserPEG(Parser):
def __init__(self, language_def, root_rule_name, comment_rule_name=None, skipws=True, ws=DEFAULT_WS):
super(ParserPEG, self).__init__(skipws, ws)
self.root_rule_name = root_rule_name
# PEG Abstract Syntax Graph
self.parser_model = self._from_peg(language_def)
# Comments should be optional and there can be more of them
if self.comments_model: # and not isinstance(self.comments_model, ZeroOrMore):
self.comments_model.root = True
self.comments_model.rule = comment_rule_name
def _parse(self):
return self.parser_model.parse(self)
def _from_peg(self, language_def):
parser = ParserPython(grammar, comment)
parser.root_rule_name = self.root_rule_name
parse_tree = parser.parse(language_def)
return parser.getASG()
if __name__ == "__main__":
try:
parser = ParserPython(grammar, None)
f = open("peg_parser_model.dot", "w")
f.write(str(DOTSerializator(parser.parser_model)))
f.close()
except NoMatch, e:
print "Expected %s at position %s." % (e.value, str(e.parser.pos_to_linecol(e.position)))
\ No newline at end of file
#######################################################################
# Name: calc.py
# Purpose: Simple expression evaluator example
# Author: Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# Copyright: (c) 2009 Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# License: MIT License
#
# This example demonstrates grammar definition using python constructs as
# well as using semantic actions to evaluate simple expression in infix
# notation.
#######################################################################
from arpeggio import *
from arpeggio.export import PMDOTExport, PTDOTExport
from arpeggio import RegExMatch as _
from arpeggio import _log
def number(): return _(r'\d*\.\d*|\d+')
def factor(): return [number, ("(", expression, ")")]
def term(): return factor, ZeroOrMore(["*","/"], factor)
def expression(): return Optional(["+","-"]), term, ZeroOrMore(["+", "-"], term)
def calc(): return expression, EndOfFile
# Semantic actions
class ToFloat(SemanticAction):
'''Converts node value to float.'''
def first_pass(self, parser, node, nodes):
_log("Converting %s." % node.value)
return float(node.value)
class Factor(SemanticAction):
'''Removes parenthesis if exists and returns what was contained inside.'''
def first_pass(self, parser, node, nodes):
_log("Factor %s" % nodes)
if nodes[0] == "(":
return nodes[1]
else:
return nodes[0]
class Term(SemanticAction):
'''
Divides or multiplies factors.
Factor nodes will be already evaluated.
'''
def first_pass(self, parser, node, nodes):
_log("Term %s" % nodes)
term = nodes[0]
for i in range(2, len(nodes), 2):
if nodes[i-1]=="*":
term *= nodes[i]
else:
term /= nodes[i]
_log("Term = %f" % term)
return term
class Expr(SemanticAction):
'''
Adds or substracts terms.
Term nodes will be already evaluated.
'''
def first_pass(self, parser, node, nodes):
_log("Expression %s" % nodes)
expr = 0
start = 0
# Check for unary + or - operator
if str(nodes[0]) in "+-":
start = 1
for i in range(start, len(nodes), 2):
if i and nodes[i-1]=="-":
expr -= nodes[i]
else:
expr += nodes[i]
_log("Expression = %f" % expr)
return expr
class Calc(SemanticAction):
def first_pass(self, parser, node, nodes):
return nodes[0]
# Connecting rules with semantic actions
number.sem = ToFloat()
factor.sem = Factor()
term.sem = Term()
expression.sem = Expr()
calc.sem = Calc()
if __name__ == "__main__":
try:
import arpeggio
# Setting DEBUG to true will show log messages.
arpeggio.DEBUG = True
# First we will make a parser - an instance of the calc parser model.
# Parser model is given in the form of python constructs therefore we
# are using ParserPython class.
parser = ParserPython(calc)
# Then we export it to a dot file in order to visualise it. This is
# particulary handy for debugging purposes.
# We can make a jpg out of it using dot (part of graphviz) like this
# dot -O -Tjpg calc_parse_tree_model.dot
PMDOTExport().exportFile(parser.parser_model,
"calc_parse_tree_model.dot")
# An expression we want to evaluate
input = "-(4-1)*5+(2+4.67)+5.89/(.2+7)"
# We create a parse tree or abstract syntax tree out of textual input
parse_tree = parser.parse(input)
# Then we export it to a dot file in order to visualise it.
PTDOTExport().exportFile(parse_tree,
"calc_parse_tree.dot")
# getASG will start semantic analysis.
# In this case semantic analysis will evaluate expression and
# returned value will be the result of the input expression.
print "%s = %f" % (input, parser.getASG())
except NoMatch, e:
print "Expected %s at position %s." % (e.value, str(e.parser.pos_to_linecol(e.position)))
#######################################################################
# Name: calc_peg.py
# Purpose: Simple expression evaluator example using PEG language
# Author: Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# Copyright: (c) 2009 Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# License: MIT License
#
# This example is functionally equivalent to calc.py. The difference is that
# in this example grammar is specified using PEG language instead of python constructs.
# Semantic actions are used to calculate expression during semantic
# analysis.
# Parser model as well as parse tree exported to dot files should be
# the same as parser model and parse tree generated in calc.py example.
#######################################################################
from arpeggio import *
from arpeggio.peg import ParserPEG
from arpeggio.export import PMDOTExport, PTDOTExport
# Semantic actions
from calc import ToFloat, Factor, Term, Expr, Calc
# Grammar is defined using textual specification based on PEG language.
calc_grammar = """
number <- r'\d*\.\d*|\d+';
factor <- number / "(" expression ")";
term <- factor (( "*" / "/") factor)*;
expression <- ("+" / "-")? term (("+" / "-") term)*;
calc <- expression EndOfFile;
"""
# Rules are mapped to semantic actions
sem_actions = {
"number" : ToFloat(),
"factor" : Factor(),
"term" : Term(),
"expression" : Expr(),
"calc" : Calc()
}
try:
# Turning debugging on
import arpeggio
arpeggio.DEBUG = True
# First we will make a parser - an instance of the calc parser model.
# Parser model is given in the form of PEG notation therefore we
# are using ParserPEG class. Root rule name (parsing expression) is "calc".
parser = ParserPEG(calc_grammar, "calc")
# Then we export it to a dot file.
PMDOTExport().exportFile(parser.parser_model,
"calc_peg_parser_model.dot")
# An expression we want to evaluate
input = "-(4-1)*5+(2+4.67)+5.89/(.2+7)"
# Then parse tree is created out of the input expression.
parse_tree = parser.parse(input)
# We save it to dot file in order to visualise it.
PTDOTExport().exportFile(parse_tree,
"calc_peg_parse_tree.dot")
# getASG will start semantic analysis.
# In this case semantic analysis will evaluate expression and
# returned value will be evaluated result of the input expression.
# Semantic actions are supplied to the getASG function.
print "%s = %f" % (input, parser.getASG(sem_actions))
except NoMatch, e:
print "Expected %s at position %s." % (e.value, str(e.parser.pos_to_linecol(e.position)))
##############################################################################
# Name: json.py
# Purpose: Implementation of a simple JSON parser in arpeggio.
# Author: Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# Copyright: (c) 2009 Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# License: MIT License
#
# This example is based on jsonParser.py from pyparsing project
# (see http://pyparsing.wikispaces.com/).
##############################################################################
json_bnf = """
object
{ members }
{}
members
string : value
members , string : value
array
[ elements ]
[]
elements
value
elements , value
value
string
number
object
array
true
false
null
"""
from arpeggio import *
from arpeggio.export import PMDOTExport, PTDOTExport
from arpeggio import RegExMatch as _
def TRUE(): return "true"
def FALSE(): return "false"
def NULL(): return "null"
def jsonString(): return '"', _('[^"]*'),'"'
def jsonNumber(): return _('-?\d+((\.\d*)?((e|E)(\+|-)?\d+)?)?')
def jsonValue(): return [jsonString, jsonNumber, jsonObject, jsonArray, TRUE, FALSE, NULL]
def jsonArray(): return "[", Optional(jsonElements), "]"
def jsonElements(): return jsonValue, ZeroOrMore(",", jsonValue)
def memberDef(): return jsonString, ":", jsonValue
def jsonMembers(): return memberDef, ZeroOrMore(",", memberDef)
def jsonObject(): return "{", Optional(jsonMembers), "}"
def jsonFile(): return jsonObject, EOF
if __name__ == "__main__":
testdata = """
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList":
{
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"TrueValue": true,
"FalseValue": false,
"Gravity": -9.8,
"LargestPrimeLessThan100": 97,
"AvogadroNumber": 6.02E23,
"EvenPrimesGreaterThan2": null,
"PrimesLessThan10" : [2,3,5,7],
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML", "markup"],
"EmptyDict": {},
"EmptyList" : []
}
}
}
}
"""
try:
import arpeggio
arpeggio.DEBUG = True
# Creating parser from parser model.
parser = ParserPython(jsonFile)
# Exporting parser model to dot file in order to visualise it.
PMDOTExport().exportFile(parser.parser_model,
"json_parser_model.dot")
parse_tree = parser.parse(testdata)
PTDOTExport().exportFile(parser.parse_tree,
"json_parse_tree.dot")
except NoMatch, e:
print "Expected %s at position %s." % (e.value, str(e.parser.pos_to_linecol(e.position)))
# -*- coding: utf-8 -*-
##############################################################################
# Name: peg_peg.py
# Purpose: PEG parser definition using PEG itself.
# Author: Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# Copyright: (c) 2009 Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# License: MIT License
#
# PEG can be used to describe PEG.
# This example demonstrates building PEG parser using PEG based grammar of PEG
# grammar definition language.
##############################################################################
from arpeggio import *
from arpeggio.export import PMDOTExport, PTDOTExport
from arpeggio import _log
from arpeggio import RegExMatch as _
from arpeggio.peg import ParserPEG
# Semantic actions
from arpeggio.peg import SemGrammar, SemRule, SemOrderedChoice, SemSequence, SemPrefix, \
SemSufix, SemExpression, SemRegEx, SemIdentifier, SemLiteral, SemTerminal
sem_actions = {
"grammar" : SemGrammar(),
"rule" : SemRule(),
"ordered_choice" : SemOrderedChoice(),
"sequence" : SemSequence(),
"prefix" : SemPrefix(),
"sufix" : SemSufix(),
"expression" : SemExpression(),
"regex" : SemRegEx(),
"identifier" : SemIdentifier(),
"literal" : SemLiteral()
}
for sem in ["LEFT_ARROW", "SLASH", "STAR", "QUESTION", "PLUS", "AND", "NOT", "OPEN", "CLOSE"]:
sem_actions[sem] = SemTerminal()
# PEG defined using PEG itself.
peg_grammar = r"""
grammar <- rule+ EndOfFile;
rule <- identifier LEFT_ARROW ordered_choice ';';
ordered_choice <- sequence (SLASH sequence)*;
sequence <- prefix+;
prefix <- (AND/NOT)? sufix;
sufix <- expression (QUESTION/STAR/PLUS)?;
expression <- regex / (identifier !LEFT_ARROW)
/ ("(" ordered_choice ")") / literal;
identifier <- r'[a-zA-Z_]([a-zA-Z_]|[0-9])*';
regex <- 'r' '\'' r'(\\\'|[^\'])*' '\'';
literal <- r'\'(\\\'|[^\'])*\'|"[^"]*"';
LEFT_ARROW <- '<-';
SLASH <- '/';
AND <- '&';
NOT <- '!';
QUESTION <- '?';
STAR <- '*';
PLUS <- '+';
OPEN <- '(';
CLOSE <- ')';
DOT <- '.';
comment <- '//' r'.*\n';
"""
try:
import arpeggio
arpeggio.DEBUG = True
# ParserPEG will use ParserPython to parse peg_grammar definition and
# create parser_model for parsing PEG based grammars
parser = ParserPEG(peg_grammar, 'grammar')
# Exporting parser model to dot file in order to visualise.
PMDOTExport().exportFile(parser.parser_model,
"peg_peg_parser_model.dot")
# Now we will use created parser to parse the same peg_grammar used for parser
# initialization. We can parse peg_grammar because it is specified using
# PEG itself.
parser.parse(peg_grammar)
PTDOTExport().exportFile(parser.parse_tree,
"peg_peg_parse_tree.dot")
# ASG should be the same as parser.parser_model because semantic
# actions will create PEG parser (tree of ParsingExpressions).
asg = parser.getASG(sem_actions)
# This graph should be the same as peg_peg_parser_model.dot because
# they define the same parser.
PMDOTExport().exportFile(asg,
"peg_peg_asg.dot")
# If we replace parser_mode with ASG constructed parser it will still
# parse PEG grammars
parser.parser_model = asg
parser.parse(peg_grammar)
except NoMatch, e:
print "Expected %s at position %s." % (e.value, str(e.parser.pos_to_linecol(e.position)))
#######################################################################
# Name: simple.py
# Purpose: Simple language based on example from pyPEG
# Author: Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# Copyright: (c) 2009 Igor R. Dejanovic <igor DOT dejanovic AT gmail DOT com>
# License: MIT License
#
# This example demonstrates grammar definition using python constructs.
# It is taken and adapted from pyPEG project (see http://www.fdik.org/pyPEG/).
#######################################################################
from arpeggio import *
from arpeggio.export import PMDOTExport, PTDOTExport
from arpeggio import RegExMatch as _
def comment(): return [_("//.*"), _("/\*.*\*/")]
def literal(): return _(r'\d*\.\d*|\d+|".*?"')
def symbol(): return _(r"\w+")
def operator(): return _(r"\+|\-|\*|\/|\=\=")
def operation(): return symbol, operator, [literal, functioncall]
def expression(): return [literal, operation, functioncall]
def expressionlist(): return expression, ZeroOrMore(",", expression)
def returnstatement(): return Kwd("return"), expression
def ifstatement(): return Kwd("if"), "(", expression, ")", block, Kwd("else"), block
def statement(): return [ifstatement, returnstatement], ";"
def block(): return "{", OneOrMore(statement), "}"
def parameterlist(): return "(", symbol, ZeroOrMore(",", symbol), ")"
def functioncall(): return symbol, "(", expressionlist, ")"
def function(): return Kwd("function"), symbol, parameterlist, block
def simpleLanguage(): return function
try:
import arpeggio
arpeggio.DEBUG = True
# Parser instantiation. simpleLanguage is root definition and comment is
# grammar rule for comments.
parser = ParserPython(simpleLanguage, comment)
# We save parser model to dot file in order to visualise it.
# We can make a jpg out of it using dot (part of graphviz) like this
# dot -Tjpg -O simple_parser.dot
PMDOTExport().exportFile(parser.parser_model,
"simple_parser_model.dot")
# Parser model for comments is handled as separate model
PMDOTExport().exportFile(parser.comments_model,
"simple_parser_comments.dot")
input = """
function fak(n) {
if (n==0) {
// For 0! result is 0
return 0;
} else { /* And for n>0 result is calculated recursively */
return n * fak(n - 1);
};
}
"""
parse_tree = parser.parse(input)
PTDOTExport().exportFile(parse_tree,
"simple_parse_tree.dot")
except NoMatch, e:
print "Expected %s at position %s." % (e.value, str(e.parser.pos_to_linecol(e.position)))
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#######################################################################
# Name: arpeggio.py
# Purpose: PEG parser interpreter
# Author: Igor R. Dejanović <igor DOT dejanovic AT gmail DOT com>
# Copyright: (c) 2009 Igor R. Dejanović <igor DOT dejanovic AT gmail DOT com>
# License: MIT License
#
# Arpeggio is implementation of pacrat parser interpreter based on PEG grammars.
# Parsers are defined using python language construction or PEG language.
#######################################################################
__author__ = "Igor R. Dejanović <igor DOT dejanovic AT gmail DOT com>"
__version__ = "0.1-dev"
from setuptools import setup
NAME = 'Arpeggio'
VERSION = __version__
DESC = 'Pacrat parser interpreter'
AUTHOR = 'Igor R. Dejanovic'
AUTHOR_EMAIL = 'igor DOT dejanovic AT gmail DOT com'
LICENCE = 'MIT'
URL = 'http://arpeggio.googlecode.com/'
setup(
name = NAME,
version = VERSION,
description = DESC,
author = AUTHOR,
author_email = AUTHOR_EMAIL,
maintainer = AUTHOR,
maintainer_email = AUTHOR_EMAIL,
license = LICENCE,
url = URL,
packages = ["arpeggio"],
keywords = "parser pacrat peg",
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Developers',
'Intended Audience :: Information Technology',
'Intended Audience :: Science/Research',
'Topic :: Software Development :: Interpreters',
'Topic :: Software Development :: Compilers',
'Topic :: Software Development :: Libraries :: Python Modules'
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
'Programming Language :: Python',
]
)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment