|
|
10 years ago | |
|---|---|---|
| ast | 10 years ago | |
| bytecode | 10 years ago | |
| cmd/gython | 10 years ago | |
| compiler | 10 years ago | |
| error | 10 years ago | |
| errorcode | 10 years ago | |
| grammar | 10 years ago | |
| gython | 10 years ago | |
| interpreter | 10 years ago | |
| scanner | 10 years ago | |
| symbol | 10 years ago | |
| token | 10 years ago | |
| LICENSE | 10 years ago | |
| README.md | 10 years ago | |
This project is currently a for-fun work in progress.
The main goals of this project are to learn about programming languages by trying to rewrite CPython 3.5.0 in Go.
So far I have a mostly working scanner/tokenizer. The main goal was to be able to generate similar output as running python3 -m tokenize --exact <script.py>.
Currently there are a few small differences between the output format, but the tokens being produced are the same.
Next up is going to be writing the parser to be able to validate the source code grammar; which will match the form provided from:
import parser
import pprint
import symbol
import token
def resolve_symbol_names(part):
if not isinstance(part, list):
return part
if not len(part):
return part
symbol_id = part[0]
if symbol_id in symbol.sym_name:
symbol_name = symbol.sym_name[symbol_id]
return [symbol_name] + [resolve_symbol_names(p) for p in part[1:]]
elif symbol_id in token.tok_name:
token_name = token.tok_name[symbol_id]
return [token_name] + part[1:]
return part
def main(filename):
with open(filename, 'r') as fp:
contents = fp.read()
st = parser.suite(contents)
ast = resolve_symbol_names(st.tolist())
pprint.pprint(ast)
if __name__ == '__main__':
import sys
main(sys.argv[1])
python3 grammar.py <script.py>
$ echo "print('hello world')" > test.py
$ python3 parse.py test.py
['file_input',
['stmt',
['simple_stmt',
['small_stmt',
['expr_stmt',
['testlist_star_expr',
['test',
['or_test',
['and_test',
['not_test',
['comparison',
['expr',
['xor_expr',
['and_expr',
['shift_expr',
['arith_expr',
['term',
['factor',
['power',
['atom_expr',
['atom', ['NAME', 'print']],
['trailer',
['LPAR', '('],
['arglist',
['argument',
['test',
['or_test',
['and_test',
['not_test',
['comparison',
['expr',
['xor_expr',
['and_expr',
['shift_expr',
['arith_expr',
['term',
['factor',
['power',
['atom_expr',
['atom',
['STRING',
"'hello world'"]]]]]]]]]]]]]]]]]],
['RPAR', ')']]]]]]]]]]]]]]]]]]],
['NEWLINE', '']]],
['NEWLINE', ''],
['ENDMARKER', '']]
AST parsing will take the validated source grammar and convert it into a valid language AST.
The goal is to get a similar AST output as the following:
import ast
def main(filename):
with open(filename, 'r') as fp:
contents = fp.read()
module = ast.parse(contents)
print(ast.dump(module))
if __name__ == '__main__':
import sys
main(sys.argv[1])
$ echo "print('hello world')" > test.py
$ python3 parser.py test.py
Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Str(s='hello world')], keywords=[]))])
The purpose of the compiler is to convert an AST into the appropriate Python bytecode.
The goal is to be able to produce a similar output as running:
$ echo "print('hello world')" > test.py
$ python3 -m dis test.py
1 0 LOAD_CONST 0 (5)
3 STORE_NAME 0 (num)
6 LOAD_CONST 1 (None)
9 RETURN_VALUE
The interpreter will be up after the compiler and will be able to execute on Python bytecode.