Python 3 interpreter in Go
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
Brett Langdon 0fd448d5ea add start to interpreter interface 10 years ago
ast start to use gython objects and instructions 10 years ago
bytecode move bytecode/codeobject to gython/codeobject 10 years ago
cmd/gython add in some base objects 10 years ago
compiler start generating CodeObject in assemble() 10 years ago
error reorganize some things again 10 years ago
errorcode add errorcode names 10 years ago
grammar fix up some nodes and stuffs 10 years ago
gython start generating CodeObject in assemble() 10 years ago
interpreter add start to interpreter interface 10 years ago
scanner fix up indentation levels 10 years ago
symbol don't forget to add in symbol package 10 years ago
token add token.IsLiteral() method 10 years ago
LICENSE initial commit 10 years ago
README.md add compiler purpose/goal 10 years ago

README.md

Gython

This project is currently a for-fun work in progress.

The main goals of this project are to learn about programming languages by trying to rewrite CPython 3.5.0 in Go.

Progress

Scanner

So far I have a mostly working scanner/tokenizer. The main goal was to be able to generate similar output as running python3 -m tokenize --exact <script.py>. Currently there are a few small differences between the output format, but the tokens being produced are the same.

Grammar Parser

Next up is going to be writing the parser to be able to validate the source code grammar; which will match the form provided from:

import parser
import pprint
import symbol
import token


def resolve_symbol_names(part):
    if not isinstance(part, list):
        return part

    if not len(part):
        return part

    symbol_id = part[0]
    if symbol_id in symbol.sym_name:
        symbol_name = symbol.sym_name[symbol_id]
        return [symbol_name] + [resolve_symbol_names(p) for p in part[1:]]
    elif symbol_id in token.tok_name:
        token_name = token.tok_name[symbol_id]
        return [token_name] + part[1:]
    return part


def main(filename):
    with open(filename, 'r') as fp:
        contents = fp.read()
    st = parser.suite(contents)
    ast = resolve_symbol_names(st.tolist())
    pprint.pprint(ast)

if __name__ == '__main__':
    import sys
    main(sys.argv[1])
python3 grammar.py <script.py>
$ echo "print('hello world')" > test.py
$ python3 parse.py test.py
['file_input',
 ['stmt',
  ['simple_stmt',
   ['small_stmt',
    ['expr_stmt',
     ['testlist_star_expr',
      ['test',
       ['or_test',
        ['and_test',
         ['not_test',
          ['comparison',
           ['expr',
            ['xor_expr',
             ['and_expr',
              ['shift_expr',
               ['arith_expr',
                ['term',
                 ['factor',
                  ['power',
                   ['atom_expr',
                    ['atom', ['NAME', 'print']],
                    ['trailer',
                     ['LPAR', '('],
                     ['arglist',
                      ['argument',
                       ['test',
                        ['or_test',
                         ['and_test',
                          ['not_test',
                           ['comparison',
                            ['expr',
                             ['xor_expr',
                              ['and_expr',
                               ['shift_expr',
                                ['arith_expr',
                                 ['term',
                                  ['factor',
                                   ['power',
                                    ['atom_expr',
                                     ['atom',
                                      ['STRING',
                                       "'hello world'"]]]]]]]]]]]]]]]]]],
                     ['RPAR', ')']]]]]]]]]]]]]]]]]]],
   ['NEWLINE', '']]],
 ['NEWLINE', ''],
 ['ENDMARKER', '']]

AST Parsing

AST parsing will take the validated source grammar and convert it into a valid language AST.

The goal is to get a similar AST output as the following:

import ast


def main(filename):
    with open(filename, 'r') as fp:
        contents = fp.read()
    module = ast.parse(contents)
    print(ast.dump(module))

if __name__ == '__main__':
    import sys
    main(sys.argv[1])
$ echo "print('hello world')" > test.py
$ python3 parser.py test.py
Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Str(s='hello world')], keywords=[]))])

Compiler

The purpose of the compiler is to convert an AST into the appropriate Python bytecode.

The goal is to be able to produce a similar output as running:

$ echo "print('hello world')" > test.py
$ python3 -m dis test.py
  1           0 LOAD_CONST               0 (5)
              3 STORE_NAME               0 (num)
              6 LOAD_CONST               1 (None)
              9 RETURN_VALUE

Interpreter

The interpreter will be up after the compiler and will be able to execute on Python bytecode.