add draft for forge article

11 years ago · d46f997e24
--- a/contents/writing/about/forge-configuration-parser/index.md
+++ b/contents/writing/about/forge-configuration-parser/index.md
@ -0,0 +1,76 @@
 ---
 title: Forge configuration parser
 author: Brett Langdon
 date: 2015-06-19
 template: article.jade
 ---


 ---

 ## TODO: Fix this starting paragraph
 Ok, job is great, lets talk about code! I have always had the aspiration of writing my own
 programming language. It is something I never went through as part of my "higher"
 education and it has always greatly interested me. I have always made attempts here and
 there to learn what I can about how they work and try to deconstruct existing languages or
 build what I can of new ones. I have never been very successful, but I am always learning.

 I finally feel that I am making a bit of progress in my learning. Recently I was working
 on a project in [go](http://golang.org/) and where I started was trying to determine what
 configuration language I wanted to use and whether I tested out
 [YAML](https://en.wikipedia.org/wiki/YAML) or [JSON](https://en.wikipedia.org/wiki/JSON)
 or [ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt right. What I really
 wanted was a format similar to [nginx]() but I couldn't find any existing packages for go
 which supported this syntax. A-ha, I smell an opportunity. The project I started is
 [forge](https://github.com/brettlangdon/forge). Currently it is still nothing too
 impressive, but I feel like it marks a step in my quest for creating a programming
 language.

 **Example config file:**
 ```cfg
 # Top comment
 global = "value";
 section {
  a_float = 50.67;
  sub_section {
    a_null = null;
    a_bool = true;
    a_reference = section.a_float;  # Gets replaced with `50.67`
  }
 }
 ```

 Effectively what I wrote was a programming language. A very very simple one, but one none
 the less a programming language. Basically what this library does is take a configuration
 file in the specific format and parses it into an intermediate format. In this case it
 ends up being a `map[string]interface{}` but if it were a programming language it could
 end up being an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree)
 (AST) which is the intermediate representation that some languages use before compiling
 the source into machine code or [bytecode](https://en.wikipedia.org/wiki/Bytecode).

 The code itself is comprised of two main parts, the tokenizer (or scanner) and the
 parser. The tokenizer turns the raw source code (like above) into a stream of tokens. If
 you printed the token representation of the code above, it could look like:

 ```
 (COMMENT, "Top comment")
 (IDENTIFIER, "global")
 (EQUAL, "=")
 (STRING, "value")
 (SEMICOLON, ";"
 (IDENTIFIER, "section")
 (LBRACKET, "{")
 (IDENTIFIER, "a_float")
 (EQUAL, "=")
 (FLOAT, "50.67")
 (SEMICOLON, ";")
 ....
 ```

 Then the parser takes in this stream of tokens and tries to parse them based on some known
 grammar. For example, a directive is in the form `<IDENTIFIER> <EQUAL> <VALUE>
 <SEMICOLON>` (where `<VALUE>` can be `<STRING>`, `<BOOL>`, `<INTEGER>`, `<FLOAT>`,
 `<NULL>`, `<REFERENCE>`). When the parser sees `<IDENTIFIER>` it'll look ahead to the next
 token to try and match it to this rule, if it matches then it knows to add this setting to
 the internal `map[string]interface{}` for that identifier. If it doesn't match anything
 then it has a syntax error and will throw an exception.