From 905c4b9edb050f340a1d695575a4ad9b75ffbe93 Mon Sep 17 00:00:00 2001 From: brettlangdon Date: Sat, 27 Jun 2015 13:23:10 -0400 Subject: [PATCH] finish forge blog post --- .../about/forge-configuration-parser/index.md | 174 ++++++++++++++---- 1 file changed, 141 insertions(+), 33 deletions(-) diff --git a/contents/writing/about/forge-configuration-parser/index.md b/contents/writing/about/forge-configuration-parser/index.md index cff4b28..ae17af7 100644 --- a/contents/writing/about/forge-configuration-parser/index.md +++ b/contents/writing/about/forge-configuration-parser/index.md @@ -1,33 +1,122 @@ --- title: Forge configuration parser author: Brett Langdon -date: 2015-06-19 +date: 2015-06-27 template: article.jade --- +An overview of how I wrote a configuration file format and parser. --- -## TODO: Fix this starting paragraph -Ok, job is great, lets talk about code! I have always had the aspiration of writing my own -programming language. It is something I never went through as part of my "higher" -education and it has always greatly interested me. I have always made attempts here and -there to learn what I can about how they work and try to deconstruct existing languages or -build what I can of new ones. I have never been very successful, but I am always learning. - -I finally feel that I am making a bit of progress in my learning. Recently I was working -on a project in [go](http://golang.org/) and where I started was trying to determine what -configuration language I wanted to use and whether I tested out -[YAML](https://en.wikipedia.org/wiki/YAML) or [JSON](https://en.wikipedia.org/wiki/JSON) -or [ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt right. What I really -wanted was a format similar to [nginx]() but I couldn't find any existing packages for go -which supported this syntax. A-ha, I smell an opportunity. The project I started is -[forge](https://github.com/brettlangdon/forge). Currently it is still nothing too -impressive, but I feel like it marks a step in my quest for creating a programming -language. +Recently I have finished the initial work on a project, +[forge](https://github.com/brettlangdon/forge), which is a +configuration file syntax and parser written in go. Recently I was working +on a project where I was trying to determine what configuration +language I wanted to use and whether I tested out +[YAML](https://en.wikipedia.org/wiki/YAML) or +[JSON](https://en.wikipedia.org/wiki/JSON) or +[ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt +right. What I really wanted was a format similar to +[nginx](http://wiki.nginx.org/FullExample) +but I couldn't find any existing packages for go which supported this +syntax. A-ha, I smell an opportunity. + +I have always been interested by programming languages, by their +design and implementation. I have always wanted to write my own +programming language, but since I have never had any formal education +around the subject I have always gone about it on my own. I bring it +up because this project has some similarities. You have a defined +syntax that gets parsed into some sort of intermediate format. The +part that is missing is where the intermediate format is then +translated into machine or byte code and actually executed. Since this +is just a configuration language, that is not necessary. + + +## Project overview + +You can see the repository for +[forge](https://github.com/brettlangdon/forge) for current usage and +documentation. + +Forge syntax is a file which is made up of _directives_. There are 3 +kinds of _directives_: + +* _settings_: Which are in the form ` = ` +* _sections_: Which are used to group more _directives_ ` { }` +* _includes_: Used to pull in settings from other forge config files `include ` + +Forge also supports various types of _setting_ values: + +* _string_: `key = "some value";` +* _bool_: `key = true;` +* _integer_: `key = 5;` +* _float_: `key = 5.5;` +* _null_: `key = null;` +* _reference_: `key = some_section.key;` + +Most of these setting types are probably fairly self explanatory +except for _reference_. A _reference_ in forge is a way to have the +value of one _setting_ be a pointer to another _setting_. For example: + +```config +global = "value"; +some_section { + key = "some_section.value"; + global_ref = global; + local_ref = .key; + ref_key = ref_section.ref_key; +} +ref_section { + ref_key = "hello"; +} +``` + +In this example we see 3 examples of _references_. A _reference_ value +is one which is an identifier (`global`) possibly multiple identifiers separated +with a period (`ref_section.ref_key`) as well _references_ can begin +with a perod (`.key`). Every _reference_ which is not prefixed with a period +is resolved from the global section (most outer level). So in this +example a _reference_ to `global` will point to the value of +`"value"` and `ref_section.ref_key` will point to the value of +`"hello"`. A _local reference_ is one which is prefixed with a period, +those are resolved starting from the current section that the +_setting_ is defined in. So in this case, `local_ref` will point to +the value of `"some_section.value"`. + +That is a rough idea of how forge files are defined, so lets see a +quick example of how you can use it from go. + +```go +package main + +import ( + "github.com/brettlangdon/forge" +) + +func main() { + settings, _ := forge.ParseFile("example.cfg") + if settings.Exists("global") { + value, _ := settings.GetString("global"); + fmt.Println(value); + } + settings.SetString("new_key", "new_value"); + + settingsMap := settings.ToMap(); + fmt.Println(settingsMaps["new_key"]); + + jsonBytes, _ := settings.ToJSON(); + fmt.Println(string(jsonBytes)); +} +``` + +## How it works + +Lets dive in and take a quick look at the parts that make forge +capable of working. **Example config file:** -```cfg +```config # Top comment global = "value"; section { @@ -40,14 +129,8 @@ section { } ``` -Effectively what I wrote was a programming language. A very very simple one, but one none -the less a programming language. Basically what this library does is take a configuration -file in the specific format and parses it into an intermediate format. In this case it -ends up being a `map[string]interface{}` but if it were a programming language it could -end up being an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree) -(AST) which is the intermediate representation that some languages use before compiling -the source into machine code or [bytecode](https://en.wikipedia.org/wiki/Bytecode). - +Basically what forge does is take a configuration file in defined +format and parses it into what is essentially a `map[string]interface{}`. The code itself is comprised of two main parts, the tokenizer (or scanner) and the parser. The tokenizer turns the raw source code (like above) into a stream of tokens. If you printed the token representation of the code above, it could look like: @@ -68,9 +151,34 @@ you printed the token representation of the code above, it could look like: ``` Then the parser takes in this stream of tokens and tries to parse them based on some known -grammar. For example, a directive is in the form ` -` (where `` can be ``, ``, ``, ``, -``, ``). When the parser sees `` it'll look ahead to the next -token to try and match it to this rule, if it matches then it knows to add this setting to -the internal `map[string]interface{}` for that identifier. If it doesn't match anything -then it has a syntax error and will throw an exception. +grammar. For example, a directive is in the form +` ` (where `` can be +``, ``, ``, ``, ``, +``). When the parser sees `` it'll look ahead +to the next token to try and match it to this rule, if it matches then +it knows to add this setting to the internal `map[string]interface{}` +for that identifier. If it doesn't match anything then it has a syntax +error and will throw an exception. + +The part that I think is interesting is that I opted to just write the +tokenizer and parser by hand rather than using a library that converts +a language grammar into a tokenizer (like flex/bison). I have done +this before and was inspired to do so after learning that that is how +the go programming language is written, you can see here +[parser.go](https://github.com/golang/go/blob/258bf65d8b157bfe311ce70c93dd854022a25c9d/src/go/parser/parser.go) +(not a light read at 2500 lines). The +[scanner.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/scanner.go) +and +[parser.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/parser.go) +might proof to be slightly easier reads for those who are interested. + +## Conclusion + +There is just a brief overview of the project and just a slight dip +into the inner workings of it. I am extremely interested in continuing +to learn as much as I can about programming languages and +parsers/compilers. I am going to put together a series of blog posts +that walk through what I have learned so far and which might help +guide the reader through creating something similar to forge. + +Enjoy.