| @ -0,0 +1,184 @@ | |||||
| --- | |||||
| title: Forge configuration parser | |||||
| author: Brett Langdon | |||||
| date: 2015-06-27 | |||||
| template: article.jade | |||||
| --- | |||||
| An overview of how I wrote a configuration file format and parser. | |||||
| --- | |||||
| Recently I have finished the initial work on a project, | |||||
| [forge](https://github.com/brettlangdon/forge), which is a | |||||
| configuration file syntax and parser written in go. Recently I was working | |||||
| on a project where I was trying to determine what configuration | |||||
| language I wanted to use and whether I tested out | |||||
| [YAML](https://en.wikipedia.org/wiki/YAML) or | |||||
| [JSON](https://en.wikipedia.org/wiki/JSON) or | |||||
| [ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt | |||||
| right. What I really wanted was a format similar to | |||||
| [nginx](http://wiki.nginx.org/FullExample) | |||||
| but I couldn't find any existing packages for go which supported this | |||||
| syntax. A-ha, I smell an opportunity. | |||||
| I have always been interested by programming languages, by their | |||||
| design and implementation. I have always wanted to write my own | |||||
| programming language, but since I have never had any formal education | |||||
| around the subject I have always gone about it on my own. I bring it | |||||
| up because this project has some similarities. You have a defined | |||||
| syntax that gets parsed into some sort of intermediate format. The | |||||
| part that is missing is where the intermediate format is then | |||||
| translated into machine or byte code and actually executed. Since this | |||||
| is just a configuration language, that is not necessary. | |||||
| ## Project overview | |||||
| You can see the repository for | |||||
| [forge](https://github.com/brettlangdon/forge) for current usage and | |||||
| documentation. | |||||
| Forge syntax is a file which is made up of _directives_. There are 3 | |||||
| kinds of _directives_: | |||||
| * _settings_: Which are in the form `<KEY> = <VALUE>` | |||||
| * _sections_: Which are used to group more _directives_ `<SECTION-NAME> { <DIRECTIVES> }` | |||||
| * _includes_: Used to pull in settings from other forge config files `include <FILENAME/GLOB>` | |||||
| Forge also supports various types of _setting_ values: | |||||
| * _string_: `key = "some value";` | |||||
| * _bool_: `key = true;` | |||||
| * _integer_: `key = 5;` | |||||
| * _float_: `key = 5.5;` | |||||
| * _null_: `key = null;` | |||||
| * _reference_: `key = some_section.key;` | |||||
| Most of these setting types are probably fairly self explanatory | |||||
| except for _reference_. A _reference_ in forge is a way to have the | |||||
| value of one _setting_ be a pointer to another _setting_. For example: | |||||
| ```config | |||||
| global = "value"; | |||||
| some_section { | |||||
| key = "some_section.value"; | |||||
| global_ref = global; | |||||
| local_ref = .key; | |||||
| ref_key = ref_section.ref_key; | |||||
| } | |||||
| ref_section { | |||||
| ref_key = "hello"; | |||||
| } | |||||
| ``` | |||||
| In this example we see 3 examples of _references_. A _reference_ value | |||||
| is one which is an identifier (`global`) possibly multiple identifiers separated | |||||
| with a period (`ref_section.ref_key`) as well _references_ can begin | |||||
| with a perod (`.key`). Every _reference_ which is not prefixed with a period | |||||
| is resolved from the global section (most outer level). So in this | |||||
| example a _reference_ to `global` will point to the value of | |||||
| `"value"` and `ref_section.ref_key` will point to the value of | |||||
| `"hello"`. A _local reference_ is one which is prefixed with a period, | |||||
| those are resolved starting from the current section that the | |||||
| _setting_ is defined in. So in this case, `local_ref` will point to | |||||
| the value of `"some_section.value"`. | |||||
| That is a rough idea of how forge files are defined, so lets see a | |||||
| quick example of how you can use it from go. | |||||
| ```go | |||||
| package main | |||||
| import ( | |||||
| "github.com/brettlangdon/forge" | |||||
| ) | |||||
| func main() { | |||||
| settings, _ := forge.ParseFile("example.cfg") | |||||
| if settings.Exists("global") { | |||||
| value, _ := settings.GetString("global"); | |||||
| fmt.Println(value); | |||||
| } | |||||
| settings.SetString("new_key", "new_value"); | |||||
| settingsMap := settings.ToMap(); | |||||
| fmt.Println(settingsMaps["new_key"]); | |||||
| jsonBytes, _ := settings.ToJSON(); | |||||
| fmt.Println(string(jsonBytes)); | |||||
| } | |||||
| ``` | |||||
| ## How it works | |||||
| Lets dive in and take a quick look at the parts that make forge | |||||
| capable of working. | |||||
| **Example config file:** | |||||
| ```config | |||||
| # Top comment | |||||
| global = "value"; | |||||
| section { | |||||
| a_float = 50.67; | |||||
| sub_section { | |||||
| a_null = null; | |||||
| a_bool = true; | |||||
| a_reference = section.a_float; # Gets replaced with `50.67` | |||||
| } | |||||
| } | |||||
| ``` | |||||
| Basically what forge does is take a configuration file in defined | |||||
| format and parses it into what is essentially a `map[string]interface{}`. | |||||
| The code itself is comprised of two main parts, the tokenizer (or scanner) and the | |||||
| parser. The tokenizer turns the raw source code (like above) into a stream of tokens. If | |||||
| you printed the token representation of the code above, it could look like: | |||||
| ``` | |||||
| (COMMENT, "Top comment") | |||||
| (IDENTIFIER, "global") | |||||
| (EQUAL, "=") | |||||
| (STRING, "value") | |||||
| (SEMICOLON, ";" | |||||
| (IDENTIFIER, "section") | |||||
| (LBRACKET, "{") | |||||
| (IDENTIFIER, "a_float") | |||||
| (EQUAL, "=") | |||||
| (FLOAT, "50.67") | |||||
| (SEMICOLON, ";") | |||||
| .... | |||||
| ``` | |||||
| Then the parser takes in this stream of tokens and tries to parse them based on some known | |||||
| grammar. For example, a directive is in the form | |||||
| `<IDENTIFIER> <EQUAL> <VALUE> <SEMICOLON>` (where `<VALUE>` can be | |||||
| `<STRING>`, `<BOOL>`, `<INTEGER>`, `<FLOAT>`, `<NULL>`, | |||||
| `<REFERENCE>`). When the parser sees `<IDENTIFIER>` it'll look ahead | |||||
| to the next token to try and match it to this rule, if it matches then | |||||
| it knows to add this setting to the internal `map[string]interface{}` | |||||
| for that identifier. If it doesn't match anything then it has a syntax | |||||
| error and will throw an exception. | |||||
| The part that I think is interesting is that I opted to just write the | |||||
| tokenizer and parser by hand rather than using a library that converts | |||||
| a language grammar into a tokenizer (like flex/bison). I have done | |||||
| this before and was inspired to do so after learning that that is how | |||||
| the go programming language is written, you can see here | |||||
| [parser.go](https://github.com/golang/go/blob/258bf65d8b157bfe311ce70c93dd854022a25c9d/src/go/parser/parser.go) | |||||
| (not a light read at 2500 lines). The | |||||
| [scanner.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/scanner.go) | |||||
| and | |||||
| [parser.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/parser.go) | |||||
| might proof to be slightly easier reads for those who are interested. | |||||
| ## Conclusion | |||||
| There is just a brief overview of the project and just a slight dip | |||||
| into the inner workings of it. I am extremely interested in continuing | |||||
| to learn as much as I can about programming languages and | |||||
| parsers/compilers. I am going to put together a series of blog posts | |||||
| that walk through what I have learned so far and which might help | |||||
| guide the reader through creating something similar to forge. | |||||
| Enjoy. | |||||