diff --git a/contents/css/main.css b/contents/css/main.css index d66f60b..c940d83 100644 --- a/contents/css/main.css +++ b/contents/css/main.css @@ -317,7 +317,7 @@ pre code { line-height: 1.1; } -p code { +p code, li code { padding: 0.1em 0.3em 0.2em; border-radius: 0.3em; position: relative; @@ -450,7 +450,7 @@ code.lang-markdown .bullet { } section.social-links { - width: 201px; + width: 191px; margin-left: auto; margin-right: auto; } diff --git a/contents/img/geeklist.png b/contents/img/geeklist.png deleted file mode 100644 index 71c07b8..0000000 Binary files a/contents/img/geeklist.png and /dev/null differ diff --git a/contents/img/googleplus.png b/contents/img/googleplus.png deleted file mode 100644 index a36053f..0000000 Binary files a/contents/img/googleplus.png and /dev/null differ diff --git a/contents/img/pandora.png b/contents/img/pandora.png deleted file mode 100644 index 7802a39..0000000 Binary files a/contents/img/pandora.png and /dev/null differ diff --git a/contents/writing/about/forge-configuration-parser/index.md b/contents/writing/about/forge-configuration-parser/index.md new file mode 100644 index 0000000..ae17af7 --- /dev/null +++ b/contents/writing/about/forge-configuration-parser/index.md @@ -0,0 +1,184 @@ +--- +title: Forge configuration parser +author: Brett Langdon +date: 2015-06-27 +template: article.jade +--- + +An overview of how I wrote a configuration file format and parser. + +--- + +Recently I have finished the initial work on a project, +[forge](https://github.com/brettlangdon/forge), which is a +configuration file syntax and parser written in go. Recently I was working +on a project where I was trying to determine what configuration +language I wanted to use and whether I tested out +[YAML](https://en.wikipedia.org/wiki/YAML) or +[JSON](https://en.wikipedia.org/wiki/JSON) or +[ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt +right. What I really wanted was a format similar to +[nginx](http://wiki.nginx.org/FullExample) +but I couldn't find any existing packages for go which supported this +syntax. A-ha, I smell an opportunity. + +I have always been interested by programming languages, by their +design and implementation. I have always wanted to write my own +programming language, but since I have never had any formal education +around the subject I have always gone about it on my own. I bring it +up because this project has some similarities. You have a defined +syntax that gets parsed into some sort of intermediate format. The +part that is missing is where the intermediate format is then +translated into machine or byte code and actually executed. Since this +is just a configuration language, that is not necessary. + + +## Project overview + +You can see the repository for +[forge](https://github.com/brettlangdon/forge) for current usage and +documentation. + +Forge syntax is a file which is made up of _directives_. There are 3 +kinds of _directives_: + +* _settings_: Which are in the form ` = ` +* _sections_: Which are used to group more _directives_ ` { }` +* _includes_: Used to pull in settings from other forge config files `include ` + +Forge also supports various types of _setting_ values: + +* _string_: `key = "some value";` +* _bool_: `key = true;` +* _integer_: `key = 5;` +* _float_: `key = 5.5;` +* _null_: `key = null;` +* _reference_: `key = some_section.key;` + +Most of these setting types are probably fairly self explanatory +except for _reference_. A _reference_ in forge is a way to have the +value of one _setting_ be a pointer to another _setting_. For example: + +```config +global = "value"; +some_section { + key = "some_section.value"; + global_ref = global; + local_ref = .key; + ref_key = ref_section.ref_key; +} +ref_section { + ref_key = "hello"; +} +``` + +In this example we see 3 examples of _references_. A _reference_ value +is one which is an identifier (`global`) possibly multiple identifiers separated +with a period (`ref_section.ref_key`) as well _references_ can begin +with a perod (`.key`). Every _reference_ which is not prefixed with a period +is resolved from the global section (most outer level). So in this +example a _reference_ to `global` will point to the value of +`"value"` and `ref_section.ref_key` will point to the value of +`"hello"`. A _local reference_ is one which is prefixed with a period, +those are resolved starting from the current section that the +_setting_ is defined in. So in this case, `local_ref` will point to +the value of `"some_section.value"`. + +That is a rough idea of how forge files are defined, so lets see a +quick example of how you can use it from go. + +```go +package main + +import ( + "github.com/brettlangdon/forge" +) + +func main() { + settings, _ := forge.ParseFile("example.cfg") + if settings.Exists("global") { + value, _ := settings.GetString("global"); + fmt.Println(value); + } + settings.SetString("new_key", "new_value"); + + settingsMap := settings.ToMap(); + fmt.Println(settingsMaps["new_key"]); + + jsonBytes, _ := settings.ToJSON(); + fmt.Println(string(jsonBytes)); +} +``` + +## How it works + +Lets dive in and take a quick look at the parts that make forge +capable of working. + +**Example config file:** +```config +# Top comment +global = "value"; +section { + a_float = 50.67; + sub_section { + a_null = null; + a_bool = true; + a_reference = section.a_float; # Gets replaced with `50.67` + } +} +``` + +Basically what forge does is take a configuration file in defined +format and parses it into what is essentially a `map[string]interface{}`. +The code itself is comprised of two main parts, the tokenizer (or scanner) and the +parser. The tokenizer turns the raw source code (like above) into a stream of tokens. If +you printed the token representation of the code above, it could look like: + +``` +(COMMENT, "Top comment") +(IDENTIFIER, "global") +(EQUAL, "=") +(STRING, "value") +(SEMICOLON, ";" +(IDENTIFIER, "section") +(LBRACKET, "{") +(IDENTIFIER, "a_float") +(EQUAL, "=") +(FLOAT, "50.67") +(SEMICOLON, ";") +.... +``` + +Then the parser takes in this stream of tokens and tries to parse them based on some known +grammar. For example, a directive is in the form +` ` (where `` can be +``, ``, ``, ``, ``, +``). When the parser sees `` it'll look ahead +to the next token to try and match it to this rule, if it matches then +it knows to add this setting to the internal `map[string]interface{}` +for that identifier. If it doesn't match anything then it has a syntax +error and will throw an exception. + +The part that I think is interesting is that I opted to just write the +tokenizer and parser by hand rather than using a library that converts +a language grammar into a tokenizer (like flex/bison). I have done +this before and was inspired to do so after learning that that is how +the go programming language is written, you can see here +[parser.go](https://github.com/golang/go/blob/258bf65d8b157bfe311ce70c93dd854022a25c9d/src/go/parser/parser.go) +(not a light read at 2500 lines). The +[scanner.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/scanner.go) +and +[parser.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/parser.go) +might proof to be slightly easier reads for those who are interested. + +## Conclusion + +There is just a brief overview of the project and just a slight dip +into the inner workings of it. I am extremely interested in continuing +to learn as much as I can about programming languages and +parsers/compilers. I am going to put together a series of blog posts +that walk through what I have learned so far and which might help +guide the reader through creating something similar to forge. + +Enjoy.