|
|
@ -1,33 +1,122 @@ |
|
|
--- |
|
|
--- |
|
|
title: Forge configuration parser |
|
|
title: Forge configuration parser |
|
|
author: Brett Langdon |
|
|
author: Brett Langdon |
|
|
date: 2015-06-19 |
|
|
|
|
|
|
|
|
date: 2015-06-27 |
|
|
template: article.jade |
|
|
template: article.jade |
|
|
--- |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
An overview of how I wrote a configuration file format and parser. |
|
|
|
|
|
|
|
|
--- |
|
|
--- |
|
|
|
|
|
|
|
|
## TODO: Fix this starting paragraph |
|
|
|
|
|
Ok, job is great, lets talk about code! I have always had the aspiration of writing my own |
|
|
|
|
|
programming language. It is something I never went through as part of my "higher" |
|
|
|
|
|
education and it has always greatly interested me. I have always made attempts here and |
|
|
|
|
|
there to learn what I can about how they work and try to deconstruct existing languages or |
|
|
|
|
|
build what I can of new ones. I have never been very successful, but I am always learning. |
|
|
|
|
|
|
|
|
|
|
|
I finally feel that I am making a bit of progress in my learning. Recently I was working |
|
|
|
|
|
on a project in [go](http://golang.org/) and where I started was trying to determine what |
|
|
|
|
|
configuration language I wanted to use and whether I tested out |
|
|
|
|
|
[YAML](https://en.wikipedia.org/wiki/YAML) or [JSON](https://en.wikipedia.org/wiki/JSON) |
|
|
|
|
|
or [ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt right. What I really |
|
|
|
|
|
wanted was a format similar to [nginx]() but I couldn't find any existing packages for go |
|
|
|
|
|
which supported this syntax. A-ha, I smell an opportunity. The project I started is |
|
|
|
|
|
[forge](https://github.com/brettlangdon/forge). Currently it is still nothing too |
|
|
|
|
|
impressive, but I feel like it marks a step in my quest for creating a programming |
|
|
|
|
|
language. |
|
|
|
|
|
|
|
|
Recently I have finished the initial work on a project, |
|
|
|
|
|
[forge](https://github.com/brettlangdon/forge), which is a |
|
|
|
|
|
configuration file syntax and parser written in go. Recently I was working |
|
|
|
|
|
on a project where I was trying to determine what configuration |
|
|
|
|
|
language I wanted to use and whether I tested out |
|
|
|
|
|
[YAML](https://en.wikipedia.org/wiki/YAML) or |
|
|
|
|
|
[JSON](https://en.wikipedia.org/wiki/JSON) or |
|
|
|
|
|
[ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt |
|
|
|
|
|
right. What I really wanted was a format similar to |
|
|
|
|
|
[nginx](http://wiki.nginx.org/FullExample) |
|
|
|
|
|
but I couldn't find any existing packages for go which supported this |
|
|
|
|
|
syntax. A-ha, I smell an opportunity. |
|
|
|
|
|
|
|
|
|
|
|
I have always been interested by programming languages, by their |
|
|
|
|
|
design and implementation. I have always wanted to write my own |
|
|
|
|
|
programming language, but since I have never had any formal education |
|
|
|
|
|
around the subject I have always gone about it on my own. I bring it |
|
|
|
|
|
up because this project has some similarities. You have a defined |
|
|
|
|
|
syntax that gets parsed into some sort of intermediate format. The |
|
|
|
|
|
part that is missing is where the intermediate format is then |
|
|
|
|
|
translated into machine or byte code and actually executed. Since this |
|
|
|
|
|
is just a configuration language, that is not necessary. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Project overview |
|
|
|
|
|
|
|
|
|
|
|
You can see the repository for |
|
|
|
|
|
[forge](https://github.com/brettlangdon/forge) for current usage and |
|
|
|
|
|
documentation. |
|
|
|
|
|
|
|
|
|
|
|
Forge syntax is a file which is made up of _directives_. There are 3 |
|
|
|
|
|
kinds of _directives_: |
|
|
|
|
|
|
|
|
|
|
|
* _settings_: Which are in the form `<KEY> = <VALUE>` |
|
|
|
|
|
* _sections_: Which are used to group more _directives_ `<SECTION-NAME> { <DIRECTIVES> }` |
|
|
|
|
|
* _includes_: Used to pull in settings from other forge config files `include <FILENAME/GLOB>` |
|
|
|
|
|
|
|
|
|
|
|
Forge also supports various types of _setting_ values: |
|
|
|
|
|
|
|
|
|
|
|
* _string_: `key = "some value";` |
|
|
|
|
|
* _bool_: `key = true;` |
|
|
|
|
|
* _integer_: `key = 5;` |
|
|
|
|
|
* _float_: `key = 5.5;` |
|
|
|
|
|
* _null_: `key = null;` |
|
|
|
|
|
* _reference_: `key = some_section.key;` |
|
|
|
|
|
|
|
|
|
|
|
Most of these setting types are probably fairly self explanatory |
|
|
|
|
|
except for _reference_. A _reference_ in forge is a way to have the |
|
|
|
|
|
value of one _setting_ be a pointer to another _setting_. For example: |
|
|
|
|
|
|
|
|
|
|
|
```config |
|
|
|
|
|
global = "value"; |
|
|
|
|
|
some_section { |
|
|
|
|
|
key = "some_section.value"; |
|
|
|
|
|
global_ref = global; |
|
|
|
|
|
local_ref = .key; |
|
|
|
|
|
ref_key = ref_section.ref_key; |
|
|
|
|
|
} |
|
|
|
|
|
ref_section { |
|
|
|
|
|
ref_key = "hello"; |
|
|
|
|
|
} |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
In this example we see 3 examples of _references_. A _reference_ value |
|
|
|
|
|
is one which is an identifier (`global`) possibly multiple identifiers separated |
|
|
|
|
|
with a period (`ref_section.ref_key`) as well _references_ can begin |
|
|
|
|
|
with a perod (`.key`). Every _reference_ which is not prefixed with a period |
|
|
|
|
|
is resolved from the global section (most outer level). So in this |
|
|
|
|
|
example a _reference_ to `global` will point to the value of |
|
|
|
|
|
`"value"` and `ref_section.ref_key` will point to the value of |
|
|
|
|
|
`"hello"`. A _local reference_ is one which is prefixed with a period, |
|
|
|
|
|
those are resolved starting from the current section that the |
|
|
|
|
|
_setting_ is defined in. So in this case, `local_ref` will point to |
|
|
|
|
|
the value of `"some_section.value"`. |
|
|
|
|
|
|
|
|
|
|
|
That is a rough idea of how forge files are defined, so lets see a |
|
|
|
|
|
quick example of how you can use it from go. |
|
|
|
|
|
|
|
|
|
|
|
```go |
|
|
|
|
|
package main |
|
|
|
|
|
|
|
|
|
|
|
import ( |
|
|
|
|
|
"github.com/brettlangdon/forge" |
|
|
|
|
|
) |
|
|
|
|
|
|
|
|
|
|
|
func main() { |
|
|
|
|
|
settings, _ := forge.ParseFile("example.cfg") |
|
|
|
|
|
if settings.Exists("global") { |
|
|
|
|
|
value, _ := settings.GetString("global"); |
|
|
|
|
|
fmt.Println(value); |
|
|
|
|
|
} |
|
|
|
|
|
settings.SetString("new_key", "new_value"); |
|
|
|
|
|
|
|
|
|
|
|
settingsMap := settings.ToMap(); |
|
|
|
|
|
fmt.Println(settingsMaps["new_key"]); |
|
|
|
|
|
|
|
|
|
|
|
jsonBytes, _ := settings.ToJSON(); |
|
|
|
|
|
fmt.Println(string(jsonBytes)); |
|
|
|
|
|
} |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
## How it works |
|
|
|
|
|
|
|
|
|
|
|
Lets dive in and take a quick look at the parts that make forge |
|
|
|
|
|
capable of working. |
|
|
|
|
|
|
|
|
**Example config file:** |
|
|
**Example config file:** |
|
|
```cfg |
|
|
|
|
|
|
|
|
```config |
|
|
# Top comment |
|
|
# Top comment |
|
|
global = "value"; |
|
|
global = "value"; |
|
|
section { |
|
|
section { |
|
|
@ -40,14 +129,8 @@ section { |
|
|
} |
|
|
} |
|
|
``` |
|
|
``` |
|
|
|
|
|
|
|
|
Effectively what I wrote was a programming language. A very very simple one, but one none |
|
|
|
|
|
the less a programming language. Basically what this library does is take a configuration |
|
|
|
|
|
file in the specific format and parses it into an intermediate format. In this case it |
|
|
|
|
|
ends up being a `map[string]interface{}` but if it were a programming language it could |
|
|
|
|
|
end up being an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree) |
|
|
|
|
|
(AST) which is the intermediate representation that some languages use before compiling |
|
|
|
|
|
the source into machine code or [bytecode](https://en.wikipedia.org/wiki/Bytecode). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Basically what forge does is take a configuration file in defined |
|
|
|
|
|
format and parses it into what is essentially a `map[string]interface{}`. |
|
|
The code itself is comprised of two main parts, the tokenizer (or scanner) and the |
|
|
The code itself is comprised of two main parts, the tokenizer (or scanner) and the |
|
|
parser. The tokenizer turns the raw source code (like above) into a stream of tokens. If |
|
|
parser. The tokenizer turns the raw source code (like above) into a stream of tokens. If |
|
|
you printed the token representation of the code above, it could look like: |
|
|
you printed the token representation of the code above, it could look like: |
|
|
@ -68,9 +151,34 @@ you printed the token representation of the code above, it could look like: |
|
|
``` |
|
|
``` |
|
|
|
|
|
|
|
|
Then the parser takes in this stream of tokens and tries to parse them based on some known |
|
|
Then the parser takes in this stream of tokens and tries to parse them based on some known |
|
|
grammar. For example, a directive is in the form `<IDENTIFIER> <EQUAL> <VALUE> |
|
|
|
|
|
<SEMICOLON>` (where `<VALUE>` can be `<STRING>`, `<BOOL>`, `<INTEGER>`, `<FLOAT>`, |
|
|
|
|
|
`<NULL>`, `<REFERENCE>`). When the parser sees `<IDENTIFIER>` it'll look ahead to the next |
|
|
|
|
|
token to try and match it to this rule, if it matches then it knows to add this setting to |
|
|
|
|
|
the internal `map[string]interface{}` for that identifier. If it doesn't match anything |
|
|
|
|
|
then it has a syntax error and will throw an exception. |
|
|
|
|
|
|
|
|
grammar. For example, a directive is in the form |
|
|
|
|
|
`<IDENTIFIER> <EQUAL> <VALUE> <SEMICOLON>` (where `<VALUE>` can be |
|
|
|
|
|
`<STRING>`, `<BOOL>`, `<INTEGER>`, `<FLOAT>`, `<NULL>`, |
|
|
|
|
|
`<REFERENCE>`). When the parser sees `<IDENTIFIER>` it'll look ahead |
|
|
|
|
|
to the next token to try and match it to this rule, if it matches then |
|
|
|
|
|
it knows to add this setting to the internal `map[string]interface{}` |
|
|
|
|
|
for that identifier. If it doesn't match anything then it has a syntax |
|
|
|
|
|
error and will throw an exception. |
|
|
|
|
|
|
|
|
|
|
|
The part that I think is interesting is that I opted to just write the |
|
|
|
|
|
tokenizer and parser by hand rather than using a library that converts |
|
|
|
|
|
a language grammar into a tokenizer (like flex/bison). I have done |
|
|
|
|
|
this before and was inspired to do so after learning that that is how |
|
|
|
|
|
the go programming language is written, you can see here |
|
|
|
|
|
[parser.go](https://github.com/golang/go/blob/258bf65d8b157bfe311ce70c93dd854022a25c9d/src/go/parser/parser.go) |
|
|
|
|
|
(not a light read at 2500 lines). The |
|
|
|
|
|
[scanner.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/scanner.go) |
|
|
|
|
|
and |
|
|
|
|
|
[parser.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/parser.go) |
|
|
|
|
|
might proof to be slightly easier reads for those who are interested. |
|
|
|
|
|
|
|
|
|
|
|
## Conclusion |
|
|
|
|
|
|
|
|
|
|
|
There is just a brief overview of the project and just a slight dip |
|
|
|
|
|
into the inner workings of it. I am extremely interested in continuing |
|
|
|
|
|
to learn as much as I can about programming languages and |
|
|
|
|
|
parsers/compilers. I am going to put together a series of blog posts |
|
|
|
|
|
that walk through what I have learned so far and which might help |
|
|
|
|
|
guide the reader through creating something similar to forge. |
|
|
|
|
|
|
|
|
|
|
|
Enjoy. |