Browse Source

Merge pull request #1 from brettlangdon/draft/forge

Draft/forge
pull/2/head
Brett Langdon 11 years ago
parent
commit
64ff14eeaf
5 changed files with 186 additions and 2 deletions
  1. +2
    -2
      contents/css/main.css
  2. BIN
      contents/img/geeklist.png
  3. BIN
      contents/img/googleplus.png
  4. BIN
      contents/img/pandora.png
  5. +184
    -0
      contents/writing/about/forge-configuration-parser/index.md

+ 2
- 2
contents/css/main.css View File

@ -317,7 +317,7 @@ pre code {
line-height: 1.1; line-height: 1.1;
} }
p code {
p code, li code {
padding: 0.1em 0.3em 0.2em; padding: 0.1em 0.3em 0.2em;
border-radius: 0.3em; border-radius: 0.3em;
position: relative; position: relative;
@ -450,7 +450,7 @@ code.lang-markdown .bullet {
} }
section.social-links { section.social-links {
width: 201px;
width: 191px;
margin-left: auto; margin-left: auto;
margin-right: auto; margin-right: auto;
} }


BIN
contents/img/geeklist.png View File

Before After
Width: 32  |  Height: 32  |  Size: 1.7 KiB

BIN
contents/img/googleplus.png View File

Before After
Width: 32  |  Height: 32  |  Size: 1.6 KiB

BIN
contents/img/pandora.png View File

Before After
Width: 32  |  Height: 32  |  Size: 1.5 KiB

+ 184
- 0
contents/writing/about/forge-configuration-parser/index.md View File

@ -0,0 +1,184 @@
---
title: Forge configuration parser
author: Brett Langdon
date: 2015-06-27
template: article.jade
---
An overview of how I wrote a configuration file format and parser.
---
Recently I have finished the initial work on a project,
[forge](https://github.com/brettlangdon/forge), which is a
configuration file syntax and parser written in go. Recently I was working
on a project where I was trying to determine what configuration
language I wanted to use and whether I tested out
[YAML](https://en.wikipedia.org/wiki/YAML) or
[JSON](https://en.wikipedia.org/wiki/JSON) or
[ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt
right. What I really wanted was a format similar to
[nginx](http://wiki.nginx.org/FullExample)
but I couldn't find any existing packages for go which supported this
syntax. A-ha, I smell an opportunity.
I have always been interested by programming languages, by their
design and implementation. I have always wanted to write my own
programming language, but since I have never had any formal education
around the subject I have always gone about it on my own. I bring it
up because this project has some similarities. You have a defined
syntax that gets parsed into some sort of intermediate format. The
part that is missing is where the intermediate format is then
translated into machine or byte code and actually executed. Since this
is just a configuration language, that is not necessary.
## Project overview
You can see the repository for
[forge](https://github.com/brettlangdon/forge) for current usage and
documentation.
Forge syntax is a file which is made up of _directives_. There are 3
kinds of _directives_:
* _settings_: Which are in the form `<KEY> = <VALUE>`
* _sections_: Which are used to group more _directives_ `<SECTION-NAME> { <DIRECTIVES> }`
* _includes_: Used to pull in settings from other forge config files `include <FILENAME/GLOB>`
Forge also supports various types of _setting_ values:
* _string_: `key = "some value";`
* _bool_: `key = true;`
* _integer_: `key = 5;`
* _float_: `key = 5.5;`
* _null_: `key = null;`
* _reference_: `key = some_section.key;`
Most of these setting types are probably fairly self explanatory
except for _reference_. A _reference_ in forge is a way to have the
value of one _setting_ be a pointer to another _setting_. For example:
```config
global = "value";
some_section {
key = "some_section.value";
global_ref = global;
local_ref = .key;
ref_key = ref_section.ref_key;
}
ref_section {
ref_key = "hello";
}
```
In this example we see 3 examples of _references_. A _reference_ value
is one which is an identifier (`global`) possibly multiple identifiers separated
with a period (`ref_section.ref_key`) as well _references_ can begin
with a perod (`.key`). Every _reference_ which is not prefixed with a period
is resolved from the global section (most outer level). So in this
example a _reference_ to `global` will point to the value of
`"value"` and `ref_section.ref_key` will point to the value of
`"hello"`. A _local reference_ is one which is prefixed with a period,
those are resolved starting from the current section that the
_setting_ is defined in. So in this case, `local_ref` will point to
the value of `"some_section.value"`.
That is a rough idea of how forge files are defined, so lets see a
quick example of how you can use it from go.
```go
package main
import (
"github.com/brettlangdon/forge"
)
func main() {
settings, _ := forge.ParseFile("example.cfg")
if settings.Exists("global") {
value, _ := settings.GetString("global");
fmt.Println(value);
}
settings.SetString("new_key", "new_value");
settingsMap := settings.ToMap();
fmt.Println(settingsMaps["new_key"]);
jsonBytes, _ := settings.ToJSON();
fmt.Println(string(jsonBytes));
}
```
## How it works
Lets dive in and take a quick look at the parts that make forge
capable of working.
**Example config file:**
```config
# Top comment
global = "value";
section {
a_float = 50.67;
sub_section {
a_null = null;
a_bool = true;
a_reference = section.a_float; # Gets replaced with `50.67`
}
}
```
Basically what forge does is take a configuration file in defined
format and parses it into what is essentially a `map[string]interface{}`.
The code itself is comprised of two main parts, the tokenizer (or scanner) and the
parser. The tokenizer turns the raw source code (like above) into a stream of tokens. If
you printed the token representation of the code above, it could look like:
```
(COMMENT, "Top comment")
(IDENTIFIER, "global")
(EQUAL, "=")
(STRING, "value")
(SEMICOLON, ";"
(IDENTIFIER, "section")
(LBRACKET, "{")
(IDENTIFIER, "a_float")
(EQUAL, "=")
(FLOAT, "50.67")
(SEMICOLON, ";")
....
```
Then the parser takes in this stream of tokens and tries to parse them based on some known
grammar. For example, a directive is in the form
`<IDENTIFIER> <EQUAL> <VALUE> <SEMICOLON>` (where `<VALUE>` can be
`<STRING>`, `<BOOL>`, `<INTEGER>`, `<FLOAT>`, `<NULL>`,
`<REFERENCE>`). When the parser sees `<IDENTIFIER>` it'll look ahead
to the next token to try and match it to this rule, if it matches then
it knows to add this setting to the internal `map[string]interface{}`
for that identifier. If it doesn't match anything then it has a syntax
error and will throw an exception.
The part that I think is interesting is that I opted to just write the
tokenizer and parser by hand rather than using a library that converts
a language grammar into a tokenizer (like flex/bison). I have done
this before and was inspired to do so after learning that that is how
the go programming language is written, you can see here
[parser.go](https://github.com/golang/go/blob/258bf65d8b157bfe311ce70c93dd854022a25c9d/src/go/parser/parser.go)
(not a light read at 2500 lines). The
[scanner.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/scanner.go)
and
[parser.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/parser.go)
might proof to be slightly easier reads for those who are interested.
## Conclusion
There is just a brief overview of the project and just a slight dip
into the inner workings of it. I am extremely interested in continuing
to learn as much as I can about programming languages and
parsers/compilers. I am going to put together a series of blog posts
that walk through what I have learned so far and which might help
guide the reader through creating something similar to forge.
Enjoy.

Loading…
Cancel
Save