Browse Source

finish forge blog post

pull/1/head
Brett Langdon 11 years ago
parent
commit
905c4b9edb
1 changed files with 141 additions and 33 deletions
  1. +141
    -33
      contents/writing/about/forge-configuration-parser/index.md

+ 141
- 33
contents/writing/about/forge-configuration-parser/index.md View File

@ -1,33 +1,122 @@
---
title: Forge configuration parser
author: Brett Langdon
date: 2015-06-19
date: 2015-06-27
template: article.jade
---
An overview of how I wrote a configuration file format and parser.
---
## TODO: Fix this starting paragraph
Ok, job is great, lets talk about code! I have always had the aspiration of writing my own
programming language. It is something I never went through as part of my "higher"
education and it has always greatly interested me. I have always made attempts here and
there to learn what I can about how they work and try to deconstruct existing languages or
build what I can of new ones. I have never been very successful, but I am always learning.
I finally feel that I am making a bit of progress in my learning. Recently I was working
on a project in [go](http://golang.org/) and where I started was trying to determine what
configuration language I wanted to use and whether I tested out
[YAML](https://en.wikipedia.org/wiki/YAML) or [JSON](https://en.wikipedia.org/wiki/JSON)
or [ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt right. What I really
wanted was a format similar to [nginx]() but I couldn't find any existing packages for go
which supported this syntax. A-ha, I smell an opportunity. The project I started is
[forge](https://github.com/brettlangdon/forge). Currently it is still nothing too
impressive, but I feel like it marks a step in my quest for creating a programming
language.
Recently I have finished the initial work on a project,
[forge](https://github.com/brettlangdon/forge), which is a
configuration file syntax and parser written in go. Recently I was working
on a project where I was trying to determine what configuration
language I wanted to use and whether I tested out
[YAML](https://en.wikipedia.org/wiki/YAML) or
[JSON](https://en.wikipedia.org/wiki/JSON) or
[ini](https://en.wikipedia.org/wiki/INI_file), nothing really felt
right. What I really wanted was a format similar to
[nginx](http://wiki.nginx.org/FullExample)
but I couldn't find any existing packages for go which supported this
syntax. A-ha, I smell an opportunity.
I have always been interested by programming languages, by their
design and implementation. I have always wanted to write my own
programming language, but since I have never had any formal education
around the subject I have always gone about it on my own. I bring it
up because this project has some similarities. You have a defined
syntax that gets parsed into some sort of intermediate format. The
part that is missing is where the intermediate format is then
translated into machine or byte code and actually executed. Since this
is just a configuration language, that is not necessary.
## Project overview
You can see the repository for
[forge](https://github.com/brettlangdon/forge) for current usage and
documentation.
Forge syntax is a file which is made up of _directives_. There are 3
kinds of _directives_:
* _settings_: Which are in the form `<KEY> = <VALUE>`
* _sections_: Which are used to group more _directives_ `<SECTION-NAME> { <DIRECTIVES> }`
* _includes_: Used to pull in settings from other forge config files `include <FILENAME/GLOB>`
Forge also supports various types of _setting_ values:
* _string_: `key = "some value";`
* _bool_: `key = true;`
* _integer_: `key = 5;`
* _float_: `key = 5.5;`
* _null_: `key = null;`
* _reference_: `key = some_section.key;`
Most of these setting types are probably fairly self explanatory
except for _reference_. A _reference_ in forge is a way to have the
value of one _setting_ be a pointer to another _setting_. For example:
```config
global = "value";
some_section {
key = "some_section.value";
global_ref = global;
local_ref = .key;
ref_key = ref_section.ref_key;
}
ref_section {
ref_key = "hello";
}
```
In this example we see 3 examples of _references_. A _reference_ value
is one which is an identifier (`global`) possibly multiple identifiers separated
with a period (`ref_section.ref_key`) as well _references_ can begin
with a perod (`.key`). Every _reference_ which is not prefixed with a period
is resolved from the global section (most outer level). So in this
example a _reference_ to `global` will point to the value of
`"value"` and `ref_section.ref_key` will point to the value of
`"hello"`. A _local reference_ is one which is prefixed with a period,
those are resolved starting from the current section that the
_setting_ is defined in. So in this case, `local_ref` will point to
the value of `"some_section.value"`.
That is a rough idea of how forge files are defined, so lets see a
quick example of how you can use it from go.
```go
package main
import (
"github.com/brettlangdon/forge"
)
func main() {
settings, _ := forge.ParseFile("example.cfg")
if settings.Exists("global") {
value, _ := settings.GetString("global");
fmt.Println(value);
}
settings.SetString("new_key", "new_value");
settingsMap := settings.ToMap();
fmt.Println(settingsMaps["new_key"]);
jsonBytes, _ := settings.ToJSON();
fmt.Println(string(jsonBytes));
}
```
## How it works
Lets dive in and take a quick look at the parts that make forge
capable of working.
**Example config file:**
```cfg
```config
# Top comment
global = "value";
section {
@ -40,14 +129,8 @@ section {
}
```
Effectively what I wrote was a programming language. A very very simple one, but one none
the less a programming language. Basically what this library does is take a configuration
file in the specific format and parses it into an intermediate format. In this case it
ends up being a `map[string]interface{}` but if it were a programming language it could
end up being an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree)
(AST) which is the intermediate representation that some languages use before compiling
the source into machine code or [bytecode](https://en.wikipedia.org/wiki/Bytecode).
Basically what forge does is take a configuration file in defined
format and parses it into what is essentially a `map[string]interface{}`.
The code itself is comprised of two main parts, the tokenizer (or scanner) and the
parser. The tokenizer turns the raw source code (like above) into a stream of tokens. If
you printed the token representation of the code above, it could look like:
@ -68,9 +151,34 @@ you printed the token representation of the code above, it could look like:
```
Then the parser takes in this stream of tokens and tries to parse them based on some known
grammar. For example, a directive is in the form `<IDENTIFIER> <EQUAL> <VALUE>
<SEMICOLON>` (where `<VALUE>` can be `<STRING>`, `<BOOL>`, `<INTEGER>`, `<FLOAT>`,
`<NULL>`, `<REFERENCE>`). When the parser sees `<IDENTIFIER>` it'll look ahead to the next
token to try and match it to this rule, if it matches then it knows to add this setting to
the internal `map[string]interface{}` for that identifier. If it doesn't match anything
then it has a syntax error and will throw an exception.
grammar. For example, a directive is in the form
`<IDENTIFIER> <EQUAL> <VALUE> <SEMICOLON>` (where `<VALUE>` can be
`<STRING>`, `<BOOL>`, `<INTEGER>`, `<FLOAT>`, `<NULL>`,
`<REFERENCE>`). When the parser sees `<IDENTIFIER>` it'll look ahead
to the next token to try and match it to this rule, if it matches then
it knows to add this setting to the internal `map[string]interface{}`
for that identifier. If it doesn't match anything then it has a syntax
error and will throw an exception.
The part that I think is interesting is that I opted to just write the
tokenizer and parser by hand rather than using a library that converts
a language grammar into a tokenizer (like flex/bison). I have done
this before and was inspired to do so after learning that that is how
the go programming language is written, you can see here
[parser.go](https://github.com/golang/go/blob/258bf65d8b157bfe311ce70c93dd854022a25c9d/src/go/parser/parser.go)
(not a light read at 2500 lines). The
[scanner.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/scanner.go)
and
[parser.go](https://github.com/brettlangdon/forge/blob/1c8c6f315b078622b7264b702b76c6407ec0f264/parser.go)
might proof to be slightly easier reads for those who are interested.
## Conclusion
There is just a brief overview of the project and just a slight dip
into the inner workings of it. I am extremely interested in continuing
to learn as much as I can about programming languages and
parsers/compilers. I am going to put together a series of blog posts
that walk through what I have learned so far and which might help
guide the reader through creating something similar to forge.
Enjoy.

Loading…
Cancel
Save