String Token Parser

Published: May 20, 2023

About

String Token Parser (STP) is a Lua module for parsing tokens in text strings. It was created primarily as a way to extract documentation data from source code files, but it can easily be used outside of that context.

Usage

STP exposes one function, init. When calling init, you can pass a table that contains a set of variables that control how STP operates. The table must contain, at a minimum, a variable that specifies the content to be parsed. All other variables are optional. The following list contains all currently supported configuration variables:

file	A string containing a path to a file to be parsed.
text	A string containing text to parsed.
comment_patterns	A table containing valid comment patterns.
replace_comment_patterns	A boolean that, when set to true, will replace all default comment patterns with the ones specified in the comment_patterns variable.
token_pattern	A string containing a custom token pattern.

Below is an example of basic STP usage:

-- load stp module
local stp = require("stp")

-- load kikito's inspect module
local inspect = require("inspect")

-- example string containing basic token layout
local str = [[
-- @token
-- @author Kenny Shields
-- @description This is my test token
-- @end]]

-- send the string to stp to get a table containing all parsed tokens
local tokens = stp.init({
    text = str
})

-- list all tokens
print(inspect(tokens))

The above code will produce the following output:

{
    token = { {
        author = { {
            _default = "Kenny Shields"
        } },
        description = { {
            _default = "This is my test token"
        } }
    } }
}

As shown in the output above, STP will store tokens it parses in a tree-like format. In the example above, the @file token is a top-level token, under which all future tokens will be stored until it is closed with the @end token. Any text that comes after a token will be collected and stored in the _default sub-table.

Named Strings

In addition to the default text collection method, STP also supports named strings, a feature that allows certain strings of text to be associated with a name. Below is a demonstration of how this feature works:

-- @token
-- @text This is a string of text with no name.
-- This is another string of text with no name.
-- [named_string:This is a string of text with a name]
-- @end

When STP parses the content shown above, it will produce the following output:

token = { {
    text = { {
        _default = "This is a string of text with no name. This is another string of text with no name.",
        named_string = "This is a string of text with a name"
    } }
} }

As you can see, STP detected both the standard token text as well as the named text, and stored theme seperately. Named text strings are stored in sub-tables (like the _default table) with the detected name as the name of the table.

To add a named string to a token, simply use the format shown above. Also note that named strings will not be parsed correctly if they contain newlines.

Comment Patterns

The comment_patterns configuration variable is used to tell STP what patterns indicate valid comments. Since STP only gathers tokens in comments, it is critical that the comment patterns used in the subject file or text string are provided to STP via this variable.

In the comment_patterns variable, two sub-tables can be specified – one containing valid single-line comment patterns, and one containing valid multi-line/block comment patterns. Below is an example of how to specify comment patterns:

local collected = stp.init({
    file = "path/to/file",
    comment_patterns = {
        multiline = {
            ["/*"] = "*/"
        },
        single = {
            "//",
            "#"
        }
    }
})

By default, STP uses Lua’s single and multiline comment patterns when checking for tokens. You can override these by setting the replace_comment_patterns configuration variable to true.

License