lib-laddertypes/README.md

136 lines
3.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# lib-laddertypes
Rust Implementation of Ladder-Types (parsing, unification, rewriting, etc)
<hr/>
## Ladder Types
### Motivation
In order to implement complex datastructures and algorithms, usually
many layers of abstraction are built ontop of each other.
Consequently higher-level data types are encoded into lower-level data
types, forming a chain of embeddings from concept to `rock bottom' of
byte streams. While a high-level type makes claims about the
semantics of objects of that type, high-level types are ambiguous in
regard to their concrete syntactical representation or memory
layout. However for compositions to be type-safe, compatibility of
concrete represenations must be ensured.
For example in the unix shell, many different tools & utilities
coexist. Depending on the application domain, each of them will
potentially make use of different representational forms for the same
abstract concepts. E.g. for the concept 'natural number', many
representations do exist, e.g. with variation over radices,
endianness, digit encoding etc.
Intuitively, *ladder types* provide a way to distinguish between
multiple *concrete representations* of the same *abstract / conceptual
type*, by capturing the *represented-as* of layered data formats in
the structure of type-terms. Formally, we introduce a new type
constructor, called the *ladder type*, written `T1 ~ T2`, where `T1`
and `T2` are types. The type-term `T1 ~ T2` then expresses the
abstract type of `T1` being represented in terms of the concrete type
`T2`, which can be read by "`T1` represented as `T2`".
#### Example
The following type describes a colon-separated sequence of timepoints,
each represented as unix-timestamp written as decimal number in
big-endian, encoded as UTF-8 string.
```
<Seq TimePoint
~<TimeSince UnixEpoch>
~<Duration Seconds>
~
~<PosInt 10 BigEndian>
~<Seq <Digit 10>~Char>>
~<SepSeq Char ':'>
~<Seq Char>
~UTF-8
~<Seq Byte>
```
An object that fits the format described by this type could look like
this:
```
1696093021:1696093039:1528324679:1539892301:1638141920:1688010253
```
### Syntax
In their core form, type-terms can be one of the following:
- (**Atomic Type**) | `SomeTypeName`
- (**Literal Integer**) | `0` | `1` | `2` | ...
- (**Literal Character**) | `'a'` | `'b'` | `'c'` | ...
- (**Literal String**) | `"abc"`
- (**Parameter Application**) | `<T1 T2>` given `T1` and `T2` are type-terms
- (**Ladder**) | `T1 ~ T2` given `T1` and `T2` are type-terms
Ontop of that, the following syntax-sugar is defined:
#### Complex Types
- `[ T ]` <===> `<Seq T>`
- `{ a:A b:B }` <===> `<Struct <"a" A> <"b" B>>`
- `a:A | b:B` <===> `<Enum <"a" A> <"b" B>>`
#### Function Types
- `A -> B` <===> `<Fn A B>`
#### Reference Types
- `*A` <===> `<Ptr A>`
- `&A` <===> `<ConstRef A>`
- `&!A` <===> `<MutRef A>`
### Equivalences
#### Currying
`<<A B> C>` <===> `<A B C>`
#### Ladder-Normal-Form
exhaustively apply `<A B~C>` ===> `<A B>~<A C>`
e.g. `[<Digit 10>]~[Char]~[Ascii]` is in **LNF**
#### Parameter-Normal-Form
exhaustively apply `<A B>~<A C>` ===> `<A B~C>`
e.g. `[<Digit 10>~Char~Ascii]` is in **PNF**
## How to use this crate
```rust
use laddertypes::*;
fn main() {
let mut dict = TypeDict::new();
let t1 = dict.parse("<A B~X C>").expect("couldnt parse typeterm");
let t2 = dict.parse("<<A B~X> C>").expect("couldnt parse typeterm");
assert_eq!( t1.clone().curry(), t2 );
assert_eq!( t1, t2.clone().decurry() );
}
```
## Roadmap
- [x] (Un-)Parsing
- [x] (De-)Currying
- [x] Unification
- [x] Ladder-Normal-Form
- [x] Parameter-Normal-Form
- [ ] (De)-Sugaring
- [ ] Seq
- [ ] Enum
- [ ] Struct
- [ ] References
- [ ] Function
## License
[GPLv3](COPYING)