Rust Implementation of Ladder-Types (parsing, unification, rewriting, etc)
Find a file
2024-08-05 00:11:37 +02:00
.github/workflows Create cargo-test.yml 2023-10-01 14:45:28 +00:00
src TypeID: add Copy trait 2024-08-05 00:11:37 +02:00
.gitignore add gitignore 2023-12-02 17:03:27 +01:00
Cargo.toml initial lexer 2023-10-01 13:11:29 +02:00
COPYING add README.md & COPYING 2023-10-01 16:44:17 +02:00
README.md readme: add syntax description and roadmap 2024-05-01 17:44:42 +02:00

lib-laddertypes

Rust Implementation of Ladder-Types (parsing, unification, rewriting, etc)


Ladder Types

Motivation

In order to implement complex datastructures and algorithms, usually many layers of abstraction are built ontop of each other. Consequently higher-level data types are encoded into lower-level data types, forming a chain of embeddings from concept to `rock bottom' of byte streams. While a high-level type makes claims about the semantics of objects of that type, high-level types are ambiguous in regard to their concrete syntactical representation or memory layout. However for compositions to be type-safe, compatibility of concrete represenations must be ensured.

For example in the unix shell, many different tools & utilities coexist. Depending on the application domain, each of them will potentially make use of different representational forms for the same abstract concepts. E.g. for the concept 'natural number', many representations do exist, e.g. with variation over radices, endianness, digit encoding etc.

Intuitively, ladder types provide a way to distinguish between multiple concrete representations of the same abstract / conceptual type, by capturing the represented-as of layered data formats in the structure of type-terms. Formally, we introduce a new type constructor, called the ladder type, written T1 ~ T2, where T1 and T2 are types. The type-term T1 ~ T2 then expresses the abstract type of T1 being represented in terms of the concrete type T2, which can be read by "T1 represented as T2".

Example

The following type describes a colon-separated sequence of timepoints, each represented as unix-timestamp written as decimal number in big-endian, encoded as UTF-8 string.

<Seq TimePoint
     ~<TimeSince UnixEpoch>
     ~<Duration Seconds>
     ~
     ~<PosInt 10 BigEndian>
     ~<Seq <Digit 10>~Char>>
~<SepSeq Char ':'>
~<Seq Char>
~UTF-8
~<Seq Byte>

An object that fits the format described by this type could look like this:

1696093021:1696093039:1528324679:1539892301:1638141920:1688010253

Syntax

In their core form, type-terms can be one of the following:

  • (Atomic Type) | SomeTypeName
  • (Literal Integer) | 0 | 1 | 2 | ...
  • (Literal Character) | 'a' | 'b' | 'c' | ...
  • (Literal String) | "abc"
  • (Parameter Application) | <T1 T2> given T1 and T2 are type-terms
  • (Ladder) | T1 ~ T2 given T1 and T2 are type-terms

Ontop of that, the following syntax-sugar is defined:

Complex Types

  • [ T ] <===> <Seq T>
  • { a:A b:B } <===> <Struct <"a" A> <"b" B>>
  • a:A | b:B <===> <Enum <"a" A> <"b" B>>

Function Types

  • A -> B <===> <Fn A B>

Reference Types

  • *A <===> <Ptr A>
  • &A <===> <ConstRef A>
  • &!A <===> <MutRef A>

Equivalences

Currying

<<A B> C> <===> <A B C>

Ladder-Normal-Form

exhaustively apply <A B~C> ===> <A B>~<A C>

e.g. [<Digit 10>]~[Char]~[Ascii] is in LNF

Parameter-Normal-Form

exhaustively apply <A B>~<A C> ===> <A B~C>

e.g. [<Digit 10>~Char~Ascii] is in PNF

How to use this crate

use laddertypes::*;

fn main() {
    let mut dict = TypeDict::new();

    let t1 = dict.parse("<A B~X C>").expect("couldnt parse typeterm");
    let t2 = dict.parse("<<A B~X> C>").expect("couldnt parse typeterm");

    assert_eq!( t1.clone().curry(), t2 );
    assert_eq!( t1, t2.clone().decurry() );
}

Roadmap

  • (Un-)Parsing
  • (De-)Currying
  • Unification
  • Ladder-Normal-Form
  • Parameter-Normal-Form
  • (De)-Sugaring
    • Seq
    • Enum
    • Struct
    • References
    • Function

License

GPLv3