Q: A Data Language

March 4, 2014

-Q-
Q is a data language. For now, it is limited to a data definition language (DDL). Think "JSON/XML schema", but the correct way. Q comes with a dedicated type system for defining data and a theory, called information contracts, for interoperability with programming and data exchange languages.

Examples

Validating
Suppose we want to capture information about a medical diagnosis for some patient. At left, a typical digital document in JSON. At right, the corresponding Q schema.

Coercing
Data exchange languages, e.g. JSON, impose a very low data abstraction level: e.g. boolean, string & numbers. Q helps raising the level of discourse while navigating up and down in abstraction levels with information contracts:

Use cases

Q can be used in many places where data is involved. In particular, it can be used for:
?Enforcing data types towards more robust and secure (RESTful) web-services, configuration files, data exchanges, etc.
?Validating data input, e.g. screens and HTML forms.
?Coercing low-level to high-level types when digesting data, to palliate the fact that exchange languages have limited type systems (e.g. JSON has no builtin time type) and raise the level of abstraction in a safe and almost transparent way.
?Documenting data types and schemas, in RESTful resource definitions, NoSQL document databases, etc.
?Mapping data types in heterogeneous environments, towards a better and simpler interoperability between databases, various programming languages, data exchange languages, etc.

Implementations
Obviously the scenarios outlined in previous section require an implementation, or binding of Q, for the situation at hand. The following projects provide those bindings so far (contact us or fork this page on github to add your own binding to this list!):

? Qrb ? Q in Ruby (data coercing & validation)
? Qjs ? Q in Javascript (under development)

~Type System ~

Q's type system is different from those you can find in a programming language. The aim here is to capture information, not software behavior. Therefore, the definition of type differs. In Q, a type is a set of values, a subtype is a subset, a supertype is a superset. That's it.

However, the aim here is not to define yet another type system with a fixed set of available types such as boolean, string and integer, but rather to provide an abstract way of building information types and to 'connect' them to the types available in a host programming language, or data exchange language.

For this, a Q implementation has to define a representation function that maps, for each Q type, a type of the host language that will represent values of the information type. This representation function is host/implementation-specific; see the documentation of the binding you use.
Rep(QType) -> HostType

Builtin types

A builtin type starts with a dot followed by the name of an abstraction in the host language, a Ruby class for instance. The set of values captured by the Q type is defined the same set as the host abstraction. For instance,

.Integer # The set of values captured by the Integer class

To avoid builtins being spread everywhere, it is usual to define type aliases and build higher-level types with those aliases instead. This also provides better host-language independence and interoperability. For instance, the so-called default system in Qrb includes the following definitions:

Integer = .Integer
Nil = .NilClass

Sub types

Sub types are subsets of values. Q uses so-called 'specialization by constraint' to define sub types. E.g., the set of positive integers can be defined as follows:

Posint = Integer( i | i >= 0 )

Multiple constraints can be distinguished by name:

Evens = Integer( i | positive: i >= 0, even: i%2 == 0 )

All types can be sub-typed through constraints. In addition, Q uses structural type equivalence, which means that the type captured by the definition of Evens above is actually equivalent to the following one:

Evens = PosInt( i | i % 2 == 0 )

Union types

In some respect, union types are the dual of subtypes. They allow defining new types by generalization, through the union of the sets of values defined by other types. For instance, the missing Boolean type of Ruby is simply captured as:

Boolean = .TrueClass|.FalseClass

Union types are also very useful for capturing possibly missing information (aka NULL/nil). For instance, the following type will capture either an integer or nil:

MaybeInt = Integer|Nil

Seq types

Capturing sequences (aka arrays) of values is straightforward. Sequences are ordered and may contain duplicates:

Measures = [Posint]

Set types

Capturing sets of values is straightforward too. Set are unordered and may not contain duplicates:

Hobbies = {String}

Tuple types

Tuples capture information facts. Their 'structure' is called heading and is fixed and known in advance. All attributes are mandatory:

ProgrammingLanguage = { name: String, author: String, since: Date }

Relation types

Relations are sets of tuples, all of which have the same heading. The notation for defining relation types naturally follows:

Languages = {{ name: String, author: String, since: Date }}

Relation types and their syntax are first-class in Q, most notably because of the availability of relational algebra for them, unlike pure sets of tuples.

Note that relations do not allow duplicates and have no significant ordering of their tuples. If the ordering is significant, you should consider a sequence of tuples instead:

Preferences = [{ lang: String, reason: String }]

Abstract Data types

Abstract data types, also called user-defined types, provide the way to define higher level abstractions easily and to optionally connect them to types of the host language (e.g. a Ruby class). For instance, a Color abstraction can be defined as follows:

Color = <rgb> {r: Byte, g: Byte, b: Byte},
<hex> String( s | s =~ /^#[0-9a-f]{6}$/i )

The Color definition above shows that a color can be represented either by a RGB triple (through a tuple type), or by a hexadecimal string (e.g. #8a2be2). rgb and hex are called the information representations of the Color abstraction.

Binding an ADT to the host language

Defined as above, the type will behave as a union type, i.e. it will let pass valid RGB triples and hexadecimal strings. Now, representations can be complemented to connect the Color abstraction to a host language type, e.g. a Color Ruby class, and raise the level of discourse on the programming side. This amounts to providing one information contract per representation.

Suppose for example that the following Color class has been defined:

class Color

def initialize(r, g, b)
@r, @g, @b = r, g, b
end
attr_reader :r, :g, :b

end

Connecting our information ADT to this Color class can be done through a builtin type and two explicit converters, called the dresser and the undresser: (We only show the rgb case here; the hex one is defined in a similar way)

Color = .Color <rgb> {r: Byte, g: Byte, b: Byte}
\( tuple | Color.new(tuple.r, tuple.g, tuple.b) )
\( color | {r: color.r, g: color.g, b: color.b} )

The converters provide load/dump code to convert from information types to the code abstraction and vice versa, thereby complementing a representation with a so-called information contract. A binding of Q, e.g. Qrb, guarantees that the dresser will only be executed on valid representations of the corresponding information type. As the dresser tends to be exposed to an unsafe world, however, it should always be kept pure and safe (no side effects, no metaprogramming, no code evaluation, etc.).

Host ADT protocols

In order to keep Q schemas as clean as possible, implementations may provide conventions-over-configuration protocols for automating information contracts.

For instance, Qrb provides a more idiomatic way of connecting Ruby classes to information types. The information contracts may indeed be moved to the class itself, as one would probably do it, e.g. for testing purpose.

class Color

def initialize(r, g, b)
@r, @g, @b = r, g, b
end
attr_reader :r, :g, :b

def self.rgb(tuple)
Color.new(tuple[:r], tuple[:g], tuple[:b])
end

def to_rgb
{r: @r, g: @g, b: @b}
end

end

In Qrb, the following definition, that refers to the builtin type but has no dresser/undresser, makes the assumption that the convention is met and will use the Color.rgb(...) and Color#to_rgb methods:

Color = .Color <rgb> {r: Byte, g: Byte, b: Byte}

The mechanism described here for abstract data types is actually more general and applies to most of Q's work. The next section describes information contracts in more details.

~Information Contracts ~

Q tries very hard not to be yet another data language. In particular, it aims at integrating as smoothly as possible with existing technologies, in particular with programming languages and data exchange formats (e.g. JSON or YAML).

This interoperability is handled through so-called information contracts. In some respect, information contracts are the dual of axiomatic contracts, i.e. the dual of public behavioral APIs of software abstractions.

For a given software abstraction, say a Color:
?The axiomatic contract hides the internal representation in favor of a set of public behavioral methods to manipulate the abstraction (e.g. darkening and brightening the color),
?The information contract hides the internal representation in favor of a set of public information representations of the abstraction (e.g. a RGB triple, an hexadecimal string).

The data types involved in the definitions of the information contracts are called information types, e.g. {r: Byte, g: Byte, b: Byte} (a tuple type). Q provides a rich type system dedicated at capturing those data types precisely, mostly because type systems of mainstream programming languages fail at providing an good support for them.

Dressing & Undressing

Data Interoperability

Contracts in Action

Dressing and undressing generally applies recursively, e.g. when involving collection and abstract data types. This provides the real ability of Q to dress and undress complex data involving many information contracts and many abstractions.

Consider the following Q system, i.e. for dressing sequences of tuples having a name attribute restricted to simple words:

The concrete dressing result is implementation-dependent, as it involves the definition of the representation function Rep mentionned previously. The aim is not to define new host abstractions, e.g. classes, for every Q type defined in a system but rather to check that values conform to Q types and choose an idiomatic representation in the host language (see the parentheses). However, all those information contracts are actually involved in the dressing process and provide as many places to validate and coerce data in practice.

Source

Sign In

Q: A Data Language

Question

Turk.

Link to comment

Share on other sites

0 answers to this question

Recommended Posts

Recently Browsing 0 members

Similar Content

T-Mobile could be profiling your personal data, here's how you can opt out of it

Google reportedly let OpenAI transcribe a million hours of YouTube videos to train GPT-4

Meta kills data tool that helped researchers uncover shady stuff on Facebook, Instagram

Google Chrome's incognito warning changes after $5 billion lawsuit

No X-Men games for Xbox players until at least 2036, huge data leak suggests

Company

Community

Social

Partners

Forums

News

Features

More

Themes