Today I'll illustrate how discriminated unions in F# can contribute to reducing code size and complexity, compared to a typical object-oriented approach. I'll present a problem, model it in typical object-oriented C#, then in typical functional F#, and contrast both approaches.
The problem: evaluate expressions of the form 1 + 2 * 3. The code shall represent the expression tree, evaluate it, and print its value to the console.
The basic idea: these expressions have two types of sub-expressions:
1) literals, which value is simply themselves, i.e. an integer
2) operators, which are either addition or multiplication, and always have a left expression and a right expression. Their value is the addition or multiplication function applied to the value of their left and right expressions.
A typical C# version
C# is mainly an object-oriented language, so we'll use a class hierarchy to represent the different nodes of our tree. All expressions have a value:
Expressions can either be literals, whose value is simply an integer:
We’re now ready to test our little program. We’ll evaluate (2 + 3) * (4 + 5):
As expected, this prints 45. All right, so let’s make some quantitative measurements on this code:
A typical F# version
F# is mainly a functional language, so we’ll use discriminated unions and pattern matching rather than polymorphism. This approach is highly suited for representing and processing any kind of tree data structure as you shall see.
We’ll use a discriminated union to represent the operators; this is very similar to the enum we used in C#:
We’ll use a second discriminated union to represent the different expressions.
If you don’t know F#, this simply reads as “An Expression can either be a “Literal”, which is just an int, or an “Operation”, which is a 3-item Tuple containing an Expression, an Operator and another Expression.”
Finally, we’ll process expressions using pattern matching on our discriminated union types:
Testing the program is not very different from the C# version:
And this evaluates to 45. So let’s take some quantitative measurements here as well:
The big advantage F# has here is of course discriminated unions and pattern matching, but F# also avoids a lot of C#’s inherent syntactic noise in a number of other places. For example, creating a type with private fields and a constructor to assign these fields repeats the same information three times:
In comparison, F# gives us implicit constructors that do what we expect for discriminated unions. A few other things that make the F# code brief:
P.S. sorry about the weird indentation, the automatic beautifer used by Neowin messes it all up every time.
The problem: evaluate expressions of the form 1 + 2 * 3. The code shall represent the expression tree, evaluate it, and print its value to the console.
The basic idea: these expressions have two types of sub-expressions:
1) literals, which value is simply themselves, i.e. an integer
2) operators, which are either addition or multiplication, and always have a left expression and a right expression. Their value is the addition or multiplication function applied to the value of their left and right expressions.
A typical C# version
C# is mainly an object-oriented language, so we'll use a class hierarchy to represent the different nodes of our tree. All expressions have a value:
interface IExpression {
int Evaluate();
}
Expressions can either be literals, whose value is simply an integer:
class Literal : IExpression {
int value;
public Literal(int value) {
this.value = value;
}
public int Evaluate() {
return value;
}
}
Or they can be an operator and its operands, which we’ll call an “operation”. We need to represent the different types of operators: this could be achieved with further subclassing, but for brevity’s sake we’ll just use an enum.
enum Operator {
Add,
Multiply
}
class Operation : IExpression {
IExpression x;
IExpression y;
Operator op;
public Operation(IExpression x, Operator op, IExpression y) {
this.x = x;
this.y = y;
this.op = op;
}
public int Evaluate() {
var valueX = x.Evaluate();
var valueY = y.Evaluate();
switch (op) {
case Operator.Add:
return valueX + valueY;
case Operator.Multiply:
return valueX * valueY;
default:
throw new Exception("Unknown operand");
}
}
}
We’re now ready to test our little program. We’ll evaluate (2 + 3) * (4 + 5):
using System;
class Program {
static void Main() {
var expr = new Operation(
new Operation(new Literal(2), Operator.Add, new Literal(3)),
Operator.Multiply,
new Operation(new Literal(4), Operator.Add, new Literal(5)));
Console.WriteLine(expr.Evaluate());
}
}
As expected, this prints 45. All right, so let’s make some quantitative measurements on this code:
- About 55 lines of code excluding blank lines
- 1 interface
- 1 enum
- 2 classes excluding the “Program” class
- 2 constructors
- 2 public methods
- 4 private fields
A typical F# version
F# is mainly a functional language, so we’ll use discriminated unions and pattern matching rather than polymorphism. This approach is highly suited for representing and processing any kind of tree data structure as you shall see.
We’ll use a discriminated union to represent the operators; this is very similar to the enum we used in C#:
type Operator = | Add | Multiply
We’ll use a second discriminated union to represent the different expressions.
type Expression = | Literal of int | Operation of Expression * Operator * Expression
If you don’t know F#, this simply reads as “An Expression can either be a “Literal”, which is just an int, or an “Operation”, which is a 3-item Tuple containing an Expression, an Operator and another Expression.”
Finally, we’ll process expressions using pattern matching on our discriminated union types:
let rec Evaluate = function | Literal x -> x | Operation(x, op, y) -> let func = function | Add -> (+) | Multiply -> (*) func(op) (Evaluate x) (Evaluate y)If you’re not familiar with F#, the keyword “function” means this is a function that implicitly takes a parameter and matches it against one of the cases on line 2 and 3. If it’s a literal, the function evaluates to that literal’s value (x), which is obtained through pattern-matching. If it’s an Operation, we’ll use a helper function ("func") to map each operator to an actual function, call that helper on the operator we got and pass it the values of both operands. Again, the different components of the Operation (x, op, y) are obtained through pattern-matching.
Testing the program is not very different from the C# version:
let result = Evaluate(Operation( Operation(Literal(2), Add, Literal(3)), Multiply, Operation(Literal(4), Add, Literal(5)))) printf "%d" result
And this evaluates to 45. So let’s take some quantitative measurements here as well:
- 17 lines of code
- 2 discriminated unions
- 2 functions: 1 top level (“Evaluate”), 1 nested (“func”)
The big advantage F# has here is of course discriminated unions and pattern matching, but F# also avoids a lot of C#’s inherent syntactic noise in a number of other places. For example, creating a type with private fields and a constructor to assign these fields repeats the same information three times:
IExpression x;
IExpression y;
Operator op;
public Operation(IExpression x, Operator op, IExpression y) {
this.x = x;
this.y = y;
this.op = op;
}
In comparison, F# gives us implicit constructors that do what we expect for discriminated unions. A few other things that make the F# code brief:
- Being able to treat the operators + and * as regular functions
- Not having to write "new" to instantiate objects
- Not having to prefix union cases by their type as for C#’s enums
- Implicit typing everywhere
- Implicit argument for functions declared with "function"
- Compiler verifies that we check every case of a discriminated union, so no need for a "default" case
P.S. sorry about the weird indentation, the automatic beautifer used by Neowin messes it all up every time.







You got me thinking...I wanted to see what F# can do in a real-world scenario, so I re-wrote my IRC message parser in F#.
The parser takes a string and an exclude function, and returns a series of blocks which can be bold, underlined, have a foreground color and a background color. The spaces and punctuation are separated from the words, unless the full word matches the exclude function (this is used for smileys). The formatting tags are mIRC's, and they're not easily parseable (especially the color-related ones).
Here's the C# version: https://bitbucket.or...etroIrc/Parsing
Here's the F# version: http://pastebin.com/HW9XVUb5
I'm impressed - the F# version is both much shorter and faster. It uses only immutable objects apart from the splitPunctuation method, which looks horrible...
Another thing I noticed is that the first time the C# method is run, it's 500-1000x slower than the second time. F# also exhibits this behavior, but the "first time" factor is much smaller (<100x).
It's sad that Pastebin has a better F# syntax highlighter than Visual Studio, though.