Writing a Type System

Types are cool. Type systems are cool. That said, how they work is weird and scary. Things like object and type and such are weird and a little terrifying. “Metaclass” should send shivers down your spine. So lets have some fun and pretend we’re writing python in python, and now we need to implement the object system.

Let’s start with our “object” class, which in this case is very simple.

obj = dict()  # this is a class, not an instance of a class

This seems strange and impossibly simple at first. However, in python, classes are really just namespaces, since Namespaces are one honking great idea -- let's do more of those!. A namespace, in this case, really just being a collection of names mapped to objects. A dictionary is incredibly good at this. In fact, if your object does not take advantage of __slots__, a python object is really just a dictionary with some additional attachments.

Next, we need to define our “api” for our classes. Python uses methods with leading and trailing underscored (__method_name__), but for brevity’s sake, I’ll use a single leading underscore. So, _init will the the initializer, _str will be our toString/__str__ function, and so on.

def default_str(instance):
    return "<instance of " + instance["_name"] + ">"

def initialize(instance):
    # the default initializer doesn't need to do anything
    pass

obj["_init"] = initialize
obj["_str"] = default_str
obj["_cls"] = "obj"

Finally, a class needs to keep track of its parent for the purposes of inheritance. Its possible to also support multiple inheritance, and I’ll tackle that at the end. For now though, just single inheritance.

obj["_parent"] = obj
obj["_name"] = "obj"

This is a bit strange, but all classes should have a parent, so object is its own parent. Strange family.

Now we can build out a few convenience methods for creating a class (this would be the analog of type in python) and for instantiating a class (the analog of the implicit parts of __new__ that you can’t change in python). But first, we need to do something special and a tad hacky.

Namely, because we are not using python’s built in object system, I can’t override things like __getattr__ to efficiently call methods stored on the class. Instead, we have to bind them to the individual instances of an object to be able to call methods with a familiar syntax (obj.method(args) instead of cls.method(obj, args)). Maybe I’ll eventually address that.

So we need a function to bind a method to an instance of a class.

def bind_method(instance, method):
    def bound_method(*args, **kwargs):
        return method(instance, *args, **kwargs)
    return bound_method

Astute observers will notice that this resembles a simpler version of functools.partial, and in fact you could use that instead. The downside of using this method is that there will be an instance of bound_method attached to every instance of a class, which is memory linear and less performant than what python actually does.

Now we can instantiate an object.

from types import FunctionType
# we need this for special casing methods

def new_instance(clazz, *args, **kwargs):
    new_instance = {}
    for data in clazz:
        if isinstance(clazz[data], FunctionType):
            # special case for functions as they need to be bound
            new_instance[data] = bind_method(new_instance, clazz[data])
        else:
            new_instance[data] = clazz[data]
    new_instance["_init"](*args, **kwargs)
    return new_instance

x = new_instance(obj)
assert x["_str"]() == "<instance of obj>"

Now we need some way to support inheritance. So lets define a method to declare a type. Because defining every class manually is awful and unfun.

def new_class(parent, **methods):
    new_cls = {val: parent[val] for val in parent}
    new_cls["_parent"] = parent
    for method in methods:
        new_cls[method] = methods[method]
    return new_cls

This code defines a function that will define a class. This is analogous to type as a metaclass. Note that this will overwrite functions and values on the parent class, which is exactly the functionality we want. We also want to test this to make sure everything is in good working order.

def init_person(instance, name, age):
    instance["name"] = name
    instance["age"] = age

def person_string(instance):
    return "<Hi my name is " + instance["name"] + ">"

def is_older(instance, other):
    return instance["age"] > other["age"]

Person = new_class(obj, _init=init_person, _str=person_string,
                   is_older=is_older, _name="Person")
john = new_instance(Person, "john", 35)
amy = new_instance(Person, "amy", 26)

assert john["age"] == 35
assert john["name"] == "john"
assert john["is_older"](amy)

This looks a bit different than the way that classes are normally instantiated, but let’s take a look at type. It is defined as class type(name, bases, dict). Which means that on occasion, people will define classes using this function.

MyClass = type("MyClass", (object, ), 
               {"__init__": lambda s, n: s.update(_name=n)})

These look somewhat similar, a name, a base class, and a set of functions are passed in and apparently a class comes out. In fact, when you define a class normally in python, this is actually what is being done in the background (because type is the default metaclass).

We still need to know whether or not the “inheritance” portion works. That’s kinda important.

Child = new_class(Person, 
                  is_teen=lambda inst: 20 > inst["age"] > 12,
                  _name="Child")

billy = new_instance(Child, "billy", 7)
assert not billy["is_teen"]()
assert billy["age"] == 7

It works!

So that’s single inheritance object oriented types in 20 lines of code. Pretty cool. But python supports multiple inheritance, and even java does now. Mixins are awesome. How do we support that? The change is surprisingly small, that said I’m going to do it differently than python and model it after the way java 8 supports mixins.

The change we need to make to our previous function is actually rather simple: instead of a single parent, we can have multiple parents. And unlike in python, the parents are unordered. This way, there’s no need to define a method resolution order. Additionally, we steal from java the rule that any time there is a method conflict, ie. multiple parents implement a single method, the child class must reimplment that method.

def new_class_multi(parents, **methods):
    new_cls = {}
    conflicts = []
    for parent in parents:
        for member in parent:
            if new_cls.get(member, None) is not None and isinstance(
                                        parent[member], FunctionType):
                conflicts.append(member)
            new_cls[member] = parent[member]
    for conflict in conflicts:
        if conflict not in methods:
            raise RuntimeError("Method conflict on " + conflict)

    for method in methods:
        new_cls[method] = methods[method]

    new_cls["_parent_class"] = {parent["_name"]: parent for parent in parents}
    return new_clsdef new_class_multi(parents, **methods):

This does exactly what we want. It supports multiple inheritence, we avoid method conflicts. All great things. Just one tiny issue: I now need to redefine _str on ever class I define that inherits from multiple base classes. That is annoying and antithetical to the goals of object oriented programming. It’s an easy fix though.

def new_class_multi_fixed(parents, **methods):
    new_cls = {}
    conflicts = []
    for parent in parents:
        for member in parent:
            # something is a conflict if:
            #   its name is being assigned over an existing value, because then
            #       it is a conflicting name
            #   the item is a function (ignore things like _name and _parents)
            #   the functions are not identical (memorywise) because then, in
            #       reality there is no conflict, its just that multiple classes
            #       are implementing the same function
            if all([new_cls.get(member, None) is not None,
                    isinstance(parent[member], FunctionType),
                    not new_cls.get(member) is parent[member]]):
                conflicts.append(member)
            new_cls[member] = parent[member]
    for conflict in conflicts:
        if conflict not in methods:
            raise RuntimeError("Method conflict on " + conflict)

    for method in methods:
        new_cls[method] = methods[method]

    new_cls["_parent_class"] = {parent["_name"]: parent for parent in parents}
    return new_cls

And there’s Java’s type system. Easy. Python’s is a bit more complex because of the builtin resolution of method order. Now just to be sure, here are some tests of the two new class multi methods

HTMLMixin = new_class(obj, render_as_html=lambda s: "<p>"+s["_str"]()+"</p>",
                      _name="HTMLMixin")
CallableMixin = new_class(obj, _call=lambda s: s, _name="Callable")
StrMixin = new_class(obj, _str=lambda s: "A different str function",
                     _name="Printable")

testval = False
try:
    FailClass = new_class_multi([HTMLMixin, CallableMixin], _name="Fails")
except:
    testval = True
assert testval

testval = False
try:
    FailClass2 = new_class_multi_fixed([HTMLMixin, 
                                        CallableMixin], _name="Fails")
except:
    testval = True
assert not testval

testval = False
try:
    FailClass3 = new_class_multi_fixed([HTMLMixin, CallableMixin, 
                                        StrMixin], _name="Fails")
except:
    testval = True
assert testval

That’s a wrap. No syntactic sugar, but a working multiple inheritance type system. Like always, Here’s the link to the runnable python version of this post, and here’s the link to the post source.