In-Law Agile Data Storage and Retrieval

Overview

Applications typically store data in one of three ways:

RDBMS
XML
Proprietary format

Each of these methods requires significant up-front analysis in the creation of a schema or data definition. This schema must be adhered to throughout development and, once deployed, cannot be significantly changed without potentially difficult technical and political analysis.

Even for small applications, or applications that do not manipulate large or complex amounts of data, schema changes can be significant. Often, coding is required to support different versions of the schema, or conversion tools are required.

Even if your data is stored as a serialized object graph, you still have to be careful when changing your objects; their interfaces represent a schema.

What In-Law does is to eliminate the concept of a formalized static schema. In much the same way dynamic languages like Perl or Ruby do not require formalized data structures, In-Law allows complete run-time flexibilty with data storage. It does this by classifying every piece of data as an "Item", uniquely identifable by it's type and name. These items can have any number of relations to other items. Relations are simply identified by name as well.

Basic Concepts

An item represents something you wish to store information about. A noun, object, or "thing". It has only two pieces of meta-data, a "type", which provides a loose categorization (or a tight one, it doesn't matter), and a "name" that uniquely identifies this item in the context of its type.

Everything else that you may want to describe about an item is done by relating that item to other items.

A relationship is a uni-directional mapping from one item to another (or one item to itself). Every relationship has a source and a destination item, and every relationship has a name.

For example, suppose you are storing digital photos and wish to keep track of the camera you used and the subject of the photo. Each photo would be an item, of type "photo". Each subject of a photo would be an item of type "subject" and each camera would be an item of type "camera". You would then have a relation called "taken with" that relates pictures to cameras. You would also have a relation called "is of" that relates photos to subjects.

No Schema: Flexibility

The photo domain described above isn't terribly complex. Modelling it in an RDMS would be straightfowrad. Even XML would be workable for a domain this simple. Certainly, the dynamic schema saves some time starting up the project, becaues the analysis phase of how to store data is eliminated, but the true power of the dynamic schema is shown when you need to make changes.

Suppose your photo organizing software needs ot store some more meta-data about cameras, such as the brand, the type of film and the resolution. In a traditional application you now have to modify your schema and make determinations about "nullable" and cardinality of relations. For an RDMS solution, you will have to add three new tables to store the brands, film types, and possible resolutions, populate them with some expected values, and then add foreign keys to your "camera" table. Only then, can you being to code the features that will store and retrieve this data.

With In-Law, you simply start coding the new features. You don't need to describe what makes a "camera brand", nor do you need to determine what types of film are avaialble. You simply decide that you'll have three new types of items called "film", "resolution" and "brand". Your relationships can simply be created as you work.

But, let's take this a step further. Suppose you have pictures you took with a traditional film camera. The concept of "megapixels" is meaningless in the context of a film camera. Now, your database design is messed up. Sure, there are many ways to deal with this (you can make the relation to resolution nullable, or you can create a new table for film cameras), but the point is, you have to spend some time determining what to do, and not because of some new revelation about the domain of your application but because of artifical constraints of the underlying data storage mechanism.

Storing Data

In-Law currently has a programmatic interface that can be accessed with a simple command-line tool. These commands can be used to demonstrate how to create and use an In-Law databae.

Data is stored by simply declaring information:

picture "My Cat Rudy" "was taken with" camera "Canon SD550"

This creates two items: a "picture" named "My Cat Rudy" and a "camera" named "Canon SD550". They are related by the relation "was taken with".

picture "My Cat Rudy" "is of" cat "Rudy"
picture "My Cat Rudy" "is of" window "Living Room"

Here, we create only two new items, a "cat" named "Rudy" and a "window" named "Living Room". The picture "My Cat Rudy" was created previously. These two statements, in addition to creating the two new items, create relationships between the picture "My Cat Rudy" and the new items. Note that we've given the same relationship name to two different types of objects. This is a powerful feature of In-Law; relationships mean whatever you need them to mean. You don't have to find some generalized "superclass" of "cat" and "windows" and configure your database to allow relations between that and a picture. You can even do:

painting "Mona Lisa" "is of" person "Mona Lisa"

Again, we don't need to create some superfluous "ArtisticCreation" superclass that is allowed to relate to people. We simply declare what the relation is between two items; just as we do in the real world.

Getting Data Back Out

We can now query any part of the database. We can get all items that relate via "is of". We can get only such items that relate to a cat. We can get only paintings. Anything is possible.

In many ways, an In-Law database is actually more explicit than an RDBMS. In In-Law, you can see directly the relationships and meta-data about all objects being stored. In an RDBMS, you must deduce that a column like "camera_id" in the "picture" table means that a picture was taken with the camera identified by that integer.

Significant effort is spent deducing this information to show data in an intuitive format.

Binary Data

This needs to be addressed.

Stuff In-Law Might Make Possible

Rapid Prototyping

In-Law could be used as a rapid prototyping tool, allowing a developer to completely bypass an RDMS when trying out ideas or refining a user interface. At a certain point in development, an analysis of the In-Law database could be done to produce an actual SQL-based schema, which could be used in a post-prototype phase.

What else?

Questions someone might ask frequently

Why not just use hashtables?

In-Law may seem like a fancy wrapper around "magic strings" or "hashtable programming". While it's true that much of the analysis of the database's contents is done by string comparisons, an In-Law database is much more sopisitcated and useful than a serialized hashtable.

While In-Law doesn't require you to create and design a schema ahead of time, this doesn't mean that your database cannot be described by a schema. We can analyze a database and determine the schema or "meta model". In our camera domain, we can say that pictures are "of" cats and windows and that paintings are "of" people. pictures are "taken with" cameras.

Won't the performance suck?

For storing a million bits of information, In-Law might not be the best method of handling data. But for a lot of application domains that do not manipulate large amounts of data, In-Law is perfect. Consider a desktop application to manage your finances. The "file format" can be an In-Law database. It will be forward compatible and backward compatible, and, since a "file" for this type of application stores a relatively small amout of data, the advantages in flexibility of data modeling far outweight potential performance problems.