Colin's Journal: A place for thoughts about politics, software, and daily life.
On Monday I read this brief article regarding Object Prevalence [slashdot.org], but I’ve only just got around to writing down some of my thoughts on the subject. Object Prevalence is essentially just a mechanism for object persistence, with the added bonus of using a log to try and ensure some level of recoverability in the event of abnormal shutdown.
The comments to the article include some good points regarding the problems with this approach. It obviously can not scale to large enterprise applications like those seen in telecommunications and other such industries. It does not provide atomicity and transactions, limits the ability to use 3rd party reporting products, and restricts the ability to perform ad-hoc queries of your data. All of this is well and good, but it’s also rather obvious. There is a bigger issue with using the Object Prevalence approach, even when the size of your dataset, nature of transactions, and reporting requirements would otherwise lead you to thinking it’s a suitable solution.
The problem is one of data accessibility. An application is ultimately only a tool, it’s the data that you care about. The data is the thing that differentiates your installation from all others, it’s the data that you are dealing with that contains the value of what you are doing. Object Prevalence though locks your data into the implementation of the application, it’s only this particular application that can load and makes sense of the data that you have. This is in turn causes two other problems, one of integration and one of application lock-in.
The issue of integration is potentially the most serious. If you want any other application to be able to access that data then you need to interface at the object level, or the application that hosts the data needs to be modified to be able to extract the data in another, non-application specific, form. Either route will almost always require modification of your Object Prevalence based application, especially if you want the data sharing to be real time. Often integration of systems is a two way exercise, you want to be able to send data the other way as well, which in turn means your Object Prevalence application must also support an import mechanism or some object level API to specifically allow this kind of integration. By the time you have made the changes to your application to support these interfaces you have done much of the work that you were trying to avoid by using Object Prevalence in the first place.
Most integration between applications happens at a very crude level. Often it’s just a matter of reading data out of this database and writing it into a flat file, and conversely loading out of this file and putting it into that table. The reason that these integrations can be done cheaply and easily is because the storage mechanism where the data sits is widely understood, and completely application independent. Transferring data between spreadsheets and databases often is done using files in a CSV (Comma Separated Values) format, one of the crudest but simplest ways of formatting data around.
The other draw back, that of application lock-in, has the same root cause, but occurs when it’s time to move to a different application or system. Any team looking to migrate data from an old application to a new one wants the old data in a database or nice simple, application independent, format. If the data only exists as live data object written out using Object Prevalence, your estimated cost of migration has just increased significantly.
Ultimately, if your data matters to you, you have to understand how it’s stored. A relational database is preferable, but an easy to read file format (XML, or at a push fixed field format) is fine as well. Anything that is stored in it’s own, application dependent, format should be avoided unless the application has proven support for exporting and importing in an application independent format. It should always be possible to get your data out of an application specific format, but the opportunity cost of simple integration, as well as the actual cost of eventual migration should be thought on heavily before going in that direction.
The full list of my published Software
Email: colin at owlfish.com