Introduction

DyBASE is very simple object oriented embedded database for languages with dynamic type checking. In a dynamic type checking language the type of a class instance variable is not known at compile time. Moreover the same instance variable can be used to store integers and string and later be assigned reference to some other object. So it means that database could not keep information about object format in class descriptor and should store type for each instance variable. DyBASE provides APIs to the most popular scripting languages with OO extensions: PHP, Ruby, Python and Rebol.

DyBASE is easy to use and provide high performance. It is intended to be used in applications which needs to deal with persistent data in more sophisticated way than load/store object tree provided by standard serialization mechanism. Although DyBASE is very simple, it provides fault tolerant support (ACID transactions) and concurrent access to the database.

The main advantage of DyBASE is tight integration with programming language. There is no gap between database and application data models - DyBASE directly stores language objects. So there is no need in packing/unpacking code, which has to be written for traditional relational databases. Also DyBASE (unlike many other OODBMS) requires no special compiler or preprocessor. And still it is able to provide a high level of transparency.

Features

Lets now describe key features of DyBASE architecture.

Persistency by reachability

DyBASE is implementing persistency by reachability approach. Object of any class derived from Persistent base class is considered as persistent capable. It is automatically made persistent and stored in the storage when it is referenced from some other persistent object and store method of that object was invoked. So there is no need (but it is possible) to explicitly assign object to the storage.

The storage has one special root object. Root object is the only persistent object accessed in the special way (using getRootObject method). All other persistent objects are accessed in normal way: either by traversing by references from another persistent objects or using indices. Unlike many other OODBMS, there can be only one root in the storage. If you need to have several named roots, you should create Index object and use it as root object.

Which classes are persistent capable depends on the particular language API. In most of them they must be derived it from Persistent class. This makes impossible to store "foreign" classes in the storage. This is the cost of easy use of DyBASE and lack of any specialized preprocessors or compilers. In Python it is possible to make persistent arbitrary class, but it is still more convenient to derive it from Persistent.

DyBASE supports the following basic types:

TypeDescriptionSizeRelated C++ type
Booleanboolean type1bool
Integer32-bit integer type4int
Long integer64-bit integer type8long long
Real64-bit floating point type8double
StringString with counterNchar*
ArrayOne dimensional array with components of any of the described type--

Unfortunately it is not possible to detect if object is changed or not without saving old state of the object and performing field-by-field comparison with new state of the object. But overhead of such solution (both space and CPU) is very high. In DyBASE it is responsibility of programmer to save object in the storage. It can be done by Persistent.store or Persistent.modify methods.

Persistent.store method writes object in the storage as well as all objects referenced from this object which are not yet persistent. So if you create a tree of objects and assign reference to the root of this tree to some persistent object X, it is only necessary to invoke store() method in this object X. But then if you update one of the elements in this tree, you should invoke store() individually for each such element (X.store() will NOT now automatically save referenced objects).

Persistent.modify method mark object is modified but doesn't immediately write it to the storage. All objects marked as modified will be stored to the storage during transaction commit (store method will be invoked for each modified object). So using modify method is preferable if object is updated multiple times within transaction. In this case instead of storing it several times, it will be stored only once at the end.

Semitransparent object loading

DyBASE is not using any special compiler or preprocessor. And only very few languages provide runtime behavior reflection (changing behavior of object at runtime), it is not possible to implement completely transparent persistence (when there are no differences between access to transient and persistent objects). Instead DyBASE supports transparent behavior in most cases with some exceptions.

When object is loaded from the storage, DyBASE will also try to recursively load all objects referenced from this object. So as a result all cluster of referenced object will be loaded, references between then will be correctly set and so programmer can access these object and traverse from one object to another without explicit checks whether object is loaded or not.

What is bad with this approach? If all objects in the storage are accessible from the rot by references (no indices are used), then loading root object will cause loading of all objects from the database to the memory. If the number of object is very large, it can take a significant amount of time and cause exhaustion of memory in application. In DyBASE it is possible to stop recursive loading for particular objects. This is done either by redefinition of recursiveLoading method and returning false, either (in Python API) by setting __nonrecursive__ attribute). Objects referenced from the objects with disables recursive loading should be explicitly loaded using Persistent.load method. Before object is loaded, you should not access any of its components.

Also it is important to notice that indices always load member objects on demand (i.e. Dybase does not perform automatic loading of all objects in the containers). Since access to the container members is always performed through methods of the container class, programmer will never notice it - container methods will always return loaded object(s).

Sometimes not all fields of persistent object need to be saved in the storage. Some of them are transient. DyBASE use the following convention to distinguish persistent and transient fields: all fields which name ends with "_" are considered to be transient and are not saved in the storage. You can initialize such fields when object is loaded from the storage in onLoad() method which is invoked for any persistent object after loading it from the storage (i.e. after restoring values of all persistent fields).

So summarizing all above, proposed mechanism is convenient and easy to use because it doesn't require programmer to explicitly load any referenced object and from another point of view it is flexible above by providing programmer control on object loading. Only those classes (usually containers) which explicitly control loading of their members (by overloading recursiveLoading to return false value) should be aware calling Persistent.load method.

Delegators

Some programming languages are providing delegation mechanism: possibility to redirect method to other object. Such mechanism can be used to build convenient object database API. Instead of recursive loading of all objects cluster we can create delegators which will load objects on demand. No explicit invocations of load method and overloading recursiveLoading to stop recursion are needed.

Unfortunately it is still not possible to detect modification of the object. Although some languages make it possible to redefine object attribute setter method, it will not help to detect modification of self components inside instance method. So explicit invocation of modify or store methods is needed.

Although delegator mechanism provides more convenient and transparent API, it has its own drawbacks:

  1. It is not possible to check object class: it will return class of delegator and not class of target object itself
  2. As delegator and target objects are two different objects, comparison of objects references should be done with special care.
  3. Delegators adds extra space and processor overhead: instead of one object we have two and each method invocation of field access requires extra delegator method call.

This is why by default delegators support is switch off in all APIs.

Indices

Usually objects are accessed by traversing from one object to another using references or links. But it is frequently needed to locate object by key. Almost all languages provides classes like hashtable or associative array. In databases usually more sophisticated search is required. I do not want to implement in DyBASE complete SQL language, because it immediately makes DBMS huge and slow. But, in most cases, an application is performing only very simple queries using exact match or key range. This is done in DyBASE by Index class which make it possible to associate persistent object with key of any supported scalar or string type.

Indices are created in DyBASE using Storage.createXXXIndex method, where XXX specifies type of the key. You can place in the index only keys with the same type as specified at index creation time (so it is not possible to create index of strings and place integer key in it). Index is implemented using B+Tree algorithm, because B+Tree is the most efficient structure for disk based databases. Methods of Index class allows to add, remove and locate objects by key. It is possible to perform search either specifying exact key value either specifying range of key values (high or low boundary or both of them can be skipped or can be declared as been exclusive or inclusive). So it is possible to perform the following types of search:

  1. key is equals to VAL
  2. key belongs to [MIN_VAL, MAX_VAL]
  3. key belongs to [MIN_VAL, MAX_VAL)
  4. key belongs to (MIN_VAL, MAX_VAL]
  5. key belongs to (MIN_VAL, MAX_VAL)
  6. key is greater than MIN_VAL
  7. key is greater or equals to MIN_VAL
  8. key is less than than MAX_VAL
  9. key is less than or equals to MAX_VAL

Transaction model

DyBASE preserve consistency of the data in case of system or application failure. Transaction mechanism is used to implement all-or-nothing database update. Transaction in DyBASE are started implicitly when update operation is first time performed and finished explicitly by commit, rollback or close methods.

Commit of transaction cause synchronous write of changed pages to the disk. Synchronous write (application is blocked until data is really flushed to the disk) is very expensive operation. Average positioning time for modern disks is about 10ms, so them are usually not able to perform in one second more than 100 writes in random places. As far as transaction usually consists of several updated pages, it leads to average performance about 10 transaction commits per second.

Performance can be greatly increased if you minimize number of commits (larger transactions). DyBASE is using shadow mechanism for transaction implementation. When object is changed first time during transaction, shadow of the object is created and original object is kept unchanged. If object is updated multiple times during transaction, shadow is create only once. Because of using shadows, DyBASE does not need a transaction log file. So in DyBASE long transaction can not cause transaction log overflow as in most of other DBMSes. Quite the contrary, if you do not call commit at all, DyBASE works as DBMS without transaction support, adding almost no overhead of supporting transactions.

The only drawback of long transactions is possibility to loose a lot of changes in case of fault. DyBASE will preserve consistency of the database, but all changes made since list commit will be lost.

Application Program Interface

DyBASE core is implemented in C++ (actually GigaBASE implementation is mostly reused). Interaction with database core is performed using C functions defined in dybase.h file. API of particular programming language consists of two parts:
  1. C interface with database (language extension module)
  2. Database wrapper classes implemented in language itself.
I try to made part 1 (extension module implemented in C) as small as possible and used mostly as wrapper for functions defined in dybase.h. And most of the interface is implemented in language itself. So all such things as object caching, recursive loading, investigating object fields are implemented in target language and not in C. The main advantages of such decision is flexibility (it is easier to implement different strategies at this level) and convenience (using language extension API is usually not convenient an error prone). The only disadvantage is worse performance because interpreted languages are usually not as fast as C and certainly implementation of the whole API in C will lead to better performance because no interpretation overhead is present here). But I think that pro in this case is more significant than its contra.

Below is the description of classes and method present in API of all languages. The next section describes specific of implementation for particular language.

Persistent class

Persistent class is common root for all persistent capable objects. It provides method for loading and storing object.
load()
Explicitly load object. This method will check if objects needs to be loaded, and, if so, load it from the storage.

isLoaded()
Check if object is already loaded or explicit invocation of load() method is required.
Returns
True if object is loaded, False otherwise

isPersistent()
Check if object is persistent (was assigned persistent OID).
Returns
True if object is persistent, False otherwise

isModified()
Check if object was modified during current transaction
Returns
True if modify method was invoked for the object within current transaction, False otherwise

store()
Store the object in the storage. If object is not yet persistent it is first made persistent by assigning persistent OID. If object references some other non-persistent object then they recursively made persistent and also stored in the storage. But referenced persistent objects are not recursively stored. You should invoke store method explicitly for each changed persistent object.

modify()
Mark object as modified. This object will be automatically stored to the database during transaction commit.

getStorage()
Get storage in which object is stored.
Returns
Storage in which object is stored or Null if object is not persistent.

deallocate()
Deallocate object in the storage. This method doesn't affect instance of the object in application memory - it is deallocated in normal way by language runtime. If you are using garbage collector, there is no need to invoke this method.

sharedLock(nowait=false)
Lock persistent object in shared mode. Other threads will be able to set their shared locks on this objects, but no exclusive locks can be set on this object until this lock is released.
Upgrading of the lock is not possible (thread holding a read lock can not upgrade it to exclusive lock). This is done to prevent possible deadlocks caused by lock updates. But locks are reentrant - so thread can request the same lock many times (and correspondent number of unlocks is needed to release the lock).
Locking the object doesn't prevent other threads from accessing the object - it only has influence on sharedLock and exclusiveLock methods. So programmer should set proper lock before accessing the object in multi-threaded application.
If object is concurrently accessed by several threads in read-only mode, then explicit locking of this object is not needed, because language API provides consistent retrieving of objects itself.
Only persistent object (object which were assigned to the the storage either implicitly by saving some other persistent object referencing this object, either explicitly by Storage.makeObjectPersistent method.
Parameters
nowait - optional parameter specifying whether request should wait until lock is available or fail if lock can not be granted immediately.
Returns
true if lock is successfully granted
false if lock can not be granted within specified time

exclusiveLock(nowait=false)
Lock persistent object in exclusive mode. Only one thread can lock object in exclusive mode at each moment of time. Shared or exclusive lock requests of other threads will be blocked until this lock is released. Shared locks, but not exclusive locks, can be set on this object until this lock is released.
This lock is reentrant so thread owning the lock can successfully retrieve the lock many times (and correspondent number of unlocks is needed to release the lock).
Locking the object doesn't prevent other threads from accessing the object - it only has influence on sharedLock and exclusiveLock methods. So programmer should set proper lock before accessing the object in multi-threaded application.
Only persistent object (object which were assigned to the the storage either implicitly by saving some other persistent object referencing this object, either explicitly by Storage.makeObjectPersistent method.
Parameters
nowait - optional parameter specifying whether request should wait until lock is available or fail if lock can not be granted immediately.
Returns
true if lock is successfully granted
false if lock can not be granted within specified time

unlock()
Remove granted lock. If lock was requested several times by one thread, then correspondent number of unlocks is needed to release the lock.

Index class

The index class provides access to the objects by key. Keys of any scalar or string type are supported. But in one index can contains keys only of one type. Index class extends Persistent class and so it is normal Persistent object. Index can be unique (duplicated keys are prohibited) or not unique.


drop()
Delete index. This method removes all entries from index and deallocate index object.

clear()
Remove all entries from the index. This method doesn't deallocate indexed objects.

insert(key, value)
Insert object in index.
Parameters
key - key with type matching with type of the index
value - persistent capable object to be associated with this key. This object is automatically made persistent (if it isn't persistent yet).
Returns
True if object was successfully inserted in index, False if index is unique and such key already exists in index.

set(key, value)
Set object for the specified key. If such key already exists in the index, previous association of this key will be replaced.
Parameters
key - key with type matching with type of the index
value - persistent capable object to be associated with this key. This object is automatically made persistent (if it isn't persistent yet).

remove(key, value=Null)
Remove object from index
Parameters
key - key with type matching with type of the index
value - optional reference to the persistent object removed from the index. If index is unique, this parameter can be skipped.
Returns
True if object was successfully removed from the index, False if there is no object with such key in index.

get(key)
Get object associated with specified key.
Parameters
key - key with type matching with type of the index
Returns
Null if there is no object with such key in the index,
object itself if there is only one object with such key in the index,
array of object if there are more than one object with such key in the index

find(low=null, lowInclusive=true, high=null, highInclusive=true)
Get objects which keys belongs to the specified range.
Parameters
low - low boundary for key value, if Null than there is no low boundary.
lowInclusive - if low boundary is inclusive or not
high - high boundary for key value, if Null than there is no high boundary.
highInclusive - if high boundary is inclusive or not
Returns
Array of selected objects or Null if there are no object with key belonging to the specified range

iterator(low=null, lowInclusive=true, high=null, highInclusive=true, ascent=true)
Get iterator for objects in the index. Objects will be traversed in key ascending order. Details of iterator's implementation depends on particular language.
Parameters
low - low boundary for key value, if Null than there is no low boundary.
lowInclusive - if low boundary is inclusive or not
high - high boundary for key value, if Null than there is no high boundary.
highInclusive - if high boundary is inclusive or not
ascent - iteration order: if true, then objects will be traversed in key ascending order
Returns
Usually this method returns iterator object, but in some languages it is possible to pass block of code which will be executed for each selected object.

Storage class

Storage is class is the main class of DyBASE API. It provides access to the database storage.


Storage(pagePoolSize = 4*1024*1024, objectCacheSize = 1000)
Storage constructor
Parameters
pagePoolSize - database page pool in bytes (larger page pool usually leads to better performance)
objectCacheSize - this parameter is used only by some of languages API and specify maximal number of objects in application object cache. Not all languages APIs maintain such cache.
Returns
Storage object. It is not opened and you should invoke open method explicitly.

open(path)
Storage constructor
Parameters
path - path to the database file
Returns
True if storage is successfully opened, False otherwise. Some language APIs do not return False, but throw exception in case of failure.

close()
Close the storage. If there is an open transaction, it will be first committed.

commit()
Commit current transaction. Transaction is automatically started once you update the database first time.

rollback()
Rollback current transaction. All changes done by current transaction are undone. Attention!Rollback of transaction did not restore original values of application objects. It just clears object cache. You should not use any variable referencing persistent object after rollback, but instead of it use Storage.getRootObject method and traverse objects from the root.

getRootObject()
Get storage root object.
Returns
Loaded storage root object or Null if root was not yet specified.

setRootObject(root)
Specify new storage root object. Previous root object (if exists) is NOT deallocated.
Parameters
root - new storage root object which is automatically made persistent.

deallocateObject(obj)
Deallocate persistent object from the storage. If object is not persistent, this method has no effect.
Parameters
obj - persistent object

makeObjectPeristent(obj)
Explicitly make object persistent. Object is automatically made persistent when persistent object containing reference to this object is stored (persistency by reachability). But sometimes you may need to force assignment OID and storage reference to the object. It can be done by makePerisistent method. This method doesn't actually store object in the storage, just assign OID to it.
Parameters
obj - object to be made persistent, if it is already persistent = method as no effect.


This method is usually invoked by Persistent.store() method. storeObject(obj)
Make object persistent if it is not yet persistent and store it in the storage.
Parameters
obj - stored object.


This method is usually invoked by Persistent.modify() method. modifyObject(obj)
Mark object as modified. This object will be automatically stored to the database during transaction commit.
Parameters
obj - modified object.

loadObject(obj)
Load object from the storage.
This method is usually invoked by Persistent.load() method.
Parameters
obj - loaded object.

createStrIndex(unique = True)
Create index for keys of string type.
Parameters
unique - whether duplicated keys are allowed or not (by default not allowed)
Returns
Index object

createIntIndex(unique = True)
Create index for keys of integer type.
Parameters
unique - whether duplicated keys are allowed or not (by default not allowed)
Returns
Index object

createLongIndex(unique = True)
Create index for keys of long integer type. Not all languages supports long (64-bit integer) type and so do not provide such method.
Parameters
unique - whether duplicated keys are allowed or not (by default not allowed)
Returns
Index object

createBoolIndex(unique = True)
Create index for keys of boolean type. Not all languages supports boolean type and so do not provide such method.
Parameters
unique - whether duplicated keys are allowed or not (by default not allowed)
Returns
Index object

createRealIndex(unique = True)
Create index for keys of floating point type.
Parameters
unique - whether duplicated keys are allowed or not (by default not allowed)
Returns
Index object

resetHash()
Reset object hash. Each fetched object is stored in objByOidMap hash table. It is needed to provide OID->instance mapping. Since placing object in hash increase its access counter, such object can not be deallocated by garbage collector. So after some time all persistent objects from the storage will be loaded to the memory. To solve the problem almost all languages with implicit memory deallocation (garbage collection) provides weak references. But in some of them weak references are not supported and sometime implementation of weak references is very inefficient. So to prevent memory overflow you should use resetHash() method. This method just clear hash table. After invocation of this method, you should not use any variable referencing persistent objects. Instead you should invoke getRootObject method and access all other persistent objects only through the root.

gc()
Explicit start of garbage collector. Garbage collector will collect only committed object, so there is no need to invoke garbage collector more than once within one transaction. Garbage collection can be used together with explicit deallocator by Persistent.deallocate method. But when you are using garbage collector, you should be careful with keeping references to the persistent objects in local variables. If there are no references to such object from other persistent object (so it is not reachable from root object), then garbage collector can deallocate such object. If you then try to access or update this object using reference stored in local variable, you will get on error.

setGcThreashold(maxAllocatedDelta)
Set garbage collection threshold. By default garbage collection is disable (threshold is set to 0). If it is set to non zero value, GC will be started each time when delta between total size of allocated and deallocated objects exceeds specified threshold OR after reaching end of allocation bitmap in allocator.
Parameters
maxAllocatedDelta - delta between total size of allocated and deallocated object since last GC or storage opening

sharedLock(obj, nowait=false)
Lock object in shared mode. Other threads will be able to set their shared locks on this objects, but not exclusive lock can be set until this lock is released. Upgrading of the lock is not possible (thread having read lock can not upgrade it to exclusive lock). It is done to prevent possible deadlocks caused by lock updates. But locks are reentrant - so thread can request the same lock many times (and correspondent number of unlocks is needed to release the lock). Locking the object doesn't prevent other threads from accessing the object - it only has influence on sharedLock and exclusiveLock methods. So programmer should set proper lock before accessing the object in multithreaded application. If object is concurrently accessed by several threads in read-only mode, then explicit locking of this object is not needed, because language API provides consistent retrieving of objects itself.
Parameters
obj - locked object
nowait - optional parameter specifying whether request should wait until lock is available or fail if lock can not be granted immediately.
Returns
true if lock is successfully granted
false if lock can not be granted within specified time

exclusiveLock(obj, nowait=false)
Lock object in exclusive mode. Only one thread can lock object in exclusive mode at each moment of time. Shared or exclusive lock requests of other threads will be blocked until this lock is released. shared locks on this objects, but not exclusive lock can be set until this lock is released. This lock is reentrant, so thread owning the lock can successfully retrieve the lock many times (and correspondent number of unlocks is needed to release the lock). Locking the object doesn't prevent other threads from accessing the object - it only has influence on sharedLock and exclusiveLock methods. So programmer should set proper lock before accessing the object in multithreaded application.
Parameters
obj - locked object
nowait - optional parameter specifying whether request should wait until lock is available or fail if lock can not be granted immediately.
Returns
true if lock is successfully granted
false if lock can not be granted within specified time

unlock(obj)
Remove granted lock. If lock was requested several times by one thread, then correspondent number of unlocks is needed to release the lock.
Parameters
obj - unlocked object

Language specific issues

Python

Python seems to be the most serious language among those three languages. It provides efficient weak reference dictionary which makes it possible to implement object cache and do not make programmer to explicitly call resetHash method.

Python provides long integer type (64-bit integer) but it has not separate boolean type (boolean values are treated as integers).

In Python instance variables are not declared at all and are attached to to the object when first time assigned. So objects of the same class can have different sets of instance variables. But this is not a problems for DyBASE. But because of this feature of Python it is not required to derive persistent capable object from Persistent (although it is still more convenient to do it, because in this case you can use methods derived fro Persistent class).

Unlike other languages stop of recursive loading is done not by redefinition of recursiveLoading method but by the assignment __nonrecursive__ attribute to the object.

Python makes it possible to redefine getter/setter methods for attributes. Using this facility delegators for persistent object can be implemented. Instead of recursive loading of object cluster, it is possible to create delegators for the persistent object and load objects on demand. Python API provides PersistentDelegator class which catch method attribute access and load object from the database if needed. This class also redefine comparison method to treat as equal two different delegators referencing the same OID. The following things will not work with delegator:

  1. Check class of the objects with isinstance operator (it can return class of delegator instead of actual object class)
  2. Comparison of persistent object not derived from Persistent class (Persistent class defines special redefine stand quality comparison operator to correctly handle delegators)

To enable usage of delagators you should path True to useDelegators parameter of Storage constructor (default value is False).

Ruby

Ruby is interesting language but with lack of some important features. For example reflection support in Ruby is very restricted - it is not possible to access instance variables outside the object. That is why I have to define C functions to access instance variables and create instances of the class.

Ruby provides weak references but its implementation is so inefficient, that by default it is switched off. To enable weak references, set USE_WEAK_REFERENCES=true in dybase.rb. Also Ruby support delegates: classes which redirect (delegate) methods to some other class. With delegate classes there is no need in recursiveLoading method - objects are loaded on demand. So it is much more convenient then common DyBase model of loading object. But there are also some significant drawbacks of delegates:

  1. There are actually two object instances: delegator object and target object. DyBASE always return reference to delegator object. But if you return self from some method, if will not be equal to the object from which method was invoked:
    class MyRoot<Persistent 
        def me
            return self
        end
    end
    
    root = db.getRootObject
    itIsMe = (root == root.me) # !!! false
    
  2. You can not inspect class of the object because it will return class of delegator (PersistentDelegator.
  3. Delegators adds extra space and CPU overhead (instead of one instance of the object we have two and each method invocation requires extra redirection).
That is why delegators are also switched off by default. To enable them, set USE_DELEGATOR = false in dybase.rb.

Ruby has normal (mark-and-sweep) garbage collector so it is not suffer from cyclic references. It support big numbers but not long (64 bit) integers.

PHP

May be this language is really very convenient for generation of Web pages, but as it is almost impossible to use it (from my point of view) as normal object oriented language. This is because of very strange copy by value model. You have to specify explicitly (in three different places!) whether you want to copy object by value or by reference and certainly it is so easy to make a mistake with very interesting effect.

In PHP 4.3.x reference counter field has short type. It means that PHP is not able to correctly handle objects which are referenced from more than 65535 places. As far as each persistent object contains reference to the storage object, this limitation means that there can not be more than 65535 persistent object loaded from the database to the memory at each moment of time. You have to periodically invoke Storage.resetHash method to remove persistent objects from the cache. Violation of this rule cause unpredictable behavior of the program (corruption of memory and sometimes segmentation faults).

PHP doesn't support weak references at all. Also it has no long (64-bit) integers.

PHP 4.* doesn't provide any primitives for working in multithreaded environment. So locking mechanisms are not supported by PHP API.

Rebol

Requirements/restrictions of Rebol DyBASE API:

Comparison of performance of different languages

I have three products GigaBASE, PERST and DyBASE build on the same core. GigaBASE is implemented in C++ and provides C++ API, PERST is implemented in Java and C-Sharp and provides API to the correspondent language, DyBASE mostly reuses GigaBASE implementation and provides API to the languages with dynamic type checking: Python, Ruby, PHP and Rebol.

I run the same simple test implemented in C++, Java, C-Sharp, PHP, Python, Ruby and Rebol. This test contains three steps:

  1. Create specified number of records with random long and string key and include it in long and string indices. After inserting of all records, commit is performed.
  2. Search for each record using long and string key.
  3. Search and remove each record from the indices and deallocate it from the storage.

Time of execution of each step is measured. Number of records in each case is the same - 100000. Page pool size in all cases is 32Mb, which is enough to hold all records in memory. All test were executed at the same computer: P4-1800, 256Mb RAM, Win2k. I am using MS Visual.NET 2003 C++ compiler, Java JDK 1.4, Python 2.3.2, PHP 4.3., Ruby 1.8.0 and Rebol/View 1.2.10.3.1. I divide time of each step by number of iteration and produce the following results:

ColorLanguageDatabase
1C++GigaBASE
2JavaPERST
3C-SharpPERST
4RubyDyBase
5PythonDyBase
6PHPDyBase
7RebolDyBase

Index searches per second

1147929
223010
315591
65000
54632
73389
42083

Stored objects per second

115676
28844
38713
53558
42777
62702
72173

Removed objects per second

112434
29819
35902
43571
62570
52272
71818

DyBASE implementation issues

Below is more detailed description of DyBASE implementation. This section can be skipped if you are not interested in the details of implementation:

Memory allocation

Memory allocation is performed in DyBASE by bitmap. Memory is allocated in chunks called allocation quantum. In current version of DyBASE size of allocation quantum is 64 byte. It means that size of all allocated objects is aligned on 64 byte boundary. Each 64 byte of database memory is represented by one bit in the bitmap. To locate hole of requested size in bitmap, DyBASE sequentially searches bitmap pages for correspondent number of successive cleared bits. DyBASE use three arrays indexed by bitmap byte, which makes possible fast calculation of hole offset and size within the byte.

DyBASE performs cyclic scanning of bitmap pages. It keeps identifier of current bitmap page and current position within the page. Each time when allocation request arrives, scanning of the bitmap starts from the current position. When last allocated bitmap page is scanned, scanning continues from the beginning (from the first bitmap page) and until current position. When no free space is found after full cycle through all bitmap pages, new bulk of memory is allocated. Size of extension is maximum of size of allocated object and extension quantum. Extension quantum is parameter of database, specified in constructor. Bitmap is extended to be able to map additional space. If virtual space is exhausted and no more bitmap pages can be allocated, then OutOfMemory error is reported.

Allocation memory using bitmap provides high locality of references (objects are mostly allocated sequentially) and also minimizes number of modified pages. Minimization of number of modified pages is significant when commit operation is performed and all dirty pages should be flushed on the disk. When all cloned objects are placed sequentially, number of modified pages is minimal and so transaction commit time is also reduced. Using extension quantum also helps to preserve sequential allocation. Once bitmap is extended, objects will be allocated sequentially until extension quantum will be completely used. Only after reaching the end of the bitmap, scanning restarts from the beginning searching for holes in previously allocated memory.

To reduce number of bitmap pages scans, DyBASE associates descriptor with each page, which is used to remember maximal size of the hole on the page. Calculation of maximal hole size is performed in the following way: if object of size M can not be allocated from this bitmap pages, then maximal hole size is less than M, so M is stored in the page descriptor if previous value of descriptor is large than M. For next allocation of object of size greater or equal than M, we will skip this bitmap page. Page descriptor is reset when some object is deallocated within this bitmap page.

Some database objects (like hash table pages) should be aligned on page boundary to provide more efficient access. DyBASE memory allocator checks requested size and if it is aligned on page boundary, then address of allocated memory segment is also aligned on page boundary. Search of free hole will be done faster in this case, because DyBASE increases step of current position increment according to the value of alignment.

To be able to deallocate memory used by object, DyBASE needs to keep somewhere information about object size. DyBASE memory allocator deals with two types of objects - normal table records and page objects. All table records are prepended by record header, which contains record size and pointer of L2-list linking all records in the table. So size of the table record object can be extracted from record header. Page objects always occupies the whole database page are are allocated at the positions aligned on page boundary. Page objects has no headers. DyBASE distinguish page objects with normal object by using special marker in object index.

Shadow transaction mechanism

Each record (object) in DyBASE has unique identifier (OID). Object identifiers are used to implement references between objects. To locate object by reference, its OID is used as index in array of object offsets within the file. This array is called object index and element of this array - object handle. These are two copies of object indices in DyBASE, one of which is current and other - shadow. Header of database contains pointers to both object indices and indicator which index is current at this moment.

When object is modified first time, it is cloned (copy of the object is created) and object handle in current index is changed to point to newly created object copy. And shadow index still contains handle which points to the original version of the object. All changes are done with the object copy, leaving original object unchanged. DyBASE marks in special bitmap page of the object index, which contains modified object handle.

When transaction is committed, DyBASE first checks if size of object index was increased during current transaction. If so, it also reallocates shadow copy of object index. Then DyBASE frees memory for all "old objects", i.e. objects which was cloned within transaction. Memory can not be deallocated before commit, because we wants to preserve consistent state of the database by keeping cloned object unchanged. If we deallocate memory immediately after cloning, new object can be allocated at the place of cloned object and we loose consistency. As far as memory deallocation is done in DyBASE by bitmap using the same transaction mechanism as for normal database objects, deallocation of object space will require clearing some bits in bitmap page, which also should be cloned before modification. Cloning bitmap page will require new space for allocation the page copy, and we can reuse space of deallocated objects. And it is not acceptable due to the reason explained above - we will loose database consistency. That is why deallocation of object is done in two steps. When object is cloned, all bitmap pages used for marking objects space, are also cloned (if not not cloned before). So when transaction is committed, we only clear bits in bitmap pages and no more requests for allocation memory can be generated at this moment.

After deallocation of old copies, DyBASE flushes all modified pages on disk to synchronize content of the memory and disk file. After that DyBASE changes current object index indicator in database header to switch roles of the object indices. Now object index, which was current becomes shadow, and shadow index becomes current. Then DyBASE again flushes modified page (i.e. page with database header) on disk, transferring database to new consistent state. After that DyBASE copies all modified handles from new object index to object index which was previously shadow and now becomes current. At this moment contents of both indices is synchronized and DyBASE is ready to start new transaction.

Bitmap of modified object index pages is used to minimize time of committing transaction. Not the whole object index, but only its modified pages should be copied. After committing of transaction bitmap is cleared.

When transaction is explicitly aborted by Storage.rollback method, shadow object index is copied back to the current index, eliminating all changes done by aborted transaction. After the end of copying, both indices are identical again and database state corresponds to the moment before the start of current transaction.

Allocation of object handles is done by free handles list. Header of the list is also shadowed and two instances of list headers are stored in database header. Switch between them is done in the same way as switch of object indices. When there are no more free elements in the list, DyBASE allocates handles from the unused part of new index. When there is no more space in the index, it is reallocated. Object index is the only entity in database whose is not cloned on modification. Instead of this two copies of object index are always used.

There are some predefined OID values in DyBASE. OID 0 is reserved as invalid object identifier. OID starting from 1 are reserved for bitmap pages. Number of bitmap pages depends on database maximum virtual space. For one terabyte virtual space, 8 Kb page size and 64 byte allocation quantum, 32K bitmap pages are required. So 32K handles are reserved in object index for bitmap. Bitmap pages are allocated on demand, when database size is extended. So OID of first users object will be 0x8002.

Recovery procedure is trivial in DyBASE. There are two instances of object index, one of which is current and another corresponds to consistent database state. When database is opened, DyBASE checks database header to detect if database was normally closed. If not (dirty flag is set in database header), then DyBASE performs database recovery. Recovery is very similar to rollback of transaction. Indicator of current index in database object header is used to determine index corresponding to consistent database state and object handles from this index are copied to another object index, eliminating all changes done by uncommitted transaction. As far as the only action performed by recovery procedure is copying of objects index (really only handles having different values in current and shadow indices are copied to reduce number of modified pages) and size of object index is small, recovery can be done very fast. Fast recovery procedure reduces "out-of-service" time of application.

Where to use

DyBASE is simple and fast embedded database engine. If your applications need embedded database engine and do not need to execute complex SQL queries, and the only thing you need is to be able to store/fetch/locate object in the database using navigation through references or indexed search by key, then DyBASE is what you need. It will provide much better performance than relational database and other (more sophisticated) object oriented database.

The table below summarize pro features of DyBASE:

  1. Tight and transparent integration with programming language
  2. No gap in database and application data model
  3. Easy to use
  4. Minimal requirements (DyBASE package itself has size only 51Kb and it can be configured to use minimal memory and disk space during its work)
  5. High performance (no overheads of communication, locking and parsing and executing queries)
  6. Fault tolerance (transaction support)
  7. Fast recovery after fault
  8. Zero administration efforts (database consists of the single file), there is no need to define or tune any database parameters. Such situation like transaction log overflow can never happen
Certainly nothing in this world can have only positive features. Now contra list which contains features lacking in DyBASE:
  1. No procedural query language
  2. Fine grain locking (DyBASE always access database in exclusive mode, making it possible for multiple threads to access the database but provide completely no parallelism).
  3. Remote access by multiple clients (unless you will implement you own server).
  4. Data distribution
  5. Database browsers and database administration tools and import/export utilities to other databases (may be will be implemented later)
  6. Lack of support of any standard (for example ODMG)

Quick start

DyBASE distribution includes binaries of libraries for MS-Windows and Visual C++ compiler. The following modules are provides:

lib/dybase.lib
static DyBASE library
lib/dybasedll.dll
DyBASE dynamically loaded library (used by all language APIs)
python/pythonapi.dll
Python extension library
python/dybase.py
Python module implementing DyBASE API (using pythonapi.dll)
php/php_dybaseapi.dll
PHP extension library
php/dybase.php
PHP DyBASE API (using php_dybaseapi.dll)
ruby/rubyapi.dll
Ruby extension library
ruby/dybase.rb
Ruby module implementing DyBASE API (using rubyapi.dll)

If you want to rebuild these libraries yourself, you should run make.bat (which invokes MS nmake for makefile.mvc) in src directory and compile.bat in each language API directory. You make have to change first paths to language home directory in compile.bat scripts. At Unix you should run make in each directory (GCC compiler and GNU make is expected). Also paths to language installation directory may need to be adjusted in makefiles.

The easiest way to learn DyBASE API is to look at the examples. Directory of each language contains three examples:

Guess
Game "Guess an Animal" - very simple game storing its binary tree in the database.
TestIndex
Example of using indices and also performance test.
TestLink
Classical Producer-Order-Detail example illustrating usage of one-to-many links.

To run these example and your own application do not forget to include dybase\lib in PATH.

Tuning

This sections contains several hints how to adjust DyBASE parameters and increase database performance.

Speed of accessing data at the disk is several times slower than speed of access data in main memory. That is why caching of data is the key to increase database performance. DyBASE is using pool of pages to optimize access to the disk. Page pool size can be specified in Storage.open method (by default it is 4Mb). Usually increasing page pool leads to better performance. But you should notice the following things:

There are some constants defined in database.h file which has influence on initial and maximal database size. If you want to change them, you will have to rebuild DyBASE. Below is detailed description of this parameters:

ParameterDefault valueDescription
dbDatabaseOffsetBits32 Number of bits in offset within database file. This parameter limit the maximal database size. Default value 32 restrict database to 1Gb. Increasing this parameter will also increase initial database size. DyBASE is using bitmap to allocate space in the database. Each bitmap page has its own ID which are reserved in objects index. When database is created, DyBASE reserve in object index space for identifiers of ALL bitmap pages needed to map database virtual space (defined by dbDatabaseOffsetBits). Number of bitmap pages needed to map the whole database can be calculated as 2 ** (dbDatabaseOffsetBits-32)). Actually index will be two times larger, because it should contain some other elements and when index is reallocated its size is always doubled. Each entry in object index is 8 byte long. There are two indices (active and shadow). So to estimate initial size of the database, you should multiply value of the expression above by 32.
dbDefaultInitIndexSize10*1024 Initial object index size. Object index is increased on demand. Reallocation of index is expensive operation and so to minimize number of such reallocations, object index size is always doubled. Specifying larger initial index size allows to reduce number of future reallocations and so a little bit increase performance (certainly if your application allocates such number of object). But it also leads to larger initial size of database file. With default value of this parameter, initial database size is about 50Kb.
dbDefaultExtensionQuantum512*1024 Database extension quantum. Memory is allocate by scanning bitmap. If there is no large enough hole, then database is extended by the value of dbDefaultExtensionQuantum. Increasing the value of this parameters leads to less frequent rescanning of allocation bitmap from the very beginning. It leads to faster allocation speed and better locality of reference for created objects (because there is more chances that them will be allocated sequentially). From the other side it leads to less efficient memory usage. Reducing the value of this parameter force reusing of existed holes in memory allocation bitmap.

Now some hints how to increase DyBASE performance and reduce size of used main memory. If you database performs a lot of updates of persistent data, then the main limiting factor is speed of writing changes to the disk. Especially synchronous write to the disk performed by commit. If you will do commit after each update, then average speed will be about 10 updates per second (this estimation is based on the assumption than average disk access time is about 10msec and each transaction commit usually requires writing about 10 pages in random places in the file). But it is possible to dramatically increase update performance if you group several updates in one transactions. DyBASE is creating shadow of the object when it is first time updated inside transaction. If object will be updated once in N transactions, then N shadows will be created. If object will be updated N times inside one transaction, then shadow will be created only once. It explains advantage of having one large transaction.

But if you will perform update of large number of objects in one transaction and for each updated object shadow is created, then it leads to significant increase of database file size. If you update each object in the database inside one transaction, database size will be almost doubled! And if you perform each update in separate transaction, then size of database will be almost unchanged (because space of allocated shadow objects will be reused in this case). So the best approach is to perform commit after 100 or 1000 updates, it will reduce overhead of each commit and save database size.

If your persistent object form tree or graph where all objects can be accessed by reference from the root object, then once you will load root object in main memory and store reference to it in some variable, GC will never be able to collect any instance of loaded persistent object (because it will be accessible from the root object). So when you application access more and more objects from the storage, at some moment of time all of them will have to be loaded in main memory. It can cause space exhaustion. To prevent this situation you should avoid to store in variables references to container objects which contain references to a large number of members. Especially it is true for storage root object. In this case GC is able to do it work and throw out from the memory objects which are not used at this moment (to which there are no references from local variable). But it is important to say that objects accessible though the index by key can be normally deallocated by garbage collector. So in this case special care is not needed.

Distribution terms

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR OF THIS SOFTWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

I will provide e-mail support and help you with development of DyBASE applications.


Look for new version at my homepage | E-Mail me about bugs and problems

tor. So in this case special care is not needed.

Distribution terms

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR OF THIS SOFTWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

I will provide e-mail support and help you with development of DyBASE applications.


Look for new version at my homepage | E-Mail me about bugs and problems