Introduction

GOODS (Generic Object Oriented Database System) uses an active client language-independent model of server-client interaction. All application logic is implemented in and executed by client applications. The server is responsible for storing and retrieving objects, handling transactions, object locks, garbage collection, database backups and recovery. A metaobject protocol is used in the implementation of the client database application interface to provide transparent and flexible interaction with the database. "Generic" in the acronym GOODS stands for the capability to extend the system to handle almost all possible object access and synchronization strategies. An "aspect-oriented" approach makes it possible to change object access management policies without affecting the application code. The notation and ideas of the "aspect-oriented" approach was taken from the works of Kickzales. Different strategies, such as a standard serialized transaction model or an optimistic model, can be used as well as models fitting the specific needs of a concrete application.

GOODS is a fully distributed database. The database consists of a number of storages, each of which can be located at different nodes of a network. The storage is controlled by a storage server, which is responsible both for handling database requests relevant to this storage and for interaction with other servers to perform database wide operations. The client can work with an arbitrary number of databases at any time. Each persistent object is stored in a concrete storage and its location can't be changed. The location of an object within the database is mostly transparent for clients (the client should not worry about in which storage an object is situated), but it is possible for a client to attach an object to a concrete storage. Global database operations (operations affecting more than one storage in the database) such as global transaction commitment, garbage collection and object locking are handled automatically by the database system using various algorithms to synchronize the activities of each storage server. Online database backup and schema modification is possible without suspending client interaction with the database. A lazy object conversion approach is used for implementing the schema modification. Different clients can simultaneously work with different versions of class definitions.

GOODS was designed to achieve maximum performance for handling object requests. Various algorithms for caching and prefetching objects and storage pages are used for this purpose. By splitting work into separate control threads, GOODS makes it possible to use the benefits of parallel execution - especially on multiprocessor architectures. A specially designed library provides a system independent interface for multitasking (classes and methods for creation and synchronization of tasks, including synchronization primitives such as: mutex, event, semaphore). This library is used to implement the client part of the database interface as well as the GOODS database server itself. Using a metaobject protocol for handling access to objects makes it possible to specify algorithms for object caching and synchronization of concurrent accesses, allowing a concrete application to reach the highest level of performance.

GOODS server

The GOODS storage server is divided into several components. Each of them is responsible for its own task and interacts with other components by means of a strictly defined interface. These components include a storage memory manager, a transaction manager, a page pool manager, an object access manager and a class information manager. This modular design makes it possible to choose different implementations for each component (the most suitable for a concrete application) and also to investigate the efficiency of various database management algorithms and strategies. A short description of each component follows.

Storage memory manager

This manager is responsible for (de)allocating storage memory and object descriptors. In GOODS, an indirect access model is used for interobject references. Each object has a unique identifier (opid). A descriptor table provides the mapping of object identifiers to the physical offset within the storage file. This approach allows an efficient implementation of the "become" operator, which changes the type of a concrete object. In GOODS, the object descriptor contains information about the object class, its size and its location in the storage file. The object descriptor table is implemented using a mapped on memory file. The allocation of storage memory in the current implementation is done by using a bit memory map. Each bit of this map corresponds to a quantum of storage memory. Currently this quantum is 32 bytes long. The storage of the bit memory map is itself stored in a mapped on memory file.

Garbage collection (GC) is done using a variation of the standard mark-and-sweep algorithm with some extensions for synchronization of the GC processes at different storages. To perform GC, one of the storage servers is chosen as a coordinator, which initiates the GC processes at each server and determines the global end of the mark phase at all servers.

GC can be initiated either when the size of the allocated storage exceeds some predefined threshold value or after some period spent by the system in an idle state. To initiate GC, each server sends a request to the coordinator. If the coordinator decides, that the GC can be started, it polls other servers in the database to find out whether they are ready to start GC. When the server receives such request from the coordinator and sends an acknowledgment to start GC, the server changes its state to "PRE-GC". If acknowledgments are received from all servers, the coordinator broadcasts to all servers a request to start the mark stage of GC. Otherwise (if a server is not available or a previous GC iteration still not finished at one of the servers) the coordinator broadcasts the "CANCEL" request to all servers, so that they can exit from the "PRE-GC" state.

Each storage server locally performs GC starting from the storage root objects. The set of roots consists of storage root objects and instances of objects loaded by the client at the moment of starting GC. Objects already scanned are considered to be marked with "black" color. Not yet marked objects which are referenced from "black" objects are considered to be "gray". All other objects are "white".

When the GC process finds a reference to an object from another storage, it sends a message to this server with this reference. To make this exchange of references more effective, GOODS buffers external references. Before a reference will be sent to a client, it is placed in the export buffer (if such reference is not already present in the buffer). When the buffer is full, it is sent to the destination server. The server maintains a separate export buffer for each other server.

When there are no more "gray" objects in the storage, the server sends a message to the coordinator containing a vector of numbers of messages with external references sent and received by this server from other servers. The coordinator maintains a matrix of all such vectors. To determinate the global end of the mark stage, the coordinator checks the following condition:

M[i,j].import = M[j,i].export, for all i, j

where M[i,j].import is the number of external references received by server i from server j, and M[j,i].export is the number of messages sent by server j to server i.

If this condition is true, the mark phase is completed at all servers and the coordinator sends messages to all servers to initiate the sweep phase of GC. During the sweep phase all "white" objects are deallocated. The same approach is used to collect unused versions of object classes.

To prevent the deallocation of object instances which were just allocated and not yet committed by some transaction, all such instances are marked as "black" before the sweep phase. The GC process runs in the background without affecting the normal functioning of the database. If an object is changed during GC, the difference of the sets of the references from the old and the new version of the object is calculated. (To make this operation more effective, not really the set difference is calculated, but only references in near standing positions are compared instead. This makes it possible to handle the most common cases: references are not changed or are shifted due to insert or remove from array operation.) A list of references from all old versions of the object is kept until the end of GC. The PRE-GC state is necessary to synchronize the moment when all servers start to keep information about old versions of objects. So GC at each server will see all references between objects which were available at the moment of GC initiation. If some objects are changed by a transaction, GC will have no access to these objects until the transaction will be committed. Then all these objects are updated.

Since different clients can have different versions of objects, the storage server should remember all versions of objects used by active clients. This information can be removed when all clients update their instances of an object and the object is scanned by GC (if GC was active when the object was modified).

The sweep stage is performed incrementally, making it possible to allocate new objects in storage while the sweep process is active. A system crash can cause some leak of memory due to garbage in the memory allocation bitmap (for example space allocated for extended objects by transactions which were active at the moment of the crash). Usually this leak of memory is very small (if any) and it is more essential to recover the server as fast as possible. It is possible to completely reconstruct the bitmap after recovery: clear the bitmap and mark only the space used by objects present in the object index.

Allocation of object descriptors is performed using a list of free object descriptors. Since a system fault can leave this list in an inconsistent state, special checks are used during all operations on this list. If the list is corrupted, it will be completely rebuilt at the sweep stage of the next GC process (normally the GC process only appends deallocated descriptors to this list).

Usually objects are allocated successively in a file (unless the bitmap is fragmented), so the object's body can cross a page's boundary. Usually continuous allocation provides good performance, because objects which are created together will be with high probability accessed together. So continuous allocation of objects will reduce page thrashing. But for some classes of objects it is preferable to place each object on a separate page. The B-tree page is an example for such kind of object. Since the page size in GOODS is an application dependent parameter which can be changed in any moment, the only thing we can ask the memory manager to do with such objects is to align its position in the file on object size boundary (more precisely, on power of 2 which is greater than or equal to the object size). By such alignment we can guarantee that the object will never cross a page boundary unless the page size is smaller than the object size.

Transaction manager

The transaction manager is responsible for handling all database updates as atomic and recoverable actions. A transaction can be either local (only one server participates in the transaction) or global. In the last case a two-stage transaction commit protocol is used to guarantee the global database state consistency. When a global transaction is committed, one of the servers participated in this transaction (one with a lowermost identifier) is chosen to be the coordinator. The coordinator assigns a global identifier to the transaction, writes the identifier into the global transaction history file and sends the identifier back to the client. The client sends local parts of the transaction to all servers participated in this transaction together with the identifier of the coordinator and the global identifier assigned to this transaction by the coordinator. The coordinator then waits for response from all servers participated in this transaction (stage I). Receiving his part of the transaction, the server writes it to the transaction log and sends a message to the coordinator, reporting that he is ready for a global transaction commit. When the coordinator receives acknowledgements from all servers involved in the transaction (stage II), the coordinator marks the transaction in the global transaction history file as globally committed, sends messages to all servers to finish the transaction, flushes all transaction changes in the database and sends a response to the client about the successfully committed transaction. Receiving such a message from the coordinator, each other server participated in the transaction will copy all objects modified by this transaction into the storage and will release locks set by this transaction. So it is possible that the client, which initiated the transaction, will receive a reply from the coordinator before the transaction is completed at all servers involved in the transaction. But because the transaction is committed by a client agent at the server, the server will not handle any other requests from this client before transaction completion.

This protocol of transaction commitment has uncertainty periods: the protocol might not be able to complete if the coordinator fails. The participants must wait until the coordinator recovers before committing or aborting their local transactions. During this time, the local data (accessed by a transaction) is locked to ensure serializability. When the server is in an uncertain state it periodically sends a request to the coordinator to obtain the transaction status until a response will be received.

If at stage I the coordinator receives an "aborted" message from one of the servers or if the timeout of waiting for a reply is expired, the coordinator decides that the transaction is globally aborted, marks this transaction in the global transaction history log as aborted and sends abort messages to all servers participated in the transaction and to the client which has initiated this transaction.

During recovery from system fault, the server reads the transaction log and checks whether the transaction is local or global. If the transaction is global, the server asks the coordinator of this transaction for the status of this transaction. The coordinator reads the global transaction history log to check the status of the global transaction. If the transaction is marked as globally committed, the coordinator sends a "committed" reply to the server. The server performs recovery of his local part of the transaction. Otherwise, if the transaction is marked as "aborted" or if there is no entrance with a global transaction identifier in the global transaction log file, then the coordinator responds to the server with an "aborted" message and the server skips this transaction in his log. If the coordinator is not available at this moment, the server has to wait until the coordinator will be restarted.

If one of the participants fails after saving his local part of the transaction in the log but before sending a "ready-to-commit" message to the coordinator, the participant will ask the coordinator for the transaction status while recovering. If the coordinator still waits for acknowledgement of transaction commitment from servers participated in the transaction (timeout is not yet expired), it can notice that a request for the transaction refers to the active transaction. In this case the coordinator can treat a "get-transaction-status" request as an acknowledgement from this server to commit the transaction (the server asks for the transaction status only when it has successfully saved his local part of the transaction in the log and is ready now to recover it). So the coordinator doesn't send a reply for this request immediately but instead continues processing the transaction. A reply will be sent to the participants later when the transaction will be either committed or aborted.

When objects are restored from the transaction log after a crash, we should be careful with "zombie" objects - objects, which are already deallocated by the GC process. The space of these objects can be reused for newly created objects. When we restore such "zombie" object from the log it will override a new object allocated in the same memory space. That is not a problem - the new object is also present in the log and will be restored later. But now we have several object descriptors refering to the same memory location. Since there should be no references to an old object (which was once collected by GC), it will be deallocated during the next GC iteration and the space will be freed. But we also deallocate space of new objects which still can be accessible!

To avoid such dangerous behavior, we keep track of address ranges occupied by all restored objects during recovery. If the ranges of two objects overlap, then the older one is removed (space used by this object is deallocated and the descriptor is inserted into the list of free descriptors). We use a special kind of binary tree for effective finding of overlapped regions (log can contain a big number of objects and a naive algorithm with complexity N*N is not acceptable). Each level of this tree represents a range which is power of 2. The depth of this tree is limited by 64 (an offset in the database file has a size of 8 byte). So the complexity of checking the ranges of all restored objects is C*N, where C is some constant.

When storage is recovered after the crash, the file memory allocation bitmap is cleared and all used objects (addressed by object index) are marked in this bit map. If some of the bitmap bits (corresponding to the object space) are already marked and the object is not a one recovered from the transaction log, we have an inconsistency in the object index. This inconsistency can be caused by the garbage collector which freed an unreferenced object, reused its space for some newly allocated object, but did not flush the index page with this descriptor to the disk before the crash. Fortunately if one of two overlapped objects is really accessible from the storage root, then it should be found in the transaction log, which means it was already recovered. In this case we can just remove the descriptor of the second object. If both objects are not recovered from the transaction log, then both of them are inaccessible and we can safely remove one of them to reestablish the consistency of the object index.

When the size of the transaction log exceeds some limit, a checkpoint process is started by the transaction manager. All modified pages and file buffers are flushed to the disks, the log is truncated and the logging process starts from the beginning of the file. A new sequential number is written at the beginning of the log. This sequential number is incremented after each checkpoint and database close operation. This sequential number is included in each transaction record written to the log. Only a record which sequential number is equal to the sequential number of the log will be restored during recovery after a fault. When the checkpoint process has completed, the server sends messages to all possible coordinators of global transactions in which this server can be involved (servers with identifiers less than the identifier of this server) informing them about checkpoint completion. When receiving such message from a server i, the coordinator can remove from the global transaction log all entries relevant to the server with a sequential number less than the current sequential number of server i minus one. Subtracting one is necessary because transactions with these sequential numbers can be saved by backup and retrieved by restore procedures.

The transaction manager is also responsible for managing online backup procedures. Two types of backups are supported by the GOODS server: snapshot backup and permanent backup. Snapshot backup allows to produce snapshots of storage states. If a disk failure happens, then it is possible to restore a locally consistent state of storage for the time of the backup completion (if global transactions are used, then the global state of the database can not be consistent because a backup is local to the storage server: so snapshots at each server are not synchronized). A permanent backup process can be used to guarantee the global database recovery in any moment of time. It is assumed that log files can not be corrupted (they can be placed on some reliable media - for example RAID disk massive). When the backup process is active, checkpoints are delayed till the end of backup. To avoid delays in database functioning due to waiting for backup completion, the limiting log size should be set to a value, which can guarantee that backup can be completed before this limitation is reached. Backup can be started by triggering one of two conditions: a timeout signal is received or the size of the log becomes greater than some specified value. The backup process first saves the current contents of the storage file, the storage memory allocation bit map and the object descriptor table. Then it copies to the backup file all records from the transaction log until it reaches the last one (records can be added to the transaction log while backup is in progress: so synchronization is required). When the last record is reached, the backup process is complete. Now the checkpoint procedure is invoked to truncate the transaction log. After backup completion, the database operator has to switch backup media (tape) and make a schedule for the next backup: specify the timeout and/or log size for the next backup.

The difference between a permanent backup and a snapshot backup is that the snapshot backup has to save only that part of the transaction log file which was written at the moment when the backup completes copying the storage file and the memory manager tables. The backup should save only information about transactions completed before this moment to provide a consistent snapshot of the storage. The checkpoint procedure is not forced by the snapshot backup.

The operation of writing the transaction into the log should be performed synchronously, to guarantee that after the completion of this operation it will be possible to recover a transaction in case of system fault. The total throughput of a database system can be significantly increased if separate synchronous writes are combined into a single write operation. The performance can further be increased if the size and position of the data written are aligned to the operating system's disk block size. In this case the operating system should not previously read the contents of a modified block from the disk. This algorithm is encapsulated into the "file" class. This abstract class provides operating system independent access to files, guaranteeing correctness of concurrent access operations to the file. Implementations of this class for different operating system use system specific advanced calls (such as writev for Unix or overlapped IO for Windows-NT) to achieve maximal performance. To provide merging of concurrent synchronous write requests, the implementation of the class file uses two buffers in a cyclic way. While one buffer is writing to the disk all other write requests are collected in the other buffer. When the write operation for the first buffer is completed, the buffers change the role: all write requests are now collected in the first buffer. The size of the buffer is aligned to the size of the operating system's disk block, so no overhead of reading the content of the modified block is present. In case of using gathered IO functions (such as writev), it is possible to avoid copying the data written to the buffer; instead a vector of segments can be written to disk.

It is possible in GOODS to switch off synchronous writes to the transaction logs (using normal buffered write requests) if the operating system can guarantee that all modified buffers will be flushed to disk before system halt. That is possible if UPS is used to prevent power faults and the operating system itself is considered to be enough reliable. The performance can be dramatically increased in this case.

Page pool manager

The page pool manager provides efficient access to storage files. To improve IO operation performance, the page pool is used for caching the most recently used pages and buffering IO requests. A special synchronization policy allows different clients to perform read/write operations in parallel. The algorithm is similar to the one used by operating systems to control the access to file buffers. Each page has a BUSY and a WAIT flag and an event object on which tasks, accessing this page, can sleep. While reading a page from the file, the BUSY flag is set, so any other task trying to access this page will set the WAIT flag and sleep on the correspondent event object. After closing the read operation, the WAIT flag is checked. If it is set, then the event object is signaled to wake up all waiting tasks. While the page is being used by some tasks (they copy data to or from it), the USED counter in the page header is not zero. This counter prevents the page replacing algorithm from throwing this page from the cache.

The storage page size is an application dependent parameter and can be changed without affecting the storage data file. By default, the operating system page size is used as a value for this parameter.

Object access manager

The object access manager controls the locking of objects, maintains information about instances of objects loaded by clients, sends notifications to clients when an object is modified and synchronizes the access to objects by different components of the storage server.

Two kinds of locks are supported: exclusive and shared. Only one process can lock an object in exclusive mode, all other processes in shared mode. Lock requests can be either blocked or not-blocked. In the first case, the process requesting the lock will be blocked until the requested lock is granted. In case of a non-blocking request and if granting a lock is impossible, a negative answer is returned immediately. In GOODS the "honest policy" of granting locks is used: a lock requested first will be granted first. So even if a requested lock is compatible with current locks set on the object, this request can be blocked because there are already other blocked lock requests for this object. There is only one exception from this rule: a shared lock can be upgraded to an exclusive lock despite of the presence of other blocked requests.

Since different components of the storage server can access the same object simultaneously, some synchronization mechanism is needed. The object access manager provides methods for two types of object accesses: reading and writing, implementing the "multiple readers single writer" discipline.

The object access manager maintains for each object a list of object instances loaded by client processes. When an object is changed, this list is scanned and to all processes having an instance of this object (except the process which has modified this object) a notification message is sent. For correct garbage collection it is necessary to know about all objects and references from these objects present in client processes. Since the object access manager has this information, the garbage collector process requests information about all such references from the object access manager during the mark phase and before the sweep phase.

Since a client can have deteriorated a version of the object and can follow references from this object instance, it is necessary to prevent the garbage collector from deallocating this object, which is referenced only from deteriorated instances (so no more references to this object are present in the database). When an object is modified, the object access manager compares the set of references in the old with the one in the new version of the object. All references in the old version which are not present in the new version are inserted into an "old-reference-list", explored by the garbage collector. The references are kept in this list until all clients will update their instances of this object.

Class information manager

GOODS uses an "active-client" model of client-server interaction. The client applications are responsible for programming the application logic, whereas the servers are responsible only for fetching and storing objects and synchronizing the access to them. The server does not need to know too much about the contents of objects. It views an object as an array of references and raw data: knowledge about object references is necessary to perform garbage collection. The object is stored by the server in a system independent format (big-endian byte order, 6-bytes references containing object storage identifier and number of the object within storage). To simplify the server's work with objects and to increase the server performance, objects are stored by the server formatted: all references are placed before any other data.

The GOODS server stores information about classes of all objects allocated in this storage. The class information includes the class name, the size of fixed and varying parts of class, the number of references in fixed and varying parts and information about each object field: name, type and offset within object. This information is used by the server only to calculate the number of references in the object while doing garbage collection and building the closure of objects claimed by the client. Storing complete information about the class definition is essential for the server to allow work with clients which have different database schemas (class definitions). GOODS allows the online modification of a database schema while the client works with the database. Moreover one client can use the old version of a class definition, while another a new one. Such lazy object conversion strategy is really necessary for a system with a lot of different client applications and a working time 7x24.

Before requesting an object from the server, the client looks at the class identifier. If the client has already loaded an object of such a class, then the mapping between the database class and the local application class is already set. Otherwise the client sends to the server a request to provide information about the class with such identifier. The server responds with the full class description. The client then looks for a local application class with the same name. If no such class is found, the application is abnormally terminated. Otherwise the client compares the signatures of the storage class and the application class. If the signatures are equal, the client establishes a new mapping between the storage class identifier and the application class.

If the signatures are not equal, then it is assumed that the client has a new version of the class. Now a special class descriptor for converting the object from the old representation to the new one is constructed. The conversion procedure looks for the definition of all components in the old class description and tries to find fields with the same name in the new class description. If a field with the same name is present in both class definitions, the conversion procedure checks whether a conversion between the old type of the field and the new type is possible. The only restriction on type conversion compatibility is that a reference type can be only assigned to a references type. All other kinds of conversion (real->integer, integer->real, fixed array of scalars->scalar, varying array->fixed array...) are possible (certainly some of these conversions can cause loss of data). Then the client sends the new signature of the class to the server and receives the new storage class identifier assigned to this class descriptor by the server. When an object of such class is loaded from the server, it is converted to the new class format and a new storage class identifier is assigned to this object. If such object will be modified and sent back to the storage, it will be saved in the new class format. So lazy approach is used to perform object conversion when the class definition has been changed.

When a client is going to store an object in the storage (as a result of making a reference to this object from another persistent object), it checks whether this object has a valid storage class identifier. If none exists, the client sends to the server a description of the object class and waits for a response containing the storage class identifier assigned to this class. When the server receives such a request from the client, it looks through the definitions of all classes in the storage. If a class with the same signature is found, the server replies with its identifier. Otherwise a new class definition is added to the storage and a new assigned storage class identifier is returned to the client.

Since there can be a lot of object class modifications, a garbage collection procedure is also necessary to collect unused classes. During the mark stage of the garbage collector, classes of all "black" objects are also marked. At the sweep stage the class information manager is asked to remove all unmarked classes. If a class can be registered by an application, but the object instances of this class are not yet created at this moment, the deletion of this class definition is delayed to the moment when all client processes working with this class descriptor will be terminated. To implement this strategy of delayed class deallocation, all clients are assigned successive numbers as identifiers at the time of connection to the server. Each class descriptor stores the maximal client identifier which has accessed this class. When a client sends a request to provide a class identifier for a concrete storage class descriptor, the identifier of this client is compared with the client identifier stored in this class descriptor;if the first is bigger, it replaces the client identifier stored in this class descriptor. A class descriptor can be removed when there are no object instances of this class in the storage and there are no more clients with identifiers less or equal to the class identifier stored in this class descriptor.

Class descriptors are stored in the storage like other objects, but a special range of object identifiers is used for class descriptor objects. In GOODS a class storage identifier is stored in one word (two bytes) of the object descriptor, so the maximal number of classes in the storage is limited to 2^16 = 65535. The first 2^16 entries in the object descriptor table are reserved for class descriptors. Creating new class descriptors and updating existing ones is performed using standard transaction mechanisms. To improve the performance of the class descriptor lookup, the class information manager maintains an array of copies of all class descriptors in memory and uses a hash table for fast search of the specified class name.

Optimization of object loading

When a client application accesses a persistent object which is not present in the client cache, the database interface sends a request for loading the object to the server where this object is located. Since transmission of data through a network is a relatively slow operation and moreover only a small portion of the total time is used to transfer the data itself (it is not so much difference in time between sending 1 byte or 100 bytes of data), there are two ways of increasing the application performance. Either one can increase the size of the object cache and improve the cache management algorithms (we will speak about this approach later). Or we predict which objects will will be retrieved by the client and include into the reply not only the requested object, but all objects which will with high probability be requested by the client in near future: when the client accesses such object, the client database interface finds that the object is already present in the object cache and no request to the server is necessary. Certainly it is very difficult a task to predict which objects will be retrieved by the application in the near future. Looks like no single best solution exists for this problem.

There are several main approaches to determinate which set of objects should be sent to the client. One alternative is to ask the programmer to explicitly specify clusters of objects. If one object from the cluster is requested, then all objects from this cluster are transmitted to the client. The disadvantage of this approach is that it violates the transparency of manipulation with persistent objects. Another approach is to maintain statistics of requesting objects by clients. So if the server knows that after requesting object A the client with high probability will request object B, then object B can be sent to the client together with object A. The disadvantage of this approach is that it consumes a lot of server memory and CPU time. While this approach may provide very good results for some applications, it can be inefficient for other applications. The third approach is to try to construct some kind of object closure which includes all (or some of the) objects referenced from the requested object. The size of the closure should be limited to minimize the increase of network traffic due to transfer of useless data.

The GOODS server currently uses the third approach (certainly other approaches can easily be implemented and tested by redefinition of one method). When an object is requested by the client, the server includes in his reply all objects which satisfy the three following conditions:

  1. the object is directly accessible from the requested object,
  2. the client doesn't have this object,
  3. the total size of all objects in the reply is limited to some predefined constant.

Replication support

Starting from version 2.60 GOODS supports database replication. This feature can be used if your application is mission critical and requires fault tolerance. GOODS supports master-slave replication model with active primary node and passive standby node. Replication is integrated in GOODS transaction mechanism. All transactions executed at primary server are replicated to standby server. When standby server detects failure of primary server, it performs recovery and continue functioning as new primary server. Clients connected to the crashed primary server needs to reestablish connection to new new server and then they will see the database in the same state as it was before crash of the primary server.

To enable replication mode, you should specify transmgr.replication_node parameter in GOODS configuration file (goodsrv.cfg). When GOODS server is started and transmgr.replication_node is specified, it tries to connect to the specified node.

Both standby and primary nodes store transaction records in their transaction logs. To improve performance, it is good idea to set transmgr.sync_log_writes to 0 disabling synchronous writes to the log. Since transaction is transferred to standby node and can be recovered from standby node in case of primary node fault, synchronous write to primary node transaction log is not needed. Since synchronous write is very expensive operation operation, disabling them can significantly improve performance. Using transaction logs both at standby and primary node sides, allows to minimize size of data which needs to be transferred from primary node to standby node during recovery. Most of the transaction will be recovered from local transaction log, and only those transaction which were not flushed to the disk before failure or which were committed by new primary node after the crash, have to be sent from primary node to recovered standby node during synchronization of state of standby node.

When you are running GOODS in replication mode, you should start two GOODS servers. The first one started becomes primary node. Second server (which should be started with some delay after start of th first server) will connect to primary node and becomes standby node. After crash of the primary server, you should restart it as soon as possible. If will connect to active server, becomes standby server and synchronize its state with primary node. If primary node has to be terminated and standby node is not available, then primary node doesn't perform checkout. When you will start the database next time, you should first launch GOODS server at this node (node which was primary before termination). It will perform recovery (even through it was normally terminated) and is able to restore state of standby node once it will be started and connected to this node.

When transmgr.replication_node parameter is specified by no standby node is connected, primary node doesn't perform checkpoints and truncate transaction log. Transaction log is needed to be able to perform recovery of standby node once it will be connected to primary node. It means that size of the transaction log can becomes very huge if there are a lot of transactions commits and standby node is not available for a long time. So you need to schedule your disk resources in such way, that avoid transaction log overflow in such situation.

When standby node detects primary node failure (connection with primary node is broken), it invokes virtual method database::on_replication_master_crash(). A programmer can override this method in class derived from database to be able to provide some application specific handling of primary node failure. For example, it can change IP address of the standby node to be the same as was at primary node. So now the clients need not know about primary and standby node addresses - they will just reconnect to the same address and will work with new primary server.

Currently only replication of the single storage database is supported. Replication will not work for distributed transactions in the database consisting of two or more storages.

Application interface to the database

The application interface to the database is divided into two parts: the application dependent and the application independent. The application independent part is represented by the abstract class "dbs_storage". This class contains a reference to the abstract class "dbs_application" which declares methods handling database requests to applications (notification about object modification, disconnect handler). The class "dbs_storage" declares methods for fetching, locking and unlocking objects, for retrieving and storing class descriptors, for manipulations with transactions,... Implementations of this class are responsible for sending requests to the server and receiving replies and notifications from the server. This class is not responsible for synchronizing the access to the storage and for allocation, unpacking and deallocation of object instances in client applications. Instead the implementation of "dbs_storage" just places an object instance or class descriptor loaded from the server in a specially supplied buffer, whereas successive operations on this buffer are performed by the application dependent part of the database interface. The application dependent part of the database interface implements the "dbs_application" protocol; it contains methods accessing "dbs_storage" and providing the synchronization of storage accesses as well as object packing and unpacking. Also handling of storage notifications is performed by methods defined in this part of the database interface.

The database interface is organized in such way that it is possible for different applications written in different languages to work with the same database. In the current version of GOODS only the interface for C++ is provided, because C++ is the most popular object oriented language now, which also provides highly performance efficient executables. An interface to the Java language is planned in near future.

Different applications may have very different requirements to the database system. For example for a banking application a standard serializable transaction model may be the most relevant one. Applications in such a system should have an illusion of monopoly work with a database:a transaction should not see changes made by other transactions of other clients. It is the isolation model of database organization. But for an office document flow controlling system a cooperation model may be more preferable. In this model applications cooperate with each other and changes made by one application should be visible by other applications.

Since it is not possible to incorporate all possible strategies in a single database system, the generic (or universal) object oriented database should have facilities to extend its functionality and behavior depending on application requirements.

Modern programming systems have to deal with a lot of different aspects: user interface, memory management, multitasking, database access, security... A traditional approach of dividing an application into modules will not work if each module will be responsible for all of these aspects. If for example the logic of synchronization access to objects will be scattered about the application, it will be very difficult to understand such code, debug or modify it. To make the application more clear, simple and extensible, it will be a good idea to separate code responsible for different aspects of system behavior. These separated aspects can be combined together at compilation time (or at runtime) using a metaobject protocol approach. According to the GOODS interface with client applications, the following aspects of the system are implemented by metaobjects:

Let us look at these aspects more closer:

Intertask synchronization of object access
Modern applications very often have to deal with a lot of concurrent jobs: handling user interfaces, retrieving data from the database, performing some background recalculations... It is very difficult to implement such a system without using multitasking. Multitasking can be either explicit or implicit. Implicit multitasking is more transparent for the programmer (for example each method invocation can be considered as a separate thread of control), but it is not so flexible as explicit multitasking. And since aspects of its implementation are hidden from the programmer, some kind of unexpected programmer behavior can take place. Thus we decided to use in GOODS the explicit multitasking model where the programmer has to create and synchronize tasks explicitly using synchronization objects like: mutex, event, semaphore. Synchronization of access to the object by different tasks forms a separate aspect of system behavior which is implemented by means of metaobject protocols.

The default policy used in GOODS for intertask object access synchronization is the mutual exclusion model. Only one task can access the object at the same time. Each GOODS object has an associated monitor object which synchronizes the access to the object instance by different tasks. When a method of the object is invoked by some task, the monitor associated with this object is locked to prevent other tasks from using this object until this task returns from the invoked method. Nested calls to objects are also allowed. This is a simple and powerful strategy but it has some serious disadvantages, one of them is the possibility of a deadlock - the mutual blocking of several tasks.

This model of intertasking object access synchronization is similar to one used in Java (with all methods explicitly considered to be "synchronized"). Each GOODS object also has two methods "wait" and "notify". When a task, executing in a monitor needs to wait for another task to do something, it calls wait(). This causes the task to unlock the monitor and go to sleep. Since the monitor is unlocked, another thread can enter the monitor and supply the information the first task is waiting for. The second task signals the waiting task by calling notify(). Once the first task is signaled it wakes up and begins waiting for the monitor to become available. When the second task finishes its processing, it unlocks the monitor, allowing the first task to reacquire the monitor and to finish what it started.

Unfortunately, there is one serious problem with this signaling mechanism, which can cause deadlocks. If a method of object O1 invokes a method of object O2, which in turn calls the wait() method, then only the monitor of object O2 is unlocked, while the monitor of object O1 remains locked. One could, which looks tempting at first glance, modify the behavior of the metaobject so that calling wait() unlocks all monitors a task has acquired. This would be a mistake. It would imply that anytime you call a function that could in turn potentially call wait(), you cannot guarantee that another task won't change the state of the object.

Since monitor objects take some space, it is very space consuming a solution to have separate monitor objects for each application object. Instead of this a "turnstile" of monitor objects is used in GOODS. When an object should be locked and it has no attached monitor object, a new monitor object is taken from the turnstile (if there are no more free monitor objects, then the turnstile is extended). When an object is unlocked, the monitor object continues to be attached to this object, but it can be taken away and used for another object. So the total number of monitor objects doesn't exceed the maximal number of simultaneously accessed objects and there is a high probability that when an object is accessed it already has an attached monitor object.

Synchronization of object access by different database clients
There are two main approaches to object access synchronization in the database: pessimistic and optimistic (certainly some combination of this two approaches can be used). With the pessimistic approach, object locks are usually used to prevent other clients from accessing the object. This approach guarantees that object changes can not be lost. Main disadvantages of this approach are the possibility of deadlocks and reduced concurrency. The optimistic approach can be successively used if the possibility of conflicts (simultaneous modification of an object by different clients) is small and the transaction can be easily restarted. With this approach the check for correctness of object access is made only when the transaction is committed. If some of the objects touched by the transaction have been changed by some other transaction, a conflict has taken place and the current transaction should be restarted. There are a number of different metaobjects in the GOODS database interface, which implement different variations of these approaches. Below the hierarchy of these metaobjects is illustrated. The description of these metaobjects can be found in the "mop.h" header file.

Abstract Metaobject
	Basic Metaobject 
		Optimistic Metaobject             
			Repeatable Read Optimistic Metaobject	
		Pessimistic Metaobject
			Lazy Pessimistic Metaobject 
                        Repeatable Read Pessimistic Metaobject

It is possible to achieve a higher level of concurrency between different clients if the semantic of a concrete object class is explored. For example, the PUT and GET methods of the "queue" object can be executed concurrently without mutual exclusion as long as the queue is not empty. This is possible because these methods, although both are mutators, work on different ends of the queue. Using a specific metaobject protocol considering the object's semantic can be advantageous.

Handling of database transactions
All changes in the database should be made by means of consistent and atomic sequences of operations - the transactions. A transaction transfers the database from one consistent state to another. There are a lot of different transaction models: flat transactions, nested transactions... Simply speaking, all these models have different answers to two questions: when should a transaction be committed or aborted and when should changes made by one transaction become visible for other transactions. Since different applications may require different transaction models, we want to allow methods of the metaclass to answer these questions. So by definition each programmer can introduce her/his own model of transactions by means of own metaobjects.

The default implementation of transactions in GOODS implicitly opens a nested transaction each time a GOODS object is accessed; and commits all nested transactions when control returns from the last invoked method. So the programmer should not worry about the definition of points where to open or where to close a transaction (but it is possible to explicitly open or close a nested transaction and so to create a long-live transaction). When a transactions is aborted, all modifications of persistent objects are discarded. This implicit transaction schema is highly compatible with the idea of a transparent database interface. And it is more error safe.

It is possible to explicitly specify transaction boundaries by metaobject::begin_transaction() and metaobject::end_transaction virtual methods. The method metaobject::begin_transaction() increments by one counter of nested transactions and metaobject::end_transaction method decrements the counter and if it becomes zero, commits the transaction.

Management of the client's object cache
Since the client application has to send requests to the server for every accessed object, the performance of the application mostly depends on the efficiency of the object caching strategy. Usually a standard LRU algorithm is used for replacing entries in the cache, but for a database this discipline is not always a good choice. An application accessing a database frequently scans a large number of objects. With a traditional LRU cache replacement algorithm, these scanned objects (most of them will not be accessed in the near future any more) will completely replace all cache entries by throwing away all other objects.

To avoid such undesirable behavior, a special modification of the LRU algorithm is used in GOODS. The object cache is divided into two parts: a frequently used objects (FUO) part and an objects used only once (OUOO) part. Both parts are controlled by an ordinary LRU discipline using double linked lists. The object is excluded from the first list when it is accessed; and inserted at the head of the second list after the end of the access. If an object is taken from the OUOO list and it is not at the head of this list, the object is considered to be a frequently used one and reattached to the FUO list. (Since the object is present in the OUOO list but not at the head of this list, we can make the conclusion that the object was already accessed by some other place in the program and may be accessed once again). So each object scanned during the database search can not replace objects from the FUO part of the cache. But GOODS allows the programmer to define her/his own cache managing policy because the cache is also controlled by metaclass methods.

Currently the GOODS database interface consists of some kernel database requests (such as set lock, open transaction, load object...) and a number of basic metaobjects, implementing the most common models of object access organization. Deriving from these basic metaobjects, it is possible to create specific metaobjects to handle specific requirements of concrete applications. Usually most of the work can be done by methods of base metaclasses and only few things should be redefined or added.

The GOODS interface for the C++ language

GOODS supports the following scalar types as components of database classes:
 
char    1 byte character  
nat1	unsigned 1 byte integer
int1    signed 1 byte integer
nat2	unsigned 2 bytes integer
int2    signed 2 bytes integer
nat4	unsigned 4 bytes integer
int4    signed 4 bytes integer
nat8	unsigned 8 bytes integer
int8    signed 8 bytes integer
real4   4 bytes ANSII floating type
real8   8 bytes ANSII floating type
It is better to use these type aliases instead of native C types to provide portability of your application (for example type long can be 4 bytes on one system and 8 bytes on another). References to other GOODS objects are supported by means of "smart pointers" implemented in the template class "ref":
class A;
ref<A> ra;
It is possible to construct arrays and structures of the above specified atomic types:
char str[10];
struct coord {
	int x;
	int y;
};
coord points[10];
It is possible to specify classes with objects of varying size. Such class can have (only) one varying component, the size of which is determined at object creation time:
class person : public object {
  public:  
    char name[1];
   
    static tree_node* create(char* name) {
        int name_len = strlen(name);
        return new (name_len) person(name, name_len);
    }
    METACLASS_DECLARATIONS(person, object);

  protected: 
    person(const char* name, size_t name_len)
    : object(self_class, name_len)
    {
        memcpy(this->name, name, name_len+1);
    }	
};

field_descriptor& person::describe_components()
{
    return VARYING(name);
}
Such a class can have only one varying component which must be the last one. Classes with varying components are used in the GOODS C++ interface to implement arrays. They provide an efficient method for representing classes with constant (immutable) string identifiers (like person and name).

When the implementation of the C++ metaclass information does not exist, the programmer has to specify this information manually because the database needs to know the format of objects. To ease this work, a number of macros and functions are supported by the GOODS interface for C++.

GOODS supports persistency only for objects of persistent capable classes. A class is persistent capable if it is derived from the GOODS "object" class and implements some methods and constructors needed by the GOODS client library. Such a class should have:

  1. a specific constructor for initializing the object when it is loaded from the database;
  2. a static component "self_class" containing the class descriptor of this class;
  3. an overloaded function "classof" returning the class descriptor determined by the static type of the function argument;
  4. a virtual method "describe_components", which returns information about all components of the class.
All these are declared by the macro
METACLASS_DECLARATIONS(CLASS, BASE_CLASS)
The programmer merely has to implement a member function "describe_components", which provides information about all class instance variables. To make the work of describing the class variables more easy and error safe, five special macros are provided:
NO_FIELDS   the method should return this value if there are no variables in the class
FIELD(x)    describes an atomic or structural field 
ARRAY(x)    describes a fixed array type
MATRIX(x)   describes a two dimensional fixed array type
VARYING(x)  describes a varying array 
If there are structural components in the class, you should define a function "describe_field" to describe components of this structure:
class B_page : public object {
    friend class B_tree;
    enum { n = 1024 }; // Minimal number of used items at not root B tree page
    int4 m;            // Index of first used item

    struct item { 
        ref<object> p; // for leaf page - pointer to 'set_member' object
	               // for all other pages - pointer to child page
        skey_t      key;

        field_descriptor& describe_components() {
	    return FIELD(p), FIELD(key);
	}				
        inline friend field_descriptor& describe_field(item& s) {
            return s.describe_components();
	}
    } e[n*2];

    ...
  public: 
    METACLASS_DECLARATIONS(B_page, object);
};

field_descriptor& B_page::describe_components()
{
    return FIELD(m), ARRAY(e); 
}

REGISTER(B_page, object, optimistic_scheme);
The macro REGISTER can be used to define default implementations for the methods "classof" and "constructor". The macro can be used to create a class descriptor for the class:
REGISTER(CLASS, // name of the class
	 BASE,  // name of base class
	 MOP    // metaobject for this class
        ); 
It is possible for a template class to be persistent capable, but the programmer has to explicitly register different template instantiations of the database. The common way is to use the typedef operator to create an alias name of the concrete template instantiation and then register the class with this alias name in the database by means of the REGISTER macro.

Each storage has a predefined root object. To change the type of this abstract root object you should use the "become" method. To find out whether storage is already initialized, a special virtual method "is_abstract_root" is defined in the class "object". This method returns true if the type of the root object has not been changed yet and false otherwise.

The multitasking library requires some initialization before it can be used. The first statement in the main() function of the application should be the invocation of the static method task::initialize. The single parameter of this method specifies which stack size should be reserved for the main thread of the program.

Usually the sequence of steps to initialize storage may be:

class my_root : public object {
  public:
    ...
    METACLASS_DECLARATIONS(my_root, object);
  
    my_root() : object(self_class)
    {
	...
    }
    void initialize() const { 
        if (is_abstract_root()) {
            ref<my_root> root = this;
            modify(root)->become(new my_root);
	} 
    }
};
	
int main(int argc, char* argv[]) 
{
    task::initialize(task::huge_stack);
    database db;

    char* cfg_name = new char[strlen(argv[1])+5];
    sprintf(cfg_name, "%s.cfg", argv[1]);

    if (db.open(cfg_name)) {
        ref<my_root> root;
        db.get_root(root);
        root->initialize();
	
	... // do something with database 
	
        db.close();
    }
}
You can find a template for a simple database application in the file "template.cxx". All accesses to persistent objects should be encapsulated inside object methods. It is bad programming practice which may cause a runtime error, to directly reference some object component:
class text {
  public: 
    char str[1];
    ...
};

main () {
    ref<text> t;
    ...
    printf("text; %s\n", t->str); // !!! ERROR
}
The reason for such restriction is that the persistent object at any moment can be thrown away from memory by the cache replacement algorithm, when there is no active method for this object. Then it may happen that a smart pointer points to the wrong object. The correct implementation for the code above is:
class text { 
  protected: 
    char str[1];
  public:
    void print();
    ...
};

void text::print() 
{
    printf("text; %s\n", str);
}

main () {
    ref<text> t;
    ...
    t->print();
}
Avoid direct access to object components whenever it is possible: make all object instance variables protected and encapsulate the access to them within object methods. Encapsulation makes your application simpler and more flexible. With the GOODS C++ interface the performance is increased, because the access to self components within object methods requires no extra runtime overhead.

GOODS C++ API provides implicit memory deallocation (garbage collector) for persistent capable objects based on reference counters. It means that once number of references to the instance of persistent capable object becomes zero, it will be immediately deallocated. Smart pointer are responsible for maintaining reference counter for each object. Unfortunately garbage collector based on reference counters is not able to deallocate data structures with cyclic references (such as L2-list). It is not critical for persistent objects, because object cache replacement algorithm will in any case remove least recently used instances from the memory. But you should take care about transient data structures with cyclic references.

There are two dangerous issues related with usage constructors of persistent capable objects. As far as at the moment of the constructor execution there are no references (smart pointer) to the created instance, assigning this pointer in the constructor to some smart pointer can cause deallocation of the created object one this variable will be changed. For example, the following constructor will cause unexpected deallocation of object:

class X : public object{ 
    X() : object(self_class) { 
        ref x = this;
	x = NULL; <<<-- at this point this object will be removed!
	...  
    }
};
Another problem is specific for the constructor used for creation of root object. Root object is initialized in GOODS C++ API by means of become() operator, which just swaps references to two objects (predefined root object and new root objects). It means that all references to the new root object set in constructor will point to the other object (abstract root) after become. So do not set any references to created object in the constructor of root object. Also be care with assignment of this pointer in constructors of all persistent capable classes, and better avoid such assignments at all by performing all necessary initialization in some other method.

Starting from 2.37 version of GOODS, per-thread transactions are supported. In prior versions of GOODS only all threads share the same transaction. The transaction is committed only if there are no more active methods of persistent capable object in any thread. The counter of nested transaction invocations was static variable of basic_metaobject class, so all persistent capable objects (even if these objects belongs to the different database) share the single counter of nested transaction.

In 2.37 version of GOODS, special class cache_manager was introduced to contain static variables from basic_metaobject class used for transaction and cache management. By default the behavior is consistent with old versions of GOODS - all threads share the same transaction. But it is now possible to assign some particular instance of cache_manager to a thread or group of threads. database class has now two new methods: attach() and detach(). Each database class can have its own cache_manager. But unless attach() method is executed by the thread, default cache manager is used. The method attach associate cache manager of the particular database connection with the current thread. The method detach destroy this association. Invocation of detach method is not ncessary and this method was added only for csymetry and compatibility with Java API.

Per-thread transaction model is especially useful for servers, when one process handle remote connections with multiple clients. The actions performed by each client should be treated as separate transaction. To implement this model in GOODS version 2.37 and higher, it is necessary to create separate database connection for each thread (create instance of database class and open the database). Then thread should be attached to the database by database::attach() method. Each thread will have its own object cache, so there can be several instance of the object with the same OID in the application. Standard GOODS synchronization mechanism based on locks set by metaobjects is used to avoid conflict of accessing the same object by several concurrent threads. There are the following drawbacks of the proposed model:

  1. Inefficient use of client memory due to duplication of information - several copies of one object can be present in the application.
  2. Redundant load operations. Object will be fetched from the server even if instance of the object is available in the cache of other thread.
  3. Instead of having the single connection with server, multiple connections (one per thread) are maintained.
  4. Necessity of explicit attach and detach methods invocations - it is responsibility of programmer to associate database connection with thread.

This model of per-thread transaction is experimental and may be changed in future. Example illustrating per-thred thransactions:

void task_proc run(void*) { 
    database db;
    db.attach();
    if (db.open("myserver.cfg")) {
	...
        db.close();
    }
    done.signal();
}

int main() { 
    int i;
    task::initialize(task::normal_stack);
    for (i = 0; i < nThreads; i++) { 
	task::create(run);
    }
    for (i = 0; i < nThreads; i++) { 
	done.wait();
    }
    return 0;    
}
For real servers, it is better to maintain pool of threads. So the server will not spawn new thread and open database connection each time it receives request from the client (and destroying thread and connection after requests has been processed). Instead of it threads for handling client request are taken from the pool and return to the pool after the end of request processing. Each thread has permanently opened database connection. Such scheme can significantly increase total server throughput because of eliminating overhead of spawning new thread and especially establishing new database connection.

Version 2.37 of GOODS provides two additional optimization methods: fetching object clusters and bulk allocation mode. Both optimizations can be enabled at database level by set_alloc_buffer_size(size_t size) and enable_clustering(boolean enabled) methods.

When clustering is enabled, server will send to the client not only the requested object, but also objects directly referenced from the requested object. The total size of objects in cluster is limited by server parameter for maximal cluster size. By default the value of this parameter is 500 bytes.

The bulk allocation mode allows to reduce number of messages send between client and server and also improve reference locality. If size of allocation buffer is set to non zero value, then the server will reserve requested number of OPIDs for the client. So allocation of object (assigning OPID to the object) can be done immediately by the client without sending requests to the server. At the end of transaction or when the end of the allocation buffer is reached, one request is sent to the server to allocate all objects with reserved OPIDs. Allocation is performed as the single continuous segment, so all objects created within transaction will be placed near each other. When the client is detached from the server (normally or abnormally) cleanup is performed and all reserved OPIDs are reclaimed.

The GOODS C++ interface provides a library of some widely used container classes for efficient access to persistent data. The following subsections briefly describe these classes.

Dynamic arrays

The GOODS interface for C++ provides a template class for dynamic arrays. The template parameter should be either a builtin primitive type (nat1, int4, real8...) or a persistent object reference. It is important to notice that template instances must be explicitly registered in the database by the REGISTER macro. The following array template instantiations are defined and registered in "dbscls.h": ArrayOfByte, ArrayOfInt, ArrayOfDouble and ArrayOfObject. If you want to use an array of some other component type, you should first create a type alias by means of the typedef operator and then register this type in the database:

     typedef array_template ArrayOfShort;
     REGISTER(ArrayOfShort, object, pessimistic_repeatable_read_scheme);
The dynamic array template provides methods for direct access to array components:
        T operator[](nat4 index) const;
	T getat(nat4 index) const;
        void putat(nat4 index, T elem);
Methods for getting the number of components in the array, for copying and appending array components are also available. The dynamic array also implements a stack protocol by providing such methods as Push(T value), Pop(), Top(). The methods insert(int index, int count, T value) and remove(int index, int count) can be used to add or remove dynamic array components.

The class String is implemented as a subclass of the ArrayOfChar class and defines extra methods for string manipulation.

Sets

The GOODS class library provides CODASYL-like sets to represent one-to-many and many-to-many relationships between objects. The set consists of one owner and many member components. The set owner is represented by the set_owner class and provides methods for insertion/removal of members to/from the set and iteration through the set members. The set members are accessed through the set_member class, which contains a member key and a virtual function to calculate the short key representation (used in the B-tree). Both the set_member and the set_owner class contain a pointer to the attached object, so the object can be a member of some sets and an owner of some other set at the same time.

B*-tree

The B-tree is the classical data structure for DBMS. It minimizes the number of disk read operations needed to locate an object by key and preserves the order of elements (range requests are possible). Also the maintenance of a B-tree can be done efficiently (insert/remove operations have log(N) complexity).

In the classical implementation of the B-tree, each B-tree page contains a set of pairs <key, page-pointer>. The nodes at the page are ordered by key, so a binary search can be used to locate an item with greater or equal key. In the B*-tree, pointers to members are stored only in leaf pages of the B-tree. All other pages contain pointers to child pages.

The B*-Tree in GOODS is implemented as a subclass of the set_owner class. Pointers in leaf pages of the B*-tree refer to objects of the set_member class, which contain references to the objects included in the B-Tree. Nodes of the B*-tree pages contain a short form of key , which can be calculated from the object key by a virtual method of the set_member class (using the nat8 type, it usually consists of just the first 8 bytes of the original key). Such structure allows objects to be included in several B_trees and also makes the search operation more effective, because only small set_member objects are accessed during the search (if there are several objects with the same value of the short key). The B_Tree class defines methods for inserting new objects into the tree, removing objects from the tree and searching objects by the key.

Hash Table

The class hash_table provides fast, almost constant time access to the object by the key. The GOODS class library for C++ provides the implementation of a non-extendable hash table which operates with string keys. The hash table can be effectively used if the upper limit of objects in the hash table is known, does not exceed the size of the table more than several times and the hash table fits into the operating memory.

H-Tree

The H-Tree is a combination of hash table and index tree. It can be used when the size of the hash table is too large to represent the table as a single object (as an array of pointers). The H-Tree algorithm first calculates the normal hash key and then divides it into several groups of bits. The first group of bits is used as an index in the root page of the H-Tree, the second group of bits as an index in the page referred from the root page, and so on... If e.g. the size of the hash table is 1000003, then the H-tree with pages, containing 128 pointers, requires access to three pages to locate any object. Since a reference in GOODS is 6 bytes long, the total size of the loaded objects is 2304 bytes (128*6*3) instead of 6Mb when using the hash_table class.

Blob

Most modern database applications have to deal with large objects, used to store multimedia and text data. The GOODS class library has a special class Blob to provide an efficient mechanism for storing/extracting large objects. Since loading large objects can consume significant time and memory, the Blob object allows subdividing large objects into parts (segments), which can be accessed sequentially. Moreover, the Blob object uses the multitasking model of GOODS, which makes it possible to load the next part of the Blob object in parallel while handling (playing, visualizing,...) the current part of the Blob. Such approach minimizes delays caused by loading objects from the storage.

R-Tree

The R-tree provides fast access to spatial data. Basically the idea behind the R-Tree and the B-Tree are the same: use a hierarchical structure with a high branching factor to reduce the number of disk accesses. The R-tree is the extension of the B_tree for a multidimensional object. A geometric object is represented by its minimum bounding rectangle (MBR). Non-leaf nodes contain entries of the form (R,ptr) where ptr is a pointer to a child node in the R-tree; R is the MBR that covers all rectangles in the child node. Leaf nodes contain entries of the form (obj-id, R) where obj-id is a pointer to the object, and R is the MBR of the object. The main innovation in the R-tree is that the father nodes are allowed to overlap. By this means the R-tree guarantees at least 50% space utilization and remains balanced. The first R-tree implementation was proposed by Guttman. The GOODS R_tree class is based on Guttman's implementation with a quadratic split algorithm. The quadratic split algorithm is the one that achieves the best trade-off between splitting time and search performance.

API for development Web applications

New version of GOODS provides API for developing WWW applications. It is very easy to perform Web database publishing with GOODS. GOODS server can either communicate with standard WWW server by means of CGI requests, or it can serve HTTP requests itself.

Interaction with Web server is based on three-tier model:

    Web Server   ->     CGI stub     ->    GOODS application
             CGI call          local socket connection  
Using GOODS built-in HTTP server provides maximum performance, because in this no communication and process creation overhead takes place. In both cases the same API for receiving and unpacking requests and constructing responses is used. So the same application can be used for interaction with external Web server as well as stand-alone HTTP server.

GOODS application is request-driven program, receiving data from HTML forms and dynamically generating result HTML page. Classes WWWapi and WWWconnection provide simple and convenient interface for getting HTTP requests, constructing HTML page and sending reply back to WWW browser. Abstract class WWWapi has two implementations: CGIapi and HTTPapi, first of which implements protocol of interaction with Web server by means of CGI mechanism, and the second - protocol of direct serving HTTP requests.

Built-in HTTP server is able to handle two types of requests - transfer HTML file find in place relative to the current working directory in response to GET HTTP request and perform action specified by GET or POST requests with parameters. Built-in HTTP server provides persistent connections - server will not close connection with client immediately after sending response, instead of this connection will be kept during some specified interval of time. Also this built-in server supports concurrent requests processing by several threads of control. But starting of threads should be performed by client application.

Virtual method WWWapi::connect(WWWconnection& con) accept clients connection (either from CGISTUB program of from WWW browser). This method returns true if connection is established. In this case programmer should call CGIapi::serve(WWWconnection& con) to receive and handle client's requests. This method return false if and only if handler of request returns false. Even if request was not correctly received or could not be handled, true is returned by serve method. The connection is always closed after return from serve method. It is possible to start separate thread for exceution of each server method.

To construct responce to the request special overloaded >> operators are provided in WWWconnection class. First line of response should specify type of response body, for example:

Content-type: text/html\r\n\r\n
Two CR-LF character after this line separate HTTP header from the body. Three encoding schemes can be used for constructing response body:
  1. TAG - used for specifying HTML control elements. No conversion is done for this encoding.
  2. HTML - with this encoding output characters which are special for HTML (&Ltd; > & &qout;) are replaced with special symbolic names (&qout;lt; &qout;gt; &qout;amp; &qout;qout;).
  3. URL - used for specifying call parameters in URL format. Spaces are replaced with '+' character, all other special characters with their hex code.
To make switching between encoding more convenient, WWWconeection class performs automatic switching between encodings. Initially TAG encoding is always used. Then encodings are implicitly changed using the following rules:
              TAG  -> HTML
              HTML -> TAG
              URL  -> TAG
It certainly possible to explicitly specify encoding for the next output operation by means of special << operator, which accepts one of the following constants: TAG, HTML, URL.

Information about HTML form slots values or request parameters can be obtained using WWWconnection::get(char const* name, int n = 0) method. Optional second parameter is used only for getting value of selectors with multiple selection allows option. If parameter with such name is not found, NULL is returned. There are some mandatory parameters which always should be present in all forms handled by GOODS:

Parameter nameParameter Description
socketaddress of the server, used for constructing new links
pagesymbolic name of the page, used for request dispatching
stubname of CGI stub program always defined by API

Running GOODS applications

Database configuration file

To specify the configuration of the database you should create a configuration file using the following format:
<number of storages = N>
<storage identifier 0>: <host name>:<port0>
...
<storage identifier N-1>: <host name>:<portN-1>
The storage identifiers should be successive integer numbers,which are used as indices in the array of storages. You can find examples of this configuration file in: "unidb.cfg" and "guess.cfg". In a distributed environment, a configuration file can be accessed from the server computer using some network file system protocol (for example NFS) or can be replicated to client computers.

Server monitor GOODSRV

To run a database you should first start all storage servers at each node of the net as specified in the configuration file. You can write a server program yourself. GOODS provides a standard server implementation named "goodsrv" supporting some basic monitoring functions. To run this program you should specify the name of the database, which should be equal to the name of the configuration file without extension (the extension is assumed to be ".cfg"). The first line of this configuration file specifies the number of storages in the database. Each successive line specifies locations of database servers. The line consists of three fields separated by colons: storage identifier, host name and port number.

Parameters for GOODSRV can be specified in one of two files: "goodsrv.cfg" and "database,srv". The first one specifies parameters common for all servers, the second specifies parameters of the server of the specific database. If a parameter is defined in both configuration files, then the value of the parameter from "database,srv" is used. If a parameter is not specified in the configuration files, then default values will be used. The following table describes all available parameters:

ParameterTypeUnitDefault valueMeaningSet
memmgr.init_map_file_sizeintegerKb8192 Initial size of memory map file. Increasing size reduces the number of memory map reallocations. -
memmgr.init_index_file_sizeintegerKb4096 Initial size of index file. Increasing size reduces the number of index reallocations. -
memmgr.gc_init_timeoutintegerseconds60 Timeout for initiation of GC process. A GC coordinator waits for replies from other servers for GC initiation request during the specified period of time. +
memmgr.gc_response_timeoutintegerseconds86400 Timeout for acknowledgment from a GC coordinator to finish the mark stage and perform the sweep stage of GC. If no response is received from the GC coordinator within this period, GC will be aborted at the server. +
memmgr.gc_init_allocatedintegerKb1024 Size of allocated memory since last GC, after which the next garbage collection process will be initiated. +
memmgr.gc_init_idle_periodintegerseconds0 If > 0: specifies the idle period interval, after which GC will be initiated, if the memory management server receives no request during this period of time. +
memmgr.gc_init_min_allocatedintegerKb0 Minimal size of allocated memory to start GC in idle state (see previous parameter). GC will be initiated only if the idle period timeout is expired and more than gc_init_min_allocated memory was allocated since last GC. +
memmgr.gc_grey_set_thresholdintegerreferences1024 Maximal extension of GC grey references set: reaching this number of references, the optimization of the order of taking references from the grey set (to improve reference locality) is disabled and a breadth first order of object reference graph traversal is used. +
memmgr.max_data_file_sizeintegerKb0 If > 0: limits the size of the storage data file. After reaching this value, GC is forced and all allocation requests are blocked until enough free space is collected. +
memmgr.max_objectsintegerobjects0 If > 0: limits the number of objects in the storage. After reaching this value, GC is forced and all allocation requests are blocked until some object will be collected by GC. +
memmgr.map_file_namestring-"*.map" Name of file with memory allocation bitmap. -
memmgr.index_file_namestring-"*.idx" Name of file with object index. -
transmgr.sync_log_writesboolean0/11 Enable/disable write-ahead transaction logging. -
transmgr.permanent_backupboolean0/10 If == 0: a snapshot backup type is used. Backup is terminated after saving a consistent state of the database and checkpoints will be enabled.

If == 1: a permanent backup type is used. Backup terminates and forces a checkpoint after saving all records from the transaction log. This type can be used to ensure, that storage can be restored when loosing the storage data file after a fault.

+
transmgr.max_log_sizeintegerKb8192 Size of transaction log: reaching this value starts a checkpoint. After checkpoint completion, writing to the log file continues from the beginning. +
transmgr.preallocated_log_sizeintegerKb0 Forces the transaction manager to preallocate the log file and doesn't truncate it after checkpoint. In this case the file size should not be updated after each write operations and the transaction performance is increased about 2 times. +
transmgr.wait_timeoutintegerseconds600 Timeout for committing a global transaction. A coordinator waits for replies from other servers participated in a global transaction until expiration of this timeout. +
transmgr.retry_timeoutintegerseconds5 Timeout for requesting the status of a global transaction from a coordinator. When a server performs recovery after crash, it needs to know the status of a global transactions, in which it has been participated, So it polls coordinators of global transactions, using this timeout as interval for resending request to coordinators. +
transmgr.checkpoint_periodintegerseconds0 Time interval between two checkpoints. A checkpoint can be forced either by exceeding some limit value of transaction log size or after some specified period of time (if this timeout is non-zero). +
transmgr.dynamic_reclustering_limitintegerbytes0 Enable/disable dynamic reclustering of objects. If dynamic reclustering of objects is enabled, all objects modified in a transaction are sequentially written to a new place in the storage (with the assumption that objects modified in one transaction will be also accessed together in future). This parameter specifies a maximal size of objects for dynamic reclustering. If == 0: disables reclustering. +
transmgr.log_file_namestring-"*.log" Name of local transaction log file. -
transmgr.history_file_namestring-"*.his" Name of global transaction history file. -
objmgr.lock_timeoutintegerseconds600 Deadlock detection timeout. If a lock can't be granted within specified period of time, the server considers that deadlock takes place and aborts one or more client processs to destroy deadlock. +
poolmgr.page_pool_sizeintegerpages4096 Size of page cache. Increasing this value will improve performance of disk IO operations. -
poolmgr.data_file_namestring-"*.odb" Name of storage data file. -
server.admin_telnet_portstringhostname:port"" Specifies the port upon which goodsrv will serve administrative interactive dialog sessions. If you specify this option, you can run goodsrv as a background task. Administrative functions, such as scheduling backup tasks, etc. can be performed by telnetting to the specified port. -
server.cluster_sizeintegerbytes512 Maximal size of object cluster to be sent to client. A server can perform optimization of sending objects to clients. Instead of sending one object for each request, it can send several objects (cluster of objects), including into the cluster objects referenced from the requested object (but only if total size of objects will not exceed cluster_size parameter). +

  1. The last column of this table marks parameters, the values of which can be changed by the set command in interactive dialogue mode.
  2. The symbol * used in the file names in this table stands for the concrete database name. This name is an argument passed to GOODSRV.

Other arguments of "goodsrv" are:

goodsrv <storage name> [<storage-id> [<trace-file-name> | "-"]]
By default "goodsrv" starts in interactive dialogue mode, allowing the user to execute some administrative database operations as well as to see database usage parameters. Also if the GOODS server was compiled with the trace option, trace information will be shown at the terminal. If you specify a file name as the last argument to "goodsrv", messages will be saved in this file. Below is a list of all valid commands for the "goodsrv" monitor:
help [COMMAND]                        print information about command(s)
open				      open database
close				      close database
exit				      terminate server
log [LOG_FILE_NAME|"-"]               set log file name
show [CATEGORIES]                     show current server state
monitor PERIOD [CATEGORIES]           periodical monitoring of server state
backup FILE_NAME [TIME [LOG_SIZE]]    schedule online backup process
stop backup			      stop backup process
restore BACKUP_FILE_NAME              restore database from the backup
trace [TRACE_MESSAGE_CLASS]           select trace messages to be printed
set PARAMETER=INTEGER_VALUE           set server parameter value
rename OLD_CLASS_NAME NEW_CLASS_NAME  rename class in the storage
rename CLASS COMPONENT_PATH NEW_NAME  rename component of the class
CATEGORIES is the set of "servers clients memory transaction classes". By default all categories are shown.

PERIOD interval in seconds of printing statistic information

TIME is the interval of time in seconds and LOG_SIZE is the transaction log size. After reaching one of these parameters, a backup procedure will be started. Default values for these parameters are 0, which means that backup will be started immediately.

TRACE_MESSAGE_CLASS is a space separated list of message classes. Only messages belonging to one of the specified classes will be shown. The complete list of message classes can be obtained by the "help trace" command.

PARAMETER is a valid server parameter (see table above). Execute the "help set" command to obtain a list of all valid parameter names.

If you would like to run goodsrv as a background task, set the "server.admin_telnet_port" parameter in your configuration file. GOODSRV will not output any messages to stdout; instead, it will log all output and provide a telnet session on the specified hostname and port. By enabling this option, a database administrator can monitor and maintain a database remotely.

Here's an example setting for the server administration telnet port:

server.admin_telnet_port="localhost:8920"

Once the server is started with this parameter set, a database administrative session can be accessed by issuing the following command:

telnet localhost 8920

Database browser

The database browser is a program which allows you to dump fields of objects in the database. I want to make this program as simple as possible and also system independent. The primary idea of writing this program is to show how metainformation can be extracted from a database. There are two versions of the browser: a console application and a CGI version using a WWW browser to navigate through objects.

The browser can dump values of fields of all objects (including classes) stored in database storages. The console version of the database browser "browser.cxx" requires a single command line argument, which specifies the name of the database configuration file. It is not necessary to specify all servers in this configuration file. You only have to describe those storages, objects from which you want to inspect. Certainly the servers of these storages should have been started before you run the "browser" application. To refer an object, you should specify the storage and the object identifier. It is possible to see a list of available commands by typing "help".

The CGI version of the browser "cgibrows.cxx" provides a much more user friendly interface. To be able to use this browser you need some WWW server running at the computer where the database storage is located. For example it can be the Microsoft Peer WebServer, included in the NT-4.0 distribution, or the Apache free WWW server. You have to do some preparation before you can use the browser. You should edit a file "browser.htm" (name can be changed) to specify the name of your host (localhost is possible) and the path to the cgibrows program (cgibrows.exe in Windows). This program can be placed either in some default directory for CGI scripts used by the WWW server, or it is possible to register another directory in the WWW server. It is preferable to have the cgibrows program in the same directory as the database. Otherwise you should specify the full absolute path to the directory where the database is located when opening the browser. Be sure that the user, under which name the CGI script will be executed, has enough permissions to access files and ports at the local computer (GUESS by default has no such permissions).

More preparations the WWW browser for GOODS does not need. Now you can open in your favorite WWW browser the page "browser.htm" and specify the database name and storage number. Database name is the name of the configuration file without the extension. If this file and the cgibrows program are located in different directories, you should specify the absolute path to this file. By default, the storage with identifier 0 will be opened. To start the browser press the Open button. The browser will dump fields of the root object. The browser outputs non-NULL-reference fields as the pair: storageID:objectID. To navigate through objects, click on the reference field. Do not forget the Back, Forward and Go buttons, which can help you in navigation. You can browse the database from any computer which can access the WWW server.

Bug tracking database

Example "Bug tracking database" illustrates developing Web application using GigaBASE and WWW API. It can be used either with any WWW server (for example Apache or Microsoft Personal Web Server) or with built-in HTTP server. To compile BUGDB for interaction with external server, define macro USE_EXTERNAL_HTTP_SERVER. Database can be accessed from any computer running some WWW browser. To build bugdb application in Unix you should specify www target to make utility.

To run this BUGDB with external WWW server you should first customize your WWW server. It should be able to access buglogin.htm file and run CGI script cgistub. Also user, under which CGI scripts will be executed, should have enough permissions to establish connection with GigaBASE application (by sockets). It is better to run GigaBASE application and GigaBASE CGI scripts under the same user. For example, I have changed the following variables in Apache configuration file:

httpd.conf:
        User konst
        Group users

access.conf:
        <Directory /usr/konst/gigabase>
        Options All
        AllowOverride All
        allow from all
        </Directory>

        DocumentRoot /usr/konst/gigabase

srm.conf:
        ScriptAlias /cgi-bin/ /usr/konst/gigabase/
It is also possible not to change configuration of WWW server, but place cgistub and bugdb programs in standard CGI script directory and change in the file buglogin.htm path to the cgistub program. After preparing configuration files you should start WWW server.

No configuration is needed when you are using built-in HTTP server. Just make sure that user has enough permission to access port number 80 (default port for HTTP server). If some HTTP server is already started at your computer, you should either stop it or specify another port for built-in HTTP server. In last case you also need to specify the same port in the settings of WWW browser to make it possible to establish connection with right HTTP server.

After starting bugdb application itself you can visit buglogin.htm page in WWW browser and start to work with BUGDB database. When database is initialized, "administrator" user is created in the database. First time you should login as administrator using empty password. Than you can create some other users/engineers and change the password. BUGDB doesn't use secure protocol of passing passwords and doesn't worry much about restricting access of users to the database. So if you are going to use BUGDB in real life, you should first think about protecting database from unauthorized access.

Running GOODS examples

You can try to play with some examples of GOODS application programs:
guess.cxx    - game "Guess an animal",
unidb.cxx    - "university" database
testblob.cxx - work with binary large object
tstbtree.cxx - program for measuring of GOODS performance
The program "guess.cxx" is the simplest database application using an optimistic approach for synchronization. This program creates a binary tree with information about animals (or anything else which you have entered). To run this program you should first start the database server by the following command:

> goodsrv guess
Then you can start any number of client sessions by running the program guess in separate windows. Optimistic approach in this application means: should - while you are answering program questions - another user have performed an update of the same branch of the tree, you have to start from the beginning.

The program "unidb.cxx" is a sample GOODS application working with a distributed database. The structure of this database is very simple. The university database contains a B_tree of students and a B_tree of professors. Each student object has a diploma component and is attached to one professor (his advisor). A professor owns an unordered set of students (group) attached to him. Students can be added and removed; or reattached from one professor to another. A professor can be added and removed, except he is someone's tutor. This application shows a simple menu, allowing the user to select an action and also uses the GOODS change notification mechanism to handle database modifications done by other clients (each time a student or professor is added or removed by some database client, the total number of students and professors in the university is updated and shown at the top of the menu).

Objects in this application are distributed between two storages: the student's storage (0) and the professor's storage (1). All student objects (and objects referenced from them, such as "string" and "set_member" objects) are placed in storage 0. All professor objects are placed in storage 1. To run this application, you need to start two servers:

terminal-1> goodsrv unidb 0
terminal-2> goodsrv unidb 1
Then you can start any number of clients by running the program "unidb" without any arguments at different windows.

Dealing with large multimedia objects is shown in the sample "testblob" application. Two classes from the GOODS class library are used in this application: the blob class and the hash_table class. Class blob provides incrementally loading big binary objects in parallel with its handling (for example you can unpack and transfer data from buffer to audio device while the next part of the object is loaded from the database). Also this application tests multitasking at client's site. The program "testblob" is very simple and has only few commands, allowing you to insert, extract and remove files in/from the database. Type the "help" command to learn more about the command syntax. To run this application you should activate the database server by issuing the command "goodsrv blob". Then start the application itself.

One of the most effective structures for storing spatial objects is the R-tree (proposed by Guttman). The GOODS library contains an "R_tree" class which implements Guttman's quadratic split algorithm. To test this class implementation, a very simple model of a spatial database is developed: the "tstrtree". All objects in this program are placed in R-trees and/or H-trees (a combination of B*-tree and hash table) and can be accessed either by name or by coordinates. The configuration file for this program is "rtree.cfg". So issue the command "goodsrv rtree" to start the server; and then the command "tstrtree" to run the test program.

Measuring GOODS performance

Program "tstbtree" simulates parallel work of several clients with the database. To run this program, you should first start the "goodsrv" server with the following parameters:
> goodsrv btree 0
and then run some amount of client applications using the "spawn" utility, for example:
> spawn 32 8 tstbtree
This command will 32 times invoke the "tstbtree" program, and at most 8 instances of this program will be executed simultaneously. Each instance of the "tstbtree" program inserts 100 records of randomly distributed size in the range 6..1030 bytes in one of four B-trees. Then 10 times it repeats a loop to search each of these records in the B-tree. Finally it removes all created records from the B-tree. After the end of the test there should be no records in the database, so it is possible to run the test once again. Because not the performance of the B-tree implementation itself but the performance of the GOODS server is measured, it was decided to reduce the size of the B-tree page to 4 entries to increase the number of objects participated in the transactions.

It is interesting to investigate the dependence between the number of programs executed in parallel and the total system performance. In theory the best result should be obtained when there are exactly 4 concurrent applications (mostly they all will work with different B-trees and no synchronization between them is necessary). If there are more than 4 applications running in parallel then a lot of notification messages used to synchronize the caches of these applications will reduce the total system performance. The following table contains results (elapsed time in seconds of test execution) for some systems:

GOODS transaction performance test
Parallel processes AlphaServer 2100 2x250 DigitalUnix 4.0 portable AlphaServer 2100 2x250 DigitalUnix 4.0 pthreads PowerPC 120 MkLinux portable PPro-200 Linux portable PPro-200 WinNT 4.0 SPARCstation-20 2x50 Solaris 2.5 pthreads PPro-233 FreeBSD 3.0 portable UltraSparc 2x300 pthreads
1 239 227 121 73 57 349 117 130
2 226 187 124 95 47 339 116 119
4 221 112 130 66 30 178 65 70
8 233 114 156 67 37 188 68 71
16 240 124 247 68 44 209 68 74

Graphic representation of these results

The time of this test execution mostly depends on the time needed for synchronous writes to the transaction log. Improving the system performance in case of executing client requests in parallel is mostly obtained by merging synchronous write requests. The following table shows the average number of merged synchronous writes for different systems and number of clients running in parallel:

Parallel transaction log writes
Parallel processes AlphaServer 2100 2x250 DigitalUnix pthreads PPro-200 WinNT 4.0
2 1.009 1.854
2 1.845 2.948
8 1.805 1.953
16 1.758 1.767

Graphic representation of these results

It is also interesting to measure the database performance when the transaction log synchronous writes option is switched off. In this case mainly the efficiency of the implementation of the GOODS server components and their interactions is measured.

GOODS transaction performance test with asynchronous writes
Parallel processes AlphaServer 2100 2x250 DigitalUnix 4.0 portable AlphaServer 2100 2x250 DigitalUnix 4.0 pthreads PPro-200 Linux portable PPro-200 WinNT 4.0 SPARCstation-20 2x50 Solaris 2.5 pthreads PPro-233 FreeBSD 3.0 portable UltraSparc 2x300 pthreads
1 28 47 19 22 102 29 40
2 23 30 21 13 87 26 27
4 21 30 17 17 90 21 27
8 24 41 21 21 137 21 31
16 39 56 24 33 174 24 37

Graphic representation of these results

There are also two programs (not working with the database) which can be used for testing and measuring the performance of socket and multitasking libraries: "testsock.cxx" and "testtask.cxx".

The socket test is performed only with local clients, i.e. server and client are at the same computer. This test consists of two programs (client and server) which interact with each other in the same way as in normal GOODS applications. The client sends 1000000 requests to the server, waiting for a server response for each request. Each client request consists of two parts: header and body. So sending a request to the server requires two socket write operations. This program mostly measures the efficiency of the socket library implementations at concrete systems (UNIX_DOMAIN sockets in Unix). But in case of Windows-NT/95, the performance of the GOODS local sockets implementation is tested. The table below represents results for different systems (elapsed time in seconds of test execution):
Performance of socket library
AlphaServer 2100 2x250 DigitalUnix 4.0 PPro-200 WinNT 4.0 WinSockets PPro-200 WinNT 4.0 GOODS local sockets PPro-200 Linux SPARCstation-20 2x50 Solaris 2.5 PPro-233 FreeBSD 3.0 portable
374 275 25 109 541 86

The program testtask measures the performance of the multitasking library. This test simply starts a number of tasks (threads). Each of them performs the following loop: wait for signal, enter critical section and wake up (i.e. send signal to) another task. The number of loops, the total number of tasks and the number of concurrently running tasks (number of tasks initially signaled) are parameters which can be specified from the command line. Default values for these parameters (these values were used to obtain the results below) are: loops - 50000, tasks - 20, activations - 4. The following results (elapsed time in seconds of test execution) were measured at different platforms:

Performance of GOODS multitasking library
AlphaServer 2100 2x250 DigitalUnix 4.0 portable AlphaServer 2100 2x250 DigitalUnix 4.0 pthreads PPro-200 WinNT 4.0 PPro-200 Linux portable SPARCstation-20 2x50 Solaris 2.5 pthreads PPro-233 FreeBSD 3.0 portable
3 29 11 2 65 2

Installation of GOODS

Compilation of GOODS sources

GOODS is now running under Windows-NT/95 and various Unix dialects. The system specific code is encapsulated within few files and is accessed by abstract system independent interfaces: "task.h", "sockio.h" and "file.h". There are several system specific implementations for these interfaces. To achieve maximal performance, advanced features of modern operating systems are used, such as memory mapped files, gathered io (writev), threads. That can be a source of problems when porting GOODS to some old Unix dialects.

To build GOODS you should execute the config script in the source directory. You can specify the name of your system or let the configuration script try to guess the target system itself. The configuration script only copies one system specific makefile version to the file "makefile". There is a common makefile for all Unix systems "makefile.uni" containing targets and rules. All system specific makefiles only define some parameters (such as CC, CFLAGS...) and include the "makefile.uni". Apart from the name of the C++ compiler and compiler flags, systems mainly differ in which type of multitasking library is used by GOODS (portable, implemented by setjmp/longjmp; or based on Posix pthreads).

It is not necessary to run configuration scripts at Windows. I am using Microsoft Visual C++ 5.0 for compiling GOODS. The makefile for Windows has the name "windows.mak". Compiler dependent definitions are collected in "makefile.inc" file and rules are placed in "makefile.win" files. There is a special MAKE.BAT file which invokes NMAKE and specifies the name of the makefile.

GOODS itself is not using C++ exception mechanism. If you want to use exceptions in your application you should add -DGOODS_SUPPORT_EXCEPTIONS option to the DEFS macro in makefile and compile GOODS with this option. Only this case GOODS will be able correctly perform cleanup when exception caused by database error or thrown by user function is generated. If you want to abort transaction instead of committing it when thrown exception is not caught within any method of persistent capable object, then you should derive all application exception classes from dbException class. In this case standard metaobjects will abort the current transaction.

Marc Seter has incorporated support of DLLs into GOODS. So, for anyone who wants to link the client and/or server library into an MFC-based application, do the following:

  1. Compile clntmfc.dll, clntmfc.lib, clntmfcd.dll and clntmfcd.lib using the MSVC 5.0 makefile in goods/clntmfc. This makefile should work fine with MSVC 6.0 as well.
        C:\GOODS\CLNTMFC> nmake
        C:\GOODS\CLNTMFC> nmake BUILD=RELEASE
        
  2. Compile srvrmfc.dll, srvrmfc.lib, srvrmfcd.dll and srvrmfcd.lib using the MSVC 5.0 makefile in goods/srvrmfc. This makefile should work fine with MSVC 6.0 as well.
        C:\GOODS\SRVRMFC> nmake
        C:\GOODS\SRVRMFC> nmake BUILD=RELEASE
        
  3. Link your DLL or EXE against clntmfc.lib (the DLL export library) for your Release configuration.
  4. Link your DLL or EXE against clntmfcd.lib (the DLL export library) for your Debug configuration.
  5. Link your server-side DLL or EXE against srvrmfc.lib (the DLL export library) for your server's Release configuration.
  6. Link your server-side DLL or EXE against srvrmfcd.lib (the DLL export library) for your server's Debug configuration.
  7. Define __GOODS_MFCDLL in your DLL or EXE (both Debug and Release configurations).
  8. Be sure to include the .dll files in the same directory as your project's .EXE.
The MSVC 5.0 .dsp and .dsw files are provided for both client and server, but I highly recommend against using them if you wish to run your programs in debug mode. MSVC's IDE builds both debug and release versions of the libraries using the same name (i.e., clntmfc.dll) and it can be very wearing on the nerves keeping them separated. Ever wonder why MFC42.DLL and MFC42D.DLL exist?

After configuration just issue the make command to build GOODS client and server libraries and administative utilities. To build sample GOODS applications and tests execute make test command. To build JavaAPI execute make jar command. You can build all this targets by execution make all command. It is also possible to invoke make localy in src, examples or java directory. The command make install at Unix will copy header files, libraries and utilities to the location specified by $(INC_INSTALL_PATH), $(BIN_INSTALL_PATH), $(BIN_INSTALL_PATH) variables defined in system dependent makefile.

Description of GOODS sources

async.cxx
Asynchronous event manager for portable multitasking library. Task with priority 0 repeatedly calls Unix KBD>select function to find out channels ready to input/output.
async.h
Specification of the asynchronous event manager.
browser.cxx
Simple database browser. This application shows how metainformation can be extracted from database storage.
bugdb.cxx
Implementation of Bug Tracking Database - example of Web publishing of GOODS database.
bugdb.h
Definition of classes of Bug Tracking Database.
wwwapi.cxx
Implementation of HTTP server interface for the GOODS database applications.
wwwapi.h
Interface to the HTTP server for the GOODS database applications.
cgibrows.cxx
CGI version of database browser. Using this CGI program you can browse database objects from WWW browser if WWW server is installed at your computer.
cgistub.cxx
CGI script used as gateway between WWW server and GOODS application.
class.cxx
Classes to support class information for client application at runtime. Methods of this classes are responsible for building class and field descriptors, conversion of object instances when from storage format to application representation and visa versa.
class.h
Definition of classes providing reflection property to client application.
classmgr.cxx
Storage class manager implementation.
classmgr.h
Interface for storage class manager.
client.cxx
Implementation of application independent client interface with storage.
client.h
Definition of the database storage interface for clients.
config.h
Definition of some global types and constants used in GOODS.
console.cxx
Implementation of GOODS console interface. These methods perform input and output of data from/to terminal.
console.h
Definition of the GOODS console interface.
convert.h
Definition of functions for (un)packing atomic types from storage format (big endian, unaligned) to application representation.
ctask.cxx
Implementation of the portable non-preemptive multitasking library, using setjmp/longjmp function for context switching.
ctask.h
Definition of classes used by portable non-preemptive multitasking library.
database.cxx
Application dependent part of the client interface with database. This part of the interface is responsible for synchronizing the access to the database, packing/unpacking objects, handling server messages.
database.h
Definition of the application dependent part of client interface with database.
dbscls.cxx
Implementation of the application database classes: set, B-tree, hash table, blob, dynamic arrays and strings.
dbscls.h
Collection of application database classes.
file.h
Abstract file interface. This interface provides operating system independent methods for working with files.
goods.h
Main include file for GOODS client application.
goodsrv.cxx
Simple database storage server program. This storage server is powerful and flexible enough for been used in various applications.
guess.cxx
Sample GOODS application: game "Guess an animal".
memmgr.cxx
Implementation of the GOODS storage server memory manager with distributed and incremental garbage collector.
memmgr.h
Abstract interface for the server memory manager.
mmapfile.h
Interface for mapped on memory file.
mop.cxx
Implementation of basic metaobjects. Using this metaobject which covers most common database access patterns, you can derive your own metaobject, satisfying requirements of concrete applications.
mop.h
Metaobject protocol definition. This protocol is used in GOODS for controlling object access synchronization aspects, transactions, and object cache management.
multfile.cxx
Implementation of file consisting of several physical segment (operating system files). Such multifile makes it possible to overcome operating system limitation for maximal file size.
multfile.h
Definition of the multifile class.
object.cxx
Implementation of the object class: the base class for all GOODS objects. The object index, which is used for indirect object access, is implemented in this file too.
object.h
Definition of the object class - top class in object hierarchy.
objmgr.cxx
Implementation of the GOODS storage server object access manager. This manager is responsible for handling object locks and notification of clients about object instance deterioration.
objmgr.h
Definition of the storage server object access manager interface.
osfile.h
Definition of the file class corresponding to operating system file. There are several implementation for this class for different platforms.
poolmgr.cxx
Implementation of the GOODS storage server page pool manager,which is responsible for efficient work with database files.
poolmgr.h
Definition of the storage server page pool manager interface.
protocol.cxx
Implementation of the client-server protocol methods.
protocol.h
Definition of the protocol used for client-server and server-server communication.
ptask.cxx
Implementation of the multitasking library using Posix threads.
ptask.h
Definition of classes used in multitasking library for Posix threads.
refs.h
Definition of smart pointers for GOODS C++ interface.
rtree.h
Definition of the R-tree class (effective search structure for spatial objects)
rtree.cxx
Implementation of the R-tree class
server.cxx
Database server methods implementation. This server is responsible for the coordination of work of all storage managers and interaction with clients.
server.h
Definition of the database storage server class.
sockio.h
Abstract socket interface. A socket is a reliable bidirectional connection used for implementation of client-server and server-server communications.
spawn.cxx
Auxiliary utility for spawning sever parallel applications. This utility can be used for testing GOODS performance.
stdinc.h
List of include files common for all GOODS modules.
stdtp.h
Definition of standard types and including standard system headers.
storage.h
Definition of the application independent client interface to the GOODS storage.
support.h
Set of support classes and functions for GOODS modules: dynamic arrays, buffers, hash functions...
task.h
System independent multitasking interface. This header file contains definitions of following classes: task, mutex, semaphore, event, eventex, semaphorex.
template.cxx
Template for the GOODS client application.
testblob.cxx
Test program for binary large objects. This program can be used as an example for creating your own multimedia objects.
testsock.cxx
Test program for testing socket performance.
testtask.cxx
Test program for GOODS multitasking libraries.
transmgr.cxx
Implementation of the GOODS storage server transaction manager using /maintaining a log file for providing transaction recoverability. This manager handles local and global (distributed) transaction.
transmgr.h
Definition of the interface for storage server transaction manager.
tstbtree.cxx
Test program for the B-tree implementation in GOODS. This program can also be used for measuring GOODS performance.
tstrtree.cxx
Test program for the R-tree implementation in GOODS (very simple spatial database).
unidb.cxx
Example of a GOODS application: "university database".
unifile.cxx
Implementation of the os_file class for Unix. This implementation supports concurrent access to files by several tasks, merging of parallel synchronous write requests, alignment of synchronous writes to operating system file block boundary. The two last capabilities greatly increase performance of the server by commiting transactions more efficiently.
unisock.cxx
Implementation of the socket class for Unix.
unisock.h
Definition of the class representing Unix sockets.
winfile.cxx
Implementation the of os_file class for Windows. Optimizations include merging of synchronous write into single request to operating system.
w32sock.cxx
Implementation of the socket class for Windows. This class uses the WinSockets library for remote connections and provides own efficient implementations for local (within one computer) connections (UNIX domain sockets analog). This local sockets implementation uses cyclic buffers in shared memory and is more than 10 times faster than WinSockets.
w32sock.h
Definition of socket classes for Windows,
wtask.cxx
Implementation of multitasking library for Windows.
wtask.h
Definition of classes from multitasking library for Windows.

Distribution of GOODS

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR OF THIS SOFTWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

I will provide e-mail support and help you with development of GOODS applications.


Look for new version at my homepage | E-Mail me about bugs and problems