GOODS is a fully distributed database. The database consists of a number of storages, each of which can be located at different nodes of a network. The storage is controlled by a storage server, which is responsible both for handling database requests relevant to this storage and for interaction with other servers to perform database wide operations. The client can work with an arbitrary number of databases at any time. Each persistent object is stored in a concrete storage and its location can't be changed. The location of an object within the database is mostly transparent for clients (the client should not worry about in which storage an object is situated), but it is possible for a client to attach an object to a concrete storage. Global database operations (operations affecting more than one storage in the database) such as global transaction commitment, garbage collection and object locking are handled automatically by the database system using various algorithms to synchronize the activities of each storage server. Online database backup and schema modification is possible without suspending client interaction with the database. A lazy object conversion approach is used for implementing the schema modification. Different clients can simultaneously work with different versions of class definitions.
GOODS was designed to achieve maximum performance for handling object requests. Various algorithms for caching and prefetching objects and storage pages are used for this purpose. By splitting work into separate control threads, GOODS makes it possible to use the benefits of parallel execution - especially on multiprocessor architectures. A specially designed library provides a system independent interface for multitasking (classes and methods for creation and synchronization of tasks, including synchronization primitives such as: mutex, event, semaphore). This library is used to implement the client part of the database interface as well as the GOODS database server itself. Using a metaobject protocol for handling access to objects makes it possible to specify algorithms for object caching and synchronization of concurrent accesses, allowing a concrete application to reach the highest level of performance.
become
" operator, which changes the type of a concrete object.
In GOODS, the object descriptor contains information
about the object class, its size and its location in the storage file.
The object descriptor table is implemented using a mapped on memory file.
The allocation of storage memory in the current implementation is done
by using a bit memory map. Each bit of this map corresponds to a quantum
of storage memory. Currently this quantum is 32 bytes long. The storage
of the bit memory map is itself stored in a mapped on memory file.Garbage collection (GC) is done using a variation of the standard mark-and-sweep algorithm with some extensions for synchronization of the GC processes at different storages. To perform GC, one of the storage servers is chosen as a coordinator, which initiates the GC processes at each server and determines the global end of the mark phase at all servers.
GC can be initiated either when the size of the allocated storage exceeds some predefined threshold value or after some period spent by the system in an idle state. To initiate GC, each server sends a request to the coordinator. If the coordinator decides, that the GC can be started, it polls other servers in the database to find out whether they are ready to start GC. When the server receives such request from the coordinator and sends an acknowledgment to start GC, the server changes its state to "PRE-GC". If acknowledgments are received from all servers, the coordinator broadcasts to all servers a request to start the mark stage of GC. Otherwise (if a server is not available or a previous GC iteration still not finished at one of the servers) the coordinator broadcasts the "CANCEL" request to all servers, so that they can exit from the "PRE-GC" state.
Each storage server locally performs GC starting from the storage root objects. The set of roots consists of storage root objects and instances of objects loaded by the client at the moment of starting GC. Objects already scanned are considered to be marked with "black" color. Not yet marked objects which are referenced from "black" objects are considered to be "gray". All other objects are "white".
When the GC process finds a reference to an object from another storage, it sends a message to this server with this reference. To make this exchange of references more effective, GOODS buffers external references. Before a reference will be sent to a client, it is placed in the export buffer (if such reference is not already present in the buffer). When the buffer is full, it is sent to the destination server. The server maintains a separate export buffer for each other server.
When there are no more "gray" objects in the storage, the server sends a message to the coordinator containing a vector of numbers of messages with external references sent and received by this server from other servers. The coordinator maintains a matrix of all such vectors. To determinate the global end of the mark stage, the coordinator checks the following condition:
M[i,j].import = M[j,i].export, for all i, jwhere M[i,j].import is the number of external references received by server i from server j, and M[j,i].export is the number of messages sent by server j to server i.
If this condition is true, the mark phase is completed at all servers and the coordinator sends messages to all servers to initiate the sweep phase of GC. During the sweep phase all "white" objects are deallocated. The same approach is used to collect unused versions of object classes.
To prevent the deallocation of object instances which were just allocated and not yet committed by some transaction, all such instances are marked as "black" before the sweep phase. The GC process runs in the background without affecting the normal functioning of the database. If an object is changed during GC, the difference of the sets of the references from the old and the new version of the object is calculated. (To make this operation more effective, not really the set difference is calculated, but only references in near standing positions are compared instead. This makes it possible to handle the most common cases: references are not changed or are shifted due to insert or remove from array operation.) A list of references from all old versions of the object is kept until the end of GC. The PRE-GC state is necessary to synchronize the moment when all servers start to keep information about old versions of objects. So GC at each server will see all references between objects which were available at the moment of GC initiation. If some objects are changed by a transaction, GC will have no access to these objects until the transaction will be committed. Then all these objects are updated.
Since different clients can have different versions of objects, the storage server should remember all versions of objects used by active clients. This information can be removed when all clients update their instances of an object and the object is scanned by GC (if GC was active when the object was modified).
The sweep stage is performed incrementally, making it possible to allocate new objects in storage while the sweep process is active. A system crash can cause some leak of memory due to garbage in the memory allocation bitmap (for example space allocated for extended objects by transactions which were active at the moment of the crash). Usually this leak of memory is very small (if any) and it is more essential to recover the server as fast as possible. It is possible to completely reconstruct the bitmap after recovery: clear the bitmap and mark only the space used by objects present in the object index.
Allocation of object descriptors is performed using a list of free object descriptors. Since a system fault can leave this list in an inconsistent state, special checks are used during all operations on this list. If the list is corrupted, it will be completely rebuilt at the sweep stage of the next GC process (normally the GC process only appends deallocated descriptors to this list).
Usually objects are allocated successively in a file (unless the bitmap is fragmented), so the object's body can cross a page's boundary. Usually continuous allocation provides good performance, because objects which are created together will be with high probability accessed together. So continuous allocation of objects will reduce page thrashing. But for some classes of objects it is preferable to place each object on a separate page. The B-tree page is an example for such kind of object. Since the page size in GOODS is an application dependent parameter which can be changed in any moment, the only thing we can ask the memory manager to do with such objects is to align its position in the file on object size boundary (more precisely, on power of 2 which is greater than or equal to the object size). By such alignment we can guarantee that the object will never cross a page boundary unless the page size is smaller than the object size.
This protocol of transaction commitment has uncertainty periods: the protocol might not be able to complete if the coordinator fails. The participants must wait until the coordinator recovers before committing or aborting their local transactions. During this time, the local data (accessed by a transaction) is locked to ensure serializability. When the server is in an uncertain state it periodically sends a request to the coordinator to obtain the transaction status until a response will be received.
If at stage I the coordinator receives an "aborted" message from one of the servers or if the timeout of waiting for a reply is expired, the coordinator decides that the transaction is globally aborted, marks this transaction in the global transaction history log as aborted and sends abort messages to all servers participated in the transaction and to the client which has initiated this transaction.
During recovery from system fault, the server reads the transaction log and checks whether the transaction is local or global. If the transaction is global, the server asks the coordinator of this transaction for the status of this transaction. The coordinator reads the global transaction history log to check the status of the global transaction. If the transaction is marked as globally committed, the coordinator sends a "committed" reply to the server. The server performs recovery of his local part of the transaction. Otherwise, if the transaction is marked as "aborted" or if there is no entrance with a global transaction identifier in the global transaction log file, then the coordinator responds to the server with an "aborted" message and the server skips this transaction in his log. If the coordinator is not available at this moment, the server has to wait until the coordinator will be restarted.
If one of the participants fails after saving his local part of the transaction in the log but before sending a "ready-to-commit" message to the coordinator, the participant will ask the coordinator for the transaction status while recovering. If the coordinator still waits for acknowledgement of transaction commitment from servers participated in the transaction (timeout is not yet expired), it can notice that a request for the transaction refers to the active transaction. In this case the coordinator can treat a "get-transaction-status" request as an acknowledgement from this server to commit the transaction (the server asks for the transaction status only when it has successfully saved his local part of the transaction in the log and is ready now to recover it). So the coordinator doesn't send a reply for this request immediately but instead continues processing the transaction. A reply will be sent to the participants later when the transaction will be either committed or aborted.
When objects are restored from the transaction log after a crash, we should be careful with "zombie" objects - objects, which are already deallocated by the GC process. The space of these objects can be reused for newly created objects. When we restore such "zombie" object from the log it will override a new object allocated in the same memory space. That is not a problem - the new object is also present in the log and will be restored later. But now we have several object descriptors refering to the same memory location. Since there should be no references to an old object (which was once collected by GC), it will be deallocated during the next GC iteration and the space will be freed. But we also deallocate space of new objects which still can be accessible!
To avoid such dangerous behavior, we keep track of address ranges occupied by all restored objects during recovery. If the ranges of two objects overlap, then the older one is removed (space used by this object is deallocated and the descriptor is inserted into the list of free descriptors). We use a special kind of binary tree for effective finding of overlapped regions (log can contain a big number of objects and a naive algorithm with complexity N*N is not acceptable). Each level of this tree represents a range which is power of 2. The depth of this tree is limited by 64 (an offset in the database file has a size of 8 byte). So the complexity of checking the ranges of all restored objects is C*N, where C is some constant.
When storage is recovered after the crash, the file memory allocation bitmap is cleared and all used objects (addressed by object index) are marked in this bit map. If some of the bitmap bits (corresponding to the object space) are already marked and the object is not a one recovered from the transaction log, we have an inconsistency in the object index. This inconsistency can be caused by the garbage collector which freed an unreferenced object, reused its space for some newly allocated object, but did not flush the index page with this descriptor to the disk before the crash. Fortunately if one of two overlapped objects is really accessible from the storage root, then it should be found in the transaction log, which means it was already recovered. In this case we can just remove the descriptor of the second object. If both objects are not recovered from the transaction log, then both of them are inaccessible and we can safely remove one of them to reestablish the consistency of the object index.
When the size of the transaction log exceeds some limit, a checkpoint process is started by the transaction manager. All modified pages and file buffers are flushed to the disks, the log is truncated and the logging process starts from the beginning of the file. A new sequential number is written at the beginning of the log. This sequential number is incremented after each checkpoint and database close operation. This sequential number is included in each transaction record written to the log. Only a record which sequential number is equal to the sequential number of the log will be restored during recovery after a fault. When the checkpoint process has completed, the server sends messages to all possible coordinators of global transactions in which this server can be involved (servers with identifiers less than the identifier of this server) informing them about checkpoint completion. When receiving such message from a server i, the coordinator can remove from the global transaction log all entries relevant to the server with a sequential number less than the current sequential number of server i minus one. Subtracting one is necessary because transactions with these sequential numbers can be saved by backup and retrieved by restore procedures.
The transaction manager is also responsible for managing online backup procedures. Two types of backups are supported by the GOODS server: snapshot backup and permanent backup. Snapshot backup allows to produce snapshots of storage states. If a disk failure happens, then it is possible to restore a locally consistent state of storage for the time of the backup completion (if global transactions are used, then the global state of the database can not be consistent because a backup is local to the storage server: so snapshots at each server are not synchronized). A permanent backup process can be used to guarantee the global database recovery in any moment of time. It is assumed that log files can not be corrupted (they can be placed on some reliable media - for example RAID disk massive). When the backup process is active, checkpoints are delayed till the end of backup. To avoid delays in database functioning due to waiting for backup completion, the limiting log size should be set to a value, which can guarantee that backup can be completed before this limitation is reached. Backup can be started by triggering one of two conditions: a timeout signal is received or the size of the log becomes greater than some specified value. The backup process first saves the current contents of the storage file, the storage memory allocation bit map and the object descriptor table. Then it copies to the backup file all records from the transaction log until it reaches the last one (records can be added to the transaction log while backup is in progress: so synchronization is required). When the last record is reached, the backup process is complete. Now the checkpoint procedure is invoked to truncate the transaction log. After backup completion, the database operator has to switch backup media (tape) and make a schedule for the next backup: specify the timeout and/or log size for the next backup.
The difference between a permanent backup and a snapshot backup is that the snapshot backup has to save only that part of the transaction log file which was written at the moment when the backup completes copying the storage file and the memory manager tables. The backup should save only information about transactions completed before this moment to provide a consistent snapshot of the storage. The checkpoint procedure is not forced by the snapshot backup.
The operation of writing the transaction into the log should be performed
synchronously, to guarantee that after the completion of this operation
it will be possible to recover a transaction in case of system fault.
The total throughput of a database system can be significantly increased
if separate synchronous writes are combined into a single write operation.
The performance can further be increased if the size and position of the data
written are aligned to the operating system's disk block size. In this case
the operating system should not previously read the contents of a modified
block from the disk.
This algorithm is encapsulated into the "file
" class.
This abstract class provides operating system independent access to
files, guaranteeing correctness of concurrent access operations to the file.
Implementations of this class for different operating system use
system specific advanced calls (such as writev
for Unix
or overlapped IO for Windows-NT)
to achieve maximal performance. To provide merging of
concurrent synchronous write requests, the implementation of the class
file
uses two buffers in a cyclic way. While one buffer is writing to the disk
all other write requests are collected in the other buffer. When the write
operation for the first buffer is completed, the buffers change the role:
all write requests are now collected in the first buffer.
The size of the buffer is aligned to the size of the operating system's disk
block, so no overhead of reading the content of the modified block is present.
In case of using gathered IO functions (such as writev
),
it is possible to avoid copying the data written to the buffer; instead
a vector of segments can be written to disk.
It is possible in GOODS to switch off synchronous writes to the transaction logs (using normal buffered write requests) if the operating system can guarantee that all modified buffers will be flushed to disk before system halt. That is possible if UPS is used to prevent power faults and the operating system itself is considered to be enough reliable. The performance can be dramatically increased in this case.
BUSY
and a WAIT
flag and an event object on which tasks, accessing
this page, can sleep.
While reading a page from the file, the BUSY
flag is set, so
any other task trying to access this page will set the WAIT
flag
and sleep on the correspondent event object. After closing the read operation,
the WAIT
flag is checked.
If it is set, then the event object is signaled to wake up all waiting tasks.
While the page is being used by some tasks (they copy data to or from it),
the USED
counter in the page header is not zero. This counter
prevents the page replacing algorithm from throwing this page from the cache.The storage page size is an application dependent parameter and can be changed without affecting the storage data file. By default, the operating system page size is used as a value for this parameter.
Two kinds of locks are supported: exclusive and shared. Only one process can lock an object in exclusive mode, all other processes in shared mode. Lock requests can be either blocked or not-blocked. In the first case, the process requesting the lock will be blocked until the requested lock is granted. In case of a non-blocking request and if granting a lock is impossible, a negative answer is returned immediately. In GOODS the "honest policy" of granting locks is used: a lock requested first will be granted first. So even if a requested lock is compatible with current locks set on the object, this request can be blocked because there are already other blocked lock requests for this object. There is only one exception from this rule: a shared lock can be upgraded to an exclusive lock despite of the presence of other blocked requests.
Since different components of the storage server can access the same object simultaneously, some synchronization mechanism is needed. The object access manager provides methods for two types of object accesses: reading and writing, implementing the "multiple readers single writer" discipline.
The object access manager maintains for each object a list of object instances loaded by client processes. When an object is changed, this list is scanned and to all processes having an instance of this object (except the process which has modified this object) a notification message is sent. For correct garbage collection it is necessary to know about all objects and references from these objects present in client processes. Since the object access manager has this information, the garbage collector process requests information about all such references from the object access manager during the mark phase and before the sweep phase.
Since a client can have deteriorated a version of the object and can follow
references from this object instance, it is necessary to prevent the
garbage collector from deallocating this object, which is referenced only from
deteriorated instances (so no more references to this object are present in
the database). When an object is modified, the object access manager compares
the set of references in the old with the one in the new version of the object.
All references in the old version which are not present in the new version are
inserted into an "old-reference-list
", explored by the garbage
collector. The references are kept in this list until all clients will
update their instances of this object.
The GOODS server stores information about classes of all objects allocated in this storage. The class information includes the class name, the size of fixed and varying parts of class, the number of references in fixed and varying parts and information about each object field: name, type and offset within object. This information is used by the server only to calculate the number of references in the object while doing garbage collection and building the closure of objects claimed by the client. Storing complete information about the class definition is essential for the server to allow work with clients which have different database schemas (class definitions). GOODS allows the online modification of a database schema while the client works with the database. Moreover one client can use the old version of a class definition, while another a new one. Such lazy object conversion strategy is really necessary for a system with a lot of different client applications and a working time 7x24.
Before requesting an object from the server, the client looks at the class identifier. If the client has already loaded an object of such a class, then the mapping between the database class and the local application class is already set. Otherwise the client sends to the server a request to provide information about the class with such identifier. The server responds with the full class description. The client then looks for a local application class with the same name. If no such class is found, the application is abnormally terminated. Otherwise the client compares the signatures of the storage class and the application class. If the signatures are equal, the client establishes a new mapping between the storage class identifier and the application class.
If the signatures are not equal, then it is assumed that the client
has a new version of the class. Now a special class descriptor for converting
the object from the old representation to the new one is constructed. The conversion
procedure looks for the definition of all components in the old class description
and tries to find fields with the same name in the new class description.
If a field with the same name is present in both class definitions, the
conversion procedure checks whether a conversion between the old type of the
field and the new type is possible. The only restriction on type conversion
compatibility is that a reference type can be only assigned to a references type.
All other kinds of conversion (real->integer, integer->real,
fixed array of scalars->scalar, varying array->fixed array...
)
are possible (certainly some of these conversions can cause loss of data).
Then the client sends the new signature of the class to the server and receives the
new storage class identifier assigned to this class descriptor by the server.
When an object of such class is loaded from the server, it is converted to the
new class format and a new storage class identifier is assigned to this object.
If such object will be modified and sent back to the storage, it
will be saved in the new class format. So lazy approach is used to
perform object conversion when the class definition has been changed.
When a client is going to store an object in the storage (as a result of making a reference to this object from another persistent object), it checks whether this object has a valid storage class identifier. If none exists, the client sends to the server a description of the object class and waits for a response containing the storage class identifier assigned to this class. When the server receives such a request from the client, it looks through the definitions of all classes in the storage. If a class with the same signature is found, the server replies with its identifier. Otherwise a new class definition is added to the storage and a new assigned storage class identifier is returned to the client.
Since there can be a lot of object class modifications, a garbage collection procedure is also necessary to collect unused classes. During the mark stage of the garbage collector, classes of all "black" objects are also marked. At the sweep stage the class information manager is asked to remove all unmarked classes. If a class can be registered by an application, but the object instances of this class are not yet created at this moment, the deletion of this class definition is delayed to the moment when all client processes working with this class descriptor will be terminated. To implement this strategy of delayed class deallocation, all clients are assigned successive numbers as identifiers at the time of connection to the server. Each class descriptor stores the maximal client identifier which has accessed this class. When a client sends a request to provide a class identifier for a concrete storage class descriptor, the identifier of this client is compared with the client identifier stored in this class descriptor;if the first is bigger, it replaces the client identifier stored in this class descriptor. A class descriptor can be removed when there are no object instances of this class in the storage and there are no more clients with identifiers less or equal to the class identifier stored in this class descriptor.
Class descriptors are stored in the storage like other objects, but a special range of object identifiers is used for class descriptor objects. In GOODS a class storage identifier is stored in one word (two bytes) of the object descriptor, so the maximal number of classes in the storage is limited to 2^16 = 65535. The first 2^16 entries in the object descriptor table are reserved for class descriptors. Creating new class descriptors and updating existing ones is performed using standard transaction mechanisms. To improve the performance of the class descriptor lookup, the class information manager maintains an array of copies of all class descriptors in memory and uses a hash table for fast search of the specified class name.
There are several main approaches to determinate which set of objects should be sent to the client. One alternative is to ask the programmer to explicitly specify clusters of objects. If one object from the cluster is requested, then all objects from this cluster are transmitted to the client. The disadvantage of this approach is that it violates the transparency of manipulation with persistent objects. Another approach is to maintain statistics of requesting objects by clients. So if the server knows that after requesting object A the client with high probability will request object B, then object B can be sent to the client together with object A. The disadvantage of this approach is that it consumes a lot of server memory and CPU time. While this approach may provide very good results for some applications, it can be inefficient for other applications. The third approach is to try to construct some kind of object closure which includes all (or some of the) objects referenced from the requested object. The size of the closure should be limited to minimize the increase of network traffic due to transfer of useless data.
The GOODS server currently uses the third approach (certainly other approaches can easily be implemented and tested by redefinition of one method). When an object is requested by the client, the server includes in his reply all objects which satisfy the three following conditions:
To enable replication mode, you should specify transmgr.replication_node
parameter
in GOODS configuration file (goodsrv.cfg
). When GOODS server is started and
transmgr.replication_node
is specified, it tries to connect to the specified node.
False
at standby node.
Both standby and primary nodes store transaction records in their transaction logs.
To improve performance, it is good idea to set transmgr.sync_log_writes
to 0
disabling synchronous writes to the log. Since transaction is transferred to standby node and can be recovered
from standby node in case of primary node fault, synchronous write to primary node transaction log is not
needed. Since synchronous write is very expensive operation operation, disabling them can significantly improve
performance. Using transaction logs both at standby and primary node sides, allows to minimize
size of data which needs to be transferred from primary node to standby node during recovery.
Most of the transaction will be recovered from local transaction log, and only those transaction
which were not flushed to the disk before failure or which were committed by new primary node
after the crash, have to be sent from primary node to recovered standby node during synchronization
of state of standby node.
When you are running GOODS in replication mode, you should start two GOODS servers. The first one started becomes primary node. Second server (which should be started with some delay after start of th first server) will connect to primary node and becomes standby node. After crash of the primary server, you should restart it as soon as possible. If will connect to active server, becomes standby server and synchronize its state with primary node. If primary node has to be terminated and standby node is not available, then primary node doesn't perform checkout. When you will start the database next time, you should first launch GOODS server at this node (node which was primary before termination). It will perform recovery (even through it was normally terminated) and is able to restore state of standby node once it will be started and connected to this node.
When transmgr.replication_node
parameter is specified by no standby node is connected,
primary node doesn't perform checkpoints and truncate transaction log. Transaction log is needed to be able
to perform recovery of standby node once it will be connected to primary node. It means that size
of the transaction log can becomes very huge if there are a lot of transactions commits and
standby node is not available for a long time. So you need to schedule your disk resources in such way, that
avoid transaction log overflow in such situation.
When standby node detects primary node failure (connection with primary node is broken), it invokes
virtual method database::on_replication_master_crash()
. A programmer can override this method
in class derived from database
to be able to provide some application specific handling of primary
node failure. For example, it can change IP address of the standby node to be the same as was at primary node.
So now the clients need not know about primary and standby node addresses - they will just reconnect to the
same address and will work with new primary server.
Currently only replication of the single storage database is supported. Replication will not work for distributed transactions in the database consisting of two or more storages.
dbs_storage
". This class
contains a reference to the abstract class "dbs_application
"
which declares
methods handling database requests to applications (notification about object
modification, disconnect handler). The class "dbs_storage
"
declares methods
for fetching, locking and unlocking objects, for retrieving and storing
class descriptors, for manipulations with transactions,...
Implementations of this class are responsible for sending requests to
the server and receiving replies and notifications from the server.
This class is not responsible for synchronizing the access to the storage
and for allocation, unpacking and deallocation of object instances in
client applications. Instead the implementation of
"dbs_storage
" just places an object instance or
class descriptor loaded from the server
in a specially supplied buffer, whereas successive operations on this buffer are
performed by the application dependent part of the database interface.
The application dependent part of the database interface implements the
"dbs_application
" protocol; it contains methods accessing
"dbs_storage
" and providing the
synchronization of storage accesses as well as object packing and unpacking.
Also handling of storage notifications is performed by methods defined in
this part of the database interface.The database interface is organized in such way that it is possible for different applications written in different languages to work with the same database. In the current version of GOODS only the interface for C++ is provided, because C++ is the most popular object oriented language now, which also provides highly performance efficient executables. An interface to the Java language is planned in near future.
Different applications may have very different requirements to the database system. For example for a banking application a standard serializable transaction model may be the most relevant one. Applications in such a system should have an illusion of monopoly work with a database:a transaction should not see changes made by other transactions of other clients. It is the isolation model of database organization. But for an office document flow controlling system a cooperation model may be more preferable. In this model applications cooperate with each other and changes made by one application should be visible by other applications.
Since it is not possible to incorporate all possible strategies in a single database system, the generic (or universal) object oriented database should have facilities to extend its functionality and behavior depending on application requirements.
Modern programming systems have to deal with a lot of different aspects: user interface, memory management, multitasking, database access, security... A traditional approach of dividing an application into modules will not work if each module will be responsible for all of these aspects. If for example the logic of synchronization access to objects will be scattered about the application, it will be very difficult to understand such code, debug or modify it. To make the application more clear, simple and extensible, it will be a good idea to separate code responsible for different aspects of system behavior. These separated aspects can be combined together at compilation time (or at runtime) using a metaobject protocol approach. According to the GOODS interface with client applications, the following aspects of the system are implemented by metaobjects:
Let us look at these aspects more closer:
The default policy used in GOODS for intertask object access synchronization is the mutual exclusion model. Only one task can access the object at the same time. Each GOODS object has an associated monitor object which synchronizes the access to the object instance by different tasks. When a method of the object is invoked by some task, the monitor associated with this object is locked to prevent other tasks from using this object until this task returns from the invoked method. Nested calls to objects are also allowed. This is a simple and powerful strategy but it has some serious disadvantages, one of them is the possibility of a deadlock - the mutual blocking of several tasks.
This model of intertasking object access synchronization is similar to one used in Java (with all methods explicitly considered to be "synchronized"). Each GOODS object also has two methods "wait" and "notify". When a task, executing in a monitor needs to wait for another task to do something, it calls wait(). This causes the task to unlock the monitor and go to sleep. Since the monitor is unlocked, another thread can enter the monitor and supply the information the first task is waiting for. The second task signals the waiting task by calling notify(). Once the first task is signaled it wakes up and begins waiting for the monitor to become available. When the second task finishes its processing, it unlocks the monitor, allowing the first task to reacquire the monitor and to finish what it started.
Unfortunately, there is one serious problem with this signaling mechanism,
which can cause deadlocks. If a method of object O1
invokes
a method of object O2
, which in turn calls the wait()
method, then only the monitor of object O2
is unlocked, while
the monitor of object O1
remains locked. One could, which looks
tempting at first glance, modify the behavior of the metaobject so that calling
wait() unlocks all monitors a task has acquired. This would be a
mistake. It would imply that anytime you call a function that could in turn
potentially call wait(), you cannot guarantee that another task won't
change the state of the object.
Since monitor objects take some space, it is very space consuming a solution to have separate monitor objects for each application object. Instead of this a "turnstile" of monitor objects is used in GOODS. When an object should be locked and it has no attached monitor object, a new monitor object is taken from the turnstile (if there are no more free monitor objects, then the turnstile is extended). When an object is unlocked, the monitor object continues to be attached to this object, but it can be taken away and used for another object. So the total number of monitor objects doesn't exceed the maximal number of simultaneously accessed objects and there is a high probability that when an object is accessed it already has an attached monitor object.
Abstract Metaobject Basic Metaobject Optimistic Metaobject Repeatable Read Optimistic Metaobject Pessimistic Metaobject Lazy Pessimistic Metaobject Repeatable Read Pessimistic Metaobject
It is possible to achieve a higher level of concurrency between different
clients if the semantic of a concrete object class is explored.
For example, the PUT and GET methods of the "queue
" object
can be executed
concurrently without mutual exclusion as long as the queue is not empty.
This is possible because these methods, although both are mutators,
work on different ends of the queue. Using a specific metaobject protocol
considering the object's semantic can be advantageous.
The default implementation of transactions in GOODS implicitly opens a nested transaction each time a GOODS object is accessed; and commits all nested transactions when control returns from the last invoked method. So the programmer should not worry about the definition of points where to open or where to close a transaction (but it is possible to explicitly open or close a nested transaction and so to create a long-live transaction). When a transactions is aborted, all modifications of persistent objects are discarded. This implicit transaction schema is highly compatible with the idea of a transparent database interface. And it is more error safe.
It is possible to explicitly specify transaction boundaries by
metaobject::begin_transaction()
and
metaobject::end_transaction
virtual methods. The method
metaobject::begin_transaction()
increments by one counter of
nested transactions and metaobject::end_transaction
method
decrements the counter and if it becomes zero, commits the transaction.
To avoid such undesirable behavior, a special modification of the LRU algorithm is used in GOODS. The object cache is divided into two parts: a frequently used objects (FUO) part and an objects used only once (OUOO) part. Both parts are controlled by an ordinary LRU discipline using double linked lists. The object is excluded from the first list when it is accessed; and inserted at the head of the second list after the end of the access. If an object is taken from the OUOO list and it is not at the head of this list, the object is considered to be a frequently used one and reattached to the FUO list. (Since the object is present in the OUOO list but not at the head of this list, we can make the conclusion that the object was already accessed by some other place in the program and may be accessed once again). So each object scanned during the database search can not replace objects from the FUO part of the cache. But GOODS allows the programmer to define her/his own cache managing policy because the cache is also controlled by metaclass methods.
Currently the GOODS database interface consists of some kernel database requests (such as set lock, open transaction, load object...) and a number of basic metaobjects, implementing the most common models of object access organization. Deriving from these basic metaobjects, it is possible to create specific metaobjects to handle specific requirements of concrete applications. Usually most of the work can be done by methods of base metaclasses and only few things should be redefined or added.
char 1 byte character nat1 unsigned 1 byte integer int1 signed 1 byte integer nat2 unsigned 2 bytes integer int2 signed 2 bytes integer nat4 unsigned 4 bytes integer int4 signed 4 bytes integer nat8 unsigned 8 bytes integer int8 signed 8 bytes integer real4 4 bytes ANSII floating type real8 8 bytes ANSII floating typeIt is better to use these type aliases instead of native C types to provide portability of your application (for example type long can be 4 bytes on one system and 8 bytes on another). References to other GOODS objects are supported by means of "smart pointers" implemented in the template class "
ref
":
class A; ref<A> ra;It is possible to construct arrays and structures of the above specified atomic types:
char str[10]; struct coord { int x; int y; }; coord points[10];It is possible to specify classes with objects of varying size. Such class can have (only) one varying component, the size of which is determined at object creation time:
class person : public object { public: char name[1]; static tree_node* create(char* name) { int name_len = strlen(name); return new (name_len) person(name, name_len); } METACLASS_DECLARATIONS(person, object); protected: person(const char* name, size_t name_len) : object(self_class, name_len) { memcpy(this->name, name, name_len+1); } }; field_descriptor& person::describe_components() { return VARYING(name); }Such a class can have only one varying component which must be the last one. Classes with varying components are used in the GOODS C++ interface to implement arrays. They provide an efficient method for representing classes with constant (immutable) string identifiers (like person and name).
When the implementation of the C++ metaclass information does not exist, the programmer has to specify this information manually because the database needs to know the format of objects. To ease this work, a number of macros and functions are supported by the GOODS interface for C++.
GOODS supports persistency only for objects of persistent capable
classes. A class is persistent capable if it is derived from the
GOODS "object
" class and implements some methods and constructors
needed by the GOODS client library. Such a class should have:
self_class
"
containing the class descriptor of this class;
classof
" returning the
class descriptor determined by the static type
of the function argument;
describe_components
",
which returns information about all components of the class.
METACLASS_DECLARATIONS(CLASS, BASE_CLASS)The programmer merely has to implement a member function "
describe_components
",
which provides information about all class instance variables.
To make the work of describing the class variables more easy and error safe,
five special macros are provided:
NO_FIELDS the method should return this value if there are no variables in the class FIELD(x) describes an atomic or structural field ARRAY(x) describes a fixed array type MATRIX(x) describes a two dimensional fixed array type VARYING(x) describes a varying arrayIf there are structural components in the class, you should define a function "
describe_field
" to describe components of
this structure:
class B_page : public object { friend class B_tree; enum { n = 1024 }; // Minimal number of used items at not root B tree page int4 m; // Index of first used item struct item { ref<object> p; // for leaf page - pointer to 'set_member' object // for all other pages - pointer to child page skey_t key; field_descriptor& describe_components() { return FIELD(p), FIELD(key); } inline friend field_descriptor& describe_field(item& s) { return s.describe_components(); } } e[n*2]; ... public: METACLASS_DECLARATIONS(B_page, object); }; field_descriptor& B_page::describe_components() { return FIELD(m), ARRAY(e); } REGISTER(B_page, object, optimistic_scheme);The macro
REGISTER
can be used to define default implementations
for the methods "classof
" and "constructor
". The
macro can be used to create a class descriptor for the class:
REGISTER(CLASS, // name of the class BASE, // name of base class MOP // metaobject for this class );It is possible for a template class to be persistent capable, but the programmer has to explicitly register different template instantiations of the database. The common way is to use the typedef operator to create an alias name of the concrete template instantiation and then register the class with this alias name in the database by means of the REGISTER macro.
Each storage has a predefined root object. To change the type of this
abstract root object you should use the "become
" method.
To find out whether storage is already initialized, a special virtual method
"is_abstract_root
" is defined in the class
"object
". This method
returns true if the type of the root object has not been changed yet and
false otherwise.
The multitasking library requires some initialization before it can be used.
The first statement in the main()
function of the application should
be the invocation of the static method task::initialize
.
The single parameter of this method specifies which stack size should
be reserved for the main thread of the program.
Usually the sequence of steps to initialize storage may be:
class my_root : public object { public: ... METACLASS_DECLARATIONS(my_root, object); my_root() : object(self_class) { ... } void initialize() const { if (is_abstract_root()) { ref<my_root> root = this; modify(root)->become(new my_root); } } }; int main(int argc, char* argv[]) { task::initialize(task::huge_stack); database db; char* cfg_name = new char[strlen(argv[1])+5]; sprintf(cfg_name, "%s.cfg", argv[1]); if (db.open(cfg_name)) { ref<my_root> root; db.get_root(root); root->initialize(); ... // do something with database db.close(); } }You can find a template for a simple database application in the file "template.cxx". All accesses to persistent objects should be encapsulated inside object methods. It is bad programming practice which may cause a runtime error, to directly reference some object component:
class text { public: char str[1]; ... }; main () { ref<text> t; ... printf("text; %s\n", t->str); // !!! ERROR }The reason for such restriction is that the persistent object at any moment can be thrown away from memory by the cache replacement algorithm, when there is no active method for this object. Then it may happen that a smart pointer points to the wrong object. The correct implementation for the code above is:
class text { protected: char str[1]; public: void print(); ... }; void text::print() { printf("text; %s\n", str); } main () { ref<text> t; ... t->print(); }Avoid direct access to object components whenever it is possible: make all object instance variables protected and encapsulate the access to them within object methods. Encapsulation makes your application simpler and more flexible. With the GOODS C++ interface the performance is increased, because the access to self components within object methods requires no extra runtime overhead.
GOODS C++ API provides implicit memory deallocation (garbage collector) for persistent capable objects based on reference counters. It means that once number of references to the instance of persistent capable object becomes zero, it will be immediately deallocated. Smart pointer are responsible for maintaining reference counter for each object. Unfortunately garbage collector based on reference counters is not able to deallocate data structures with cyclic references (such as L2-list). It is not critical for persistent objects, because object cache replacement algorithm will in any case remove least recently used instances from the memory. But you should take care about transient data structures with cyclic references.
There are two dangerous issues related with usage constructors of persistent capable
objects. As far as at the moment of the constructor execution there are no references
(smart pointer) to the created instance, assigning this
pointer in the
constructor to some smart pointer can cause deallocation of the created object one this
variable will be changed. For example, the following constructor will
cause unexpected deallocation of object:
class X : public object{ X() : object(self_class) { refAnother problem is specific for the constructor used for creation of root object. Root object is initialized in GOODS C++ API by means ofx = this; x = NULL; <<<-- at this point this object will be removed! ... } };
become()
operator,
which just swaps references to two objects (predefined root object and new root objects).
It means that all references to the new root object set in constructor will point
to the other object (abstract root) after become. So do not set any references to
created object in the constructor of root object. Also be care with assignment of
this
pointer in constructors of all persistent capable classes, and better avoid
such assignments at all by performing all necessary initialization in some other method.
Starting from 2.37 version of GOODS, per-thread transactions are supported.
In prior versions of GOODS only all threads share the same transaction. The transaction is
committed only if there are no more active methods of persistent capable object in any thread.
The counter of nested transaction invocations was static variable of basic_metaobject
class, so all persistent capable objects (even if these objects belongs to the different database) share the single counter of nested transaction.
In 2.37 version of GOODS, special class cache_manager
was introduced to contain
static variables from basic_metaobject
class used for transaction and cache
management. By default the behavior is consistent with old
versions of GOODS - all threads share the same transaction. But it is now possible to assign
some particular instance of cache_manager
to a thread or group of threads.
database
class has now two new methods: attach()
and
detach()
.
Each database
class can have its own cache_manager
. But unless
attach()
method is executed by the thread, default cache manager is used.
The method attach
associate cache manager of the particular database
connection with the current thread. The method detach
destroy this association.
Invocation of detach
method is not ncessary and this method was added only for
csymetry and compatibility with Java API.
Per-thread transaction model is especially useful for servers, when one process handle
remote connections with multiple clients. The actions performed by each client should be
treated as separate transaction. To implement this model in GOODS version 2.37 and higher, it
is necessary to create separate database connection for each thread (create instance of
database
class and open the database). Then thread should be attached
to the database by database::attach()
method. Each thread will
have its own object cache, so there can be several instance of the object with the same OID in
the application. Standard GOODS synchronization mechanism based on locks set by metaobjects
is used to avoid conflict of accessing the same object by several concurrent threads.
There are the following drawbacks of the proposed model:
This model of per-thread transaction is experimental and may be changed in future. Example illustrating per-thred thransactions:
void task_proc run(void*) { database db; db.attach(); if (db.open("myserver.cfg")) { ... db.close(); } done.signal(); } int main() { int i; task::initialize(task::normal_stack); for (i = 0; i < nThreads; i++) { task::create(run); } for (i = 0; i < nThreads; i++) { done.wait(); } return 0; }For real servers, it is better to maintain pool of threads. So the server will not spawn new thread and open database connection each time it receives request from the client (and destroying thread and connection after requests has been processed). Instead of it threads for handling client request are taken from the pool and return to the pool after the end of request processing. Each thread has permanently opened database connection. Such scheme can significantly increase total server throughput because of eliminating overhead of spawning new thread and especially establishing new database connection.
Version 2.37 of GOODS provides two additional optimization methods:
fetching object clusters and bulk allocation mode. Both optimizations can be enabled at
database level by set_alloc_buffer_size(size_t size)
and
enable_clustering(boolean enabled)
methods.
When clustering is enabled, server will send to the client not only the requested object, but also objects directly referenced from the requested object. The total size of objects in cluster is limited by server parameter for maximal cluster size. By default the value of this parameter is 500 bytes.
The bulk allocation mode allows to reduce number of messages send between client and server and also improve reference locality. If size of allocation buffer is set to non zero value, then the server will reserve requested number of OPIDs for the client. So allocation of object (assigning OPID to the object) can be done immediately by the client without sending requests to the server. At the end of transaction or when the end of the allocation buffer is reached, one request is sent to the server to allocate all objects with reserved OPIDs. Allocation is performed as the single continuous segment, so all objects created within transaction will be placed near each other. When the client is detached from the server (normally or abnormally) cleanup is performed and all reserved OPIDs are reclaimed.
The GOODS C++ interface provides a library of some widely used container classes for efficient access to persistent data. The following subsections briefly describe these classes.
typedef array_templateThe dynamic array template provides methods for direct access to array components:ArrayOfShort; REGISTER(ArrayOfShort, object, pessimistic_repeatable_read_scheme);
T operator[](nat4 index) const; T getat(nat4 index) const; void putat(nat4 index, T elem);Methods for getting the number of components in the array, for copying and appending array components are also available. The dynamic array also implements a stack protocol by providing such methods as Push(T value), Pop(), Top(). The methods insert(int index, int count, T value) and remove(int index, int count) can be used to add or remove dynamic array components.
The class String is implemented as a subclass of the ArrayOfChar class and defines extra methods for string manipulation.
In the classical implementation of the B-tree, each B-tree page contains a set of pairs <key, page-pointer>. The nodes at the page are ordered by key, so a binary search can be used to locate an item with greater or equal key. In the B*-tree, pointers to members are stored only in leaf pages of the B-tree. All other pages contain pointers to child pages.
The B*-Tree in GOODS is implemented as a subclass of the set_owner class. Pointers in leaf pages of the B*-tree refer to objects of the set_member class, which contain references to the objects included in the B-Tree. Nodes of the B*-tree pages contain a short form of key , which can be calculated from the object key by a virtual method of the set_member class (using the nat8 type, it usually consists of just the first 8 bytes of the original key). Such structure allows objects to be included in several B_trees and also makes the search operation more effective, because only small set_member objects are accessed during the search (if there are several objects with the same value of the short key). The B_Tree class defines methods for inserting new objects into the tree, removing objects from the tree and searching objects by the key.
Interaction with Web server is based on three-tier model:
Web Server -> CGI stub -> GOODS application CGI call local socket connectionUsing GOODS built-in HTTP server provides maximum performance, because in this no communication and process creation overhead takes place. In both cases the same API for receiving and unpacking requests and constructing responses is used. So the same application can be used for interaction with external Web server as well as stand-alone HTTP server.
GOODS application is request-driven program, receiving data from
HTML forms and dynamically generating result HTML page. Classes
WWWapi
and WWWconnection
provide simple and
convenient interface for getting HTTP requests, constructing HTML page and
sending reply back to WWW browser. Abstract class WWWapi
has two implementations: CGIapi
and HTTPapi
,
first of which implements protocol of interaction with Web server by means of
CGI mechanism, and the second - protocol of direct serving HTTP requests.
Built-in HTTP server is able to handle two types of requests - transfer HTML file find in place relative to the current working directory in response to GET HTTP request and perform action specified by GET or POST requests with parameters. Built-in HTTP server provides persistent connections - server will not close connection with client immediately after sending response, instead of this connection will be kept during some specified interval of time. Also this built-in server supports concurrent requests processing by several threads of control. But starting of threads should be performed by client application.
Virtual method WWWapi::connect(WWWconnection& con)
accept clients connection (either from CGISTUB program of from WWW browser).
This method returns true
if connection is established.
In this case programmer should call
CGIapi::serve(WWWconnection& con)
to receive and handle client's
requests. This method return false
if and only if handler
of request returns false
. Even if request was not correctly
received or could not be handled, true
is returned by
serve
method. The connection is always closed after return from
serve
method. It is possible to start separate thread for
exceution of each server
method.
To construct responce to the request special overloaded >>
operators are provided in WWWconnection
class. First line of
response should specify type of response body, for example:
Content-type: text/html\r\n\r\nTwo CR-LF character after this line separate HTTP header from the body. Three encoding schemes can be used for constructing response body:
WWWconeection
class performs automatic switching between encodings. Initially TAG
encoding is always used. Then encodings are implicitly changed using the
following rules:
TAG -> HTML HTML -> TAG URL -> TAGIt certainly possible to explicitly specify encoding for the next output operation by means of special
<<
operator, which accepts
one of the following constants: TAG, HTML, URL
.
Information about HTML form slots values or request parameters can be obtained
using WWWconnection::get(char const* name, int n = 0)
method.
Optional second parameter is used only for getting value of selectors with
multiple selection allows option. If parameter with such name is not found,
NULL
is returned. There are some mandatory parameters
which always should be present in all forms handled by GOODS:
Parameter name | Parameter Description |
---|---|
socket | address of the server, used for constructing new links |
page | symbolic name of the page, used for request dispatching |
stub | name of CGI stub program always defined by API |
<number of storages = N> <storage identifier 0>: <host name>:<port0> ... <storage identifier N-1>: <host name>:<portN-1>The storage identifiers should be successive integer numbers,which are used as indices in the array of storages. You can find examples of this configuration file in: "unidb.cfg" and "guess.cfg". In a distributed environment, a configuration file can be accessed from the server computer using some network file system protocol (for example NFS) or can be replicated to client computers.
Parameters for GOODSRV can be specified in one of two files: "goodsrv.cfg" and "database,srv". The first one specifies parameters common for all servers, the second specifies parameters of the server of the specific database. If a parameter is defined in both configuration files, then the value of the parameter from "database,srv" is used. If a parameter is not specified in the configuration files, then default values will be used. The following table describes all available parameters:
Parameter | Type | Unit | Default value | Meaning | Set |
---|---|---|---|---|---|
memmgr.init_map_file_size | integer | Kb | 8192 | Initial size of memory map file. Increasing size reduces the number of memory map reallocations. | - |
memmgr.init_index_file_size | integer | Kb | 4096 | Initial size of index file. Increasing size reduces the number of index reallocations. | - |
memmgr.gc_init_timeout | integer | seconds | 60 | Timeout for initiation of GC process. A GC coordinator waits for replies from other servers for GC initiation request during the specified period of time. | + |
memmgr.gc_response_timeout | integer | seconds | 86400 | Timeout for acknowledgment from a GC coordinator to finish the mark stage and perform the sweep stage of GC. If no response is received from the GC coordinator within this period, GC will be aborted at the server. | + |
memmgr.gc_init_allocated | integer | Kb | 1024 | Size of allocated memory since last GC, after which the next garbage collection process will be initiated. | + |
memmgr.gc_init_idle_period | integer | seconds | 0 | If > 0: specifies the idle period interval, after which GC will be initiated, if the memory management server receives no request during this period of time. | + |
memmgr.gc_init_min_allocated | integer | Kb | 0 | Minimal size of allocated memory to start GC in idle state (see previous parameter). GC will be initiated only if the idle period timeout is expired and more than gc_init_min_allocated memory was allocated since last GC. | + |
memmgr.gc_grey_set_threshold | integer | references | 1024 | Maximal extension of GC grey references set: reaching this number of references, the optimization of the order of taking references from the grey set (to improve reference locality) is disabled and a breadth first order of object reference graph traversal is used. | + |
memmgr.max_data_file_size | integer | Kb | 0 | If > 0: limits the size of the storage data file. After reaching this value, GC is forced and all allocation requests are blocked until enough free space is collected. | + |
memmgr.max_objects | integer | objects | 0 | If > 0: limits the number of objects in the storage. After reaching this value, GC is forced and all allocation requests are blocked until some object will be collected by GC. | + |
memmgr.map_file_name | string | - | "*.map" | Name of file with memory allocation bitmap. | - |
memmgr.index_file_name | string | - | "*.idx" | Name of file with object index. | - |
transmgr.sync_log_writes | boolean | 0/1 | 1 | Enable/disable write-ahead transaction logging. | - |
transmgr.permanent_backup | boolean | 0/1 | 0 |
If == 0: a snapshot backup type is used. Backup is terminated after
saving a consistent state of the database and checkpoints will
be enabled. If == 1: a permanent backup type is used. Backup terminates and forces a checkpoint after saving all records from the transaction log. This type can be used to ensure, that storage can be restored when loosing the storage data file after a fault. | + |
transmgr.max_log_size | integer | Kb | 8192 | Size of transaction log: reaching this value starts a checkpoint. After checkpoint completion, writing to the log file continues from the beginning. | + |
transmgr.preallocated_log_size | integer | Kb | 0 | Forces the transaction manager to preallocate the log file and doesn't truncate it after checkpoint. In this case the file size should not be updated after each write operations and the transaction performance is increased about 2 times. | + |
transmgr.wait_timeout | integer | seconds | 600 | Timeout for committing a global transaction. A coordinator waits for replies from other servers participated in a global transaction until expiration of this timeout. | + |
transmgr.retry_timeout | integer | seconds | 5 | Timeout for requesting the status of a global transaction from a coordinator. When a server performs recovery after crash, it needs to know the status of a global transactions, in which it has been participated, So it polls coordinators of global transactions, using this timeout as interval for resending request to coordinators. | + |
transmgr.checkpoint_period | integer | seconds | 0 | Time interval between two checkpoints. A checkpoint can be forced either by exceeding some limit value of transaction log size or after some specified period of time (if this timeout is non-zero). | + |
transmgr.dynamic_reclustering_limit | integer | bytes | 0 | Enable/disable dynamic reclustering of objects. If dynamic reclustering of objects is enabled, all objects modified in a transaction are sequentially written to a new place in the storage (with the assumption that objects modified in one transaction will be also accessed together in future). This parameter specifies a maximal size of objects for dynamic reclustering. If == 0: disables reclustering. | + |
transmgr.log_file_name | string | - | "*.log" | Name of local transaction log file. | - |
transmgr.history_file_name | string | - | "*.his" | Name of global transaction history file. | - |
objmgr.lock_timeout | integer | seconds | 600 | Deadlock detection timeout. If a lock can't be granted within specified period of time, the server considers that deadlock takes place and aborts one or more client processs to destroy deadlock. | + |
poolmgr.page_pool_size | integer | pages | 4096 | Size of page cache. Increasing this value will improve performance of disk IO operations. | - |
poolmgr.data_file_name | string | - | "*.odb" | Name of storage data file. | - |
server.admin_telnet_port | string | hostname:port | "" | Specifies the port upon which goodsrv will serve administrative interactive dialog sessions. If you specify this option, you can run goodsrv as a background task. Administrative functions, such as scheduling backup tasks, etc. can be performed by telnetting to the specified port. | - |
server.cluster_size | integer | bytes | 512 | Maximal size of object cluster to be sent to client. A server can perform optimization of sending objects to clients. Instead of sending one object for each request, it can send several objects (cluster of objects), including into the cluster objects referenced from the requested object (but only if total size of objects will not exceed cluster_size parameter). | + |
- The last column of this table marks parameters, the values of which can be changed by the set command in interactive dialogue mode.
- The symbol * used in the file names in this table stands for the concrete database name. This name is an argument passed to GOODSRV.
Other arguments of "goodsrv" are:
goodsrv <storage name> [<storage-id> [<trace-file-name> | "-"]]By default "goodsrv" starts in interactive dialogue mode, allowing the user to execute some administrative database operations as well as to see database usage parameters. Also if the GOODS server was compiled with the trace option, trace information will be shown at the terminal. If you specify a file name as the last argument to "goodsrv", messages will be saved in this file. Below is a list of all valid commands for the "goodsrv" monitor:
help [COMMAND] print information about command(s) open open database close close database exit terminate server log [LOG_FILE_NAME|"-"] set log file name show [CATEGORIES] show current server state monitor PERIOD [CATEGORIES] periodical monitoring of server state backup FILE_NAME [TIME [LOG_SIZE]] schedule online backup process stop backup stop backup process restore BACKUP_FILE_NAME restore database from the backup trace [TRACE_MESSAGE_CLASS] select trace messages to be printed set PARAMETER=INTEGER_VALUE set server parameter value rename OLD_CLASS_NAME NEW_CLASS_NAME rename class in the storage rename CLASS COMPONENT_PATH NEW_NAME rename component of the class
CATEGORIES is the set of "servers clients memory transaction classes". By default all categories are shown.PERIOD interval in seconds of printing statistic information
TIME is the interval of time in seconds and LOG_SIZE is the transaction log size. After reaching one of these parameters, a backup procedure will be started. Default values for these parameters are 0, which means that backup will be started immediately.
TRACE_MESSAGE_CLASS is a space separated list of message classes. Only messages belonging to one of the specified classes will be shown. The complete list of message classes can be obtained by the "help trace" command.
PARAMETER is a valid server parameter (see table above). Execute the "help set" command to obtain a list of all valid parameter names.
If you would like to run goodsrv as a background task, set the "server.admin_telnet_port" parameter in your configuration file. GOODSRV will not output any messages to stdout; instead, it will log all output and provide a telnet session on the specified hostname and port. By enabling this option, a database administrator can monitor and maintain a database remotely.
Here's an example setting for the server administration telnet port:
server.admin_telnet_port="localhost:8920"
Once the server is started with this parameter set, a database administrative session can be accessed by issuing the following command:
telnet localhost 8920
The browser can dump values of fields of all objects (including classes) stored in database storages. The console version of the database browser "browser.cxx" requires a single command line argument, which specifies the name of the database configuration file. It is not necessary to specify all servers in this configuration file. You only have to describe those storages, objects from which you want to inspect. Certainly the servers of these storages should have been started before you run the "browser" application. To refer an object, you should specify the storage and the object identifier. It is possible to see a list of available commands by typing "help".
The CGI version of the browser "cgibrows.cxx"
provides a much more user friendly interface.
To be able to use this browser you need some WWW server running at the
computer where the database storage is located. For example it can
be the Microsoft Peer WebServer, included in the NT-4.0 distribution, or the
Apache free WWW server. You have to do some preparation before you can
use the browser. You should edit a file "browser.htm"
(name can be changed) to specify the name of your host (localhost
is possible) and the path to the cgibrows
program
(cgibrows.exe
in Windows). This program can be placed either in
some default directory for CGI scripts used by the WWW server, or it is possible
to register another directory in the WWW server. It is preferable to
have the cgibrows
program in the same directory as the database.
Otherwise you should specify the full absolute path to the directory where the
database is located when opening the browser.
Be sure that the user, under which name the CGI script will be executed,
has enough permissions to access files and ports at the local computer
(GUESS
by default has no such permissions).
More preparations the WWW browser for GOODS does not need. Now you can
open in your favorite WWW browser the page
"browser.htm" and specify the database name and
storage number. Database name is the name of the configuration file without
the extension. If this file and the cgibrows
program are
located in different directories, you should specify the absolute path to this
file. By default, the storage with identifier 0 will be opened.
To start the browser
press the Open button. The browser will dump fields of the root object.
The browser outputs non-NULL-reference fields as the pair:
storageID:objectID
. To navigate through objects, click on the
reference field. Do not forget the Back, Forward
and Go buttons, which can help you in navigation. You can browse
the database from any computer which can access the WWW server.
USE_EXTERNAL_HTTP_SERVER
.
Database can be
accessed from any computer running some WWW browser. To build
bugdb
application in Unix you should specify www
target to make utility.
To run this BUGDB with external WWW server you should first customize your
WWW server.
It should be able to access buglogin.htm
file and run
CGI script cgistub
. Also user, under which CGI scripts will
be executed, should have enough permissions to establish connection with
GigaBASE application (by sockets). It is better to run GigaBASE application and
GigaBASE CGI scripts under the same user. For example, I have changed the
following variables in Apache configuration file:
httpd.conf: User konst Group users access.conf: <Directory /usr/konst/gigabase> Options All AllowOverride All allow from all </Directory> DocumentRoot /usr/konst/gigabase srm.conf: ScriptAlias /cgi-bin/ /usr/konst/gigabase/It is also possible not to change configuration of WWW server, but place
cgistub
and bugdb
programs in standard CGI
script directory and change in the file buglogin.htm
path to
the cgistub
program.
After preparing configuration files you should start WWW server.No configuration is needed when you are using built-in HTTP server. Just make sure that user has enough permission to access port number 80 (default port for HTTP server). If some HTTP server is already started at your computer, you should either stop it or specify another port for built-in HTTP server. In last case you also need to specify the same port in the settings of WWW browser to make it possible to establish connection with right HTTP server.
After starting bugdb
application itself you can visit
buglogin.htm
page in WWW browser and start to work with
BUGDB database. When database is initialized, "administrator" user is
created in the database. First time you should login as administrator using
empty password. Than you can create some other users/engineers and
change the password. BUGDB doesn't use secure protocol of passing passwords and
doesn't worry much about restricting access of users to the database.
So if you are going to use BUGDB in real life, you should first
think about protecting database from unauthorized access.
guess.cxx - game "Guess an animal", unidb.cxx - "university" database testblob.cxx - work with binary large object tstbtree.cxx - program for measuring of GOODS performanceThe program "guess.cxx" is the simplest database application using an optimistic approach for synchronization. This program creates a binary tree with information about animals (or anything else which you have entered). To run this program you should first start the database server by the following command:
> goodsrv guessThen you can start any number of client sessions by running the program guess in separate windows. Optimistic approach in this application means: should - while you are answering program questions - another user have performed an update of the same branch of the tree, you have to start from the beginning.
The program "unidb.cxx" is a sample GOODS application working with a distributed database. The structure of this database is very simple. The university database contains a B_tree of students and a B_tree of professors. Each student object has a diploma component and is attached to one professor (his advisor). A professor owns an unordered set of students (group) attached to him. Students can be added and removed; or reattached from one professor to another. A professor can be added and removed, except he is someone's tutor. This application shows a simple menu, allowing the user to select an action and also uses the GOODS change notification mechanism to handle database modifications done by other clients (each time a student or professor is added or removed by some database client, the total number of students and professors in the university is updated and shown at the top of the menu).
Objects in this application are distributed between two storages: the student's storage (0) and the professor's storage (1). All student objects (and objects referenced from them, such as "string" and "set_member" objects) are placed in storage 0. All professor objects are placed in storage 1. To run this application, you need to start two servers:
terminal-1> goodsrv unidb 0 terminal-2> goodsrv unidb 1Then you can start any number of clients by running the program "unidb" without any arguments at different windows.
Dealing with large multimedia objects is shown in the sample
"testblob" application. Two classes from the GOODS class library
are used in this application:
the blob
class and the hash_table
class.
Class blob
provides incrementally loading big binary objects
in parallel with its handling (for example you can unpack and transfer
data from buffer to audio device while the next part of the object is loaded
from the database). Also this application tests multitasking at client's site.
The program "testblob
" is very simple and has only few commands,
allowing you to insert, extract and remove files in/from the database.
Type the "help" command to learn more about the command syntax.
To run this application you should activate the database server by issuing
the command "goodsrv blob". Then start the application itself.
One of the most effective structures for storing spatial objects is the R-tree
(proposed by Guttman). The GOODS library contains an "R_tree" class
which implements Guttman's quadratic split algorithm. To test this class
implementation, a very simple model of a spatial database is developed:
the "tstrtree". All objects in this program are placed in
R-trees and/or H-trees (a combination of B*-tree and hash table)
and can be accessed either by name or by coordinates.
The configuration file for this program is "rtree.cfg".
So issue the command "goodsrv rtree" to start the server; and
then the command "tstrtree
" to run the test program.
tstbtree
" simulates parallel work of several clients
with the database.
To run this program, you should first start the "goodsrv
" server
with the following parameters:
> goodsrv btree 0and then run some amount of client applications using the "spawn" utility, for example:
> spawn 32 8 tstbtreeThis command will 32 times invoke the "
tstbtree
" program, and at
most 8 instances of this program will be executed simultaneously.
Each instance of the "tstbtree
" program inserts 100 records
of randomly distributed size in the range 6..1030 bytes in
one of four B-trees. Then 10 times it repeats
a loop to search each of these records in the B-tree. Finally
it removes all created records from the B-tree. After the end of
the test there should be no records in the database, so it is possible
to run the test once again. Because not the performance of the B-tree
implementation itself but the performance of the GOODS server is measured,
it was decided to reduce the size of the B-tree page to 4 entries
to increase the number of objects participated in the transactions.It is interesting to investigate the dependence between the number of programs executed in parallel and the total system performance. In theory the best result should be obtained when there are exactly 4 concurrent applications (mostly they all will work with different B-trees and no synchronization between them is necessary). If there are more than 4 applications running in parallel then a lot of notification messages used to synchronize the caches of these applications will reduce the total system performance. The following table contains results (elapsed time in seconds of test execution) for some systems:
Parallel processes | AlphaServer 2100 2x250 DigitalUnix 4.0 portable | AlphaServer 2100 2x250 DigitalUnix 4.0 pthreads | PowerPC 120 MkLinux portable | PPro-200 Linux portable | PPro-200 WinNT 4.0 | SPARCstation-20 2x50 Solaris 2.5 pthreads | PPro-233 FreeBSD 3.0 portable | UltraSparc 2x300 pthreads |
---|---|---|---|---|---|---|---|---|
1 | 239 | 227 | 121 | 73 | 57 | 349 | 117 | 130 |
2 | 226 | 187 | 124 | 95 | 47 | 339 | 116 | 119 |
4 | 221 | 112 | 130 | 66 | 30 | 178 | 65 | 70 |
8 | 233 | 114 | 156 | 67 | 37 | 188 | 68 | 71 |
16 | 240 | 124 | 247 | 68 | 44 | 209 | 68 | 74 |
Graphic representation of these results
The time of this test execution mostly depends on the time needed for synchronous writes to the transaction log. Improving the system performance in case of executing client requests in parallel is mostly obtained by merging synchronous write requests. The following table shows the average number of merged synchronous writes for different systems and number of clients running in parallel:
Parallel processes | AlphaServer 2100 2x250 DigitalUnix pthreads | PPro-200 WinNT 4.0 |
---|---|---|
2 | 1.009 | 1.854 | 2 | 1.845 | 2.948 |
8 | 1.805 | 1.953 |
16 | 1.758 | 1.767 |
Graphic representation of these results
It is also interesting to measure the database performance when the transaction log synchronous writes option is switched off. In this case mainly the efficiency of the implementation of the GOODS server components and their interactions is measured.
Parallel processes | AlphaServer 2100 2x250 DigitalUnix 4.0 portable | AlphaServer 2100 2x250 DigitalUnix 4.0 pthreads | PPro-200 Linux portable | PPro-200 WinNT 4.0 | SPARCstation-20 2x50 Solaris 2.5 pthreads | PPro-233 FreeBSD 3.0 portable | UltraSparc 2x300 pthreads |
---|---|---|---|---|---|---|---|
1 | 28 | 47 | 19 | 22 | 102 | 29 | 40 |
2 | 23 | 30 | 21 | 13 | 87 | 26 | 27 |
4 | 21 | 30 | 17 | 17 | 90 | 21 | 27 |
8 | 24 | 41 | 21 | 21 | 137 | 21 | 31 |
16 | 39 | 56 | 24 | 33 | 174 | 24 | 37 |
Graphic representation of these results
There are also two programs (not working with the database) which can be used for testing and measuring the performance of socket and multitasking libraries: "testsock.cxx" and "testtask.cxx".
The socket test is performed only with local clients, i.e. server and client are at the same computer. This test consists of two programs (client and server) which interact with each other in the same way as in normal GOODS applications. The client sends 1000000 requests to the server, waiting for a server response for each request. Each client request consists of two parts: header and body. So sending a request to the server requires two socket write operations. This program mostly measures the efficiency of the socket library implementations at concrete systems (UNIX_DOMAIN sockets in Unix). But in case of Windows-NT/95, the performance of the GOODS local sockets implementation is tested. The table below represents results for different systems (elapsed time in seconds of test execution):
AlphaServer 2100 2x250 DigitalUnix 4.0 | PPro-200 WinNT 4.0 WinSockets | PPro-200 WinNT 4.0 GOODS local sockets | PPro-200 Linux | SPARCstation-20 2x50 Solaris 2.5 | PPro-233 FreeBSD 3.0 portable |
---|---|---|---|---|---|
374 | 275 | 25 | 109 | 541 | 86 |
The program testtask measures the performance of the multitasking library. This test simply starts a number of tasks (threads). Each of them performs the following loop: wait for signal, enter critical section and wake up (i.e. send signal to) another task. The number of loops, the total number of tasks and the number of concurrently running tasks (number of tasks initially signaled) are parameters which can be specified from the command line. Default values for these parameters (these values were used to obtain the results below) are: loops - 50000, tasks - 20, activations - 4. The following results (elapsed time in seconds of test execution) were measured at different platforms:
AlphaServer 2100 2x250 DigitalUnix 4.0 portable | AlphaServer 2100 2x250 DigitalUnix 4.0 pthreads | PPro-200 WinNT 4.0 | PPro-200 Linux portable | SPARCstation-20 2x50 Solaris 2.5 pthreads | PPro-233 FreeBSD 3.0 portable |
---|---|---|---|---|---|
3 | 29 | 11 | 2 | 65 | 2 |
To build GOODS you should execute the config script in the source directory. You can specify the name of your system or let the configuration script try to guess the target system itself. The configuration script only copies one system specific makefile version to the file "makefile". There is a common makefile for all Unix systems "makefile.uni" containing targets and rules. All system specific makefiles only define some parameters (such as CC, CFLAGS...) and include the "makefile.uni". Apart from the name of the C++ compiler and compiler flags, systems mainly differ in which type of multitasking library is used by GOODS (portable, implemented by setjmp/longjmp; or based on Posix pthreads).
It is not necessary to run configuration scripts at Windows. I am using Microsoft Visual C++ 5.0 for compiling GOODS. The makefile for Windows has the name "windows.mak". Compiler dependent definitions are collected in "makefile.inc" file and rules are placed in "makefile.win" files. There is a special MAKE.BAT file which invokes NMAKE and specifies the name of the makefile.
GOODS itself is not using C++ exception mechanism. If you want to use
exceptions in your application you should add
-DGOODS_SUPPORT_EXCEPTIONS
option to the DEFS macro in makefile
and compile GOODS with this option.
Only this case GOODS will be able correctly perform cleanup when exception
caused by database error or thrown by user function is generated.
If you want to abort transaction instead of committing it when thrown exception
is not caught within any method of persistent capable object, then you
should derive all application exception classes from dbException class.
In this case standard metaobjects will abort the current transaction.
Marc Seter has incorporated support of DLLs into GOODS. So, for anyone who wants to link the client and/or server library into an MFC-based application, do the following:
C:\GOODS\CLNTMFC> nmake C:\GOODS\CLNTMFC> nmake BUILD=RELEASE
C:\GOODS\SRVRMFC> nmake C:\GOODS\SRVRMFC> nmake BUILD=RELEASE
__GOODS_MFCDLL
in your DLL or EXE (both Debug
and Release configurations).
After configuration just issue the make
command to build GOODS
client and server libraries and administative utilities. To build sample
GOODS applications and tests execute make test
command.
To build JavaAPI execute make jar
command. You can build
all this targets by execution make all
command.
It is also possible to invoke make localy in src
,
examples
or java
directory.
The command make install
at Unix will copy header files,
libraries and utilities to the location specified by
$(INC_INSTALL_PATH), $(BIN_INSTALL_PATH), $(BIN_INSTALL_PATH)
variables defined in system dependent makefile.
I will provide e-mail support and help you with development of GOODS applications.
Look for new version at my homepage | E-Mail me about bugs and problems