FastDB supports transactions, online backup and automatic recovery after system crash. The transaction commit protocol is based on a shadow root pages algorithm, performing atomic update of the database. Recovery can be done very fast, providing high availability for critical applications. Moreover, the elimination of transaction logs improves the total system performance and leads to a more effective usage of system resources.
FastDB is an application-oriented database. Database tables are constructed using information about application classes. FastDB supports automatic scheme evaluation, allowing you to do changes only in one place - in your application classes. FastDB provides a flexible and convenient interface for retrieving data from the database. A SQL-like query language is used to specify queries. Such post-relational capabilities as non-atomic fields, nested arrays, user-defined types and methods, direct interobject references simplifies the design of database applications and makes them more efficient.
Although FastDB is optimized in the assumption that database as a whole fits into the physical memory of the computer, it is also possible to use it with databases, the size of which exceeds the size of the physical memory in the system. In the last case, standard operating system swapping mechanisms will work. But all FastDB search algorithms and structures are optimized under the assumption of residence of all data in memory, so the efficiency for swapped out data will not be very high.
start from follow by
performs a recursive records
traversal using references.
The following rules in BNF-like notation specify the grammar of the FastDB query language search predicates:
Example | Meaning |
---|---|
expression | non-terminals |
not | terminals |
| | disjoint alternatives |
(not) | optional part |
{1..9} | repeat zero or more times |
select-condition ::= ( expression ) ( traverse ) ( order ) expression ::= disjunction disjunction ::= conjunction | conjunction or disjunction conjunction ::= comparison | comparison and conjunction comparison ::= operand = operand | operand != operand | operand <> operand | operand < operand | operand <= operand | operand > operand | operand >= operand | operand (not) like operand | operand (not) like operand escape string | operand (not) match operand | operand (not) in operand | operand (not) in expressions-list | operand (not) between operand and operand | operand is (not) null operand ::= addition additions ::= multiplication | addition + multiplication | addition || multiplication | addition - multiplication multiplication ::= power | multiplication * power | multiplication / power power ::= term | term ^ power term ::= identifier | number | string | true | false | null | current | first | last | ( expression ) | not comparison | - term | term [ expression ] | identifier . term | function term | exists identifier : term function ::= abs | length | lower | upper | integer | real | string | user-function string ::= ' { { any-character-except-quote } ('') } ' expressions-list ::= ( expression { , expression } ) order ::= order by sort-list sort-list ::= field-order { , field-order } field-order ::= [length] field (asc | desc) field ::= identifier { . identifier } traverse ::= start from field ( follow by fields-list ) fields-list ::= field { , field } user-function ::= identifier
Identifiers are case sensitive, begin with a a-z, A-Z, '_' or '$' character, contain only a-z, A-Z, 0-9, '_' or '$' characters, and do not duplicate a SQL reserved word.
abs | and | asc | between | by |
current | desc | escape | exists | false |
first | follow | from | in | integer |
is | length | like | last | lower |
match | not | null | or | real |
start | string | true | upper |
ANSI-standard comments may also be used. All characters after a double-hyphen up to the end of the line are ignored.
FastDB extends ANSI standard SQL operations by supporting bit manipulation
operations. Operators and
/or
can be applied not only
to boolean operands but also to operands of integer type. The result of applying the
and
/or
operator to integer operands is an integer
value with bits set by the bit-AND/bit-OR operation. Bit operations can be used
for efficient implementation of small sets. Also the rasing to a power
operation (x^y) is supported by FastDB for integer and floating point
types.
company.address.city
Structure fields can be indexed and used in an order by
specification. Structures can contain other structures as their components;
there are no limitations on the nesting level.
The programmer can define methods for structures, which can be used
in queries with the same syntax as normal structure components.
Such a method should have no arguments except a pointer to the object to which
it belongs (the this
pointer in C++), and should return
an atomic value (of boolean, numeric, string or reference type).
Also the method should not change the object instance (immutable method).
If the method returns a string, this string should be allocated using the
new char
operator, because it will be deleted after copying of
its value.
So user-defined methods can be used for
the creation of virtual components -
components which are not stored in the database,
but instead are calculated
using values of other components.
For example, the FastDB dbDateTime
type contains only integer timestamp components and such methods
as dbDateTime::year()
, dbDateTime::month()
...
So it is possible to specify queries like: "delivery.year = 1999
"
in an application, where the delivery
record field has
dbDateTime
type. Methods are executed in the context of the
application, where they are defined, and are not available to other
applications and interactive SQL.
length()
function.
[]
operator.
If an index expression is out of array range, an exception will be raised.
in
can be used to check if an array contains
a value specified by the left operand. This operation can be used only for arrays of
atomic type: with boolean, numeric, reference or string components.
update
method which creates copy of the array and returns
non-constant reference.
exists
operator. A variable specified after the exists
keyword can be used
as an index in arrays in the expression preceeded by the exists
quantor. This index variable will iterate through all possible array
index values, until the value of the expression will become true
or
the index runs out of range. The condition
exists i: (contract[i].company.location = 'US')will select all details which are shipped by companies located in 'US', while the query
not exists i: (contract[i].company.location = 'US')will select all details which are shipped from companies outside 'US'.
Nested exists
clauses are allowed. Using nested
exists
quantors is equivalent to nested loops using the correspondent
index variables. For example the query
exists column: (exists row: (matrix[column][row] = 0))will select all records, containing 0 in elements of a
matrix
field, which has type array of array of integer.
This construction is equivalent to the following
two nested loops:
bool result = false; for (int column = 0; column < matrix.length(); column++) { for (int row = 0; row < matrix[column].length(); row++) { if (matrix[column][row] == 0) { result = true; break; } } }The order of using indices is essential! The result of the following query execution
exists row: (exists column: (matrix[column][row] = 0))
will be completely different from the result of the previous query.
In the last case, the program simply hangs due to an infinite loop
in case of empty matrices.
char
in C) and
byte-by-byte comparison of strings ignoring locality settings.
The operator like
can be used for
matching a string with a pattern containing special wildcard characters
'%' and '_'. The character '_' matches any single character,
while the character '%' matches zero or more characters.
An extended form of the like
operator together with
the escape
keyword can be used to handle the
characters '%' and '_' in the pattern as normal characters if
they are preceded by a special escape character, specified after
the escape
keyword.
If you rebuild GigaBASE with USE_REGEX macro, then you can use
match
operator implementing standard regular expressions
(based on GNU regex library). Second operand of this operator specified
regular expression to be matched and should be string literal.
It is possible to search substrings within a string by the in
operator. The expression ('blue' in color)
will be true
for all records which color
field contains 'blue'.
If the length of the searched string is greater than some threshold value
(currently 512), a Boyer-Moore substring search algorithm is used instead
of a straightforward search implementation.
Strings can be concatenated by +
or ||
operators.
The last one was added for compatibility with the ANSI SQL standard.
As far as FastDB doesn't support the implicit conversion to string type in
expressions, the semantic of the operator +
can be redefined for
strings.
company.address.city = 'Chicago'will access records referenced by the
company
component of
a Contract
record and extract the city component of the
address
field of the referenced record from
the Supplier
table.
References can be checked for null
by is null
or is not null
predicates. Also references can be compared for
equality with each other as well as with the special null
keyword. When a null reference is dereferenced, an exception is raised
by FastDB.
There is a special keyword current
, which
during a table search can be used to refer to the current record.
Usually , the current
keyword is used for comparison of the current record identifier with
other references or locating it within an array of references.
For example, the following query will search in the Contract
table for all active contracts
(assuming that the field canceledContracts
has a
dbArray< dbReference<Contract> >
type):
current not in supplier.canceledContracts
FastDB provides special operators for recursive traverse of records by references:
The first part of this construction is used to specify root objects. The nonterminal root-references should be a variable of reference or of array of reference type. The two special keywordsstart from
root-references (follow by
list-of-reference-fields )
first
and
last
can be used here, locating the first/last record in the table
correspondingly.
If you want to check all records
referenced by an array of references or a single reference field
for some condition, then this
construction can be used without the follow by
part.If you specify the follow by part, then FastDB will recursively traverse the table of records, starting from the root references and using a list-of-reference-fields for transition between records. The list-of-reference-fields should consist of fields of reference or of array of reference type. The traverse is done in depth first top-left-right order (first we visit the parent node and then the siblings in left-to-right order). The recursion terminates when a null reference is accessed or an already visited record is referenced. For example the following query will search a tree of records with weight larger than 1 in TLR order:
"weight > 1 start from first follow by left, right"
For the following tree:
A:1.1 B:2.0 C:1.5 D:1.3 E:1.8 F:1.2 G:0.8the result of the query execution will be:
('A', 1.1), ('B', 2.0), ('D', 1.3), ('E', 1.8), ('C', 1.5), ('F', 1.2)
As was already mentioned FastDB always manipulates with objects and doesn't accept joins.
Joins can be implemented using references. Consider the classical
Supplier-Shipment-Detail
examples:
struct Detail { char const* name; double weight; TYPE_DESCRIPTOR((KEY(name, INDEXED), FIELD(weight))); }; struct Supplier { char const* company; char const* address; TYPE_DESCRIPTOR((KEY(company, INDEXED), FIELD(address))); }; struct Shipment { dbReference<Detail> detail; dbReference<Supplier> supplier; int4 price; int4 quantity; dbDateTime delivery; TYPE_DESCRIPTOR((KEY(detail, HASHED), KEY(supplier, HASHED), FIELD(price), FIELD(quantity), FIELD(delivery))); };We want to get information about delivery of some concrete details from some concrete suppliers. In relational database this query will be written something like this:
select from Supplier,Shipment,Detail where Supplier.SID = Shipment.SID and Shipment.DID = Detail.DID and Supplier.company like ? and Supplier.address like ? and Detail.name like ?In FastDB this request should be written as:
dbQuery q = "detail.name like",name,"and supplier.company like",company, "and supplier.address like",address,"order by price";FastDB will first perform index search in the table
Detail
for details
matching the search condition. Then it performs another index search to locate shipment
records referencing selected details. Then sequential search is used to check the rest of
select predicate.
Name | Argument type | Return type | Description |
---|---|---|---|
abs | integer | integer | absolute value of the argument |
abs | real | real | absolute value of the argument |
integer | real | integer | conversion of real to integer |
length | array | integer | number of elements in array |
lower | string | string | lowercase string |
real | integer | real | conversion of integer to real |
string | integer | string | conversion of integer to string |
string | real | string | conversion of real to string |
upper | string | string | uppercase string |
FastDB allows user to define its own functions and operators. Function should have at least one but no more than 3 parameters of string, integer, boolean, reference or user defined (raw binary) type. It should return value of integer, real, string or boolean type.
User functions should be registered by the USER_FUNC(f)
macro,
which creates a static object of the dbUserFunction
class, binding
the function pointer and the function name.
There are two ways of implementing these functions in application.
First can be used only for functions with one argument. This argument should be of int8, real8,
char*
types. And the function return type should be int8, real8, char*
or bool
.
If function has more than one parameters or it can accept parameters of different types (polymorphism)
then parameters should be passed as reference to dbUserFunctionArgument
structure.
This structure contains type
field, which value can be used in function implementation to
detect type of passed argument and union with argument value.
The following table contains mapping between argument types and where the value should be taken from:
Argument type | Argument value | Argument value type |
---|---|---|
dbUserFunctionArgument::atInteger | u.intValue | int8 |
dbUserFunctionArgument::atBoolean | u.boolValue | bool |
dbUserFunctionArgument::atString | u.strValue | char const* |
dbUserFunctionArgument::atReal | u.realValue | real8 |
dbUserFunctionArgument::atReference | u.oidValue | oid_t |
dbUserFunctionArgument::atRawBinary | u.rawValue | void* |
For example the following
statements make it possible to use the sin
function in SQL
statements:
#include <math.h> ... USER_FUNC(sin);Functions can be used only within the application, where they are defined. Functions are not accessible from other applications and interactive SQL. If a function returns a string type , the returned string should be copied by means of the operator
new
, because
FastDB will call the destructor after copying the returned value.In FastDB, the function argument can (but not necessarily must) be enclosed in parentheses. So both of the following expressions are valid:
'$' + string(abs(x)) length string y
Functions with two argument can be also used as operators. Consider the following example,
in which function contains
which performs case insensitive search for substring is defined:
bool contains(dbUserFunctionArgument& arg1, dbUserFunctionArgument& arg2) { assert(arg1.type == dbUserFunctionArgument::atString && arg2.type == dbUserFunctionArgument::atString); return stristr(arg1.u.strValue, arg2.u.strValue) != NULL; } USER_FUNC(contains); dbQuery q1, q2; q1 = "select * from TestTable where name contains 'xyz'"; q2 = "select * from TestTable where contains(name, 'xyz')";In this example, queries
q1
and q2
are equivalent.
dbQuery q; dbCursor<Contract> contracts; dbCursor<Supplier> suppliers; int price, quantity; q = "(price >=",price,"or quantity >=",quantity, ") and delivery.year=1999"; // input price and quantity values if (contracts.select(q) != 0) { do { printf("%s\n", suppliers.at(contracts->supplier)->company); } while (contracts.next()); }
Type | Description |
---|---|
bool | boolean type (true,false ) |
int1 | one byte signed integer (-128..127) |
int2 | two bytes signed integer (-32768..32767) |
int4 | four bytes signed integer (-2147483648..2147483647) |
int8 | eight bytes signed integer (-2**63..2**63-1) |
real4 | four bytes ANSI floating point type |
real8 | eight bytes ANSI double precision floating point type |
char const* | zero terminated string |
dbReference<T> | reference to class T |
dbArray<T> | dynamic array of elements of type T |
In addition to types specified in the table above, FastDB records can also contain nested structures of these components. FastDB doesn't support unsigned types to simplify the query language, to eliminate bugs caused by signed/unsigned comparison and to reduce the size of the database engine.
Unfortunately C++ provides no way to get metainformation about a class at runtime (RTTI is not supported by all compilers and also doesn't provide enough information). Therefore the programmer has to explicitly enumerate class fields to be included in the database table (it also makes mapping between classes and tables more flexible). FastDB provides a set of macros and classes to make such mapping as simple as possible.
Each C++ class or structure, which will be used in the database, should
contain a special method describing its fields. The macro
TYPE_DESCRIPTOR(
field_list)
will construct
this method. The single argument of this macro is - enclosed in parentheses -
a list of class field descriptors.
If you want to define some methods for the class
and make them available for the database, then the macro
CLASS_DESCRIPTOR(
name, field_list)
should be used instead of TYPE_DESCRIPTOR
. The class name is
needed to get references to member functions.
The following macros can be used for the construction of field descriptors:
HASHED
and INDEXED
flags.
When the HASHED
flag is specified, FastDB will create a hash table
for the table using this field as a key. When the INDEXED
flag is
specified, FastDB will create a (special kind of index) T-tree for the table
using this field as a key.
order by
clause. Comparison is performed by means of
comparator
function provided by programmer. Comparator functions receives three
arguments: two pointers to the compared raw binary objects and size of binary object.
The semantic of index_type is the same as of KEY
macro.
UDT
macro with memcmp
used as comparator.
UDT
macro
for raw binary fields with predefined comparator memcmp
and without indices.
inverse_reference
is a field of the referenced table
containing the inverse reference(s) to the current table. Inverse references
are automatically updated by FastDB and are used for query optimization
(see Inverse references).
Although only atomic fields can be indexed, an index type can be specified for structures. The index will be created for components of the structure only if such type of index is specified in the index type mask of the structure. This allows the programmers to enable or disable indices for structure fields depending on the role of the structure in the record.
The following example illustrates the creation of a type descriptor in the header file:
class dbDateTime { int4 stamp; public: int year() { return localtime((time_t*)&stamp)->tm_year + 1900; } ... CLASS_DESCRIPTOR(dbDateTime, (KEY(stamp,INDEXED|HASHED), METHOD(year), METHOD(month), METHOD(day), METHOD(dayOfYear), METHOD(dayOfWeek), METHOD(hour), METHOD(minute), METHOD(second))); }; class Detail { public: char const* name; char const* material; char const* color; real4 weight; dbArray< dbReference<Contract> > contracts; TYPE_DESCRIPTOR((KEY(name, INDEXED|HASHED), KEY(material, HASHED), KEY(color, HASHED), KEY(weight, INDEXED), RELATION(contracts, detail))); }; class Contract { public: dbDateTime delivery; int4 quantity; int8 price; dbReference<Detail> detail; dbReference<Supplier> supplier; TYPE_DESCRIPTOR((KEY(delivery, HASHED|INDEXED), KEY(quantity, INDEXED), KEY(price, INDEXED), RELATION(detail, contracts), RELATION(supplier, contracts))); };Type descriptors should be defined for all classes used in the database. In addition to defining type descriptors, it is necessary to establish a mapping between C++ classes and database tables. The macro
REGISTER(
name)
will do it. Unlike the
TYPE_DESCRIPTOR
macro, the REGISTER
macro should
be used in the implementation file and not in the header file. It constructs
a descriptor of the table associated with the class. If you are going to work
with multiple databases from one application, it is possible to register
a table in a concrete database by means of the
REGISTER_IN(
name,database) macro.
The parameter database
of this macro should be a pointer to the
dbDatabase
object. You can register tables
in the database as follows:
REGISTER(Detail); REGISTER(Supplier); REGISTER(Contract);The table (and correspondent class) can be used only with one database at each moment of time. When you open a database, FastDB imports into the database all classes defined in the application. If a class with the same name already exists in the database, its descriptor stored in the database is compared with the descriptor of this class in the application. If the class definitions differ, FastDB tries to convert records from the table to the new format. Any kind of conversion between numeric types (integer to real, real to integer, with extension or truncation) is allowed. Also, addition of new fields can be easily handled. But removal of fields is only possible for empty tables (to avoid accidental data destruction).
After loading all class descriptors, FastDB checks if all indices specified in the application class descriptor are already present in the database, constructs new indices and removes indices, which are no more used. Reformatting the table and adding/removing indices is only possible when no more than one application accesses the database. So when the first application is attached to the database, it can perform table conversion. All other applications can only add new classes to the database.
There is one special internal database Metatable
, which
contains information about other tables in the database. C++ programmers
need not access this table, because the format of database tables is specified
by C++ classes. But in an interactive SQL program, it may be necessary to
examine this table to get information about record fields.
Starting from version 2.30 FastDB supports autoincrement fields (fields unique value to which are assigned automaticaly by database). To be able to use them you should:
-DAUTOINCREMENT_SUPPROT
flags
(add this flag to DEFS
variables in FastDB makefile).DAUTOINCREMENT_SUPPORT
.
dbTableDescriptor::initialAutoincrementCount
.
It will be shared between all tables, so all table will have the same initial value of
autoincrement counter.
AUTOINCREMENT
flag:
class Record { int4 rid; char const* name; ... TYPE_DESCRIPTOR((KEY(rid, AUTOINCREMENT|INDEXED), FIELD(name), ...)); }
Record rec; // no rec.rid should be specified rec.name = "John Smith"; insert(rec); // rec.rid now assigned unique value int newRecordId = rec.rid; // and can be used to reference this record
=
' and ',
' C++ operators
to construct query statements with parameters. Parameters can be specified
directly in places where they are used, eliminating any mapping between
parameter placeholders and C variables. In the following sample query,
pointers to the parameters price
and quantity
are stored in the query, so that the query can be executed several times
with different parameter values. C++ overloaded functions make it possible
to automatically determine the type of the parameter,
requiring no extra information
to be supplied by the programmer (such reducing the possibility of a bug).
dbQuery q; int price, quantity; q = "price >=",price,"or quantity >=",quantity;Since the
char*
type can be used both for specifying a fraction
of a query (such as "price >=") and for a parameter of string type,
FastDB uses a special rule to resolve this ambiguity. This rule is based on the
assumption that there is no reason for splitting a query text into two strings
like ("price ",">=") or specifying more than one parameter sequentially
("color=",color,color). So FastDB assumes the first string to be a fraction
of the query text and switches to operand mode
after it. In operand mode, FastDB treats the char*
argument
as a query parameter and switches back to query text mode, and so on...
It is also possible not to use this "syntax sugar" and construct
query elements explicitly by the
dbQuery::append(dbQueryElement::ElementType type, void const* ptr)
method. Before appending elements to the query,
it is necessary to reset the query by the dbQuery::reset()
method
('operator=
' does it automatically).It is not possible to use C++ numeric constants as query parameters, because parameters are accessed by reference. But it is possible to use string constants, because strings are passed by value. There two possible ways of specifying string parameters in a query: using a string buffer or a pointer to pointer to string:
dbQuery q; char* type; char name[256]; q = "name=",name,"and type=",&type; scanf("%s", name); type = "A"; cursor.select(q); ... scanf("%s", name); type = "B"; cursor.select(q); ...
Query variables can neither be passed to a function as a parameter nor be assigned to another variable. When FastDB compiles the query, it saves the compiled tree in this object. The next time the query will be used, no compilation is needed and the already compiled tree can be used. It saves some time needed for query compilation.
FastDB provides two approaches to integrate user-defined types in databases.
The first - the definition of class methods - was already mentioned.
The other approach deals only with query construction. Programmers should
define methods, which will not do actual calculations, but instead
return an expression (in terms of predefined database types), which
performs the necessary calculation. It is better to describe it by example.
FastDB has no builtin datetime type. Instead of this, a normal C++
class dbDateTime
can be used by the programmer. This class defines
methods allowing to specify datetime fields in ordered lists and
to compare two dates using normal relational operators:
class dbDateTime { int4 stamp; public: ... dbQueryExpression operator == (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),"=",stamp; return expr; } dbQueryExpression operator != (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),"<>",stamp; return expr; } dbQueryExpression operator < (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),">",stamp; return expr; } dbQueryExpression operator <= (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),">=",stamp; return expr; } dbQueryExpression operator > (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),"<",stamp; return expr; } dbQueryExpression operator >= (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),"<=",stamp; return expr; } friend dbQueryExpression between(char const* field, dbDateTime& from, dbDateTime& till) { dbQueryExpression expr; expr=dbComponent(field,"stamp"),"between",from.stamp,"and",till.stamp; return expr; } friend dbQueryExpression ascent(char const* field) { dbQueryExpression expr; expr=dbComponent(field,"stamp"); return expr; } friend dbQueryExpression descent(char const* field) { dbQueryExpression expr; expr=dbComponent(field,"stamp"),"desc"; return expr; } };All these methods receive as their parameter a name of a field in the record. This name is used to contract the full name of the record's component. This can be done by class
dbComponent
, which constructor takes
the name of the structure field and the name of the component of the structure
and returns a compound name separated by a '.' symbol.
The class dbQueryExpression
is used to collect expression items.
The expression is automatically enclosed in parentheses, eliminating conflicts
with operator precedence.
So, assuming a record containing a field delivery
of dbDateTime type, it is possible
to construct queries like these:
dbDateTime from, till; q1 = between("delivery", from, till),"order by",ascent("delivery"); q2 = till >= "delivery";In addition to these methods, some class specific method can be defined in such way, for example the method
overlaps
for a region type.
The benefit of this approach is that a database engine will work
with predefined types and is able to apply indices and other optimizations
to proceed such query. And from the other side, the encapsulation of the class
implementation is preserved, so programmers should not rewrite all queries
when a class representation is changed.Variables of the following C++ types can be used as query parameters:
int1 | bool |
int2 | char const* |
int4 | char ** |
int8 | char const** |
real4 | dbReference<T> |
real8 | dbArray< dbReference<T> > |
dbCursor<T>
,
where T
is the name of a C++ class associated with
the database table. The cursor type should be specified in the constructor
of the cursor. By default, a read-only cursor is created.
To create a cursor for update, you should pass a parameter
dbCursorForUpdate
to the constructor.
A query is executed either by the cursor
select(dbQuery& q)
method.
Or by the select()
method,
which can be used to iterate through
all records in the table. Both methods return the number of selected records
and set the current position to the first record (if available).
A cursor can be scrolled in forward or backward direction.
The methods next(), prev(), first(), last()
can be used to
change the current position of the cursor.
If no operation can be performed as there are no (more) records
available, these methods return NULL
and the cursor position is not changed.
A cursor for class T contains an instance of class T, used for fetching the current record. That is why table classes should have a default constructor (constructor without parameters), which has no side effects. FastDB optimizes fetching records from the database, copying only data from fixed parts of the object. String bodies are not copied, instead of this the correspondent field points directly into the database. The same is true for arrays: their components have the same representation in the database as in the application (arrays of scalar types or arrays of nested structures of scalar components).
An application should not change
elements of strings and arrays in a database directly.
When an array method needs to update an array body,
it creates an in-memory copy of the array and updates this
copy. If the programmer wants to update a string field, she/he should assign
to the pointer a new value,
but don't change the string directly in the database.
It is recommended to use the char const*
type instead of the
char*
type for string components,
to enable the compiler to detect the illegal usage of strings.
The cursor class provides the
get()
method for obtaining a pointer to
the current record (stored inside the cursor). Also the overloaded
'operator->
'
can be used to access components of the current record.
If a cursor is opened for update,
the current record can be changed and stored in the database
by the update()
method or can be removed.
If the current record is removed, the next record becomes the
current. If there is no next record, then the previous record
(if it exists) becomes the current. The method removeAll()
removes all records in the table.
Whereas the method removeAllSelected
only removes
all records selected by the cursor.
When records are updated, the size of the database may increase.
Thus an extension of the database section in the virtual memory
is needed. As a result of such remapping, base addresses of the section can be
changed and all pointers to database fields kept by applications will become
invalid. FastDB automatically updates current records in all opened
cursors when a database section is remapped. So, when a database is updated,
the programmer should access record fields only through the cursor
->
method. She/he should not use pointer variables.
Memory used for the current selection can be released by the
reset()
method.
This method is automatically called by the select(),
dbDatabase::commit(), dbDatabase::rollback()
methods
and the cursor destructor, so in most cases there is no need to
call the reset()
method explicitly.
Cursors can also be used to access records by reference. The method
at(dbReference<T> const& ref)
sets the cursor to the record
pointed to by the reference. In this case, the selection consists exactly of
one record and the next(), prev()
methods will always return
NULL
. Since cursors and references in FastDB are strictly
typed, all necessary checking can be done statically by the compiler and
no dynamic type checking is needed. The only kind of checking,
which is done at runtime, is checking for null references.
The object identifier of the current record in the cursor can be obtained by
the currentId()
method.
It is possible to restrict the number of records returned by a select statement.
The cursor class has the two methods
setSelectionLimit(size_t lim)
and
unsetSelectionLimit()
,
which can be used to set/unset the limit
of numbers of records returned by the query. In some situations,
a programmer may want to receive
only one record or only few first records; so the query execution
time and size of consumed memory can be reduced by limiting the size of
selection. But if you specify an order for selected records,
the query with the restriction to
k records will not return the first k records
with the smallest value of the key. Instead of this, arbitrary k
records will be taken and then sorted.
So all operations with database data can be performed by means of cursors. The only exception is the insert operation, for which FastDB provides an overloaded insert function:
template<class T> dbReference<T> insert(T const& record);This function will insert a record at the end of the table and return a reference of the created object. The order of insertion is strictly specified in FastDB and applications can use this assumption about the record order in the table. For applications widely using references for navigation between objects, it is necessary to have some root object, from which a traversal by references can be made. A good candidate for such root object is the first record in the table (it is also the oldest record in the table). This record can be accessed by execution of the
select()
method without parameter. The current record in the cursor will
be the first record in the table.
The C++ API of FastDB defines a special null
variable
of reference type.
It is possible to compare the null
variable with references
or assign it to the reference:
void update(dbReference<Contract> c) { if (c != null) { dbCursor<Contract> contract(dbCursorForUpdate); contract.at(c); contract->supplier = null; } }Query parameters usually are bound to C++ variables. In most cases in is convenient and flexible mechanism. But in multithreaded application, there is no warranty that the same query will not be executed at the same moment of time by another thread with different values of parameters. One solution is to use synchronization primitives (critical sections or mutexes) to prevent concurrent execution of the query. But this will lead to performance degradation. FastDB is able to perform read requests in parallel, increasing total system throughput. The other solution is to use delayed parameter binding. This approach is illustrated by the following example:
dbQuery q; struct QueryParams { int salary; int age; int rank; }; void open() { QueryParams* params = (QueryParams*)NULL; q = "salary > ", params->salary, "and age < ", params->age, "and rank =", params->rank; } void find(int salary, int age, int rank) { QueryParams params; params.salary = salary; params.age = age; params.rank = rank; dbCursor<Person> cusor; if (cursor.select(q, ¶ms) > 0) { do { cout << cursor->name << NL; } while (cursor.next()); } }So in this example function
open
binds query parameters just to offsets of
fields in structure. Later in find
functions, actual pointer to the structure
with parameters is passed to the select
structure. Function find
can be concurrently executed by several threads and only one compiled version of the query
is used by all these threads. This mechanism is available since version 2.25.
dbDatabase
controls the application interactions
with the database. It performs synchronization of concurrent accesses to the
database, transaction management, memory allocation, error handling,...
The constructor of dbDatabase
objects allows programmers to specify
some database parameters:
dbDatabase(dbAccessType type = dbAllAccess, size_t dbInitSize = dbDefaultInitDatabaseSize, size_t dbExtensionQuantum = dbDefaultExtensionQuantum, size_t dbInitIndexSize = dbDefaultInitIndexSize, int nThreads = 1);The following database access type are supported:
Access type | Description |
---|---|
dbDatabase::dbReadOnly | Read only mode |
dbDatabase::dbAllAccess | Normal mode |
dbDatabase::dbConcurrentRead | Read only mode in which application can access the database
concurrently with application updating the same database in dbConcurrentUpdate mode |
dbDatabase::dbConcurrentUpdate | Mode to be used in conjunction with
dbConcurrentRead to perform updates in the database without blocking read applications for a long time |
When the database is opened in readonly mode, no new class definitions can be added to the database and definitions of existing classes and indices can not be altered.
dbConcurrentUpdate
and dbConcurrentRead
modes should be used together when
database is mostly accessed in readonly mode and updates should not block readers for a long time.
In this mode update of the database can be performed concurrently with read accesses (readers will not see
changed data until transaction is committed). Only at update transaction commit time, exclusive lock is set
but immediately released after incremental change of the current object index.
So you can start one or more applications using dbConcurrentRead
mode and all their read-only
transactions will be executed concurrently. You can also start one or more applications using
dbConcurrentUpdate
mode. All transactions of such applications will be synchronized using additional
global mutex. So all these transactions (even read-only) will be executed exclusively. But transactions of the application
running in dbConcurrentUpdate
mode can run concurrently with transaction of applications
running in dbConcurrentRead
mode! Please look at testconc.cpp
example,
illustrating usage of these modes
Attension! Do not mix dbConcurrentUpdate
and dbConcurrentRead
mode with other modes and do not use them together in one process (so it is
not possible to start two threads in one of which open database in
dbConcurrentUpdate mode and in other - in dbConcurrentRead). Do not
use dbDatabase::precommit
method in dbConcurrentUpdate
mode.
The parameter dbInitSize
specifies the initial size of the database file.
The database file increases on demand; setting the initial size can only
reduce the number of reallocations (which can take a lot of time).
In the current implementation of the FastDB database
the size is at least doubled at each extension.
The default value of this parameter is 4 megabytes.
The parameter dbExtensionQuantum
specifies the quantum of extension of the
memory allocation bitmap.
Briefly speaking, the value of this parameter specifies how much memory
will be allocated sequentially without attempt to reuse space of
deallocated objects. The default value of this parameter is 4 Mb.
See section Memory allocation for more details.
The parameter dbInitIndexSize
specifies the initial index size.
All objects in FastDB are accessed through an object index.
There are two copies of this object index:
current and committed. Object indices are reallocated on
demand; setting an initial index size can only reduce (or increase)
the number of reallocations. The default value of this parameter is 64K object
identifiers.
And the last parameter nThreads
controls the level of query
parallelization. If it is greater than 1, then FastDB can start the parallel
execution of some queries (including sorting the result).
The specified number of parallel threads will
be spawned by the FastDB engine in this case. Usually it does not make
sense to specify the value of this parameter to be greater than the
number of online CPUs in the system. It is also possible to pass zero
as the value of this parameter. In this case, FastDB will automatically detect
the number of online CPUs in the system. The number of threads also can be set
by the dbDatabase::setConcurrency
method at any moment of time.
The class dbDatabase
contains a static field
dbParallelScanThreshold
, which specifies a threshold for the
number of records in the table after which query parallelization
is used. The default value of this parameter is 1000.
The database can be opened by the
open(char const* databaseName, char const* fileName = NULL, unsigned waitLockTimeout = INFINITE)
method.
If the file name parameter is omitted, it is constructed from
the database name by appending the ".fdb" suffix. The database name should
be an arbitrary identifier consisting of any symbols except '\'.
The last parameter waitLockTimeout
can be set to prevent locking of all
active processes working with the database when some of them is crashed.
If the crashed process had locked the database, then no other process can continue
execution. To prevent it, you can specify maximal delay for waiting for the lock,
after expiration of which system will try to perform recovery and continue execution
of active processes.
The method open
returns true
if the database was
successfully opened; or false
if the open operation failed.
In the last case, the database handleError
method is called with a
DatabaseOpenError
error code. A database session can be terminated
by the close
method, which implicitly commits current transactions.
In a multithreaded application each thread, which wants to access the database,
should first be attached to it. The method dbDatabase::attach()
allocates thread specific data and attaches the thread to the database.
This method is automatically called by the open()
method, so
there is no reason to call the attach()
method for the thread,
which opens the database. When the thread finishes work with the database, it should
call the dbDatabase::detach()
method. The method
close
automatically invokes the detach()
method.
The method detach()
implicitly commits current transactions.
An attempt to access a database by a detached thread causes an assertion failure.
FastDB is able to perform compilation and execution of queries in parallel, providing significant increase of performance in multiprocessor systems. But concurrent updates of the database are not possible (this is the price for the efficient log-less transaction mechanism and zero time recovery). When an application wants to modify the database (open a cursor for update or insert a new record in the table), it first locks the database in exclusive mode, prohibiting accesses to the database by other applications, even for read-only queries. So to avoid blocking of database applications for a long time, the modification transaction should be as short as possible. No blocking operations (like waiting for input from the user) should be done within this transaction.
Using only shared and exclusive locks on the database level, allows FastDB to almost eliminate overhead of locking and to optimize the speed of execution of non-conflicting operations. But if many applications simultaneously update different parts of the database, then the approach used in FastDB will be very inefficient. That is why FastDB is most suitable for a single-application database access model or for multiple applications with a read-dominated access pattern model.
Cursor objects should be used only by one thread in a multithreaded application. If there are more than one threads in your application, use local variables for cursors in each thread. It is possible to share query variables between threads, but take care about query parameters. The query should either has no parameters, or relative form of parameters binding should be used.
The dbDatabase
object is shared between all
threads and uses thread specific data to perform query
compilation and execution in parallel with minimal synchronization overhead.
There are few global things, which require synchronization: symbol table,
pool of tree node,... But scanning, parsing and execution of the query can
be done without any synchronization, providing high level of concurrency
at multiprocessor systems.
A database transaction is started by the first select or an insert operation.
If a cursor for update is used, then the database is locked in exclusive
mode, prohibiting access to the database by other applications and threads.
If a read-only cursor is used, then the database is locked in shared mode, preventing
other applications and threads from modifying the database,
but allowing the execution of concurrent read requests.
A transaction should be explicitly terminated
either by the dbDatabase::commit()
method, which fixes all
changes done by the transaction in the database; or by the
dbDatabase::rollback()
method to undo all modifications
done by transactions. The method dbDatabase::close()
automatically
commits current transactions.
If you start a transaction by performing selection using a read-only cursor and
then use a cursor for update to perform some modifications of the database,
the database will be first locked in shared mode; then the lock will be upgraded
to exclusive mode. This can cause a deadlock problem if the database is simultaneously
accessed by several applications. Imagine that application A starts
a read transaction and application B also starts a read transaction. Both
of them hold shared locks on the database. If both of them want to
upgrade their locks to exclusive mode, they will forever block each other
(exclusive lock can not be granted until a shared lock of another process exists).
To avoid such situations try to use a cursor for update at the beginning of the
transaction; or explicitly use the dbdatabase::lock()
method.
More information about the implementation of transactions in FastDB can be found
in section Transactions.
It is possible to explicitly lock the database by the lock()
method.
Locking is usually done automatically - there are only few cases when
you will want to use this method. It will lock the database in exclusive
mode until the end of the current transaction.
A backup of the database can be done by the
dbDatabase::backup(char const* file)
method. A backup locks the database in shared mode and flushes an image of the
database from main memory to the specified file. Because of using a shadow object index,
the database file is always in a consistent state, so recovery from the backup can
be performed just by renaming the backup file (if backup was performed on tape, it
should be first restored to the disk).
The class dbDatabase
is also responsible for handling various
application errors, such as syntax errors during query compilation,
out of range index or null reference access during query execution.
There is a virtual method dbDatabase::handleError
, which handles
these errors:
virtual void handleError(dbErrorClass error, char const* msg = NULL, int arg = 0);A programmer can derive her/his own subclass from the
dbDatabase
class and redefine the default reaction on errors.
Class | Description | Argument | Default reaction |
---|---|---|---|
QueryError | query compilation error | position in query string | abort compilation |
ArithmeticError | arithmetic error during division or power operations | - | terminate application |
IndexOutOfRangeError | index is out if array bounds | value of index | terminate application |
DatabaseOpenError | error while database opening | - | open method will return false |
FileError | failure of file IO operation | error code | terminate application |
OutOfMemoryError | not enough memory for object allocation | requested allocation size | terminate application |
Deadlock | upgrading lock causes deadlock | - | terminate application |
NullReferenceError | null reference is accessed during query execution | - | terminate application |
FastDB provides multithreaded server for handling client CLI sessions. This server can be
started from SubSQL utility by
start server 'HOST:PORT' <number-of-threads>
command.
This server will accept local (within one system) and global clients connections and
attach one thread from the threads pool to each connection. The size of thread's pool
is controlled by number-of-threads parameters. But the server can spawn more
than specified number of threads if there are many active connections. A thread is attached
to the client until the end of the session. If session is abnormally terminated, all
changes made by the client are rollbacked. The server can be stopped by correspondent
stop server 'HOST:PORT'
command.
Error code | Description |
---|---|
cli_ok | Succeful completion |
cli_bad_address | Invalid format of server URL |
cli_connection_refused | Connection with server could not be established |
cli_bad_statement | Text of SQL statement is not correct |
cli_parameter_not_found | Parameter was not found in statement |
cli_unbound_parameter | Parameter was not specified |
cli_column_not_found | No sucj colunm in the table |
cli_incompatible_type | Conversion between application and database type is not possible |
cli_network_error | Connection with server is broken |
cli_runtime_error | Error during query execution |
cli_bad_descriptor | Invalid statement/session description |
cli_unsupported_type | Unsupported type for parameter or colunm |
cli_not_found | Record was not found |
cli_not_update_mode | Attempt to update records selected by view only cursor |
cli_table_not_found | There is no table with specified name in the database |
cli_not_all_columns_specified | Insert statement doesn't specify values for all table columns |
cli_not_fetched | cli_fetch method was not called |
cli_already_updated | cli_update method was invoked more than once for the same record |
cli_table_already_exists | Attempt to create existed table |
cli_not_implemented | Function is not implemented |
Type | Description | Size |
---|---|---|
cli_oid | Object identifier | 4 |
cli_bool | Boolean type | 1 |
cli_int1 | Timy interger type | 1 |
cli_int2 | Small interger type | 2 |
cli_int4 | Interger type | 4 |
cli_int8 | Big interger type | 8 |
cli_real4 | Single precision floating point type | 4 |
cli_real8 | Double precision floating point type | 8 |
cli_asciiz | Zero terminated string of bytes | 1*N |
cli_pasciiz | Pointer to zero terminated string | 1*N |
cli_array_of_oid | Array of references | 4*N |
cli_array_of_bool | Array of booleans | 1*N |
cli_array_of_int1 | Array of tiny integers | 1*N |
cli_array_of_int2 | Array of small integers | 2*N |
cli_array_of_int4 | Array of integers | 4*N |
cli_array_of_int8 | Array of big integers | 8*N |
cli_array_of_real4 | Array of reals | 4*N |
cli_array_of_real8 | Array of long reals | 8*N |
int cli_open(char const* server_url, int max_connect_attempts, int reconnect_timeout_sec);
server_url
- zero terminated string with server address and port,
for example "localhost:5101", "195.239.208.240:6100",...
max_connect_attempts
- number of attempts to establish connection
reconnect_timeout_sec
- timeput in seconds between connection attempts
>= 0
- connectiondescriptor to be used in all other cli calls
< 0
- error code as described in cli_result_code
enum
int cli_close(int session);
session
- session descriptor returned by cli_open
cli_result_code
enum
int cli_statement(int session, char const* stmt);
session
- session descriptor returned by cli_open
stmt
- zero terminated string with SubSQL statement
>= 0
- statement descriptor
< 0
- error code as described in cli_result_code
enum
int cli_parameter(int statement, char const* param_name, int var_type, void* var_ptr);
statement
- statememt descriptor returned by cli_statement
param_name
- zero terminated string with parameter name.
Paramter name should start with '%'
var_type
- type of variable as described in cli_var_type enum.
Only scalar and zero terminated string types are supported.
var_ptr
- pointer to the variable
cli_result_code
enum
int cli_column(int statement, char const* column_name, int var_type, int* var_len, void* var_ptr);
statement
- statememt descriptor returned by cli_statement
column_name
- zero terminated string with column name
var_type
- type of variable as described in cli_var_type enum
var_len
- pointer to the variable to hold length of array variable.
This variable should be assigned the maximal length of the array/string buffer,
pointed by var_ptr
. After the execution of the statement it is assigned the
real length of the fetched array/string. If it is large than length of the buffer,
then only part of the array will be placed in the buffer, but var_len
still will contain the actual array length.
var_ptr
- pointer to the variable
cli_result_code
enum
typedef void* (*cli_column_set)(int var_type, void* var_ptr, int len); typedef void* (*cli_column_get)(int var_type, void* var_ptr, int* len); int cli_array_column(int statement, char const* column_name, int var_type, void* var_ptr, cli_column_set set, cli_column_get get);
statement
- statememt descriptor returned by cli_statement
column_name
- zero terminated string with column name
var_type
- type of variable as described in cli_var_type enum
var_ptr
- pointer to the variable
set
- function which will be called to construct fetched field.
It receives pointer to the variable, length of the fetched array and returns pointer
to the array's elements.
get
- function which will be called to update the field in the
database. Given pointer to the variable, it should return pointer to the array elements
and store length of the array to the variable pointer by len parameter
cli_result_code
enum
typedef void* (*cli_column_set_ex)(int var_type, void* var_ptr, int len, char const* column_name, int statement, void const* data_ptr); typedef void* (*cli_column_get_ex)(int var_type, void* var_ptr, int* len, char const* column_name, int statemen); int cli_array_column(int statement, char const* column_name, int var_type, void* var_ptr, cli_column_set_ex set, cli_column_get_ex get);
statement
- statememt descriptor returned by cli_statement
column_name
- zero terminated string with column name
var_type
- type of variable as described in cli_var_type enum
var_ptr
- pointer to the variable
set
- function which will be called to construct fetched field.
It receives type of the vartiable, pointer to the variable, length of the fetched array, name of the fetched column,
statement descriptor and pointer to the array data. If this method returns not NULL pointer,
database will copy unpacked array to the returned location. Otherwise it is assumed that
function handle data itself.
get
- function which will be called to update the field in the
database. Given type of the vartiable, pointer to the variable, column name and statment descriptor,
it should return pointer to the array elements and store length of the array to the variable pointer by len parameter
cli_result_code
enum
enum { cli_view_only, cli_for_update }; int cli_fetch(int statement, int for_update);
statement
- statememt descriptor returned by cli_statement
for_update
- not zero if fetched rows will be updated
>= 0
- success, for select statements number of fetched rows is returned
< 0
- error code as described in cli_result_code
enum
int cli_insert(int statement, cli_oid_t* oid);
statement
- statememt descriptor returned by cli_statement
oid
- object identifier of created record.
cli_result_code
enum
int cli_get_first(int statement);
statement
- statememt descriptor returned by cli_statement
cli_result_code
enum
int cli_get_last(int statement);
statement
- statememt descriptor returned by cli_statement
cli_result_code
enum
int cli_get_next(int statement);
cli_fetch
function call, is will fetch the first record in selection.
statement
- statememt descriptor returned by cli_statement
cli_result_code
enum
int cli_get_prev(int statement);
cli_fetch
function call, is will fetch the last record in selection.
statement
- statememt descriptor returned by cli_statement
cli_result_code
enum
cli_oid_t cli_get_oid(int statement);
statement
- statememt descriptor returned by cli_statement
int cli_update(int statement);
cli_fetch
to 1 in order to be able
to perform updates. Updated value of row fields will be taken
from bound column variables.
statement
- statememt descriptor returned by cli_statement
cli_result_code
enum
int cli_remove(int statement);
for_update
parameter of
cli_fetch
to 1 in order to be able to remove records.
statement
- statememt descriptor returned by cli_statement
cli_result_code
enum
int cli_free(int statement);
statement
- statememt descriptor returned by cli_statement
cli_result_code
enum
int cli_commit(int session);
session
- session descriptor as returned by cli_open
cli_result_code
enum
int cli_abort(int session);
session
- session descriptor as returned by cli_open
cli_result_code
enum
int cli_show_tables(int session, cli_table_descriptor** tables);
typedef struct cli_table_descriptor { char const* name; } cli_table_descriptor;
session
- session descriptor as returned by cli_open
tables
- address of the pointer to the array with table descriptors.
cli_show_tables
uses malloc to allocate this array, and it should be deallocated by application using free() function.
>= 0
- number of tables in the database (Metatable is not returned/counted)
< 0
- error code as described in cli_result_code
enum
int cli_describe(int session, char const* table, cli_field_descriptor** fields);
typedef struct cli_field_descriptor { enum cli_var_type type; char const* name; } cli_field_descriptor;
session
- session descriptor as returned by cli_open
table
- name of the table
fields
- address of the pointer to the array with field descriptors. cli_describe
uses malloc to allocate this array, and it should be deallocated by application using free() function.
>= 0
- number of fields in the table
< 0
- error code as described in cli_result_code
enum
int cli_create_table(int session, char const* tableName, int nFields, cli_field_descriptor* fields);
session
- session descriptor as returned by cli_open
tableName
- name of the created table
nFields
- number of columns in the table
fields
- array with table columns descriptors. Descriptor is has the following structure:
enum cli_field_flags { cli_hashed = 1, /* field should be indexed usnig hash table */ cli_indexed = 2 /* field should be indexed using B-Tree */ }; typedef struct cli_field_descriptor { enum cli_var_type type; int flags; char const* name; char const* refTableName; char const* inverseRefFieldName; } cli_field_descriptor;
cli_result_code
enum
int cli_alter_table(int session, char_t const* tableName, int nFields, cli_field_descriptor* fields);
session
- session descriptor as returned by cli_open
tableName
- name of the altered table
nFields
- number of columns in the table
fields
- array with table columns descriptors. Descriptor is has the following structure:
enum cli_field_flags { cli_hashed = 1, /* field should be indexed usnig hash table */ cli_indexed = 2, /* field should be indexed using B-Tree */ cli_case_insensitive = 4 /* index is case insensitive */ }; typedef struct cli_field_descriptor { enum cli_var_type type; int flags; char_t const* name; char_t const* refTableName; char_t const* inverseRefFieldName; } cli_field_descriptor;
cli_result_code
enum
int cli_drop_table(int session, char const* tableName);
session
- session descriptor as returned by cli_open
tableName
- name of the created table
cli_result_code
enum
int cli_alter_index(int session, char const* tableName char const* fieldName, int newFlags);
session
- session descriptor as returned by cli_open
tableName
- name of the created table
fieldName
- name of the field
newFlags
- new flags of the field, if index exists for this field, but is not specified in
newFlags
mask, then it will be removed; if index not exists, but is
specified in newFlags
mask, then it will be created.
cli_result_code
enum
int cli_freeze(int statement);
statement
- statememt descriptor returned by cli_statement
cli_result_code
enum
int cli_unfreeze(int statement);
statement
- statememt descriptor returned by cli_statement
cli_result_code
enum
int cli_seek(int statement, cli_oid_t oid);
statement
- statememt descriptor returned by cli_statement
oid
- object identifier of the record to which cursor should be positioned
>= 0
- success, position of the record in the selection
< 0
- error code as described in cli_result_code
enum
int cli_skip(int statement, int n);
statement
- statememt descriptor returned by cli_statement
n
- number of objects to be skippedn
is positive, then this function has the same effect as executing cli_get_next() function n
times.
n
is negative, then this function has the same effect as executing cli_get_prev() function -n
times.
n
is zero, this method just reloads current record
cli_result_code
enum
cli.lib
and if you want to
access database locally - link it with fastdb.lib
.
To create local session you should use cli_create
function instead of cli_open
.
Calling cli_create
when your application is linked with cli.lib
or
cli_open
when it is linked with fastdb.lib
cause cli_bad_address
error.
int cli_create(char const* databaseName, char const* filePath, unsigned transactionCommitDelay, int openAttr, size_t initDatabaseSize, size_t extensionQuantum, size_t initIndexSize, size_t fileSizeLimit);
databaseName
- database name
filePath
- path to the database file
transactionCommitDelay
- transaction commit delay (specify 0 to disable)
openAttr
- mask of cli_open_attributes. You can specify combination of the following
attributes:cli_open_default
cli_open_readonly
cli_open_truncate
cli_open_concurrent
initDatabaseSize
- initial size of database
extensionQuantum
- database extension quantum
initIndexSize
- initial size of object index
fileSizeLimit
- limit for file size (0 - unlimited)
>= 0
- connection descriptor to be used in all other cli calls
< 0
- error code as described in cli_result_code
enum
int cli_create_replication_node(int nodeId, int nServers, char* nodeNames[], char const* databaseName, char const* filePath, int openAttr, size_t initDatabaseSize, size_t extensionQuantum, size_t initIndexSize, size_t fileSizeLimit);
nodeId
- node identifier: 0 <= nodeId < nServers
nServers
- number of replication nodes (primary + standby)
nodeNames
- array with URLs of the nodes (address:port)
databaseName
- database name
filePath
- path to the database file
openAttr
- mask of cli_open_attributes (to allow concurrent read access to replication node,
cli_open_concurren
attribute should be set)
initDatabaseSize
- initial size of database
extensionQuantum
- database extension quantum
initIndexSize
- initial size of object index
fileSizeLimit
- limit for file size (0 - unlimited)
>= 0
- connection descriptor to be used in all other cli calls
< 0
- error code as described in cli_result_code
enum
int cli_attach(int session);
session
- session descriptor as returned by cli_open
cli_result_code
enum
int cli_detach(int session, int detach_mode);
session
- session descriptor as returned by cli_open
detach_mode
- bit mask representing detach mode
enum cli_detach_mode { cli_commit_on_detach = 1, cli_destroy_context_on_detach = 2 };
cli_result_code
enum
typedef struct cli_database_monitor { int n_readers; int n_writers; int n_blocked_readers; int n_blocked_writers; int n_users; } cli_database_monitor; int cli_get_database_state(int session, cli_database_monitor* monitor);
session
- session descriptor as returned by cli_open
monitor
- pointer to the monitor structure. The folloing fields are set:
n_readers | number of granted shared locks |
n_writers | number of granted exclusive locks |
n_blocked_reader | number of threads which shared lock request was blocked |
n_blocked_writers | number of threads which exclusive lock request was blocked |
n_users | number of processes openned the database |
cli_result_code
enum
int cli_prepare_query(int session, char const* query);
session
- session descriptor as returned by cli_open
query
- query string with optional parameters. Parameters are specified
as '%T' where T is one or two character code of parameter type using the same notation as in printf:
%d or %i | cli_int4_t |
%f | cli_int8_t |
%Li, %li, %ld or %Ld | cli_int8_t |
%p | cli_oid_t |
%s | char* |
>= 0
- statement descriptor
< 0
- error code as described in cli_result_code
enum
int cli_execute_query(int statement, int for_update, void* record_struct, ...)
cli_
types
in declaring structure and table fields. For array types, you should use cli_array_t
structure. Strings should be represented as char*
and programmer should not try to
deallocate them or copy this pointer and access it outside context of the current record.
statement
- statement descriptor returned by cli_prepare_query
for_update
- not zero if fetched rows will be updated
record_struct
- structure to receive selected record fields.
...
- varying list of query parameters
cli_result_code
enum
int cli_insert_struct(int session, char const* table_name, void* record_struct, cli_oid_t* oid);
cli_
types
in declaring structure and table fields. For array types, you should use cli_array_t
structure. Strings should be represented as char*
.
session
- session descriptor as returned by cli_open
table_name
- name of the destination table
record_struct
- structure specifying value of record fields
oid
- pointer to the location to receive OID of created record (may be NULL)
cli_result_code
enum
But in many cases it is acceptable to loose changes for few last seconds (but preserving consistency of the database). With this assumption, database performance can be significantly increased. FastDB provides "delayed transaction commit model" for such applications. When commit transaction delay is non zero, database doesn't perform commit immediately, instead of it delay it for specified timeout. After expiration of this timeout, transaction is normally committed, so it ensures that only changes done within specified timeout can be lost in case of system crash.
If thread, which has initiated delayed transaction, starts new transactions before delayed commit of transaction is performed, then delayed commit operation is skipped. So FastDB is able to group several subsequent transactions performed by on client into the large single transaction. And it will greatly increase performance, because it reduces number of synchronous writes and number created shadow pages (see section Transactions).
If some other client tries to start transaction before expiration of delayed commit timeout, then FastDB force delayed commit to proceed and release resource for another thread. So concurrency is not suffered from delayed commit.
By default delayed commits are disabled (timeout is zero). You can sepcify commit delay
parameter as second optional argument of dbDatabase::open
method.
In SubSQL
utility, it is possible to specify value of transaction commit
delay by setting "FASTDB_COMMIT_DELAY"
environment variable (seconds).
Transaction commit scheme used in FastDB guaranty recovery after software and hardware fault if image of the database at the disk was not corrupted (all information which was written to the disk can be correctly read). If for some reasons, database file is corrupted, then the only way to recover database is use backup (hoping that it was performed not so long time ago).
Backup can be done by just copying database file when database is offline.
Class dbDatabase
provides backup method which is able to perform online backup,
which doesn't require stopping of the database. It can be called at any time by programmer.
But going further, FastDB provides backup scheduler, which is able to perform backup automatically.
The only things needed - name of the backup file and interval of time between backups.
The method dbDatabase::scheduleBackup(char const* fileName, time_t period)
spawns separate thread which performs backups to the specified location with specified period (in seconds).
If fileName
ends with "?" character, then data of backup initiation is appended to the file
name, producing the unique file name. In this case all backup files are kept on the disk (it is
responsibility of administrator to remove too old backup files or transfer them to another media).
Otherwise backup is performed to the file with fileName + ".new"
name, and after completion
of backup, old backup file is removed and new file is renamed to fileName
.
Also in last case, FastDB will check the creation date of the old backup file (if exists) and adjust
wait timeout in such way, that delta of time between backups will be equal to specified period
(so if database server is started only for 8 hours per day and backup period is 24 hours, then
backup will be performed each day, unlike scheme with uniquely generated backup file names).
It is possible to schedule backup processing in SubSQL
utility by setting
FASTDB_BACKUP_NAME
environment variable.
Period value is taken from FASTDB_BACKUP_PERIOD
environment variable if specified, otherwise it
is set to one day. To recover from backup it is enough to copy some of the backup files instead of
corrupted database file.
To be able to use fault tolerant support, you should rebuild FastDB
with REPLICATION_SUPPORT
option. To switch it on,
set FAULT_TOLERANT
variable at the beginning of makefile to 1.
You should use class dbReplicatedDatabase
instead of dbDatabase
.
In parameters to open
method, in addition to database name and file name, you should specify
identifier of this node (integer from 0
till N-1
), array with addresses of
all nodes (hostname:port
pair) and number of nodes (N
).
Then you should start the program at each of N
nodes. Once all instances are started,
node with ID=0
becomes active (primary node). Open method returns true
at this instance. Other nodes are blocked in open
method.
If primary node is crashed, one of standby nodes is activated (open
method
returns true
) and this instance continue execution.
If crashed instance is restarted, it will try to connect to all other servers, restore
it state and continue functioning as standby node, waiting for its chance to replace crashed primary node.
If primary node is normally terminated, close
method of all standby nodes returns false
.
In fault tolerant mode FastDB keeps two files: one with database itself and one with page update counters. The file with page update counters is used for incremental recovery. When crashed node is restarted, it sends page counters it has to the primary node and receives back only those pages which were changed at primary node within this period of time (pages which timestamps are greater than sent by recovered node).
In replication mode application (at primary node) is not blocked during transaction commit until all changes are flushed to the disk. Modified pages are flushed to the disk asynchronously by separate thread. This leads to significant improvement in performance. But if all nodes are crashed, database can be in inconsistent state. It possible to specify delay for flushing data to this disk: the larger delay is, the less overhead of disk IO we have. But in case of crash, larger amount of data can be required to be sent from primary node to perform recovery.
If all nodes are crashed, system administrator should choose the node with the most recent version of database (sorry, it can not be done automatically yet), and start application at this node, but not at other nodes. After expiration of small timeout (5 seconds), it will report that it failed to connect to other node. When connections to all specified nodes are failed, the program will perform local recovery and starts as new active node. Then you can start other nodes which will perform recovery from active node.
It is possible to use fault tolerant model with diskless mode (DISKLESS_CONFIGURATION
build option).
In this case no data is stored to the disk (nether database file, neither page update counter).
It is assumed that at least one of nodes is always alive. Until there is at least one online node,
data will not be lost. When crashed node is recovered from crash, primary node sends complete snapshot of the
database to it (incremental recovery is not possible because state of crashed node is lost).
As far as in this case there no disk operations, performance can be very high and limited only by throughput of
network.
When replication node is started it tries to connect to all other nodes within some specified timeout. If within this time no connection can be established, then node is assumed to be started autonomously and start working as normal (non-replicated) database. If nodes has established connections with some other nodes, then one with the smallest ID is chosen as replication master. All other nodes are switched to stand-by mode and wait for replication requests from the master. If content of the database at master and slave is different (it is determined using page counters array), then master perform recovery of standby node, sending to it most recent versions of the pages. If master is crashed, then standby nodes select new master (node with the smallest ID). All standby nodes are blocked in open method until one of the following things happens:
open
method
returns true
.
open
method at all replicated nodes returns
false
.
It is possible to perform read-only access to the replicated database from other applications.
In this case replicated node should be created using dbReplicatedDatabase(dbDatabase::dbConcurrentUpdate)
constructor invocation. Other applications can access the same database using
dbDatabase(dbDatabase::dbConcurrentReadMode)
instance.
Not all applications needs fault tolerance. Many applications are using replication just to increase scalability, but
distributing load between several nodes. For such applications FastDB provides simplified replication model.
In this case there are two kind of nodes: readers and writers. Any of writer nodes can play role of replication master.
And readers node are just received replicated data from master and can not update database themselves.
The main difference with the replication model described above is that reader node can never become master and
open
method in this node returns control immediately after connection has been established with master node.
Then reader node access database as normal read-only database application. Updates from master node are received
by separate thread. Reader node should be created using dbReplicatedDatabase(dbDatabase::dbConcurrentRead)
constructor. It should use exactly the same database schema (classes) as at master node.
Database connection for reader nodes is not closed automatically when master closes connection - it remains opened
and application can still access database in read-only mode. Once master node is restarted, it establish connection
with all standby nodes and continue sending updates to them. If there are no reader nodes, then replication
model is equivalent to the fault tolerance model described above, and if there is single writers and one or more
reader nodes, then it is classical master-slave replication.
You can test fault tolerant mode using Guess
example. Been compiled with
-DREPLICATION_SUPPORT
this example illustrate cluster of three nodes (all addresses refer to the
localhost, but you can certainly replace them with names of real hosts in your net).
You should start three instances of guess
application with parameters 0..2
.
When all instances is started, application wit parameter 0
starts normal user dialogue
(it is game: "Guess an animal"). If you emulate crash of this application by pressing Crtl-C
,
then one of standby nodes continue execution.
More sophisticated modes of replication are illustrated by testconc
example.
There are three replication nodes which are started using
testconc update N
command, where N is identifier of the node: 0, 1, 2.
After starting all these three nodes, them are connected to each other, node 0 becomes master and start update of the
database, replicating changes to node 1 and 2. It is possible to start one or more inspectors - applications
which are connected to the replicated database in read-only mode (using dbConcurrentRead
access type).
Inspector can be started using testconc inspect N T
, where N is identifier of the replicated node to which
inspector should be connected and T is number of inspection threads.
The same testconc
example can be used to illustrate simplified replication model.
To test it please start one master: testconc update 0
and two read-only replication nodes:
testconc coinspect 1
and testconc coinspect 2
. Please notice the difference with the scenario
described above: is case of fault tolerance model there is normal replication node started using
testconc update N
command and read-only node (not involved i replication process) connected to the
same database: testconc inspect N
. And in case of simplified master-slave replication, there
is read-only replication node, which can not become master (and so in case of original master's crash, nobody can play its
role), but application running at this node access replicated database as normal read-only application.
FastDB uses simple rules for applying indices, allowing a programmer to predict when an index and which one will be used. The check for index applicability is done during each query execution, so the decision can be made depending on the values of the operands. The following rules describe the algorithm of applying indices by FastDB:
= < > <= >= between like
)
Now we should make clear what the phrase "index is compatible with operation" means and which type of index is used in each case. A hash table can be used when:
=
is used.
between
operation is used and the values of both bounds operands
are the same.
like
operation is used and the pattern string contains
no special characters ('%' or '_') and no escape characters (specified in an
escape
part).
A T-tree index can be applied if a hash table is not applicable (or a field is not hashed) and:
= < > <= >= between
)
is used.
like
operation is used and the pattern string contains
no empty prefix (i.e. the first character of the pattern is not '%' or '_').
If an index is used to search the prefix of a like
expression, and
the suffix is not just the '%' character, then an index search operation can return
more records than really match the pattern. In this case we should filter the
index search output by applying a pattern match operation.
When the search condition is a disjunction of several subexpressions
(the expression contains several alternatives combined by the or
operator), then several indices can be used for the query execution.
To avoid record duplicates in this case, a bitmap is used in the cursor
to mark records already selected.
If the search condition requires a sequential table scan, the T-tree index
still can be used if the order by
clause contains the single
record field for which the T-tree index is defined. As far as sorting is very
expensive an operation, using an index instead of sorting significantly
reduces the time for the query execution.
It is possible to check which indices are used for the query execution,
and a number of probes can be done during index search, by compiling FastDB
with the option -DDEBUG=DEBUG_TRACE
. In this case, FastDB will
dump trace information about database functionality including information
about indices.
When a record with declared relations is inserted in the table, the inverse references in all tables, which are in relation with this record, are updated to point to this record. When a record is updated and a field specifying the record's relationship is changed, then the inverse references are also reconstructed automatically by removing references to the updated record from those records which are no longer in relation with the updated record and by setting inverse references to the updated record for new records included in the relation. When a record is deleted from the table, references to it are removed from all inverse reference fields.
Due to efficiency reasons, FastDB is not able to guarantee the consistency of all references. If you remove a record from the table, there still can be references to the removed record in the database. Accessing these references can cause unpredictable behavior of the application and even database corruption. Using inverse references allows to eliminate this problem, because all references will be updated automatically and the consistency of references is preserved.
Let's use the following table definitions as an example:
class Contract; class Detail { public: char const* name; char const* material; char const* color; real4 weight; dbArray< dbReference<Contract> > contracts; TYPE_DESCRIPTOR((KEY(name, INDEXED|HASHED), KEY(material, HASHED), KEY(color, HASHED), KEY(weight, INDEXED), RELATION(contracts, detail))); }; class Supplier { public: char const* company; char const* location; bool foreign; dbArray< dbReference<Contract> > contracts; TYPE_DESCRIPTOR((KEY(company, INDEXED|HASHED), KEY(location, HASHED), FIELD(foreign), RELATION(contracts, supplier))); }; class Contract { public: dbDateTime delivery; int4 quantity; int8 price; dbReference<Detail> detail; dbReference<Supplier> supplier; TYPE_DESCRIPTOR((KEY(delivery, HASHED|INDEXED), KEY(quantity, INDEXED), KEY(price, INDEXED), RELATION(detail, contracts), RELATION(supplier, contracts))); };
In this example there are one-to-many relations between the tables
Detail-Contract and Supplier-Contract. When a Contract
record is inserted in the database, it is necessary only to set the references
detail
and supplier
to the correspondent
records of the Detail
and the Supplier
table.
The inverse references contracts
in these records will be updated
automatically. The same happens when a Contract
record is
removed: references to the removed record will be automatically excluded
from the contracts
field of the referenced Detail
and
Supplier
records.
Moreover, using inverse reference allows to choose more effective plans for query execution. Consider the following query, selecting all details shipped by some company:
q = "exists i:(contracts[i].supplier.company=",company,")";The straightforward approach to execute this query is scanning the
Detail
table and testing each record for this condition.
But using inverse references we can choose another approach: perform an
index search in the Supplier
table for records with the specified
company name and then use the inverse references to locate records from the
Detail
table, which are in transitive relation with the
selected supplier records. Certainly we should eliminate duplicates of
records, which can appear because the company can ship a number of different
details. This is done by a bitmap in the cursor object.
As far as index search is significantly faster than sequential search
and accessing records by reference is very fast an operation, the total
time of such query execution is much shorter compared with the
straightforward approach.Starting from 1.20 version FastDB supports cascade deletes. If field is declared using OWNER macro, the record is treated as owner of the hierarchical relation. When the owner records is removed all members of this relation (records referenced from the owner) will be automatically removed. If member record of the relation should contain reference to the owner record, this field should be declared using RELATION macro.
Algorithms used in FastDB allow to quite precisely calculate the average and maximal time of query execution depending on the number of records in the table (assuming that the size of array fields in records is significantly smaller than the table size; and the time of iteration through array elements can be excluded from the estimation). The following table shows the complexity of searching a table with N records depending on the search condition:
Type of search | Average | Maximal |
---|---|---|
Sequential search | O(N) | O(N) |
Sequential search with sorting | O(N*log(N)) | O(N*log(N)) |
Search using hash table | O(1) | O(N) |
Search using T-tree | O(log(N)) | O(log(N)) |
Access by reference | O(1) | O(1) |
FastDB uses the Heapsort algorithm for sorting selected records to provide guaranteed log(N) complexity (quicksort is on the average a little bit faster, but worst time is O(N*N)). A hash table also has different average and maximal complexity. On the average, a hash table search is faster than a T-tree search, but in the worst case it is equivalent to a sequential search while a T-tree search always guarantees log(N) complexity.
The execution of update statements in FastDB is also fast, but this time is less predictable, because the commit operation requires flushing of modified pages to disk which can cause unpredictable operating system delays.
To split a table scan, FastDB starts N threads, each of them tests N-s records of the table (i.e. thread number 0 tests records 0,N,2*N,... thread number 1 test records 1,1+N,1+2*N,... and so on). Each thread builds its own list of selected records. After termination of all threads, these lists are concatenated to construct the single result list.
If the result shall be sorted, then each thread, after finishing the table scan, sorts the records it selected. After termination of all threads, their lists are merged (as it is done with an external sort).
Parallel query execution is controlled by two parameters: the number of spawned
threads and a parallel search threshold. The first is specified in the
dbDatabase
class constructor or set by the
dbDatabase::setConcurrency
method. A zero value of this parameter
asks FastDB to automatically detect the number of online CPUs in the system and
spawns exactly this number of threads. By default, the number of threads is set to 1,
so no parallel query execution takes place.
The parallel search threshold parameter specifies the minimal number of records in the
table for which parallelization of the query can improve query performance
(starting a thread has its own overhead). This parameter is a static
component of the dbDatabase
class and can be changed by an application at
any moment of time.
Parallel query execution is not possible when:
dbDatabase::dbParallelScanThreshold
.
start from
part.
GiSTstore
interface which is
responsible for storing/retrieving GiST pages in database. FastDB includes
release 0.9 beta 1 of the GiST C++ library with additional GiSTdb.cpp
and
GiSTdb.h
files and changed BTree, RTree and RSTree examples, patched to work
with FastDB instead of plain fails.GiST documentation is available here
FastDB performs a cyclic scan of bitmap pages. It saves the identifier
of the current bitmap page and the current position within the page. Each time
an allocation request arrives, scanning the bitmap starts from the (saved)
current position.
When the last allocated bitmap page is scanned, scanning continues from the
beginning (from the first bitmap page) upto the current position.
When no free space is found after a full cycle through all bitmap pages,
a new bulk of memory is allocated. The size of the extension is the maximum of the size
of the allocated object and of the extension quantum. The extension quantum is a parameter
of the database, specified in the constructor. The bitmap is extended in order to map
the additional space. If the virtual space is exhausted and no more
bitmap pages can be allocated, then an OutOfMemory
error
is reported.
Allocating memory using a bitmap provides high locality of references (objects are mostly allocated sequentially) and also minimizes the number of modified pages. Minimizing the number of modified pages is significant when a commit operation is performed and all dirty pages should be flushed onto the disk. When all cloned objects are placed sequentially, the number of modified pages is minimal and the transaction commit time is reduced. Using the extension quantum helps to preserve sequential allocation. Once the bitmap is extended, objects will be allocated sequentially until the extension quantum is completely consumed. Only after reaching the end of the bitmap, the scan restarts from the beginning, searching for holes in previously allocated memory.
To reduce the number of bitmap page scans, FastDB associates a descriptor with each page, which is used to remember the maximal size of the hole in the page. This calculation of the maximal hole size is performed in the following way: if an object of size M can not be allocated from this bitmap page, the maximal hole size is less than M, and M is stored in the page descriptor if the previous size value of the descriptor is greater than M. For the next allocation of an object of size >= M, we will skip this bitmap page. The page descriptor is reset when some object is deallocated within this bitmap page.
Some database objects (like hash table pages) should be aligned on the page boundary to provide more efficient access. The FastDB memory allocator checks the requested size. If it is aligned on page boundary, the address of the allocated memory segment is also aligned on page boundary. A search for a free hole will be done faster in this case, because FastDB increases the step of the current position increment according to the value of the alignment.
To be able to deallocate memory used by an object, FastDB needs to keep somewhere information about the object size. There are two ways of getting the object size in FastDB. All table records are prepended by a record header, which contains the record size and a (L2-list) pointer, linking all records in the table. Such the size of the table record object can be extracted from the record header. Internal database objects (bitmap pages, T-tree and hash table nodes) have known size and are allocated without any header. Instead of this, handles for such objects contain special markers, which allow to determine the class of the object and get its size from the table of builtin object sizes. It is possible to use markers because allocation is always done in quanta of 16 bytes, so the low 4 bits of an object handle are not used.
It is possible to create a database larger than 4Gb or containing more than
4Gb of objects, if you pass values greater than 32 bit in the compiler command line
for the dbDatabaseOffsetBits
or the
dbDatabaseOidBits
parameter.
In this case, FastDB will use an 8 byte integer type to
represent an object handle/object identifier. It will work only at truly
64-bit operating systems, like Digital Unix.
When an object is modified the first time, it is cloned (a copy of the object is created) and the object handle in the current index is changed to point to the newly created object copy. The shadow index still contains a handle which points to the original version of the object. All changes are done with the object copy, leaving the original object unchanged. FastDB marks in a special bitmap page of the object index, which one contains the modified object handle.
When a transaction is committed, FastDB first checks if the size of the object index was increased during the commited transaction. If so, it also reallocates the shadow copy of the object index. Then FastDB frees memory for all "old objects", i.e. objects which has been cloned within the transaction. Memory can not be deallocated before commit, because we want to preserve the consistent state of the database by keeping cloned objects unchanged. If we deallocated memory immediately after cloning, a new object could be allocated at the place of the cloned object, and we would loose consistency. As memory deallocation is done in FastDB by the bitmap using the same transaction mechanism as for normal database objects, deallocation of object space will require clearing some bits in a bitmap page, which also should be cloned before modification. Cloning a bitmap page will require new space for allocation of the page copy, and we could reuse the space of deallocated objects. But this is not acceptable due to the reasons explained above - we will loose database consistency. That is why deallocation of object is done in two steps. When an object is cloned, all bitmap pages used for marking the object space, are also cloned (if not cloned before). So when the transaction is committed, we only clear some bits in bitmap pages: no more requests for allocation of memory can be generated at this moment.
After deallocation of old copies, FastDB flushes all modified pages onto the disk to synchronize the contents of the memory and the contents of the disk file. After that, FastDB changes the current object index indicator in the database header to switch the roles of the object indices. The current object index becomes the shadow index and vice versa. Then FastDB again flushes the modified page (i.e. the page with the database header) onto the disk, transferring the database to a new consistent state. After that, FastDB copies all modified handles from the new object index to the object index which was previously shadow and now becomes current. At this moment, the contents of both indices are synchronized and FastDB is ready to start a new transaction.
The bitmap of the modified object index pages is used to minimize the duration of committing a transaction. Not the whole object index, but only its modified pages should be copied. After the transaction commitment the bitmap is cleared.
When a transaction is explicitly aborted by the dbDatabase::rollback
method, the shadow object index is copied back to the current index, eliminating
all changes done by the aborted transaction. After the end of copying,
both indices are identical again and the database state corresponds to the state
before the start of the aborted transaction.
Allocation of object handles is done by a free handle list. The header of the list is also shadowed and the two instances of the list headers are stored in the database header. A switch between them is done in the same way as between the object indices. When there are no more free elements in the list, FastDB allocates handles from the unused part of a new index. When there is no more space in the index, it is reallocated. The object index is the only entity in the database which is not cloned on modification. Instead of this, two copies of the object index are always used.
There are some predefined OID values in FastDB. OID 0 is reserved as an invalid object identifier. OID 1 is used as the identifier for the metatable object - the table containing descriptors of all other tables in the database. This table is automatically constructed during the database initialization; descriptors of all registered application classes are stored in this metatable. OID starting from 2 are reserved for bitmap pages. The number of bitmap pages depends on the maximum virtual space of the database. For 32 bit handles, the maximal virtual space is 4Gb. The number of bitmap pages can be calculated, as this size divided by page size divided by allocation quantum size divided by number of bits in the byte. For a 4 Gb virtual space, a 4 Kb page size and 16 byte allocation quantum, 8K bitmap pages are required. So 8K handles are reserved in the object index for bitmaps. Bitmap pages are allocated on demand, when the database size is extended. So the OID of the first user object will be 8194.
dirty
flag is set in the database header, FastDB performs a
database recovery. Recovery is very similar to rollback of transactions.
The indicator of the current index in the database object header is used to
determine the index corresponding to the consistent database state. Object handles
from this index are copied to another object index, eliminating
all changes done by uncommitted transactions. As the only action
performed by the recovery procedure is copying the object index (really only
handles having different values in the current and the shadow index are copied to
reduce the number of modified pages) and the size of the object index is small,
recovery can be done very fast.
The fast recovery procedure reduces the "out-of-service" time for
an application.
There is one hack which is used in FastDB to increase the database performance.
All records in the table are linked in an L2-list, allowing efficient traversal
through the list and insertion/removal of records.
The header of the list is stored in a table object (which is the record of the
Metatable
). L2-list pointers are
stored at the beginning of the object together with the object size.
New records are always appended in FastDB at the end of the list.
To provide consistent inclusion into a database list, we should clone the last record
in the table and the table object itself. But if the record size is very big,
cloning the last record can cause significant space
and time overhead.
To eliminate this overhead, FastDB does not clone the last record but allows a temporary inconsistency of the list. In which state will the list be if a system fault happens before committing the transaction ? The consistent version of the table object will point to the record which was the last record in the previous consistent state of the database. But as this record was not cloned, it can contain pointers to a next record, which doesn't exist in this consistent database state. To fix this inconsistency, FastDB checks all tables in the database during the recovery procedure: if the last record in the table contains a non-NULL next reference, next is changed to NULL to restore consistency.
If a database file was corrupted on the disk, the only way to recover the database
is to use a backup file (certainly if you do not forget to make it).
A backup file can be made by the interactive SQL utility using the backup
command; or by the application using the dbDatabase::backup()
method.
Both create a snapshot of the database in a specified file (it can be the name of a
device, a tape for example). As far as a database file is always in a consistent
state, the only action needed to perform recovery by means of the backup file
is to replace the original database file with the backup file.
If some application starts a transaction, locks the database and then crashes, the database is left in a locked state and no other application can access it. To restore from this situation, you should stop all applications working with the database. Then restart. The first application opening the database will initialize the database monitor and perform recovery after this type of crash.
FastDB uses an extensible hash table with collision chains. The table is implemented as an array of object references with a pointer to a collision chain. Collision chain elements form a L1-list: each element contains a pointer to the next element, the hash function value and the OID of the associated record. Hash tables can be created for boolean, numeric and string fields.
To prevent the growth of collision chains, the size of a hash table is automatically increased when the table becomes full. In the current implementation, the hash table is extended when both of the following two conditions are true:
char
field, because no more than 256 items of the hash table can be
used). Each time the hash table is extended, its size is doubled. More precisely:
the hash table size is 2**n-1.
Using an odd or a prime number for the hash size allows to improve the
quality of hashing and efficiently
allocates space for hash table, the size of which is aligned on page
boundary. If the hash table size is 2**n, than we will always loose
the least n bits of the hash key.FastDB uses a very simple hash function, which despite of its simplicity can provide good results (uniformal distribution of values within the hash table). The hash code is calculated using all bytes of the key value by the following formula:
h = h*31 + *key++;The hash table index is the remainder of dividing the hash code by the hash table size.
Like AVL trees, the height of left and right subtrees of a T-tree may differ by at most one. Unlike AVL trees, each node in a T-tree stores multiple key values in a sorted order, rather than a single key value. The left-most and the right-most key value in a node define the range of key values contained in the node. Thus, the left subtree of a node contains only key values less than the left-most key value, while the right subtree contains key values greater than the right-most key value in the node. A key value which falls between the smallest and largest key value in a node is said to be bounded by that node. Note that keys equal to the smallest or largest key in the node may or may not be considered to be bounded based on whether the index is unique and based on the search condition (e.g. "greater-than" versus "greater-than or equal-to").
A node with both a left and a right child is referred to as an internal node, a node with only one child is referred to as a semi-leaf, and a node with no children is referred to as a leaf. In order to keep the occupancy high, every internal node must contain a minimum number of key values (typically k-2, if k is the maximum number of keys that can be stored in a node). However, there is no occupancy condition on the leaves or semi-leaves.
Searching for a key value in a T-tree is relatively straightforward. For every node, a check is made to see if the key value is bounded by the left-most and the right-most key value in the node; if this is the case, then the key value is returned if it is contained in the node (else, the key value is not contained in the tree). Otherwise, if the key value is less than the left-most key value, then the left child node is searched; else the right child node is searched. The process is repeated until either the key is found or the node to be searched is null.
Insertions and deletions into the T-tree are a bit more complicated. For insertions, first a variant of the search described above is used to find the node that bounds the key value to be inserted. If such a node exists, then if there is room in the node, the key value is inserted into the node. If there is no room in the node, then the key value is inserted into the node and the left-most key value in the node is inserted into the left subtree of the node (if the left subtree is empty, then a new node is allocated and the left-most key value is inserted into it). If no bounding node is found, then let N be the last node encountered by the failed search and proceed as follows: If N has room, the key value is inserted into N; else, it is inserted into a new node that is either the right or left child of N, depending on the key value and the left-most and right-most key values in N.
Deletion of a key value begins by determining the node containing the key value, and the key value is deleted from the node. If deleting the key value results in an empty leaf node, then the node is deleted. If the deletion results in an internal node or semi-leaf containing fewer than the minimum number of key values, then the deficit is made up by moving the largest key in the left subtree into the node, or by merging the node with its right child.
In both insert and delete, allocation/deallocation of a node may cause the tree to become unbalanced and rotations (RR, RL, LL, LR) may be necessary. The heights of subtrees in the following description include the effects of the insert or delete operation. In case of an insert, nodes along the path from the newly allocated node to the root are examined until
In the case of delete, nodes along the path from the de-allocated node's parent to the root are examined until a node is found whose subtrees' heights now differ by one. Furthermore, every time a node whose subtrees' heights differ by more than one is encountered, a rotation is performed. Note that de-allocation of a node may result in multiple rotations.
The following rules in BNF-like notation specifies the grammar of the SUBSQL directives:
directive ::=
select (*) from table-name select-condition ;
| insert into table-name values values-list ;
| create index on on table-name.field-name ;
| create table table-name (field-descriptor {, field-descriptor}) ;
| alter table table-name (field-descriptor {, field-descriptor}) ;
| update table-name set field-name = expression {, field-name = expression} where condition ;
| drop index table-name.field-name ;
| drop table table-name
| open database-name ( database-file-name ) ;
| delete from table-name
| backup file-name
| start server server-URL number-of-threads
| stop server server-URL
| start http server server-URL
| stop http server server-URL
| export xml-file-name
| import xml-file-name
| commit
| rollback
| autocommit (on | off)
| exit
| show
| help
table-name ::= identifier
values-list ::= tuple { , tuple }
tuple ::= ( value { , value } )
value ::= number | string | true | false
| tuple
index ::= index | hash
field-descriptor ::= field-name field-type (inverse field-name)
field-name ::= identifier { . identifier }
database-name ::= string
database-file-name ::= string
xml-file-name ::= string
file-name ::= string
server-URL ::= 'HOST:PORT'
SUBSQL automatically commits a read-only transaction after each
select statement in order to release a shared database lock as soon as possible.
But all database modification operations should be explicitly committed
by a commit
statement or undone by a rollback
statement. open
opens a new database, wherase exit
closes
an open database (if it was opened), and so implicitly commits the last transaction.
If a database file name was not
specified in the open
statement, then a file name is constructed from
the database name by appending the ".fdb"
suffix.
The select
statement always prints all record fields. FastDB doesn't support
tuples: the result of the selection is always a set of objects (records).
The format of the select statement output is similar with the one accepted by the insert
statement (with the exception of reference fields). So it is possible to
export/import a database table without references by means of the
select/insert
directives of SUBSQL.
The select
statement prints references in the format
"#hexadecimal-number"
. But it is not possible to use this format
in the insert
statement. As object references are represented
in FastDB by internal object identifiers, a reference field can not be set in an
insert
statement (an object inserted into the database will
be assigned a new OID, so it does not make sense to specify a reference field
in the insert
statement). To ensure database reference consistency,
FastDB just ignores reference fields when new records are inserted into the table
with references. You should specify the value 0 at the place of reference fields.
If you omit the '*' symbol in the select statement, FastDB will output object
identifiers of each selected record.
It is mandatory to provide values for all record fields in an insert
statement; default values are not supported. Components of structures and
arrays should be enclosed in parentheses.
It is not possible to create or drop indices and tables while other
applications are working with the database. Such operations change
the database scheme: after such modifications the state of other applications
will become incorrect. But the delete
operation
doesn't change the database scheme. So it can be performed as a normal transaction,
when the database is concurrently used by several applications.
If SUBSQL hangs trying to execute some statement, then some other application
holds the lock on the database, preventing SUBSQL from accessing it.
It is possible to specify mode of openning database by SubSQL.
It can be done by SUBSQL_ACCESS_TYPE
environment variable
or -access
command line option. The following values can be specified:
Value | Mode | Description |
---|---|---|
normal | dbDatabase::dbAllAccess | Read-write access |
read-only | dbDatabase::dbReadOnly | Read only access |
concurrent-update | dbDatabase::dbConcurrentUpdate | Used in conjunction with concurrent read mode |
concurrent-read | dbDatabase::dbConcurrentUpdate | Read only mode used in conjunction with concurrent update mode. In this case reads can be executed concurrently with updates. |
SubSQL can be used to start CLI and HTTP servers. CLI server accept connections of clients using CLI protocol to access database. Each client is server by separate thread. SubSQL maintains pool of thread, taking thread from the pool when client is attached and place thread back in the pool when client is disconnected.
HTTP server is used to provide view-only access to the database from Web browsers.
Unlike CLI servers, only one instance of HTTP server can be started.
To view main database page, just type in your Web browser URL which you have
specified in start http server
command.
Database can be exported as XML file using SubSQL export
command.
This command cause dump of the whole database in XML format to the specified file according
to the following rules:
id
attribute which specified OID of the object.
""
. All characters greater than 'z' or less then ' ' or
equal to '%' or '"' are printed as two hex digits prepended by '%': space character = "%20".
<ref id=OID/>
element, where OID is object identifier of
referenced object.
<array-element>
elements
<rectangle>
element with two <vertex>
subelements containing list of coordinate attributes.
Produced XML file can be used to export data to other programs or for flexible restoring of database.
To be able to import data from XML file you should have all table created. In C++ application it is enough
to open and close database, then all described tables will be created. SubSQL import
command will import data from XML file, mapping OIDs of records specified in XML file to OIDs of created records.
Such export/import sequence can be used to convert database to the new format.
Lets say that you have database "YourOldDatabase" created by your application YourOldApplication and you want to convert it to new format so it can be used by new version of your application: YourNewApplication. First Of all you need to initialize new database. To do it you should run YourNewApplication and open/close database "YourNewDatabase" in it. Then prepare two SubSQL command files:
export.sql
:
open 'YourOldDatabase'; export '-' exit
import.sql
:
open 'YourNewDatabase'; import '-' exit
subsql export.xml | subsql import.xml
To be able to print
Interaction with Web server is based on three-tier model:
FastDB application is request-driven program, receiving data from
HTML forms and dynamically generating result HTML page. Classes
Built-in HTTP server is able to handle two types of requests -
transfer HTML file find in place relative to the current working directory
in response to GET HTTP request and perform action specified by GET or POST
requests with parameters. Built-in HTTP server provides persistent connections -
server will not close connection with client immediately after sending
response, instead of this connection will be kept during some specified
interval of time. Also this built-in server supports concurrent requests
processing by several threads of control. But starting of threads should
be performed by client application.
Virtual method
To construct responce to the request special overloaded
Information about HTML form slots values or request parameters can be obtained
using
This is an example of a "navigation-only" application -
no queries are used in this application at all. All navigation between
records (objects) is done by means of references. Really, this application
is more suitable for object oriented databases, but I include it in FastDB
*) doesn't include commit time
It will be nice if you can run this test at some other platforms and send me
the results. I need to notice, that for N = 1000000 you need
at least 128Mb of memory, otherwise you will test the performance of your disk.
To run this BUGDB with external WWW server you should first customize your
WWW server.
It should be able to access
No configuration is needed when you are using built-in HTTP server.
Just make sure that user has enough permission to access port number 80
(default port for HTTP server). If some HTTP server is already started at your
computer, you should either stop it or specify another port for
built-in HTTP server. In last case you also need to specify the same port
in the settings of WWW browser to make it possible to establish connection with
right HTTP server. Also do not forget to specify real name of your computer
in ACTION field of buglogin.htm file.
After starting
As well as BUGDB, this Web database example can work either through CGI interface with
some external HTTP server or use built-in HTTP server (last one is significantly
more efficient than interaction through CGI scripts). Look
previous section for more information about configuration of
HTTP server. Do not forget to specify real name of your computer
in ACTION field of clilogin.htm file.
After starting
Before you can do something with your database, you should open it.
Checking the
Once you are certain that the database is normally opened, you can start
to work with the database. If your application is multithreaded and several threads
will work with the same database, you should attach each thread to the
database by the
To access database data, you should create a number of
There are four main operations with database: insert, select, update, remove.
The first is done without using cursors, by means of the global overloaded
template function
Before exiting from your application do not forget to close the database.
Also remember, that the method
So a template for a FastDB application can look like this:
To build the FastDB library, just type dbDateTime
in SubSQL in readable form, you should
define SUBSQL_DATE_FORMAT
environment variable (for example '%c'). To get more information about
date time formats see documentation of strftime
makefile variables correspondingly.
Default values of these variables are the following:
function. If
BINSPATHSUBSQL_DATE_FORMAT
is not defined, dbDateTime
structure will be printed as normal structure
with integer component. SubSQL doesn't currently allow convertion of date from string during insert or update operations.API for development Web applications
New version of FastDB provides API for developing WWW applications.
It is very easy to perform Web database publishing with FastDB.
FastDB server can either communicate with standard WWW server by
means of CGI requests, or it can serve HTTP requests itself.
Web Server -> CGI stub -> FastDB application
CGI call local socket connection
Using FastDB built-in HTTP server provides maximum performance, because in
this no communication and process creation overhead takes place.
In both cases the same API for receiving and unpacking requests
and constructing responses is used. So the same application
can be used for interaction with external Web server as well as
stand-alone HTTP server.WWWapi
and WWWconnection
provide simple and
convenient interface for getting HTTP requests, constructing HTML page and
sending reply back to WWW browser. Abstract class WWWapi
has two implementations: CGIapi
and HTTPapi
,
first of which implements protocol of interaction with Web server by means of
CGI mechanism, and the second - protocol of direct serving HTTP requests.WWWapi::connect(WWWconnection& con)
accept clients connection (either from CGISTUB program of from WWW browser).
This method returns true
if connection is established.
In this case programmer should call
CGIapi::serve(WWWconnection& con)
to receive and handle client's
requests. This method return false
if and only if handler
of request returns false
. Even if request was not correctly
received or could not be handled, true
is returned by
serve
method. The connection is always closed after return from
serve
method. It is possible to start separate thread for
exceution of each server
method.>>
operators are provided in WWWconnection
class. First line of
response should specify type of response body, for example:
Content-type: text/html\r\n\r\n
Two CR-LF character after this line separate HTTP header from the body.
Three encoding schemes can be used for constructing response body:
To make switching between encoding more convenient, WWWconnection
class performs automatic switching between encodings. Initially TAG
encoding is always used. Then encodings are implicitly changed using the
following rules:
TAG -> HTML
HTML -> TAG
URL -> TAG
It certainly possible to explicitly specify encoding for the next output
operation by means of special <<
operator, which accepts
one of the following constants: TAG, HTML, URL
.WWWconnection::get(char const* name, int n = 0)
method.
Optional second parameter is used only for getting value of selectors with
multiple selection allows option. If parameter with such name is not found,
NULL
is returned. There are some mandatory parameters
which always should be present in all forms handled by FastDB:
Parameter name Parameter Description socket address of the server, used for constructing new links page symbolic name of the page, used for request dispatching stub name of CGI stub program always defined by API Examples of FastDB applications
FastDB is shipped with some examples of database applications, which
can help you creating your own applications. Example: game "Guess an animal"
"Guess an animal" is very simple a program which uses a database to
store the game tree. Having a very simple algorithm, this program shows some elements of "artificial intelligence". The more information you provide to this
game, the smarter will be its behavior.
Example: various types of queries
This application illustrates the usage of various types of database queries
and the advantages of using inverse references. A classical Detail/Contract/Supplier
database schema is used in this example. Compare the number of lines in this
file with the number of lines needed to implement this example using ODBC or
other RDBMS C API. Performance test
This test allows to receive some results about the FastDB performance for
all basic operations. This test inserts N records into a table,
then performs searches using a T-tree and a hash table, then a sequential search and
a sequential search with sorting. It is possible to specify a level
of query parallelization in the command line. By default no parallelization
is used. The following table contains results for some systems and
N = 1000000. Values in the table rows specify the number of milliseconds
consumed by the operation which is calculated by dividing the time returned
by the testperf
program by the number of iterations.
System Number of CPUs Number of threads Insertion*) Hash table search T-tree search Sequential search Sequential search with sorting Pentium-II 300, 128 Mb RAM, Windows NT 1 1 0.056 0.015 0.041 1 400 25 000 Pentium-II 333, 512 Mb RAM, Linux 1 1 0.052 0.016 0.045 1 600 33 000 Pentium-Pro 200, 128 Mb RAM, Windows NT 2 1 0.071 0.023 0.052 1 600 35 000 Pentium-Pro 200, 128 Mb RAM, Windows NT 2 2 0.071 0.023 0.052 1 800 23 000 AlphaServer 2100, 250 Mhz, 512 Mb RAM, Digital Unix 2 1 0.250 0.031 0.084 2 600 42 000 AlphaServer 2100, 250 Mhz, 512 Mb RAM, Digital Unix 2 2 0.250 0.031 0.084 1 600 23 000 AlphaStation, 500 Mhz, 256 Mb RAM, Digital Unix 2 1 0.128 0.010 0.039 1 300 36 000 Bug tracking database
Example "Bug tracking database" illustrates developing Web application
using FastDB and WWW API. It can be used either with any WWW server
(for example Apache or Microsoft Personal Web Server) or with
built-in HTTP server. To compile BUGDB for interaction with external server,
define macro USE_EXTERNAL_HTTP_SERVER
.
Database can be
accessed from any computer running some WWW browser. To build
bugdb
application in Unix you should specify www
target to make utility.buglogin.htm
file and run
CGI script cgistub
. Also user, under which CGI scripts will
be executed, should have enough permissions to establish connection with
FastDB application (by sockets). It is better to run FastDB application and
FastDB CGI scripts under the same user. For example, I have changed the
following variables in Apache configuration file:
httpd.conf:
User konst
Group users
access.conf:
<Directory /usr/konst/fastdb>
Options All
AllowOverride All
allow from all
</Directory>
DocumentRoot /usr/konst/fastdb
srm.conf:
ScriptAlias /cgi-bin/ /usr/konst/fastdb/
It is also possible not to change configuration of WWW server, but place
cgistub
and bugdb
programs in standard CGI
script directory and change in the file buglogin.htm
path to
the cgistub
program.
After preparing configuration files you should start WWW server.bugdb
application itself you can visit
buglogin.htm
page in WWW browser and start to work with
BUGDB database. When database is initialized, "administrator" user is
created in the database. First time you should login as administrator using
empty password. Than you can create some other users/engineers and
change the password. BUGDB doesn't use secure protocol of passing passwords and
doesn't worry much about restricting access of users to the database.
So if you are going to use BUGDB in real life, you should first
think about protecting database from unauthorized access.Clients-Managers database
This yet another example of Web database publishing. This database
contains information about clients and managers. Each client and each manager belongs to
some segment (department). Managers can make remarks on status of a client from the same
segment to which manager belongs. Some managers has mini-administrator permissions and
they are allowed to edit/remove reports made by other managers. Information about
segments and mangers is maintained by database administrator.clidb
application itself you can visit
clilogin.htm
page in WWW browser and start to work with
CLIDB database. When database is initialized, "administrator" user is
created in the database. This database allows to specify IP address for the manager from
which this manager can login to the system. So user authentication is based on
the name and host computer IP address. Value '*' allows user to login from any host.
If you specify wrong IP address for the administrator and so are not able to login to
the database, you can invoke clidb as
clidb
CLIDB also can be considered as example of multithreaded Web database server.
There is special class QueueManager which can be used for maintaining pool of threads
and distributing user HTTP requests between these threads. If you recompile CLIDB
application with USE_QUEUE_MANAGER option set,
then CLIDB will spawn 8 threads for handling HTTP requests. In this case persistent
connection with client's WWW browsers are established (so there is no
extra overhead for establishing connecting for each HTTP request).Quick start
When you are developing an application for FastDB, you should first decide
which data and which classes you want to store in the database. Then you
should describe the format of the database tables. Section
Table describes how to create type and table descriptors.
Do not forget to register table descriptors by the REGISTER
macro
(it should be done in some implementation module). If you are going
to redefine the default FastDB error handler (for example, if you want to use
a message window for reporting instead of stderr
), you should
define your own database class and derive it from dbDatabase
.
You should create an instance of the database class and make it accessible to
all application modules.dbDatabase::open()
return code, you can
find out, if the database was successfully opened. Errors during database
opening do not terminate the application (but they are reported)
even with the default error handler.dbDatabase::attach
method. Before thread termination,
it should detach itself from the database by invoking the
dbDatabase::detach()
method. If your application uses navigation
through database objects by references, you need some kind of root object
which can be located without any references. The best candidate for the root
object is the first record of the table. FastDB guarantees that new
records are always inserted at the end of the table. So the first table record
is also the oldest record in the table.dbQuery
and dbCursor
objects. If several threads are working with the
database, each thread should have its own instances of query and
cursor objects. Usually it is enough to have one cursor for each table
(or two if your application also can update table records). But in case
of nested queries, using several cursors may be needed.
Query objects are usually created for each type of queries. Query objects are
used also for caching compiled queries, so it will be a good idea to
extend the life span of query variables (may be make them static).insert
. Selection, updating and deleting of
records is performed using cursors. To be able to modify a table you should
use a cursor for update. A cursor in FastDB is typed and contains an instance
of an object of the table class. The overloaded 'operator->
'
of the cursor can be used to access components of the current record
and also to update these components. The method update
copies data from the cursor's object to the current table record.
The cursor's method remove
will remove the current cursor record,
the method removeAllSelected
will remove all selected records and
the method removeAll
will remove all records in the table.
Each transaction should be either committed by
the dbDatabase::commit()
or aborted by
the dbDatabase::rollback()
method. A transaction is started
automatically when the first select, insert or remove operation is executed.dbDatabase::close()
will automatically
commit the last transaction, so if this is not what you want, then explicitly perform
a dbDatabase::rollback
before exit.
//
// Header file
//
#include "fastdb.h"
extern dbDatabase db; // create database object
class MyTable {
char const* someField;
...
public:
TYPE_DESCRIPTOR((FIELD(someField)));
};
//
// Implementation
//
REGISTER(MyTable);
int main()
{
if (db.open("mydatabase")) {
dbCursor<MyTable> cursor;
dbQuery q;
char value[bufSize];
q = "someField=",value;
gets(value);
if (cursor.select(q) > 0) {
do {
printf("%s\n", cursor->someField);
} while (cursor.next());
}
db.close();
return EXIT_SUCCESS;
} else {
return EXIT_FAILURE;
}
}
To compile a FastDB application you have to include the header file
"fastdb.h"
. This header file includes other FastDB header files,
so make sure that the FastDB directory is in the compiler's include path. To
link a FastDB application, you need the FastDB library ("fastdb.lib"
for Windows or "libfastdb.a"
for Unix). You can either
specify the full path to this library or place it in some default
library catalog (for example /usr/lib
for Unix).make
in the FastDB directory.
There is no autoconfiguration utility included
in the FastDB distribution. Most system dependent parts of the code are compiled using
conditional compilation. There are two makefiles in the FastDB distribution.
One for MS Windows with MS Visual C++ (makefile.mvc
)
and another one for generic Unix with a gcc compiler(makefile
).
If you want to use Posix threads or some other compiler, you
should edit this makefile.
There is also a make.bat
file, which just spawns a
nmake -f makefile.mvc
command.
The install
target in the
Unix makefile will copy FastDB header files, the FastDB library and the subsql utility
to directories specified by the INCSPATH, LIBSPATH
and
INCSPATH=/usr/include LIBSPATH=/usr/lib BINSPATH=/usr/bin
Once your application starts to work, you will be busy with
support and extension of your application. FastDB is able to perform
automatic schema evaluation for such cases as adding a new field to the table and
changing the type of a field. The programmer can also add new indices or remove
rarely used indices. The database trace can be switched on (by (re-)compiling the
FastDB library with the -DDEBUG=DEBUG_TRACE
option) to
perform analysis of database functionality and efficiency of using indices.
The SUBSQL utility can be used for database browsing and inspection, performing online backups, importing data to and exporting data from the database. FastDB will perform automatic recovery after system or application crash; you should not worry about it. The only thing you perhaps have to do manually is stopping all database applications if one of them crashes, leaving the database blocked.
FastDB allows to declare queries as static variables.
But in this case destructors of these queries can be called after destructors of the static
objects from the FastDB library itself (because order of invoking destructors for different modules is
not specified. To solve this problem, in FastDB some global objects are replaced with references.
As a result destructors are never called for these objects. But it leads to memory leaks
in application. These memory leaks can not actually cause some problems (like memory exhaustion),
but it case warning in some memory-checking utilities. And many C++ programmers like to use such utilities
to check if all emory is correctly deallocated. To avoid these warning for FastDB objects you can
register with atexit
system function static method dbDatabase::clenup
,
which will delete all objects allocated by FastDB.
dbDefaultInitDatabaseSize
parameter. Reducing the initial database size can only cause only additional database
reallocations when the database is growing. Reallocation is expensive operation (it requires
unmapping the file and mapping it to the new location, as a result the modified pages are
first saved to the disk, and then has to be reloaded from the disk), so reducing number
of reallocations is significant. FastDB duplicates the virtual memory size used by the
database during reallocation.
By changing dbDefaultInitDatabaseSize
you could not make database
file to become smaller than 1Mb. This is because of another parameter -
dbDefaultInitIndexSize
, which specify size of object index.
Default value is 64k, each entry is 4 bytes long and there two instances of the index -
primary and shadow. So only the index occupies 512k of memory. Another 512k is reserved for
storing data. So the initial size of the database can be reduced by decreasing
dbDefaultInitIndexSize
parameter. But I do not recommend to do it, because
it is unlikely that there will be less than several thousands objects in the database.
It is important to notice that unlike normal mode, FastDB is not able to extend database in diskless
mode. To be able to reallocate memory mapped object (to extend its size), FastDB will have
to create somewhere (in memory) temporary buffer to hold all database data, copy all data to this buffer,
destruct original memory mapping object, create new memory mapping object, copy data from buffer to the
new memory mapping object and then destruct the buffer. So if database size was N
, then
reallocation requires 3*N
free memory and copying of all database data two times.
I think this is too large memory and CPU overhead.
So in diskless mode it is necessary to specify maximal database size in the constructor of
dbDatabase
(dbInitSize
parameter). Specifying large size will not cause
some significant system overhead, because physical pages will be in any case allocated on demand.
So it is not so difficult to avoid exhaustion of database size by specifying very large
value in dbInitSize
parameter). The only restriction is that specified size should not
be greater than size of virtual memory in the system (usually size of swap + physical memory).
In addition to DISKLESS_CONFIGURATION switch, FastDB also provides additional
switches to control used OS API: USE_POSIX_MMAP, USE_POSIX_SEMAPHORES, NO_MMAP
.
Use them only if mechanism used by FastDB by default doesn't work (or work inefficiently
at your system. Some of these switches and enables implicitly once you have
choose particular configuration. The following table summarize usage of these switches:
Set macrors | Implies | Data file | Monitor | Description |
---|---|---|---|---|
USE_POSIX_MMAP=0 | NO_MMAP=1 | shmat | shmat | normal access to database without mmap |
USE_POSIX_MMAP=1 | mmap | mmap | private process database | |
NO_MMAP | malloc | shmat | normal access | |
DISKLESS_CONFIGURATION | USE_POSIX_MMAP=0 | shmat | shmat | transient public database |
DISKLESS_CONFIGURATION USE_POSIX_MMAP=1 | mmap | mmap | transient private database |
All combinations not defined in this table are considered to be invalid. Also it is not recommended to set these macros explicitly and use configuration switches (like DISKLESS_CONFIGURATION) instead.
REGISTER_UNASSIGNED
macro.
dbCursor
constructor.
insert
is now member of dbDatabase class (if
C++ compiler doesn't support member templates, you should define
NO_MEMBER_TEMPLATES
macro and path pointer to database as first parameter).
For examle:
class MyClass { ... }; REGISTER_UNASSIGNED(MyClass); dbDatabase* db; dbCursor<MyClass> cursor(db); MyClass rec; if (cursor.select(q) == 0) { db->insert(rec); }
I will provide e-mail support and help you with development of FastDB applications.
Look for new version at my homepage | E-Mail me about bugs and problems