Introduction

FastDB is a highly efficient main memory database system with realtime capabilities and convenient C++ interface. FastDB doesn't support a client-server architecture and all applications using a FastDB database should run at the same host. FastDB is optimized for applications with dominated read access pattern. High speed of query execution is provided by the elimination of data transfer overhead and a very effective locking implementation. The Database file is mapped to the virtual memory space of each application working with the database. So the query is executed in the context of the application, requiring no context switching and data transfer. Synchronization of concurrent database access is implemented in FastDB by means of atomic instructions, adding almost no overhead to query processing. FastDB assumes that the whole database is present in RAM and optimizes the search algorithms and structures according to this assumption. Moreover, FastDB has no overhead caused by database buffer management and needs no data transfer between a database file and buffer pool. That is why FastDB will work significantly faster than a traditional database with all data cached in buffers pool.

FastDB supports transactions, online backup and automatic recovery after system crash. The transaction commit protocol is based on a shadow root pages algorithm, performing atomic update of the database. Recovery can be done very fast, providing high availability for critical applications. Moreover, the elimination of transaction logs improves the total system performance and leads to a more effective usage of system resources.

FastDB is an application-oriented database. Database tables are constructed using information about application classes. FastDB supports automatic scheme evaluation, allowing you to do changes only in one place - in your application classes. FastDB provides a flexible and convenient interface for retrieving data from the database. A SQL-like query language is used to specify queries. Such post-relational capabilities as non-atomic fields, nested arrays, user-defined types and methods, direct interobject references simplifies the design of database applications and makes them more efficient.

Although FastDB is optimized in the assumption that database as a whole fits into the physical memory of the computer, it is also possible to use it with databases, the size of which exceeds the size of the physical memory in the system. In the last case, standard operating system swapping mechanisms will work. But all FastDB search algorithms and structures are optimized under the assumption of residence of all data in memory, so the efficiency for swapped out data will not be very high.

Query language

FastDB supports a query language with SQL-like syntax. FastDB uses a notation more popular for object-oriented programming then for a relational database. Table rows are considered as object instances, the table is the class of these objects. Unlike SQL, FastDB is oriented on work with objects instead of SQL tuples. So the result of each query execution is a set of objects of one class. The main differences of the FastDB query language from standard SQL are:

  1. There are no joins of several tables and nested subqueries. The query always returns a set of objects from one table.
  2. Standard C types are used for atomic table columns.
  3. There are no NULL values, except null references. I completely agree with C.J. Date's criticism of three-value logic and his proposal to use default values instead.
  4. Structures and arrays can be used as record components. A special exists quantor is provided for locating elements in arrays.
  5. Parameterless user methods can be defined for table records (objects) as well as for record components.
  6. User functions with (only) one single string or numeric argument can be defined by the application.
  7. References between objects are supported including automatic support for inverse references.
  8. Construction of start from follow by performs a recursive records traversal using references.
  9. Because the query language is deeply integrated into C++ classes, a case sensitive mode is used for language identifiers as well as for keywords.
  10. No implicit conversion of integer and floating types to string representation is done. If such conversion is needed, it must be done explicitly.

The following rules in BNF-like notation specify the grammar of the FastDB query language search predicates:

Grammar conventions
ExampleMeaning
expressionnon-terminals
notterminals
|disjoint alternatives
(not)optional part
{1..9}repeat zero or more times

select-condition ::= ( expression ) ( traverse ) ( order )
expression ::= disjunction
disjunction ::= conjunction 
        | conjunction or disjunction
conjunction ::= comparison 
        | comparison and conjunction
comparison ::= operand = operand 
        | operand != operand 
        | operand <> operand 
        | operand < operand 
        | operand <= operand 
        | operand > operand 
        | operand >= operand 
        | operand (not) like operand 
        | operand (not) like operand escape string
        | operand (not) match operand
        | operand (not) in operand
        | operand (not) in expressions-list
        | operand (not) between operand and operand
	| operand is (not) null
operand ::= addition
additions ::= multiplication 
        | addition +  multiplication
        | addition || multiplication
        | addition -  multiplication
multiplication ::= power 
        | multiplication * power
        | multiplication / power
power ::= term
        | term ^ power
term ::= identifier | number | string 
        | true | false | null 
	| current | first | last
	| ( expression ) 
        | not comparison
	| - term
	| term [ expression ] 
	| identifier . term 
	| function term
        | exists identifier : term
function ::= abs | length | lower | upper
        | integer | real | string | user-function
string ::= ' { { any-character-except-quote } ('') } '
expressions-list ::= ( expression { , expression } )
order ::= order by sort-list
sort-list ::= field-order { , field-order }
field-order ::= [length] field (asc | desc)
field ::= identifier { . identifier }
traverse ::= start from field ( follow by fields-list )
fields-list ::=  field { , field }
user-function ::= identifier

Identifiers are case sensitive, begin with a a-z, A-Z, '_' or '$' character, contain only a-z, A-Z, 0-9, '_' or '$' characters, and do not duplicate a SQL reserved word.

List of reserved words
absandascbetweenby
currentdescescapeexistsfalse
firstfollowfromininteger
islengthlikelastlower
matchnotnullorreal
startstringtrueupper

ANSI-standard comments may also be used. All characters after a double-hyphen up to the end of the line are ignored.

FastDB extends ANSI standard SQL operations by supporting bit manipulation operations. Operators and/or can be applied not only to boolean operands but also to operands of integer type. The result of applying the and/or operator to integer operands is an integer value with bits set by the bit-AND/bit-OR operation. Bit operations can be used for efficient implementation of small sets. Also the rasing to a power operation (x^y) is supported by FastDB for integer and floating point types.

Structures

FastDB accepts structures as components of records. Fields of the structure can be accessed using the standard dot notation: company.address.city

Structure fields can be indexed and used in an order by specification. Structures can contain other structures as their components; there are no limitations on the nesting level.

The programmer can define methods for structures, which can be used in queries with the same syntax as normal structure components. Such a method should have no arguments except a pointer to the object to which it belongs (the this pointer in C++), and should return an atomic value (of boolean, numeric, string or reference type). Also the method should not change the object instance (immutable method). If the method returns a string, this string should be allocated using the new char operator, because it will be deleted after copying of its value.

So user-defined methods can be used for the creation of virtual components - components which are not stored in the database, but instead are calculated using values of other components. For example, the FastDB dbDateTime type contains only integer timestamp components and such methods as dbDateTime::year(), dbDateTime::month()... So it is possible to specify queries like: "delivery.year = 1999" in an application, where the delivery record field has dbDateTime type. Methods are executed in the context of the application, where they are defined, and are not available to other applications and interactive SQL.

Arrays

FastDB accepts arrays with dynamic length as components of records. Multidimensional arrays are not supported, but it is possible to define an array of arrays. It is possible to sort records in the result set by length of array field. FastDB provides a set of special constructions for dealing with arrays:

  1. It is possible to get the number of elements in the array by the length() function.
  2. Array elements can be fetched by the[] operator. If an index expression is out of array range, an exception will be raised.
  3. The operator in can be used to check if an array contains a value specified by the left operand. This operation can be used only for arrays of atomic type: with boolean, numeric, reference or string components.
  4. Array can be updated using update method which creates copy of the array and returns non-constant reference.
  5. Iteration through array elements is performed by the exists operator. A variable specified after the exists keyword can be used as an index in arrays in the expression preceeded by the exists quantor. This index variable will iterate through all possible array index values, until the value of the expression will become true or the index runs out of range. The condition
            exists i: (contract[i].company.location = 'US')
    
    will select all details which are shipped by companies located in 'US', while the query
            not exists i: (contract[i].company.location = 'US')
    
    will select all details which are shipped from companies outside 'US'.

    Nested exists clauses are allowed. Using nested exists quantors is equivalent to nested loops using the correspondent index variables. For example the query

            exists column: (exists row: (matrix[column][row] = 0))
    
    will select all records, containing 0 in elements of a matrix field, which has type array of array of integer. This construction is equivalent to the following two nested loops:
           bool result = false;
           for (int column = 0; column < matrix.length(); column++) { 
                for (int row = 0; row < matrix[column].length(); row++) { 
    	         if (matrix[column][row] == 0) { 
                         result = true;
    		     break;
                     }
                }
           }
    
    The order of using indices is essential! The result of the following query execution
            exists row: (exists column: (matrix[column][row] = 0))
    
    will be completely different from the result of the previous query. In the last case, the program simply hangs due to an infinite loop in case of empty matrices.

Strings

All strings in FastDB have varying length and the programmer should not worry about specification of maximal length for character fields. All operations acceptable for arrays are also applicable to strings. In addition to them, strings have a set of own operations. First of all, strings can be compared with each other using standard relation operators. At present, FastDB supports only the ASCII character set (corresponds to type char in C) and byte-by-byte comparison of strings ignoring locality settings.

The operator like can be used for matching a string with a pattern containing special wildcard characters '%' and '_'. The character '_' matches any single character, while the character '%' matches zero or more characters. An extended form of the like operator together with the escape keyword can be used to handle the characters '%' and '_' in the pattern as normal characters if they are preceded by a special escape character, specified after the escape keyword.

If you rebuild GigaBASE with USE_REGEX macro, then you can use match operator implementing standard regular expressions (based on GNU regex library). Second operand of this operator specified regular expression to be matched and should be string literal.

It is possible to search substrings within a string by the in operator. The expression ('blue' in color) will be true for all records which color field contains 'blue'. If the length of the searched string is greater than some threshold value (currently 512), a Boyer-Moore substring search algorithm is used instead of a straightforward search implementation.

Strings can be concatenated by + or || operators. The last one was added for compatibility with the ANSI SQL standard. As far as FastDB doesn't support the implicit conversion to string type in expressions, the semantic of the operator + can be redefined for strings.

References

References can be dereferenced using the same dot notation as used for accessing structure components. For example the following query
        company.address.city = 'Chicago'
will access records referenced by the company component of a Contract record and extract the city component of the address field of the referenced record from the Supplier table.

References can be checked for null by is null or is not null predicates. Also references can be compared for equality with each other as well as with the special null keyword. When a null reference is dereferenced, an exception is raised by FastDB.

There is a special keyword current, which during a table search can be used to refer to the current record. Usually , the current keyword is used for comparison of the current record identifier with other references or locating it within an array of references. For example, the following query will search in the Contract table for all active contracts (assuming that the field canceledContracts has a dbArray< dbReference<Contract> > type):

        current not in supplier.canceledContracts

FastDB provides special operators for recursive traverse of records by references:

     start from root-references
     ( follow by list-of-reference-fields )
The first part of this construction is used to specify root objects. The nonterminal root-references should be a variable of reference or of array of reference type. The two special keywords first and last can be used here, locating the first/last record in the table correspondingly. If you want to check all records referenced by an array of references or a single reference field for some condition, then this construction can be used without the follow by part.

If you specify the follow by part, then FastDB will recursively traverse the table of records, starting from the root references and using a list-of-reference-fields for transition between records. The list-of-reference-fields should consist of fields of reference or of array of reference type. The traverse is done in depth first top-left-right order (first we visit the parent node and then the siblings in left-to-right order). The recursion terminates when a null reference is accessed or an already visited record is referenced. For example the following query will search a tree of records with weight larger than 1 in TLR order:

        "weight > 1 start from first follow by left, right"

For the following tree:

                              A:1.1
              B:2.0                             C:1.5
      D:1.3         E:1.8                F:1.2         G:0.8
the result of the query execution will be:
('A', 1.1), ('B', 2.0), ('D', 1.3), ('E', 1.8), ('C', 1.5), ('F', 1.2)

As was already mentioned FastDB always manipulates with objects and doesn't accept joins. Joins can be implemented using references. Consider the classical Supplier-Shipment-Detail examples:

struct Detail { 
    char const* name;
    double      weight;
    
    TYPE_DESCRIPTOR((KEY(name, INDEXED), FIELD(weight)));
};

struct Supplier { 
    char const* company;
    char const* address;

    TYPE_DESCRIPTOR((KEY(company, INDEXED), FIELD(address)));
};

struct Shipment { 
    dbReference<Detail>   detail;
    dbReference<Supplier> supplier;
    int4                  price;
    int4                  quantity;
    dbDateTime            delivery;

    TYPE_DESCRIPTOR((KEY(detail, HASHED), KEY(supplier, HASHED), 
		     FIELD(price), FIELD(quantity), FIELD(delivery)));
};
We want to get information about delivery of some concrete details from some concrete suppliers. In relational database this query will be written something like this:
     select from Supplier,Shipment,Detail where 
                 Supplier.SID = Shipment.SID and Shipment.DID = Detail.DID 
		 and Supplier.company like ? and Supplier.address like ?
		 and Detail.name like ? 
In FastDB this request should be written as:
     dbQuery q = "detail.name like",name,"and supplier.company like",company,
	         "and supplier.address like",address,"order by price";
FastDB will first perform index search in the table Detail for details matching the search condition. Then it performs another index search to locate shipment records referencing selected details. Then sequential search is used to check the rest of select predicate.

Functions

Predefined functions
NameArgument typeReturn typeDescription
absintegerintegerabsolute value of the argument
absrealrealabsolute value of the argument
integerrealintegerconversion of real to integer
lengtharrayintegernumber of elements in array
lowerstringstringlowercase string
realintegerrealconversion of integer to real
stringintegerstringconversion of integer to string
stringrealstringconversion of real to string
upperstringstringuppercase string

FastDB allows user to define its own functions and operators. Function should have at least one but no more than 3 parameters of string, integer, boolean, reference or user defined (raw binary) type. It should return value of integer, real, string or boolean type.

User functions should be registered by the USER_FUNC(f) macro, which creates a static object of the dbUserFunction class, binding the function pointer and the function name.

There are two ways of implementing these functions in application. First can be used only for functions with one argument. This argument should be of int8, real8, char* types. And the function return type should be int8, real8, char* or bool. If function has more than one parameters or it can accept parameters of different types (polymorphism) then parameters should be passed as reference to dbUserFunctionArgument structure. This structure contains type field, which value can be used in function implementation to detect type of passed argument and union with argument value. The following table contains mapping between argument types and where the value should be taken from:

Argument typeArgument valueArgument value type
dbUserFunctionArgument::atIntegeru.intValueint8
dbUserFunctionArgument::atBooleanu.boolValuebool
dbUserFunctionArgument::atStringu.strValuechar const*
dbUserFunctionArgument::atRealu.realValuereal8
dbUserFunctionArgument::atReferenceu.oidValueoid_t
dbUserFunctionArgument::atRawBinaryu.rawValuevoid*

For example the following statements make it possible to use the sin function in SQL statements:

        #include <math.h>
	...
        USER_FUNC(sin);
Functions can be used only within the application, where they are defined. Functions are not accessible from other applications and interactive SQL. If a function returns a string type , the returned string should be copied by means of the operator new, because FastDB will call the destructor after copying the returned value.

In FastDB, the function argument can (but not necessarily must) be enclosed in parentheses. So both of the following expressions are valid:

        '$' + string(abs(x))
	length string y

Functions with two argument can be also used as operators. Consider the following example, in which function contains which performs case insensitive search for substring is defined:

     bool contains(dbUserFunctionArgument& arg1, dbUserFunctionArgument& arg2) { 
         assert(arg1.type == dbUserFunctionArgument::atString 
	     && arg2.type == dbUserFunctionArgument::atString);
         return stristr(arg1.u.strValue, arg2.u.strValue) != NULL;
     }

     USER_FUNC(contains);
    
     dbQuery q1, q2;
     q1 = "select * from TestTable where name contains 'xyz'";
     q2 = "select * from TestTable where contains(name, 'xyz')";
In this example, queries q1 and q2 are equivalent.

C++ interface

One of the primary goals of FastDB is to provide a flexible and convenient application language interface. Anyone who has to use ODBC or similar SQL interfaces will understand what I am speaking about. In FastDB, a query can be written in C++ in the following way:

    dbQuery q; 
    dbCursor<Contract> contracts;
    dbCursor<Supplier> suppliers;
    int price, quantity;
    q = "(price >=",price,"or quantity >=",quantity,
        ") and delivery.year=1999";
    // input price and quantity values
    if (contracts.select(q) != 0) { 
        do { 
            printf("%s\n", suppliers.at(contracts->supplier)->company);
        } while (contracts.next());
    } 

Table

Data in FastDB is stored in tables which correspond to C++ classes whereas the table records correspond to class instances. The following C++ types are accepted as atomic components of FastDB records:

TypeDescription
boolboolean type (true,false)
int1one byte signed integer (-128..127)
int2two bytes signed integer (-32768..32767)
int4four bytes signed integer (-2147483648..2147483647)
int8eight bytes signed integer (-2**63..2**63-1)
real4four bytes ANSI floating point type
real8eight bytes ANSI double precision floating point type
char const*zero terminated string
dbReference<T>reference to class T
dbArray<T>dynamic array of elements of type T

In addition to types specified in the table above, FastDB records can also contain nested structures of these components. FastDB doesn't support unsigned types to simplify the query language, to eliminate bugs caused by signed/unsigned comparison and to reduce the size of the database engine.

Unfortunately C++ provides no way to get metainformation about a class at runtime (RTTI is not supported by all compilers and also doesn't provide enough information). Therefore the programmer has to explicitly enumerate class fields to be included in the database table (it also makes mapping between classes and tables more flexible). FastDB provides a set of macros and classes to make such mapping as simple as possible.

Each C++ class or structure, which will be used in the database, should contain a special method describing its fields. The macro TYPE_DESCRIPTOR(field_list) will construct this method. The single argument of this macro is - enclosed in parentheses - a list of class field descriptors. If you want to define some methods for the class and make them available for the database, then the macro CLASS_DESCRIPTOR(name, field_list) should be used instead of TYPE_DESCRIPTOR. The class name is needed to get references to member functions.

The following macros can be used for the construction of field descriptors:

FIELD(name)
Non-indexed field with specified name.
KEY(name, index_type)
Indexed field. index_type should be a combination of HASHED and INDEXED flags. When the HASHED flag is specified, FastDB will create a hash table for the table using this field as a key. When the INDEXED flag is specified, FastDB will create a (special kind of index) T-tree for the table using this field as a key.
UDT(name, index_type, comparator)
User defined raw binary type. Database deals with this type just as with sequence of bytes of specified size. This field can be used in query (compared with query parameter of the same type), may be indexed and used in order by clause. Comparison is performed by means of comparator function provided by programmer. Comparator functions receives three arguments: two pointers to the compared raw binary objects and size of binary object. The semantic of index_type is the same as of KEY macro.
RAWKEY(name, index)
Raw binary type with predefined comparator. This macro is just specialized version of UDT macro with memcmp used as comparator.
RAWFIELD(name)
One more specialization of UDT macro for raw binary fields with predefined comparator memcmp and without indices.
SUPERCLASS(name)
Specifies information about the base class (parent) of the current class.
RELATION(reference, inverse_reference)
Specifies one-to-one, one-to-many or many-to-many relationships between classes (tables). Both reference and inverse_reference fields should be of reference or of array of reference type. inverse_reference is a field of the referenced table containing the inverse reference(s) to the current table. Inverse references are automatically updated by FastDB and are used for query optimization (see Inverse references).
OWNER(reference, inverse_reference)
Specifies one-to-many or many-to-many relationship between classes (tables) of owner-member type. When owner record is removed all referenced member records are also removed (cascade delete). If member record has reference to owner class, it should be declared with RELATION macro.
METHOD(name)
Specifies a method of the class. The method should be a parameterless instance member function returning a boolean, numeric, reference or string type. Methods should be specified after all other attributes of the class.

Although only atomic fields can be indexed, an index type can be specified for structures. The index will be created for components of the structure only if such type of index is specified in the index type mask of the structure. This allows the programmers to enable or disable indices for structure fields depending on the role of the structure in the record.

The following example illustrates the creation of a type descriptor in the header file:

class dbDateTime { 
    int4 stamp;
  public:
 
    int year() { 
	return localtime((time_t*)&stamp)->tm_year + 1900;
    }
    ...

    CLASS_DESCRIPTOR(dbDateTime, 
		     (KEY(stamp,INDEXED|HASHED), 
		      METHOD(year), METHOD(month), METHOD(day),
		      METHOD(dayOfYear), METHOD(dayOfWeek),
		      METHOD(hour), METHOD(minute), METHOD(second)));
};    

class Detail { 
  public:
    char const* name;
    char const* material;
    char const* color;
    real4       weight;

    dbArray< dbReference<Contract> > contracts;

    TYPE_DESCRIPTOR((KEY(name, INDEXED|HASHED), 
		     KEY(material, HASHED), 
		     KEY(color, HASHED),
		     KEY(weight, INDEXED),
		     RELATION(contracts, detail)));
};

class Contract { 
  public:
    dbDateTime            delivery;
    int4                  quantity;
    int8                  price;
    dbReference<Detail>   detail;
    dbReference<Supplier> supplier;

    TYPE_DESCRIPTOR((KEY(delivery, HASHED|INDEXED), 
		     KEY(quantity, INDEXED), 
		     KEY(price, INDEXED),
		     RELATION(detail, contracts),
		     RELATION(supplier, contracts)));
};
Type descriptors should be defined for all classes used in the database. In addition to defining type descriptors, it is necessary to establish a mapping between C++ classes and database tables. The macro REGISTER(name) will do it. Unlike the TYPE_DESCRIPTOR macro, the REGISTER macro should be used in the implementation file and not in the header file. It constructs a descriptor of the table associated with the class. If you are going to work with multiple databases from one application, it is possible to register a table in a concrete database by means of the REGISTER_IN(name,database) macro. The parameter database of this macro should be a pointer to the dbDatabase object. You can register tables in the database as follows:

REGISTER(Detail);
REGISTER(Supplier);
REGISTER(Contract);
The table (and correspondent class) can be used only with one database at each moment of time. When you open a database, FastDB imports into the database all classes defined in the application. If a class with the same name already exists in the database, its descriptor stored in the database is compared with the descriptor of this class in the application. If the class definitions differ, FastDB tries to convert records from the table to the new format. Any kind of conversion between numeric types (integer to real, real to integer, with extension or truncation) is allowed. Also, addition of new fields can be easily handled. But removal of fields is only possible for empty tables (to avoid accidental data destruction).

After loading all class descriptors, FastDB checks if all indices specified in the application class descriptor are already present in the database, constructs new indices and removes indices, which are no more used. Reformatting the table and adding/removing indices is only possible when no more than one application accesses the database. So when the first application is attached to the database, it can perform table conversion. All other applications can only add new classes to the database.

There is one special internal database Metatable, which contains information about other tables in the database. C++ programmers need not access this table, because the format of database tables is specified by C++ classes. But in an interactive SQL program, it may be necessary to examine this table to get information about record fields.

Starting from version 2.30 FastDB supports autoincrement fields (fields unique value to which are assigned automaticaly by database). To be able to use them you should:

  1. Recompile FastDB and your application with -DAUTOINCREMENT_SUPPROT flags (add this flag to DEFS variables in FastDB makefile).
    Attention: database files created by FastDB compiled without this option will be incompatible with FastDB compiled with DAUTOINCREMENT_SUPPORT.
  2. If you want to use other than 0 initial counter value, you should asssign value to dbTableDescriptor::initialAutoincrementCount. It will be shared between all tables, so all table will have the same initial value of autoincrement counter.
  3. Autoincrement fields should be of int4 type and should be declared with AUTOINCREMENT flag:
            class Record {
                 int4 rid;
                 char const* name;
                 ...
           
                 TYPE_DESCRIPTOR((KEY(rid, AUTOINCREMENT|INDEXED), FIELD(name), ...));
           }
    
  4. When record with autoincrement field is inserted in the database there is no need to specify value of autoincremented field (it will be ignored). After successful insertion of record this field will be assigned unique value (which is guaranteed to be not used before this table):
           Record rec;
           // no rec.rid should be specified
           rec.name = "John Smith";
           insert(rec);
           // rec.rid now assigned unique value
           int newRecordId  = rec.rid; // and can be used to reference this record
    
  5. When record is removed the value will not be reused. When transaction is aborted, table autoincrement counter is also rolled back.

Query

The class query is used to serve two purposes:
  1. to construct a query and bind query parameters
  2. to cache compiled queries
FastDB provides overloaded '=' and ',' C++ operators to construct query statements with parameters. Parameters can be specified directly in places where they are used, eliminating any mapping between parameter placeholders and C variables. In the following sample query, pointers to the parameters price and quantity are stored in the query, so that the query can be executed several times with different parameter values. C++ overloaded functions make it possible to automatically determine the type of the parameter, requiring no extra information to be supplied by the programmer (such reducing the possibility of a bug).
        dbQuery q;
        int price, quantity;
        q = "price >=",price,"or quantity >=",quantity;
Since the char* type can be used both for specifying a fraction of a query (such as "price >=") and for a parameter of string type, FastDB uses a special rule to resolve this ambiguity. This rule is based on the assumption that there is no reason for splitting a query text into two strings like ("price ",">=") or specifying more than one parameter sequentially ("color=",color,color). So FastDB assumes the first string to be a fraction of the query text and switches to operand mode after it. In operand mode, FastDB treats the char* argument as a query parameter and switches back to query text mode, and so on... It is also possible not to use this "syntax sugar" and construct query elements explicitly by the dbQuery::append(dbQueryElement::ElementType type, void const* ptr) method. Before appending elements to the query, it is necessary to reset the query by the dbQuery::reset() method ('operator=' does it automatically).

It is not possible to use C++ numeric constants as query parameters, because parameters are accessed by reference. But it is possible to use string constants, because strings are passed by value. There two possible ways of specifying string parameters in a query: using a string buffer or a pointer to pointer to string:

     dbQuery q;
     char* type;
     char name[256];
     q = "name=",name,"and type=",&type;

     scanf("%s", name);
     type = "A";     
     cursor.select(q);
     ...
     scanf("%s", name);
     type = "B";     
     cursor.select(q);
     ...

Query variables can neither be passed to a function as a parameter nor be assigned to another variable. When FastDB compiles the query, it saves the compiled tree in this object. The next time the query will be used, no compilation is needed and the already compiled tree can be used. It saves some time needed for query compilation.

FastDB provides two approaches to integrate user-defined types in databases. The first - the definition of class methods - was already mentioned. The other approach deals only with query construction. Programmers should define methods, which will not do actual calculations, but instead return an expression (in terms of predefined database types), which performs the necessary calculation. It is better to describe it by example. FastDB has no builtin datetime type. Instead of this, a normal C++ class dbDateTime can be used by the programmer. This class defines methods allowing to specify datetime fields in ordered lists and to compare two dates using normal relational operators:

class dbDateTime { 
    int4 stamp;
  public:
    ...
    dbQueryExpression operator == (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"=",stamp;
	return expr;
    }
    dbQueryExpression operator != (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<>",stamp;
	return expr;
    }
    dbQueryExpression operator < (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),">",stamp;
	return expr;
    }
    dbQueryExpression operator <= (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),">=",stamp;
	return expr;
    }
    dbQueryExpression operator > (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<",stamp;
	return expr;
    }
    dbQueryExpression operator >= (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<=",stamp;
	return expr;
    }
    friend dbQueryExpression between(char const* field, dbDateTime& from,
				     dbDateTime& till)
    { 
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp"),"between",from.stamp,"and",till.stamp;
	return expr;
    }

    friend dbQueryExpression ascent(char const* field) { 
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp");
	return expr;
    }	
    friend dbQueryExpression descent(char const* field) { 
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp"),"desc";
	return expr;
    }	
};
All these methods receive as their parameter a name of a field in the record. This name is used to contract the full name of the record's component. This can be done by class dbComponent, which constructor takes the name of the structure field and the name of the component of the structure and returns a compound name separated by a '.' symbol. The class dbQueryExpression is used to collect expression items. The expression is automatically enclosed in parentheses, eliminating conflicts with operator precedence.

So, assuming a record containing a field delivery of dbDateTime type, it is possible to construct queries like these:

        dbDateTime from, till;
        q1 = between("delivery", from, till),"order by",ascent("delivery");
        q2 = till >= "delivery"; 
In addition to these methods, some class specific method can be defined in such way, for example the method overlaps for a region type. The benefit of this approach is that a database engine will work with predefined types and is able to apply indices and other optimizations to proceed such query. And from the other side, the encapsulation of the class implementation is preserved, so programmers should not rewrite all queries when a class representation is changed.

Variables of the following C++ types can be used as query parameters:

int1bool
int2char const*
int4char **
int8char const**
real4dbReference<T>
real8dbArray< dbReference<T> >

Cursor

Cursors are used to access records returned by a select statement. FastDB provides typed cursors, i.e. cursors associated with concrete tables. There are two kinds of cursors in FastDB: readonly cursors and cursors for update. Cursors in FastDB are represented by the C++ template class dbCursor<T>, where T is the name of a C++ class associated with the database table. The cursor type should be specified in the constructor of the cursor. By default, a read-only cursor is created. To create a cursor for update, you should pass a parameter dbCursorForUpdate to the constructor.

A query is executed either by the cursor select(dbQuery& q) method. Or by the select() method, which can be used to iterate through all records in the table. Both methods return the number of selected records and set the current position to the first record (if available). A cursor can be scrolled in forward or backward direction. The methods next(), prev(), first(), last() can be used to change the current position of the cursor. If no operation can be performed as there are no (more) records available, these methods return NULL and the cursor position is not changed.

A cursor for class T contains an instance of class T, used for fetching the current record. That is why table classes should have a default constructor (constructor without parameters), which has no side effects. FastDB optimizes fetching records from the database, copying only data from fixed parts of the object. String bodies are not copied, instead of this the correspondent field points directly into the database. The same is true for arrays: their components have the same representation in the database as in the application (arrays of scalar types or arrays of nested structures of scalar components).

An application should not change elements of strings and arrays in a database directly. When an array method needs to update an array body, it creates an in-memory copy of the array and updates this copy. If the programmer wants to update a string field, she/he should assign to the pointer a new value, but don't change the string directly in the database. It is recommended to use the char const* type instead of the char* type for string components, to enable the compiler to detect the illegal usage of strings.

The cursor class provides the get() method for obtaining a pointer to the current record (stored inside the cursor). Also the overloaded 'operator->' can be used to access components of the current record. If a cursor is opened for update, the current record can be changed and stored in the database by the update() method or can be removed. If the current record is removed, the next record becomes the current. If there is no next record, then the previous record (if it exists) becomes the current. The method removeAll() removes all records in the table. Whereas the method removeAllSelected only removes all records selected by the cursor.

When records are updated, the size of the database may increase. Thus an extension of the database section in the virtual memory is needed. As a result of such remapping, base addresses of the section can be changed and all pointers to database fields kept by applications will become invalid. FastDB automatically updates current records in all opened cursors when a database section is remapped. So, when a database is updated, the programmer should access record fields only through the cursor -> method. She/he should not use pointer variables.

Memory used for the current selection can be released by the reset() method. This method is automatically called by the select(), dbDatabase::commit(), dbDatabase::rollback() methods and the cursor destructor, so in most cases there is no need to call the reset() method explicitly.

Cursors can also be used to access records by reference. The method at(dbReference<T> const& ref) sets the cursor to the record pointed to by the reference. In this case, the selection consists exactly of one record and the next(), prev() methods will always return NULL. Since cursors and references in FastDB are strictly typed, all necessary checking can be done statically by the compiler and no dynamic type checking is needed. The only kind of checking, which is done at runtime, is checking for null references. The object identifier of the current record in the cursor can be obtained by the currentId() method.

It is possible to restrict the number of records returned by a select statement. The cursor class has the two methods setSelectionLimit(size_t lim) and unsetSelectionLimit(), which can be used to set/unset the limit of numbers of records returned by the query. In some situations, a programmer may want to receive only one record or only few first records; so the query execution time and size of consumed memory can be reduced by limiting the size of selection. But if you specify an order for selected records, the query with the restriction to k records will not return the first k records with the smallest value of the key. Instead of this, arbitrary k records will be taken and then sorted.

So all operations with database data can be performed by means of cursors. The only exception is the insert operation, for which FastDB provides an overloaded insert function:

        template<class T>
        dbReference<T> insert(T const& record);
This function will insert a record at the end of the table and return a reference of the created object. The order of insertion is strictly specified in FastDB and applications can use this assumption about the record order in the table. For applications widely using references for navigation between objects, it is necessary to have some root object, from which a traversal by references can be made. A good candidate for such root object is the first record in the table (it is also the oldest record in the table). This record can be accessed by execution of the select() method without parameter. The current record in the cursor will be the first record in the table.

The C++ API of FastDB defines a special null variable of reference type. It is possible to compare the null variable with references or assign it to the reference:

        void update(dbReference<Contract> c) {
            if (c != null) { 
	        dbCursor<Contract> contract(dbCursorForUpdate);
		contract.at(c);
		contract->supplier = null;
            }
        }
Query parameters usually are bound to C++ variables. In most cases in is convenient and flexible mechanism. But in multithreaded application, there is no warranty that the same query will not be executed at the same moment of time by another thread with different values of parameters. One solution is to use synchronization primitives (critical sections or mutexes) to prevent concurrent execution of the query. But this will lead to performance degradation. FastDB is able to perform read requests in parallel, increasing total system throughput. The other solution is to use delayed parameter binding. This approach is illustrated by the following example:

dbQuery q;

struct QueryParams { 
    int         salary;
    int         age;
    int         rank;
};

void open()
{
    QueryParams* params = (QueryParams*)NULL;
    q = "salary > ", params->salary, "and age < ", params->age, "and rank =", params->rank;
}

void find(int salary, int age, int rank) 
{ 
    QueryParams params;
    params.salary = salary;
    params.age = age;
    params.rank = rank;
    dbCursor<Person> cusor;
    if (cursor.select(q, ¶ms) > 0) { 
        do { 
	    cout << cursor->name << NL;
        } while (cursor.next());
    }
}
So in this example function open binds query parameters just to offsets of fields in structure. Later in find functions, actual pointer to the structure with parameters is passed to the select structure. Function find can be concurrently executed by several threads and only one compiled version of the query is used by all these threads. This mechanism is available since version 2.25.

Database

The class dbDatabase controls the application interactions with the database. It performs synchronization of concurrent accesses to the database, transaction management, memory allocation, error handling,...

The constructor of dbDatabase objects allows programmers to specify some database parameters:

    dbDatabase(dbAccessType type = dbAllAccess,
	       size_t dbInitSize = dbDefaultInitDatabaseSize,
	       size_t dbExtensionQuantum = dbDefaultExtensionQuantum,
	       size_t dbInitIndexSize = dbDefaultInitIndexSize,
	       int nThreads = 1);
The following database access type are supported:

Access typeDescription
dbDatabase::dbReadOnlyRead only mode
dbDatabase::dbAllAccessNormal mode
dbDatabase::dbConcurrentReadRead only mode in which application can access the database concurrently with application updating the same database in dbConcurrentUpdate mode
dbDatabase::dbConcurrentUpdateMode to be used in conjunction with dbConcurrentRead to perform updates in the database without blocking read applications for a long time

When the database is opened in readonly mode, no new class definitions can be added to the database and definitions of existing classes and indices can not be altered.

dbConcurrentUpdate and dbConcurrentRead modes should be used together when database is mostly accessed in readonly mode and updates should not block readers for a long time. In this mode update of the database can be performed concurrently with read accesses (readers will not see changed data until transaction is committed). Only at update transaction commit time, exclusive lock is set but immediately released after incremental change of the current object index.

So you can start one or more applications using dbConcurrentRead mode and all their read-only transactions will be executed concurrently. You can also start one or more applications using dbConcurrentUpdate mode. All transactions of such applications will be synchronized using additional global mutex. So all these transactions (even read-only) will be executed exclusively. But transactions of the application running in dbConcurrentUpdate mode can run concurrently with transaction of applications running in dbConcurrentRead mode! Please look at testconc.cpp example, illustrating usage of these modes

Attension! Do not mix dbConcurrentUpdate and dbConcurrentRead mode with other modes and do not use them together in one process (so it is not possible to start two threads in one of which open database in dbConcurrentUpdate mode and in other - in dbConcurrentRead). Do not use dbDatabase::precommit method in dbConcurrentUpdate mode.

The parameter dbInitSize specifies the initial size of the database file. The database file increases on demand; setting the initial size can only reduce the number of reallocations (which can take a lot of time). In the current implementation of the FastDB database the size is at least doubled at each extension. The default value of this parameter is 4 megabytes.

The parameter dbExtensionQuantum specifies the quantum of extension of the memory allocation bitmap. Briefly speaking, the value of this parameter specifies how much memory will be allocated sequentially without attempt to reuse space of deallocated objects. The default value of this parameter is 4 Mb. See section Memory allocation for more details.

The parameter dbInitIndexSize specifies the initial index size. All objects in FastDB are accessed through an object index. There are two copies of this object index: current and committed. Object indices are reallocated on demand; setting an initial index size can only reduce (or increase) the number of reallocations. The default value of this parameter is 64K object identifiers.

And the last parameter nThreads controls the level of query parallelization. If it is greater than 1, then FastDB can start the parallel execution of some queries (including sorting the result). The specified number of parallel threads will be spawned by the FastDB engine in this case. Usually it does not make sense to specify the value of this parameter to be greater than the number of online CPUs in the system. It is also possible to pass zero as the value of this parameter. In this case, FastDB will automatically detect the number of online CPUs in the system. The number of threads also can be set by the dbDatabase::setConcurrency method at any moment of time.

The class dbDatabase contains a static field dbParallelScanThreshold, which specifies a threshold for the number of records in the table after which query parallelization is used. The default value of this parameter is 1000.

The database can be opened by the open(char const* databaseName, char const* fileName = NULL, unsigned waitLockTimeout = INFINITE) method. If the file name parameter is omitted, it is constructed from the database name by appending the ".fdb" suffix. The database name should be an arbitrary identifier consisting of any symbols except '\'. The last parameter waitLockTimeout can be set to prevent locking of all active processes working with the database when some of them is crashed. If the crashed process had locked the database, then no other process can continue execution. To prevent it, you can specify maximal delay for waiting for the lock, after expiration of which system will try to perform recovery and continue execution of active processes. The method open returns true if the database was successfully opened; or false if the open operation failed. In the last case, the database handleError method is called with a DatabaseOpenError error code. A database session can be terminated by the close method, which implicitly commits current transactions.

In a multithreaded application each thread, which wants to access the database, should first be attached to it. The method dbDatabase::attach() allocates thread specific data and attaches the thread to the database. This method is automatically called by the open() method, so there is no reason to call the attach() method for the thread, which opens the database. When the thread finishes work with the database, it should call the dbDatabase::detach() method. The method close automatically invokes the detach() method. The method detach() implicitly commits current transactions. An attempt to access a database by a detached thread causes an assertion failure.

FastDB is able to perform compilation and execution of queries in parallel, providing significant increase of performance in multiprocessor systems. But concurrent updates of the database are not possible (this is the price for the efficient log-less transaction mechanism and zero time recovery). When an application wants to modify the database (open a cursor for update or insert a new record in the table), it first locks the database in exclusive mode, prohibiting accesses to the database by other applications, even for read-only queries. So to avoid blocking of database applications for a long time, the modification transaction should be as short as possible. No blocking operations (like waiting for input from the user) should be done within this transaction.

Using only shared and exclusive locks on the database level, allows FastDB to almost eliminate overhead of locking and to optimize the speed of execution of non-conflicting operations. But if many applications simultaneously update different parts of the database, then the approach used in FastDB will be very inefficient. That is why FastDB is most suitable for a single-application database access model or for multiple applications with a read-dominated access pattern model.

Cursor objects should be used only by one thread in a multithreaded application. If there are more than one threads in your application, use local variables for cursors in each thread. It is possible to share query variables between threads, but take care about query parameters. The query should either has no parameters, or relative form of parameters binding should be used.

The dbDatabase object is shared between all threads and uses thread specific data to perform query compilation and execution in parallel with minimal synchronization overhead. There are few global things, which require synchronization: symbol table, pool of tree node,... But scanning, parsing and execution of the query can be done without any synchronization, providing high level of concurrency at multiprocessor systems.

A database transaction is started by the first select or an insert operation. If a cursor for update is used, then the database is locked in exclusive mode, prohibiting access to the database by other applications and threads. If a read-only cursor is used, then the database is locked in shared mode, preventing other applications and threads from modifying the database, but allowing the execution of concurrent read requests. A transaction should be explicitly terminated either by the dbDatabase::commit() method, which fixes all changes done by the transaction in the database; or by the dbDatabase::rollback() method to undo all modifications done by transactions. The method dbDatabase::close() automatically commits current transactions.

If you start a transaction by performing selection using a read-only cursor and then use a cursor for update to perform some modifications of the database, the database will be first locked in shared mode; then the lock will be upgraded to exclusive mode. This can cause a deadlock problem if the database is simultaneously accessed by several applications. Imagine that application A starts a read transaction and application B also starts a read transaction. Both of them hold shared locks on the database. If both of them want to upgrade their locks to exclusive mode, they will forever block each other (exclusive lock can not be granted until a shared lock of another process exists). To avoid such situations try to use a cursor for update at the beginning of the transaction; or explicitly use the dbdatabase::lock() method. More information about the implementation of transactions in FastDB can be found in section Transactions.

It is possible to explicitly lock the database by the lock() method. Locking is usually done automatically - there are only few cases when you will want to use this method. It will lock the database in exclusive mode until the end of the current transaction.

A backup of the database can be done by the dbDatabase::backup(char const* file) method. A backup locks the database in shared mode and flushes an image of the database from main memory to the specified file. Because of using a shadow object index, the database file is always in a consistent state, so recovery from the backup can be performed just by renaming the backup file (if backup was performed on tape, it should be first restored to the disk).

The class dbDatabase is also responsible for handling various application errors, such as syntax errors during query compilation, out of range index or null reference access during query execution. There is a virtual method dbDatabase::handleError, which handles these errors:

        virtual void handleError(dbErrorClass error, 
                                 char const*  msg = NULL, 
                                 int          arg = 0);
A programmer can derive her/his own subclass from the dbDatabase class and redefine the default reaction on errors.

Error classes and default handling
ClassDescriptionArgumentDefault reaction
QueryErrorquery compilation errorposition in query stringabort compilation
ArithmeticErrorarithmetic error during division or power operations-terminate application
IndexOutOfRangeErrorindex is out if array boundsvalue of indexterminate application
DatabaseOpenErrorerror while database opening-open method will return false
FileErrorfailure of file IO operationerror codeterminate application
OutOfMemoryErrornot enough memory for object allocationrequested allocation sizeterminate application
Deadlockupgrading lock causes deadlock-terminate application
NullReferenceErrornull reference is accessed during query execution-terminate application

Call level interface

Interface described in previous section provides convenient and reliable mechanism for accessing data from C++. It has two drawbacks:
  1. It is very C++ specific and can not be used with other programming languages
  2. It is suitable only for local connections to the database (within one system).
Interface described below outcomes these two restrictions. It consists of the set of pure ANSI C functions and using it mapping of any programming language to the FastDB database can be easily implemented. Connection between client and serves is performed by sockets (either local, either standard TCP/IP sockets). Certainly this interface is less convenient and more error prone than C++ interface, but this is a cost of its flexibility. All types, constants and functions are declared in cli.h file.

FastDB provides multithreaded server for handling client CLI sessions. This server can be started from SubSQL utility by start server 'HOST:PORT' <number-of-threads> command. This server will accept local (within one system) and global clients connections and attach one thread from the threads pool to each connection. The size of thread's pool is controlled by number-of-threads parameters. But the server can spawn more than specified number of threads if there are many active connections. A thread is attached to the client until the end of the session. If session is abnormally terminated, all changes made by the client are rollbacked. The server can be stopped by correspondent stop server 'HOST:PORT' command.

CLI functions return codes
Error codeDescription
cli_okSucceful completion
cli_bad_addressInvalid format of server URL
cli_connection_refusedConnection with server could not be established
cli_bad_statementText of SQL statement is not correct
cli_parameter_not_foundParameter was not found in statement
cli_unbound_parameterParameter was not specified
cli_column_not_foundNo sucj colunm in the table
cli_incompatible_typeConversion between application and database type is not possible
cli_network_errorConnection with server is broken
cli_runtime_errorError during query execution
cli_bad_descriptorInvalid statement/session description
cli_unsupported_typeUnsupported type for parameter or colunm
cli_not_foundRecord was not found
cli_not_update_modeAttempt to update records selected by view only cursor
cli_table_not_foundThere is no table with specified name in the database
cli_not_all_columns_specifiedInsert statement doesn't specify values for all table columns
cli_not_fetchedcli_fetch method was not called
cli_already_updatedcli_update method was invoked more than once for the same record
cli_table_already_existsAttempt to create existed table
cli_not_implementedFunction is not implemented

Supported types
TypeDescriptionSize
cli_oidObject identifier4
cli_boolBoolean type1
cli_int1Timy interger type1
cli_int2Small interger type2
cli_int4Interger type4
cli_int8Big interger type8
cli_real4Single precision floating point type4
cli_real8Double precision floating point type8
cli_asciizZero terminated string of bytes1*N
cli_pasciizPointer to zero terminated string1*N
cli_array_of_oidArray of references4*N
cli_array_of_boolArray of booleans1*N
cli_array_of_int1Array of tiny integers1*N
cli_array_of_int2Array of small integers2*N
cli_array_of_int4Array of integers4*N
cli_array_of_int8Array of big integers8*N
cli_array_of_real4Array of reals4*N
cli_array_of_real8Array of long reals8*N


int cli_open(char const* server_url, 
	     int         max_connect_attempts,
	     int         reconnect_timeout_sec);
Establish connection with the server
Parameters
server_url - zero terminated string with server address and port, for example "localhost:5101", "195.239.208.240:6100",...
max_connect_attempts - number of attempts to establish connection
reconnect_timeout_sec - timeput in seconds between connection attempts
Returns
>= 0 - connectiondescriptor to be used in all other cli calls
< 0 - error code as described in cli_result_code enum

int cli_close(int session);
Close session
Parameters
session - session descriptor returned by cli_open
Returns
result code as described in cli_result_code enum

int cli_statement(int session, char const* stmt);
Specify SubSQL statement to be executed at server. Binding to the parameters and columns can be established
Parameters
session - session descriptor returned by cli_open
stmt - zero terminated string with SubSQL statement
Returns
>= 0 - statement descriptor
< 0 - error code as described in cli_result_code enum

int cli_parameter(int         statement,
		  char const* param_name, 
		  int         var_type,
		  void*       var_ptr);
Bind parameter to the statement
Parameters
statement - statememt descriptor returned by cli_statement
param_name - zero terminated string with parameter name. Paramter name should start with '%'
var_type - type of variable as described in cli_var_type enum. Only scalar and zero terminated string types are supported.
var_ptr - pointer to the variable
Returns
result code as described in cli_result_code enum

int cli_column(int         statement,
	       char const* column_name, 
	       int         var_type, 
	       int*        var_len, 
	       void*       var_ptr);
Bind extracted column of select or insert statement
Parameters
statement - statememt descriptor returned by cli_statement
column_name - zero terminated string with column name
var_type - type of variable as described in cli_var_type enum
var_len - pointer to the variable to hold length of array variable. This variable should be assigned the maximal length of the array/string buffer, pointed by var_ptr. After the execution of the statement it is assigned the real length of the fetched array/string. If it is large than length of the buffer, then only part of the array will be placed in the buffer, but var_len still will contain the actual array length.
var_ptr - pointer to the variable
Returns
result code as described in cli_result_code enum

typedef void* (*cli_column_set)(int var_type, void* var_ptr, int len);
typedef void* (*cli_column_get)(int var_type, void* var_ptr, int* len);

int cli_array_column(int            statement,
		     char const*    column_name, 
		     int            var_type,
		     void*          var_ptr,
		     cli_column_set set,
		     cli_column_get get);
Specify get/set functions for the array column
Parameters
statement - statememt descriptor returned by cli_statement
column_name - zero terminated string with column name
var_type - type of variable as described in cli_var_type enum
var_ptr - pointer to the variable
set - function which will be called to construct fetched field. It receives pointer to the variable, length of the fetched array and returns pointer to the array's elements.
get - function which will be called to update the field in the database. Given pointer to the variable, it should return pointer to the array elements and store length of the array to the variable pointer by len parameter
Returns
result code as described in cli_result_code enum

typedef void* (*cli_column_set_ex)(int var_type, void* var_ptr, int len, 
				   char const* column_name, int statement, void const* data_ptr);
typedef void* (*cli_column_get_ex)(int var_type, void* var_ptr, int* len, 
				   char const* column_name, int statemen);

int cli_array_column(int               statement,
		     char const*       column_name, 
		     int               var_type,
		     void*             var_ptr,
		     cli_column_set_ex set,
		     cli_column_get_ex get);
Specify extended get/set functions for the array column
Parameters
statement - statememt descriptor returned by cli_statement
column_name - zero terminated string with column name
var_type - type of variable as described in cli_var_type enum
var_ptr - pointer to the variable
set - function which will be called to construct fetched field. It receives type of the vartiable, pointer to the variable, length of the fetched array, name of the fetched column, statement descriptor and pointer to the array data. If this method returns not NULL pointer, database will copy unpacked array to the returned location. Otherwise it is assumed that function handle data itself.
get - function which will be called to update the field in the database. Given type of the vartiable, pointer to the variable, column name and statment descriptor, it should return pointer to the array elements and store length of the array to the variable pointer by len parameter
Returns
result code as described in cli_result_code enum

enum { 
    cli_view_only, 
    cli_for_update
};

int cli_fetch(int statement, int for_update);
Execute select statement.
Parameters
statement - statememt descriptor returned by cli_statement
for_update - not zero if fetched rows will be updated
Returns
>= 0 - success, for select statements number of fetched rows is returned
< 0 - error code as described in cli_result_code enum

int cli_insert(int statement, cli_oid_t* oid);
Execute insert statement.
Parameters
statement - statememt descriptor returned by cli_statement
oid - object identifier of created record.
Returns
status code as described in cli_result_code enum

int cli_get_first(int statement);
Get first row of the selection.
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_get_last(int statement);
Get last row of the selection.
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_get_next(int statement);
Get next row of the selecteion. If get_next records is called exactly after cli_fetch function call, is will fetch the first record in selection.
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_get_prev(int statement);
Get previous row of the selecteion. If get_next records is called exactly after cli_fetch function call, is will fetch the last record in selection.
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

cli_oid_t cli_get_oid(int statement);
Get object identifier of the current record
Parameters
statement - statememt descriptor returned by cli_statement
Returns
object identifier or 0 if no object is seleected

int cli_update(int statement);
Update the current row in the selection. You have to set for_update parameter of cli_fetch to 1 in order to be able to perform updates. Updated value of row fields will be taken from bound column variables.
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_remove(int statement);
Remove all selected records. You have to set for_update parameter of cli_fetch to 1 in order to be able to remove records.
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_free(int statement);
Deallocate statement and all associated data
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_commit(int session);
Commit current database transaction
Parameters
session - session descriptor as returned by cli_open
Returns
result code as described in cli_result_code enum

int cli_abort(int session);
Abort current database transaction
Parameters
session - session descriptor as returned by cli_open
Returns
result code as described in cli_result_code enum

int cli_show_tables(int session, cli_table_descriptor** tables);
Return list of table presetn in the database.
 
        typedef struct cli_table_descriptor {
            char const*       name;
        } cli_table_descriptor;
Parameters
session - session descriptor as returned by cli_open
tables - address of the pointer to the array with table descriptors. cli_show_tables uses malloc to allocate this array, and it should be deallocated by application using free() function.
Returns
>= 0 - number of tables in the database (Metatable is not returned/counted)
< 0 - error code as described in cli_result_code enum

int cli_describe(int session, char const* table, cli_field_descriptor** fields);
Return definition of fields of specified table. Definition of field descriptor has the following format:
 
        typedef struct cli_field_descriptor {  
            enum cli_var_type type;
            char const*       name;
        } cli_field_descriptor;
Parameters
session - session descriptor as returned by cli_open
table - name of the table
fields - address of the pointer to the array with field descriptors. cli_describe uses malloc to allocate this array, and it should be deallocated by application using free() function.
Returns
>= 0 - number of fields in the table
< 0 - error code as described in cli_result_code enum

int cli_create_table(int                   session, 
                     char const*         tableName, 
                     int                   nFields, 
		     cli_field_descriptor* fields);
Create new table
Parameters
session - session descriptor as returned by cli_open
tableName - name of the created table
nFields - number of columns in the table
fields - array with table columns descriptors. Descriptor is has the following structure:
    enum cli_field_flags { 
        cli_hashed           = 1, /* field should be indexed usnig hash table */
        cli_indexed          = 2  /* field should be indexed using B-Tree */
    };

    typedef struct cli_field_descriptor { 
        enum cli_var_type type;
        int               flags;
        char const*     name;
        char const*     refTableName;
        char const*     inverseRefFieldName;
    } cli_field_descriptor;
Returns
result code as described in cli_result_code enum

int cli_alter_table(int                   session, 
                     char_t const*         tableName, 
                     int                   nFields, 
		     cli_field_descriptor* fields);
Change format of existed table
Parameters
session - session descriptor as returned by cli_open
tableName - name of the altered table
nFields - number of columns in the table
fields - array with table columns descriptors. Descriptor is has the following structure:
    enum cli_field_flags { 
        cli_hashed           = 1, /* field should be indexed usnig hash table */
        cli_indexed          = 2, /* field should be indexed using B-Tree */
        cli_case_insensitive = 4  /* index is case insensitive */
    };

    typedef struct cli_field_descriptor { 
        enum cli_var_type type;
        int               flags;
        char_t const*     name;
        char_t const*     refTableName;
        char_t const*     inverseRefFieldName;
    } cli_field_descriptor;
Returns
result code as described in cli_result_code enum

int cli_drop_table(int                   session, 
	           char const*         tableName); 
Drop table
Parameters
session - session descriptor as returned by cli_open
tableName - name of the created table
Returns
result code as described in cli_result_code enum

int cli_alter_index(int           session, 
	            char const* tableName 
		    char const* fieldName, 
		    int           newFlags); 
Add or remove column index
Parameters
session - session descriptor as returned by cli_open
tableName - name of the created table
fieldName - name of the field
newFlags - new flags of the field, if index exists for this field, but is not specified in newFlags mask, then it will be removed; if index not exists, but is specified in newFlags mask, then it will be created.
Returns
result code as described in cli_result_code enum

int cli_freeze(int statement);
Freeze cursor. Make it possible to reused cursor after commit of the current transaction.
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_unfreeze(int statement);
Unfreeze cursor. Reuse previously frozen cursor.
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_seek(int statement, cli_oid_t oid);
Position cursor to the record with specified OID
Parameters
statement - statememt descriptor returned by cli_statement
oid - object identifier of the record to which cursor should be positioned
Returns
>= 0 - success, position of the record in the selection
< 0 - error code as described in cli_result_code enum

int cli_skip(int statement, int n);
Skip specified number of rows.
Parameters
statement - statememt descriptor returned by cli_statement
n - number of objects to be skipped
Returns
result code as described in cli_result_code enum

Local implementation of CLI

Starting from version 2.47 FastDB provides local implementation of CLI interface. It means that now it is possible to access database directly from C application using CLI functions without starting separate server and socket communication overhead. Local implementation of CLI functions are included in main fastdb library. So if you want to use remote CLI, link you application with cli.lib and if you want to access database locally - link it with fastdb.lib. To create local session you should use cli_create function instead of cli_open. Calling cli_create when your application is linked with cli.lib or cli_open when it is linked with fastdb.lib cause cli_bad_address error.


int cli_create(char const* databaseName,
               char const* filePath,   
               unsigned    transactionCommitDelay, 
	       int         openAttr, 
	       size_t      initDatabaseSize,
               size_t      extensionQuantum,
               size_t      initIndexSize,
               size_t      fileSizeLimit);
Create connection to the local database
Parameters
databaseName - database name
filePath - path to the database file
transactionCommitDelay - transaction commit delay (specify 0 to disable)
openAttr - mask of cli_open_attributes. You can specify combination of the following attributes:
  • cli_open_default
  • cli_open_readonly
  • cli_open_truncate
  • cli_open_concurrent
initDatabaseSize - initial size of database
extensionQuantum - database extension quantum
initIndexSize - initial size of object index
fileSizeLimit - limit for file size (0 - unlimited)
Returns
>= 0 - connection descriptor to be used in all other cli calls
< 0 - error code as described in cli_result_code enum

int cli_create_replication_node(int         nodeId,
                                int         nServers,
                                char*       nodeNames[],
                                char const* databaseName, 
                                char const* filePath, 
                                int         openAttr, 
                                size_t      initDatabaseSize,
                                size_t      extensionQuantum,
                                size_t      initIndexSize,
                                size_t      fileSizeLimit);
Create connection to the local database with support of replication
Parameters
nodeId - node identifier: 0 <= nodeId < nServers
nServers - number of replication nodes (primary + standby)
nodeNames - array with URLs of the nodes (address:port)
databaseName - database name
filePath - path to the database file
openAttr - mask of cli_open_attributes (to allow concurrent read access to replication node, cli_open_concurren attribute should be set)
initDatabaseSize - initial size of database
extensionQuantum - database extension quantum
initIndexSize - initial size of object index
fileSizeLimit - limit for file size (0 - unlimited)
Returns
>= 0 - connection descriptor to be used in all other cli calls
< 0 - error code as described in cli_result_code enum

int cli_attach(int session);
Attach thread to the database. Each thread except one opened the database should first attach to the database before any access to the database, and detach after end of the work with database
Parameters
session - session descriptor as returned by cli_open
Returns
result code as described in cli_result_code enum

int cli_detach(int session, int detach_mode);
Attach thread to the database. Each thread except one opened the database should first attach to the database before any access to the database, and detach after end of the work with database
Parameters
session - session descriptor as returned by cli_open
detach_mode - bit mask representing detach mode
    enum cli_detach_mode {
        cli_commit_on_detach          = 1,
        cli_destroy_context_on_detach = 2
    };
Returns
result code as described in cli_result_code enum

typedef struct cli_database_monitor {
    int n_readers;
    int n_writers;
    int n_blocked_readers;
    int n_blocked_writers;
    int n_users;
} cli_database_monitor;

int cli_get_database_state(int session, cli_database_monitor* monitor);
Obtain information about current state of the database. Attach thread to the database. Each thread except one opened the database should first attach to the database before any access to the database, and detach after end of the work with database
Parameters
session - session descriptor as returned by cli_open
monitor - pointer to the monitor structure. The folloing fields are set:
n_readersnumber of granted shared locks
n_writersnumber of granted exclusive locks
n_blocked_readernumber of threads which shared lock request was blocked
n_blocked_writersnumber of threads which exclusive lock request was blocked
n_usersnumber of processes openned the database
Returns
result code as described in cli_result_code enum

int cli_prepare_query(int session, char const* query);
Prepare SubSQL query statement.
Parameters
session - session descriptor as returned by cli_open
query - query string with optional parameters. Parameters are specified as '%T' where T is one or two character code of parameter type using the same notation as in printf:
%d or %icli_int4_t
%fcli_int8_t
%Li, %li, %ld or %Ld cli_int8_t
%pcli_oid_t
%schar*
Returns
>= 0 - statement descriptor
< 0 - error code as described in cli_result_code enum

int cli_execute_query(int statement, int for_update, void* record_struct, ...)
Execute query previously prepared by cli_prepare_query.
It is assumed that format of the destination C structure matches format of the target database table. For scalar and reference types mapping is obvious: you should use correspondent cli_ types in declaring structure and table fields. For array types, you should use cli_array_t structure. Strings should be represented as char* and programmer should not try to deallocate them or copy this pointer and access it outside context of the current record.
Parameters
statement - statement descriptor returned by cli_prepare_query
for_update - not zero if fetched rows will be updated
record_struct - structure to receive selected record fields.
... - varying list of query parameters
Returns
result code as described in cli_result_code enum

int cli_insert_struct(int session, char const* table_name, void* record_struct, cli_oid_t* oid);
Insert new record represented as C structure.
It is assumed that format of the destination C structure matches format of the target database table. For scalar and reference types mapping is obvious: you should use correspondent cli_ types in declaring structure and table fields. For array types, you should use cli_array_t structure. Strings should be represented as char*.
Parameters
session - session descriptor as returned by cli_open
table_name - name of the destination table
record_struct - structure specifying value of record fields
oid - pointer to the location to receive OID of created record (may be NULL)
Returns
result code as described in cli_result_code enum

Delayed transactions and online backup scheduler

FastDB supports ACID transactions. It means that after database is reported that transaction is committed, it is guaranteed that database will be able to recover transaction in case of system fault (except corruption of database image at hard disk). The only way to provide this feature on standard equipment (without non-volatile RAM for example) and under general-purpose operating systems (Windows, Unix, ...) is to perform synchronous write to the disk. "Synchronous" in this context means that operating system will not return control to the application until data will be really written to the disk. Unfortunately synchronous write is very time expensive operation - average disk access time is about 10ms, so it is hard to achieve performance more than 100 transactions per second.

But in many cases it is acceptable to loose changes for few last seconds (but preserving consistency of the database). With this assumption, database performance can be significantly increased. FastDB provides "delayed transaction commit model" for such applications. When commit transaction delay is non zero, database doesn't perform commit immediately, instead of it delay it for specified timeout. After expiration of this timeout, transaction is normally committed, so it ensures that only changes done within specified timeout can be lost in case of system crash.

If thread, which has initiated delayed transaction, starts new transactions before delayed commit of transaction is performed, then delayed commit operation is skipped. So FastDB is able to group several subsequent transactions performed by on client into the large single transaction. And it will greatly increase performance, because it reduces number of synchronous writes and number created shadow pages (see section Transactions).

If some other client tries to start transaction before expiration of delayed commit timeout, then FastDB force delayed commit to proceed and release resource for another thread. So concurrency is not suffered from delayed commit.

By default delayed commits are disabled (timeout is zero). You can sepcify commit delay parameter as second optional argument of dbDatabase::open method. In SubSQL utility, it is possible to specify value of transaction commit delay by setting "FASTDB_COMMIT_DELAY" environment variable (seconds).

Transaction commit scheme used in FastDB guaranty recovery after software and hardware fault if image of the database at the disk was not corrupted (all information which was written to the disk can be correctly read). If for some reasons, database file is corrupted, then the only way to recover database is use backup (hoping that it was performed not so long time ago).

Backup can be done by just copying database file when database is offline. Class dbDatabase provides backup method which is able to perform online backup, which doesn't require stopping of the database. It can be called at any time by programmer. But going further, FastDB provides backup scheduler, which is able to perform backup automatically. The only things needed - name of the backup file and interval of time between backups.

The method dbDatabase::scheduleBackup(char const* fileName, time_t period) spawns separate thread which performs backups to the specified location with specified period (in seconds). If fileName ends with "?" character, then data of backup initiation is appended to the file name, producing the unique file name. In this case all backup files are kept on the disk (it is responsibility of administrator to remove too old backup files or transfer them to another media). Otherwise backup is performed to the file with fileName + ".new" name, and after completion of backup, old backup file is removed and new file is renamed to fileName. Also in last case, FastDB will check the creation date of the old backup file (if exists) and adjust wait timeout in such way, that delta of time between backups will be equal to specified period (so if database server is started only for 8 hours per day and backup period is 24 hours, then backup will be performed each day, unlike scheme with uniquely generated backup file names).

It is possible to schedule backup processing in SubSQL utility by setting FASTDB_BACKUP_NAME environment variable. Period value is taken from FASTDB_BACKUP_PERIOD environment variable if specified, otherwise it is set to one day. To recover from backup it is enough to copy some of the backup files instead of corrupted database file.

Fault tolerant support

Starting from 2.49 version FastDB provides optional fault tolerant support. It is possible to start one primary (active) and several standby nodes so that all changes made in the primary node will be replicated to standby node. If the primary node is crashed, one of the standby nodes becomes active and start to play role of primary node. Once crashed node is restarted, it will perform recovery, synchronize its state with primary node and start functioning as standby node. Nodes are connected by sockets and intended to be located at different computers. Communications are assumed to be reliable.

To be able to use fault tolerant support, you should rebuild FastDB with REPLICATION_SUPPORT option. To switch it on, set FAULT_TOLERANT variable at the beginning of makefile to 1. You should use class dbReplicatedDatabase instead of dbDatabase. In parameters to open method, in addition to database name and file name, you should specify identifier of this node (integer from 0 till N-1), array with addresses of all nodes (hostname:port pair) and number of nodes (N). Then you should start the program at each of N nodes. Once all instances are started, node with ID=0 becomes active (primary node). Open method returns true at this instance. Other nodes are blocked in open method. If primary node is crashed, one of standby nodes is activated (open method returns true) and this instance continue execution. If crashed instance is restarted, it will try to connect to all other servers, restore it state and continue functioning as standby node, waiting for its chance to replace crashed primary node. If primary node is normally terminated, close method of all standby nodes returns false.

In fault tolerant mode FastDB keeps two files: one with database itself and one with page update counters. The file with page update counters is used for incremental recovery. When crashed node is restarted, it sends page counters it has to the primary node and receives back only those pages which were changed at primary node within this period of time (pages which timestamps are greater than sent by recovered node).

In replication mode application (at primary node) is not blocked during transaction commit until all changes are flushed to the disk. Modified pages are flushed to the disk asynchronously by separate thread. This leads to significant improvement in performance. But if all nodes are crashed, database can be in inconsistent state. It possible to specify delay for flushing data to this disk: the larger delay is, the less overhead of disk IO we have. But in case of crash, larger amount of data can be required to be sent from primary node to perform recovery.

If all nodes are crashed, system administrator should choose the node with the most recent version of database (sorry, it can not be done automatically yet), and start application at this node, but not at other nodes. After expiration of small timeout (5 seconds), it will report that it failed to connect to other node. When connections to all specified nodes are failed, the program will perform local recovery and starts as new active node. Then you can start other nodes which will perform recovery from active node.

It is possible to use fault tolerant model with diskless mode (DISKLESS_CONFIGURATION build option). In this case no data is stored to the disk (nether database file, neither page update counter). It is assumed that at least one of nodes is always alive. Until there is at least one online node, data will not be lost. When crashed node is recovered from crash, primary node sends complete snapshot of the database to it (incremental recovery is not possible because state of crashed node is lost). As far as in this case there no disk operations, performance can be very high and limited only by throughput of network.

When replication node is started it tries to connect to all other nodes within some specified timeout. If within this time no connection can be established, then node is assumed to be started autonomously and start working as normal (non-replicated) database. If nodes has established connections with some other nodes, then one with the smallest ID is chosen as replication master. All other nodes are switched to stand-by mode and wait for replication requests from the master. If content of the database at master and slave is different (it is determined using page counters array), then master perform recovery of standby node, sending to it most recent versions of the pages. If master is crashed, then standby nodes select new master (node with the smallest ID). All standby nodes are blocked in open method until one of the following things happens:

  1. Master is crashed and this node is selected as new master. In this case open method returns true.
  2. Master normally close database. In this case open method at all replicated nodes returns false.

It is possible to perform read-only access to the replicated database from other applications. In this case replicated node should be created using dbReplicatedDatabase(dbDatabase::dbConcurrentUpdate) constructor invocation. Other applications can access the same database using dbDatabase(dbDatabase::dbConcurrentReadMode) instance.

Not all applications needs fault tolerance. Many applications are using replication just to increase scalability, but distributing load between several nodes. For such applications FastDB provides simplified replication model. In this case there are two kind of nodes: readers and writers. Any of writer nodes can play role of replication master. And readers node are just received replicated data from master and can not update database themselves. The main difference with the replication model described above is that reader node can never become master and open method in this node returns control immediately after connection has been established with master node. Then reader node access database as normal read-only database application. Updates from master node are received by separate thread. Reader node should be created using dbReplicatedDatabase(dbDatabase::dbConcurrentRead) constructor. It should use exactly the same database schema (classes) as at master node. Database connection for reader nodes is not closed automatically when master closes connection - it remains opened and application can still access database in read-only mode. Once master node is restarted, it establish connection with all standby nodes and continue sending updates to them. If there are no reader nodes, then replication model is equivalent to the fault tolerance model described above, and if there is single writers and one or more reader nodes, then it is classical master-slave replication.

You can test fault tolerant mode using Guess example. Been compiled with -DREPLICATION_SUPPORT this example illustrate cluster of three nodes (all addresses refer to the localhost, but you can certainly replace them with names of real hosts in your net). You should start three instances of guess application with parameters 0..2. When all instances is started, application wit parameter 0 starts normal user dialogue (it is game: "Guess an animal"). If you emulate crash of this application by pressing Crtl-C, then one of standby nodes continue execution.

More sophisticated modes of replication are illustrated by testconc example. There are three replication nodes which are started using testconc update N command, where N is identifier of the node: 0, 1, 2. After starting all these three nodes, them are connected to each other, node 0 becomes master and start update of the database, replicating changes to node 1 and 2. It is possible to start one or more inspectors - applications which are connected to the replicated database in read-only mode (using dbConcurrentRead access type). Inspector can be started using testconc inspect N T, where N is identifier of the replicated node to which inspector should be connected and T is number of inspection threads.

The same testconc example can be used to illustrate simplified replication model. To test it please start one master: testconc update 0 and two read-only replication nodes: testconc coinspect 1 and testconc coinspect 2. Please notice the difference with the scenario described above: is case of fault tolerance model there is normal replication node started using testconc update N command and read-only node (not involved i replication process) connected to the same database: testconc inspect N. And in case of simplified master-slave replication, there is read-only replication node, which can not become master (and so in case of original master's crash, nobody can play its role), but application running at this node access replicated database as normal read-only application.

Query optimization

The execution of queries, when all data is present in memory, is very fast, compared with the time for query execution in a traditional RDBMS. But FastDB even more increases the speed for query execution by applying several optimizations: using indices, inverse references and query parallelization. The following sections supply more information about these optimizations.

Using indices in queries

Indices are a traditional approach for increasing RDBMS performance. FastDB uses two types of indices: extensible hash table and T-tree. The first type provides the fastest way (with constant time in average) to access a record with a specified value of the key. Whereas the T-tree, which is acombination of AVL-Tree and array, has the same role for a MMRDBMS as the B-Tree for a traditional RDBMS. It provides search, insertion and deletion operations with guaranteed logarithmic complexity (i.e. the time for performing a search/insert/delete operation for a table with N records is C*log2(N), where C is some constant). The T-tree is more suitable for a MMDBMS than a B-Tree, because the last one tries to minimize the number of page loads (which is an expensive operation in disk-based databases), while the T-tree tries to optimize the number of compare/move operations. The T-tree is the best type to use with range operations or when the order of records is significant.

FastDB uses simple rules for applying indices, allowing a programmer to predict when an index and which one will be used. The check for index applicability is done during each query execution, so the decision can be made depending on the values of the operands. The following rules describe the algorithm of applying indices by FastDB:

Now we should make clear what the phrase "index is compatible with operation" means and which type of index is used in each case. A hash table can be used when:

A T-tree index can be applied if a hash table is not applicable (or a field is not hashed) and:

If an index is used to search the prefix of a like expression, and the suffix is not just the '%' character, then an index search operation can return more records than really match the pattern. In this case we should filter the index search output by applying a pattern match operation.

When the search condition is a disjunction of several subexpressions (the expression contains several alternatives combined by the or operator), then several indices can be used for the query execution. To avoid record duplicates in this case, a bitmap is used in the cursor to mark records already selected.

If the search condition requires a sequential table scan, the T-tree index still can be used if the order by clause contains the single record field for which the T-tree index is defined. As far as sorting is very expensive an operation, using an index instead of sorting significantly reduces the time for the query execution.

It is possible to check which indices are used for the query execution, and a number of probes can be done during index search, by compiling FastDB with the option -DDEBUG=DEBUG_TRACE. In this case, FastDB will dump trace information about database functionality including information about indices.

Inverse references

Inverse references provide an efficient and reliable way to establish relations between tables. FastDB uses information about inverse references when a record is inserted/updated/deleted and also for query optimization. Relations between records can be of one of the following types: one-to-one, one-to-many and many-to-many.