Introduction

GigaBASE is object-relational database system with the same programming interface as FastDB main memory DBMS, but using a page pool instead of mapping database file to the memory. That is why GigaBASE is able to handle databases which size exceeds size of computers virtual memory. As FastDB, GigaBASE provides a very convenient and efficient C++ interface. GigaBASE doesn't support client-server architecture and provides concurrent access to the database only for different threads within one process. GigaBASE is most efficient for applications fetching records using indices or direct object references. High speed of query execution is provided by elimination of data transfer overhead and very effective locking implementation. Synchronization of concurrent database access is implemented in GigaBASE by means of atomic instructions, adding almost no overhead to query processing. GigaBASE uses modified B-tree indices to provide fast access to disk resident data (with minimal disk read operations).

GigaBASE supports transactions, online backups and automatic recovery after system crash. Transaction commit protocol is based on shadow pages algorithm, performing atomic update of database. Recovery can be done very fast, providing high availability for critical applications. Moreover, elimination of transaction logs improves total system performance and leads to more effective usage of system resources.

GigaBASE is application-oriented database. Database tables are constructed using information about application classes. GigaBASE supports automatic scheme evaluation, allowing you to do changes only in one place - in your application classes. GigaBASE provides flexible and convenient interface for retrieving data from database. SQL-like query language is used to specify queries, and such post-relational capabilities as non-atomic fields, nested arrays, user-defined types and methods, direct interobject references simplifies design of database application and makes them more efficient.

GigaBASE is able to efficiently handle databases with several millions objects and up to terabyte size even at computers having not so much physical memory. Page pool using LRU strategy for page replacement and B-tree indices minimize number of disk operations and so provide high system performance.

Query language

GigaBASE supports query language with SQL-like syntax. GigaBASE uses notation more popular for object-oriented programming then for relational database. Table rows are considered as object instances and the table - as class of these objects. Unlike SQL, GigaBASE is oriented on work with objects instead of SQL tuples. So the result of each query execution is a set of objects of one class. The main differences of GigaBASE query language from standard SQL are:

  1. There are no joins of several tables and nested subqueries. Query always returns set of objects from one table.
  2. Standard C types are used for atomic table columns.
  3. There are no NULL values, except null references. I am completely agree with C.J. Date critics of three-value logic and his proposal to use default values instead.
  4. Structures and arrays can be used as record components. Special exists quantor is provided for locating element in arrays.
  5. User methods can be defined for table records (objects) as well as for record components.
  6. User functions with single string or numeric argument can be defined by application.
  7. References between objects are supported including automatic support of inverse references.
  8. Construction start from follow by performs recursive records traversal using references.
  9. As far as query language is deeply integrated with C++ classes, case sensitive mode is used for language identifiers as well as for keywords.
  10. No implicit conversion of integer and floating types is done to string representation. If such conversion is need, it should be done explicitly.

The following rules in BNF-like notation specifies grammar of GigaBASE query language search predicate:

Grammar conventions
ExampleMeaning
expressionnon-terminals
notterminals
|disjoint alternatives
[not]optional part
{1..9}repeat zero or more times

select-condition ::= [ expression ] [ traverse ] [ order ] [ limit )
expression ::= disjunction
disjunction ::= conjunction
        | conjunction or disjunction
conjunction ::= comparison
        | comparison and conjunction
comparison ::= operand = operand
        | operand != operand
        | operand <> operand
        | operand < operand
        | operand <= operand
        | operand > operand
        | operand >= operand
        | operand (not) like operand
        | operand (not) like operand escape string
        | operand (not) in operand
        | operand (not) in expressions-list
        | operand (not) between operand and operand
	| operand is (not) null
operand ::= addition
additions ::= multiplication
        | addition +  multiplication
        | addition || multiplication
        | addition -  multiplication
multiplication ::= power
        | multiplication * power
        | multiplication / power
power ::= term
        | term ^ power
term ::= identifier | number | string
        | true | false | null
	| current | first | last
	| ( expression )
        | not comparison
	| - term
	| term [ expression ]
	| identifier . term
	| function term
        | exists identifier : term
	| parameter
function ::= abs | length | lower | upper
        | integer | real | string | user-function
string ::= ' { { any-character-except-quote } ('') } '
expressions-list ::= ( expression { , expression } )
order ::= order by sort-list
sort-list ::= field-order { , field-order }
field-order ::= [length] field (asc | desc)
field ::= identifier { . identifier }
traverse ::= start from field [ follow by ( next | previous | fields-list ) ]
limit::= limit [ start-position , ] max-selected
max-selected ::= integer | parameter
start-position ::= integer | parameter
fields-list ::=  field { , field }
user-function ::= identifier

Identifiers are case sensitive, begin with a..z, A..Z, '_' or '$' character, contain only a-z, A..Z, 0..9 '_' or '$' characters, and do not duplicate a SQL reserved words.

>
List of reserved words
absandareaascbetween
bycurrentdescescapeexists
falsefirstfollowfromin
integerislengthlikelast
limitlowernotnullor
overlapsrealrectanglestartstring
uppertrue

ANSI-standard comments may also be used. All character from double-hyphen to the end of the line are ignored.

GigaBASE extends ANSI standard SQL operations by supporting bit manipulation operations. Operators and/or can be applied not only to boolean operands but also to operands of integer type. Result of applying and/or operator to integer operands is integer value with bits set by bit-AND/bit-OR operation. Bits operations can be used for efficient implementation of small sets. Also rasing to a power operation ^ is supported by GigaBASE for integer and floating point types.

Starting from 2.51 version GigaBASE supports LIMIT [S,]N construction in queries. It is more useful, flexible and efficient way to restrict number of selected records than limit specified for the cursor. First parameter S (if present) specifies number of record to be skipped (so S records which match search criteria will not be selected). Parameter N specifies maximal number of records to be selected. So if you want to show first ten results of the query at the screen, you should append limit 0,10 to the query. If you want to show next 10 results, then append limit 10,10...

It is possible to use parameters of int4 type instead of constants for specifying S and N. In this case, the same precompiled statement can be used for fetching parts of the list.

Limit construction correctly works when order by clause is present. If there is index for variable in order part, then records are inspected in requested order, so sort is not needed and after fetching N records we can finish query execution. Otherwise, all records will be selected, sorted and then only records from [S,N] interval are left and other are removed from the selection.

Structures

GigaBASE accepts structures as components of records. Field of the structure can be accessed using standard dot notation: company.address.city

Structure fields can be indexed and used in order by specification. Structures can contain other structures as their components and there are no limitations on nesting level.

Programmer can define methods for structures, which can be used in queries with the same syntax as normal structure components. Such methods should have no arguments except pointer to the object to which they belong (this pointer in C++), and should return atomic value (of boolean, numeric, string or reference type). Also method should not change object instance (immutable method). If method returns string, then this string should be allocated using new char operator, because it will be deleted after copying of its value. So user-defined methods can be used for creation virtual components - components which are not stored in database, but instead if this are calculated using values of other components. For example, GigaBASE dbDateTime type contains only integer timestamp component and such methods as dbDateTime::year(), dbDateTime::month()... So it is possible to specify queries like: "delivery.year = 1999" in application, where delivery record field has dbDateTime type. Methods are executed in the context of application, where they are defined, and are not available to other applications and interactive SQL.

Arrays

GigaBASE accepts arrays with dynamic length as components of records. Multidimensional arrays are not supported, but it is possible to define array of arrays. It is possible to sort records in the result set by length of array field. GigaBASE provides a set of special constructions for dealing with arrays:

  1. It is possible to get the number of elements in the array by length() function.
  2. Array elements can be fetched by [] operator. If index expression is out of array range, then exception will be raised.
  3. Operator in can be used for checking if array contains value specified by left operand. This operation can be used only for arrays of atomic types: with boolean, numeric, reference or string components.
  4. Array can be updated using update method which creates copy of the array and returns non-constant reference.
  5. Iteration through array elements is performed by exists operator. Variable specified after exists keyword can be used as index in arrays in the expression preceded by exists quantor. This index variable will iterate through all possible array index values, until value of expression will become true or index runs out of range. Condition
            exists i: (contract[i].company.location = 'US')
    
    will select all details which are shipped by companies located in US, while query
            not exists i: (contract[i].company.location = 'US')
    
    will select all details which are shipped only from companies outside US.

    Nested exists clauses are allowed. Using of nested exists quantors is equivalent to nested loops using correspondent index variables. For example query

            exists colon: (exists row: (matrix[colon][row] = 0))
    
    will select all records, containing 0 in elements of matrix field, which has type array of array of integer. This construction is equivalent to the following two nested loops:
           bool result = false;
           for (int colon = 0; colon < matrix.length(); colon++) {
                for (int row = 0; row < matrix[colon].length(); row++) {
    	         if (matrix[colon][row] == 0) {
                         result = true;
    		     break;
                     }
                }
           }
    
    Order of using indices is significant! Result of the following query execution
            exists row: (exists colon: (matrix[colon][row] = 0))
    
    will be completely different with result of previous query. The program can simply hang in last case due to infinite loop for empty matrices.

Strings

All strings in GigaBASE have varying length and programmer should not worry about specification of maximal length for character fields. All operations acceptable for arrays are also applicable to strings. In addition to them strings have a set their own operations. First of all string can be compared with each other using standard relation operators. At the current moment GigaBASE supports only ASCII character set (corresponds to type char in C) and byte-by-byte comparison of strings ignoring locality settings.

Construction like can be used for matching string with a pattern containing special wildcard characters '%' and '_'. Character '_' matches any single character, while character '%' matches any number of characters (including 0). Extended form of like operator with escape part can be used to handle characters '%' and '_' in the pattern as normal characters if they are preceded by special escape character, specified after escape keyword.

It is possible to search substring within string by in operator. Expression ('blue' in color) will be true for all records which color fields contains 'blue' word. If length of searched string is greater than some threshold value (currently 512), then Boyer-Moore substring search algorithm is used instead of straightforward search implementation.

Strings can be concatenated by + or || operators. Last one was added only for compatibility with ANSI SQL standard. As far as GigaBASE doesn't support implicit conversion to string type in expressions, semantic of operator + can be redefined for strings.

References

References can be dereferenced using the same dot notation as used for accessing structure components. For example the following query
        company.address.city = 'Chicago'
will access record referenced by company component of Contract record and extract city component of address field of referenced record from Supplier table.

References can be checked for null by is null or is not null predicates. Also references can be compared for equality with each other as well as with special null keyword. When null reference is dereferenced, exception is raised by GigaBASE.

There is special keyword current, which can be used to get reference to current record during table search. Usually current keyword is used for comparison of current record identifier with other references or locating it within array of references. For example, the following query will search in Contract table for all active contracts (assuming that field canceledContracts has dbArray< dbReference<Contract> > type):

        current not in supplier.canceledContracts

GigaBASE provides special construction for recursive traverse of records by references:

     start from root-references
     [ follow by ( next | previous | list-of-reference-fields ) ]
First part of this construction is used to specify root objects. Nonterminal root-references should be variable of reference or array of reference type. Two special keyword first and last can be used here, locating first/last record in the table correspondingly. If you want to check for some condition all records referenced by array of references or single reference field, then this construction can be used without follow by part.

If you specify follow by part, then GigaBASE will recursively traverse table records starting from root references and using list of reference fields list-of-reference-fields for transition between records. list-of-reference-fields should consists of fields of reference or array of reference type. Alternatively you can sepcify next or previous preudofields, which refer to next or previous record in the table (all records in GigaBASE table are linked in L2 list, new records are appended at the end of the list).

Traverse is done in depth first top-left-right order (first we visit parent node and then siblings in left-to-right order). Recursion is terminated when null reference is accessed or already visited record is referenced. For example the following query will search tree records with weight larger than 1 in TLR order:

        "weight > 1 start from first follow by left, right"

For the following tree:

                              A:1.1
              B:2.0                             C:1.5
      D:1.3         E:1.8                F:1.2         G:0.8
result of query execution will be:
('A', 1.1), ('B', 2.0), ('D', 1.3), ('E', 1.8), ('C', 1.5), ('F', 1.2)
As was already mentioned GigaBASE always manipulates with objects and doesn't accept joins. Joins can be implemented using references. Consider the classical Supplier-Shipment-Detail examples:
struct Detail {
    char const* name;
    double      weight;

    TYPE_DESCRIPTOR((KEY(name, INDEXED), FIELD(weight)));
};

struct Supplier {
    char const* company;
    char const* address;

    TYPE_DESCRIPTOR((KEY(company, INDEXED), FIELD(address)));
};

struct Shipment {
    dbReference<Detail>   detail;
    dbReference<Supplier> supplier;
    int4                  price;
    int4                  quantity;
    dbDateTime            delivery;

    TYPE_DESCRIPTOR((KEY(detail, HASHED), KEY(supplier, HASHED),
		     FIELD(price), FIELD(quantity), FIELD(delivery)));
};
We want to get information about delivery of some concrete details from some concrete suppliers. In relational database this query will be written something like this:
     select from Supplier,Shipment,Detail where
                 Supplier.SID = Shipment.SID and Shipment.DID = Detail.DID
		 and Supplier.company like ? and Supplier.address like ?
		 and Detail.name like ?
In GigaBASE this request should be written as:
     dbQuery q = "detail.name like",name,"and supplier.company like",company,
	         "and supplier.address like",address,"order by price";
GigaBASE will first perform index search in the table Detail for details matching the search condition. Then it performs another index search to locate shipment records referencing selected details. Then sequential search is used to check the rest of select predicate.

Rectangle

GigaBASE has builtin support of spatial data. It provides rectangle type. By default it has dimension 2 and integer coordinate types. It is possible to easily change dimension or use floating point coordinates, but recompilation of GigaBASE is needed in this case.

It is possible to use this type not only in geographical system for representing spatial objects but also in many other cases when date is organized as hyper-cube and queries specify range of values for each dimension, for example:

      select * from CAR where price between 20000 and 30000 
                          and producedYear between 1997 and 1999 
                          and mileage between 50000 and 100000;
If %lt;price, producedYear, milage> are dimensions of the rectangle, then this query can be executed using only one indexed search in R-Tree.

Rectangle fields can be indexed - R-Tree index is used in this case The R-tree provides fast access to spatial data. Basically the idea behind the R-Tree and the B-Tree are the same: use a hierarchical structure with a high branching factor to reduce the number of disk accesses. The R-tree is the extension of the B_tree for a multidimensional object. A geometric object is represented by its minimum bounding rectangle (MBR). Non-leaf nodes contain entries of the form (R,ptr) where ptr is a pointer to a child node in the R-tree; R is the MBR that covers all rectangles in the child node. Leaf nodes contain entries of the form (obj-id, R) where obj-id is a pointer to the object, and R is the MBR of the object. The main innovation in the R-tree is that the father nodes are allowed to overlap. By this means the R-tree guarantees at least 50% space utilization and remains balanced. The first R-tree implementation was proposed by Guttman. The Gigabase R-Tree class is based on Guttman's implementation with a quadratic split algorithm. The quadratic split algorithm is the one that achieves the best trade-off between splitting time and search performance.

Rectangle class provides method for calculating distance between two rectangle, rectangle area, checking whether two rectangle overlap or one contains another. Rectangle is specified by coordinates of two it's vertices. Rectangle class contains array of coordinates. Coordinates of first vertex are placed at the beginning of array, and then - coordinates of another vertex. Each coordinate of first vertex should not be large than correspondent coordinate of the second vertex. Singular rectangle with one or more coordinates of first vertex equals to the correspondent coordinate of the second vertex are allowed - this is a way of storing points in the database.

SubSQL provides some special operators for dealing with rectangle. First of all - all comparison operators can be used with the following semantic:

a == bRectangle a is the same as rectangle b
a != bRectangle a is not the same as rectangle b
a <= bRectangle b contains rectangle a
a < bRectangle b contains rectangle a and them are not the same
a >= bRectangle a contains rectangle b and are not the same
a > bRectangle b contains rectangle a and them are not the same

Also SubSQL provides overlaps and in operators. First checks if two rectangles overlap, the second is equivalent to <= operator. Rectangles can be added - result is minimal rectangle containing both rectangles-operands. It is possible to access rectangle as array of coordinates - using [index] notation. Coordinates in query are always returned as real numbers.

Optimizer is able to use spatial index for all comparison operators (except !=) and for overlaps operator.

Functions

Predefined functions
NameArgument typeReturn typeDescription
absintegerintegerabsolute value of the argument
absrealrealabsolute value of the argument
arearectanglerealarea of the rectangle
integerrealintegerconversion of real to integer
lengtharrayintegernumber of elements in array
lowerstringstringlowercase string
realintegerrealconversion of integer to real
stringintegerstringconversion of integer to string
stringrealstringconversion of real to string
upperstringstringuppercase string

GigaBASE allows user to define its own functions and operators. Function should have at least one but no more than 3 parameters of string, integer, boolean, reference or user defined (raw binary) type. It should return value of integer, real, string or boolean type.

User functions should be registered by the USER_FUNC(f) macro, which creates a static object of the dbUserFunction class, binding the function pointer and the function name.

There are two ways of implementing these functions in application. First can be used only for functions with one argument. This argument should be of int8, real8, char_t* types. And the function return type should be int8, real8, char_t* or bool. If function has more than one parameters or it can accept parameters of different types (polymorphism) then parameters should be passed as reference to dbUserFunctionArgument structure. This structure contains type field, which value can be used in function implementation to detect type of passed argument and union with argument value. The following table contains mapping between argument types and where the value should be taken from:

Argument typeArgument valueArgument value type
dbUserFunctionArgument::atIntegeru.intValueint8
dbUserFunctionArgument::atBooleanu.boolValuebool
dbUserFunctionArgument::atStringu.strValuechar const*
dbUserFunctionArgument::atRealu.realValuereal8
dbUserFunctionArgument::atReferenceu.oidValueoid_t
dbUserFunctionArgument::atRawBinaryu.rawValuevoid*

For example the following statements make it possible to use the sin function in SQL statements:

        #include <math.h>
	...
        USER_FUNC(sin);
Functions can be used only within the application, where they are defined. Functions are not accessible from other applications and interactive SQL. If a function returns a string type , the returned string should be copied by means of the operator new, because GigaBASE will call the destructor after copying the returned value.

In GigaBASE, the function argument can (but not necessarily must) be enclosed in parentheses. So both of the following expressions are valid:

        '$' + string(abs(x))
	length string y

Functions with two argument can be also used as operators. Consider the following example, in which function contains which performs case insensitive search for substring is defined:

     bool contains(dbUserFunctionArgument& arg1, dbUserFunctionArgument& arg2) { 
         assert(arg1.type == dbUserFunctionArgument::atString 
	     && arg2.type == dbUserFunctionArgument::atString);
         return stristr(arg1.u.strValue, arg2.u.strValue) != NULL;
     }

     USER_FUNC(contains);
    
     dbQuery q1, q2;
     q1 = "select * from TestTable where name contains 'xyz'";
     q2 = "select * from TestTable where contains(name, 'xyz')";
In this example, queries q1 and q2 are equivalent.

C++ interface

One of the primary goals of GigaBASE is to provide flexible and convenient application language interface. Anyone who have to use ODBC or similar SQL interfaces will understand what I am speaking about. In GigaBASE query can be written in C++ in the following way:

    dbQuery q;
    dbCursor<Contract> contracts;
    dbCursor<Supplier> suppliers;
    int price, quantity;
    q = "(price >=",price,"or quantity >=",quantity,
        ") and delivery.year=1999";
    // input price and quantity values
    if (contracts.select(q) != 0) {
        do {
            printf("%s\n", suppliers.at(contracts->supplier)->company);
        } while (contracts.next());
    }

Table

Data in GigaBASE is stored in tables which corresponds to C++ classes and class instances - to table records. The following C++ types are accepted as GigaBASE record atomic components:

TypeDescription
boolboolean type (true,false)
int1one byte signed integer (-128..127)
int2two bytes signed integer (-32768..32767)
int4four bytes signed integer (-2147483648..2147483647)
int8eight bytes signed integer (-2**63..2**63-1)
real4four bytes ANSI floating point type
real8eight bytes ANSI double precision floating point type
char const*zero terminated string
dbReference<T>reference to class T
dbArray<T>dynamic array of elements of type T

In addition to types specified in the table above, GigaBASE records can also contain nested structures of these components. GigaBASE doesn't support unsigned types to simplify query language, eliminate bugs caused by sign/unsigned comparison and reduce size of database engine.

Unfortunately C++ provides no way to get metainformation about a class at runtime (RTTI is not supported by all compilers and also doesn't provide enough information). That is why programmer has to explicitly enumerate class fields to be included in database table (it also makes mapping between classes and tables more flexible). GigaBASE provides a set of macros and classes to make such mapping as simple as possible.

Each C++ class or structure, which will be used in database, should contain special method describing its fields. Macro TYPE_DESCRIPTOR(field_list) will construct this method. The single argument of this macro is enclosed in parentheses list of class fields descriptors. If you want to define some methods for the class and make them available for database, then macro CLASS_DESCRIPTOR(name, field_list) should be used instead of TYPE_DESCRIPTOR. Class name is needed to get references to member functions.

The following macros can be used for construction field descriptors:

FIELD(name)
Non-indexed field with specified name.
KEY(name, index_type)
Indexed field. index_type should be combination of the following flags:
INDEXED
When this flag is specified, GigaBASE will create B-tree for the table using this field as a key. Length of indexed key should not exceed 4Kb.
HASHED
This flag is added only for compatibility with FastDB, but hash tables are not yet supported by GigaBASE. B-tree index is used instead of hash table, so using of HASHED is equivalent to using INDEXED index type.
CASE_INSENSITIVE
This flags makes index character case insensitive (when index search is performed, "aBc", "abc" and "Abc" will be treated as the same values). This flag affect only for index searches and has no effect for sequential search.
UNIQUE
Hint for optimizer that index contains only unique values. GigaBASE doesn't enforce unique constraint and this flag is used only to optimize query execution plan.
OPTIMIZE_DUPLICATES
Hint for optimizer that index contains a lot of duplicates. Without this flag removing of record which field with restricted set of values is indexed is not efficient (it requires sequential search among all items with the same key value). When this flag is set alternative B-Tree page organization is used which slightly decrease find and insert performance, but make remove significantly faster.
AUTOINCREMENT
This flag has actually no relation with indices and is included in this mask only to eliminate need in special macro. This flag is applicable only to fields of int4 type and make GigaBASE to assign unique value to this field when record is inserted in the database.
UDT(name, index_type, comparator)
User defined raw binary type. Database deals with this type just as with sequence of bytes of specified size. This field can be used in query (compared with query parameter of the same type), may be indexed and used in order by clause. Comparison is performed by means of comparator function provided by programmer. Comparator functions receives three arguments: two pointers to the compared raw binary objects and size of binary object. The semantic of index_type is the same as of KEY macro.
RAWKEY(name, index)
Raw binary type with predefined comparator. This macro is just specialized version of UDT macro with memcmp used as comparator.
RAWFIELD(name)
One more specialization of UDT macro for raw binary fields with predefined comparator memcmp and without indices.
SUPERCLASS(name)
Specifies information about base class (parent) of the current class.
RELATION(reference, inverse_reference)
Specifies one-to-one, one-to-many or many-to-many relationship between classes (tables). Both reference or inverse_reference fields should have reference or array of reference types. inverse_reference is field of referenced table containing inverse reference(s) to the current table. Inverse references are automatically updated by GigaBASE and also are used for query optimization (see Inverse references).
OWNER(reference, inverse_reference)
Specifies one-to-many or many-to-many relationship between classes (tables) of owner-member type. When owner record is removed all referenced member records are also removed (cascade delete). If member record has reference to owner class, it should be declared with RELATION macro.
METHOD(name)
Specifies method of the class. Method should be instance member function, without any parameters and returning boolean, numeric, reference or string type. Methods should be specified after all other attributes of the class.

Although only atomic fields can be indexed, index type can be also specified for structures. Index will be created for component of the structure only if such type of index is specified in the index type mask of the structure. It makes possible to programmers to enable or disable indices for structure fields depending on the role of the structure in the record.

The following example illustrates creation of type descriptor:

class dbDateTime {
    int4 stamp;
  public:

    int year() {
	return localtime((time_t*)&stamp)->tm_year + 1900;
    }
    ...

    CLASS_DESCRIPTOR(dbDateTime,
		     (KEY(stamp,INDEXED),
		      METHOD(year), METHOD(month), METHOD(day),
		      METHOD(dayOfYear), METHOD(dayOfWeek),
		      METHOD(hour), METHOD(minute), METHOD(second)));
};

class Detail {
  public:
    char const* name;
    char const* material;
    char const* color;
    real4       weight;

    dbArray< dbReference<Contract> > contracts;

    TYPE_DESCRIPTOR((KEY(name, INDEXED),
		     KEY(material, INDEXED),
		     KEY(color, INDEXED),
		     KEY(weight, INDEXED),
		     RELATION(contracts, detail)));
};

class Contract {
  public:
    dbDateTime            delivery;
    int4                  quantity;
    int8                  price;
    dbReference<Detail>   detail;
    dbReference<Supplier> supplier;

    TYPE_DESCRIPTOR((KEY(delivery, INDEXED),
		     KEY(quantity, INDEXED),
		     KEY(price, INDEXED),
		     RELATION(detail, contracts),
		     RELATION(supplier, contracts)));
};
Type descriptors should be defined for all classes used in database. In addition to defining type descriptors, it is necessary to establish mapping between C++ classes and database tables. Macro REGISTER(name) will do it. Unlike TYPE_DESCRIPTOR, REGISTER macro should be used in implementation file and not in header file. It constructs descriptor of the table associated with the class. If you are going to work with multiple databases from one application, it is possible to register table in concrete database by means of REGISTER_IN(name,database) macro. Parameter database of this macro should be pointer to dbDatabase object. Below is example of registration tables in database:

REGISTER(Detail);
REGISTER(Supplier);
REGISTER(Contract);
Table (and correspondent class) can be used only with one database at each moment of time. When you open database, GigaBASE imports all classes defined in application in database. If class with the same name already exists in database, its descriptor stored in the database is compared with descriptor of this class in application. If there are differences in class definitions, GigaBASE tries to convert records from the table to new format. Any kind of conversions between numeric types (integer to real, real to integer, with extension or truncation, are allowed). Also addition of new fields can be easily handled. But removing of the fields is only possible for empty tables (to avoid accidental data destruction).

After loading all class descriptors, GigaBASE checks if all indices specified in the application class descriptor are already present in database, constructing new indices and removing indices, which are no more used. Reformatting of table and adding/removing indices is only possible when there is no more than one application accessing database. So when first application is attached to database, it can perform table conversion. All other application can only add new classes to database, but not change existed ones.

There is one special preexisted table in database - Metatable, which contains information about other tables in database. C++ programmer need not to access this table, because format of database tables is specified by C++ classes. But in interactive SQL program it is possible to examine this table to get information about record fields.

Starting from version 2.45 GigaBASE supports autoincrement fields (fields unique value to which are assigned automaticaly by database). To be able to use them you should:

  1. Recompile GigaBASE and your application with -DAUTOINCREMENT_SUPPROT flags (add this flag to DEFS variables in GigaBASE makefile).
    Attention: database files created by GigaBASE compiled without this option will be incompatible with GigaBASE compiled with DAUTOINCREMENT_SUPPORT.
  2. If you want to use other than 0 initial counter value, you should asssign value to dbTableDescriptor::initialAutoincrementCount. It will be shared between all tables, so all table will have the same initial value of autoincrement counter.
  3. Autoincrement fields should be of int4 type and should be declared with AUTOINCREMENT flag:
            class Record {
                 int4 rid;
                 char const* name;
                 ...
           
                 TYPE_DESCRIPTOR((KEY(rid, AUTOINCREMENT|INDEXED), FIELD(name), ...));
           }
    
  4. When record with autoincrement field is inserted in the database there is no need to specify value of autoincremented field (it will be ignored). After successful insertion of record this field will be assigned unique value (which is guaranteed to be not used before this table):
           Record rec;
           // no rec.rid should be specified
           rec.name = "John Smith";
           insert(rec);
           // rec.rid now assigned unique value
           int newRecordId  = rec.rid; // and can be used to reference this record
    
  5. When record is removed the value will not be reused. When transaction is aborted, table autoincrement counter is also rolled back.

Query

Class query is used for two purposes:
  1. construct query and bind query parameters
  2. cache compiled queries
GigaBASE provides overloaded = and , C++ operators to construct query statement with parameters. Parameters can be specified directly in places where they are used, eliminating any mapping between parameters placeholders and C variables. In the following example of query pointers to the parameters price and quantity are stored in the query, so that query can be executed several times with different values of parameters. C++ overloaded functions make it possible to automatically determine type of parameter, requiring no extra information to be supplied by programmer (so programmer has no possibility to make a bug).
        dbQuery q;
        int price, quantity;
        q = "price >=",price,"or quantity >=",quantity;
As far as char* type can be used either for specifying part of query (such as "price >=") either for parameter of string type, GigaBASE uses special rule to resolve this ambiguity. This rule is based on the assumption that there is no reason for splitting query text in two strings like ("price ",">=") or specifying more than one parameter sequentially ("color=",color,color). So GigaBASE assumes first string to be part of the query text and switches to operand mode after it. In operand mode GigaBASE treats char* argument as query parameter and switches back to query text mode, and so on... It is also possible not to use this "syntax sugar" and construct query elements explicitly by dbQuery::append(dbQueryElement::ElementType type, void const* ptr) method. Before appending elements to the query, it is necessary to reset query by dbQuery::reset() method (operator = do it automatically).

It is not to possible use C++ numeric constants as query parameters, because parameters are accessed by reference. But it is possible to use string constants, because strings are passed by value. There two possible ways of specifying string parameters in query: using string buffer or pointer to pointer to string:

     dbQuery q;
     char* type;
     char name[256];
     q = "name=",name,"and type=",&type;

     scanf("%s", name);
     type = "A";
     cursor.select(q);
     ...
     scanf("%s", name);
     type = "B";
     cursor.select(q);
     ...

Query variable can not be passed to a function as parameter or be assigned to other variable. When GigaBASE compiles the query, it saves compiled tree in this object. Next time the query will be used, no compilation is need and ready compiled tree can be used. It saves some time needed for query compilation.

GigaBASE provides two approaches of integration user-defined types in database. First - definition of class methods - was already mentioned. Another approach deals only with query construction. Programmer should define methods, which will not do actual calculations, but instead of this returns expression in terms of predefine database types, which performs necessary calculation. It is better to describe it by example. GigaBASE has no builtin datetime type. Instead of this normal C++ class dbDateTime can be used by programmer. This class defines methods allowing to compare two dates using normal relational operators and specify datetime field in order list:

class dbDateTime {
    int4 stamp;
  public:
    ...
    dbQueryExpression operator == (char const* field) {
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"=",stamp;
	return expr;
    }
    dbQueryExpression operator != (char const* field) {
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<>",stamp;
	return expr;
    }
    dbQueryExpression operator < (char const* field) {
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),">",stamp;
	return expr;
    }
    dbQueryExpression operator <= (char const* field) {
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),">=",stamp;
	return expr;
    }
    dbQueryExpression operator > (char const* field) {
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<",stamp;
	return expr;
    }
    dbQueryExpression operator >= (char const* field) {
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<=",stamp;
	return expr;
    }
    friend dbQueryExpression between(char const* field, dbDateTime& from,
				     dbDateTime& till)
    {
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp"),"between",from.stamp,"and",till.stamp;
	return expr;
    }

    static dbQueryExpression ascent(char const* field) {
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp");
	return expr;
    }
    static dbQueryExpression descent(char const* field) {
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp"),"desc";
	return expr;
    }
};
All these method receives as their parameter name of the field in the record. This name is used to contract full name of the records components. It can be done by class dbComponent, which constructor takes name the the structure field and name of the component of the structure and returns compound name separated by '.' symbol. Class dbQueryExpression is used to collect expression items. Expression is automatically enclosed in parentheses, eliminating conflicts with operators precedence.

So, assuming record contains field delivery of dbDateTime type it is possible to construct queries like this:

        dbDateTime from, till;
        q1 = between("delivery", from, till),
	     "order by",dbDateTime::ascent("delivery");
        q2 = till >= "delivery";
Except these methods, some class specific method can be also defined in such way, for example method overlaps for region type. The benefit of this approach is that database engine will work with predefined types and is able to apply indices and other optimizations to proceed such query. And from the other side, encapsulation of class implementation is preserved, so programmer should not rewrite all queries when class representation is changed.

Variables of following C++ types can be used as query parameters:

int1bool
int2char const*
int4char **
int8char const**
real4dbReference<T>
real8dbArray< dbReference<T> >

Cursor

Cursors are used to access records returned by select statement. GigaBASE provides typed cursors, i.e. cursors associated with concrete tables. There are two kinds of cursors in GigaBASE: readonly cursors and cursors for update. Cursors in GigaBASE are represented by C++ template class dbCursor<T>, where T is name of C++ classes associated with database table. Cursor type should be specified in constructor of the cursor. By default read-only cursor is created. To create cursor for update, you should pass parameter dbCursorForUpdate to the constructor.

Query is executed by cursor select(dbQuery& q) or select() methods. Last method can be used to iterate through all records in the table. It is also possible to specify cursor type in select statement: dbCursorForUpdate or dbCursorViewOnly. Select methods return number of selected records and set current position to the first record (if available). Cursors can be scrolled in forward or backward directions. Methods next(), prev(), first(), last() can be used to change current position of the cursor. If operation can not be performed (no more records available), these methods return NULL and cursor position is not changed. Method skip(int n) moves cursor n positions forward if n is greater than zero, or -n positions backward if n is less than zero.

Cursor for class T contains instance of class T, used for fetching current record. That is why table classes should have default constructor (constructor without parameters), which has no side effects. GigaBASE optimizes fetching records from database, copying only data from fixed part of the object. String bodies are not copied, instead of this correspondent field points directly in database. The same is true for arrays, which components has the same representation in database as in application (arrays of scalar types or arrays of nested structures of scalar components).

Application should not change elements of strings and arrays in database directly. When array method need to update array body, it create in-memory copy of the array and updates this copy. If programmer wants to update string field, it should assign to the pointer new value, but don't change string directly in database. It is recommended to use char const* type instead of char* for string components, to make it possible to compiler to detect illegal usage of strings.

Cursor class provides get() method for obtaining pointer to the current record (stored inside cursor). Also overloaded operator-> can be used to access components of current record. If cursor is opened for update, current record can be changed and stored in database by update() method or can be removed. If current record is removed, next record becomes current. If there is no next record, then previous record becomes current (if exists). Method removeAll() removes all records in the table and method removeAllSelected - all records selected by the cursor.

When records are updated, database size can be increased and extension of database section in virtual memory is needed. As a result of such remapping, base address of the section can be changed and all pointers to database fields kept by application will become invalid. GigaBASE automatically updates current records in all opened cursors when database section is remapped. So, when database is updated, programmer should access record fields only through the cursors -> method and do not use pointer variables.

Memory used for the current selection can be released by reset() method. This method is automatically called by select(), dbDatabase::commit(), dbDatabase::rollback() methods and cursor destructor, so in most cases there is no need to call reset() method explicitly.

Cursors can be also used to access records by reference. Method at(dbReference<T> const& ref) set cursor to the record pointed by the reference. In this case selection consists exactly of one record and next(), prev() methods will always return NULL. As far as cursors and references in GigaBASE are strictly typed, all necessary checking can be done statically by compiler and no dynamic type checking is needed. The only kind of checking, which is done at runtime, is checking for null reference. Object identifier of current record in the cursor can be obtained by currentId() method.

It is possible to restrict number of records returned by select statement. Cursor has two methods setSelectionLimit(size_t lim) and unsetSelectionLimit(), which can be used to set/unset limitation on number of records returned by query. In some situations programmer wants to receive only one record or only few first records, so query execution time and size of consumed memory can be reduced by limiting size of selection. But if you specify order for selected records, query with restriction for k records will no return first k records with the smallest value of the key. Instead of this arbitrary k records will be taken and then sorted.

So all operations with database data are performed by means of cursors. The only exception is insert operation. GigaBASE provides overloaded insert function:

        template<class T>
        dbReference<T> insert(T const& record);
This function will insert record at the end of the table and return reference of the created object. Order of insertion is strictly specified in GigaBASE and applications can use this assumption about records order in the table. For applications widely using references for navigation between objects it is necessary to have some root object, from which traversal by references can be made. Good candidate for such root object is first record in the table (it is also the oldest record in the table). This record can be accessed by execution select() method without parameter. The current record in the cursor will be the first record in the table.

Starting from version 2.72 GigaBASE provides patch inserts. Batch insert dramatically increase speed of database initialization, for example without batch inserts, inserting first million of records with 2 keys takes about 460 seconds, inserting second million of records - 18000 seconds (5 hours) !!! With batch inserts, inserting first million of records takes 95 seconds, and second million - 97 seconds! Such effect is achieved by delaying reconstruction of indices for inserted records until the transaction commit or explicit call of dbDatabase::executeBatch() method. When batch is executed, all previously inserted records are sorted by index field and than in this order added to the index. Why it allows to significantly increase performance? When database is large (doesn't fit in page pool) and inserted data contains pseudo-random values, then any insert of record with indexed fields requires at least one read of B-Tree page from the disk. As far as average disk access time is about 10ms, we could not insert more than 100 records per second. But if we first sort inserted data, then most likely the same pages of B-Tree will be used for several subsequent records and number of disk reads is significantly reduced.

Batch inserts are performed by dbDatabase::batchInsert template method which has the same parameter is insert method. Inserted records will be indexed either when transaction is committed, either by explicit execution of dbDatabase::executeBatch() method.

GigaBASE C++ API defines special null variable of reference type. It is possible to compare null variable with references or assign it to the reference:

        void update(dbReference<Contract> c) {
            if (c != null) {
	        dbCursor<Contract> contract(dbCursorForUpdate);
		contract.at(c);
		contract->supplier = null;
            }
        }

Query parameters usually are bound to C++ variables. In most cases in is convenient and flexible mechanism. But in multithreaded application, there is no warranty that the same query will not be executed at the same moment of time by another thread with different values of parameters. One solution is to use synchronization primitives (critical sections or mutexes) to prevent concurrent execution of the query. But this will lead to performance degradation. GigaBASE is able to perform read requests in parallel, increasing total system throughput. The other solution is to use delayed parameter binding. This approach is illustrated by the following example:

dbQuery q;

struct QueryParams { 
    int         salary;
    int         age;
    int         rank;
};

void open()
{
    QueryParams* params = (QueryParams*)NULL;
    q = "salary > ", params->salary, "and age < ", params->age, "and rank =", params->rank;
}

void find(int salary, int age, int rank) 
{ 
    QueryParams params;
    params.salary = salary;
    params.age = age;
    params.rank = rank;
    dbCursor<Person> cusor;
    if (cursor.select(q, ¶ms) > 0) { 
        do { 
	    cout << cursor->name << NL;
        } while (cursor.next());
    }
}
So in this example function open binds query parameters just to offsets of fields in structure. Later in find functions, actual pointer to the structure with parameters is passed to the select structure. Function find can be concurrently executed by several threads and only one compiled version of the query is used by all these threads. This mechanism is available since version 2.37.

Database

Class dbDatabase controls interaction of application with database. It performs synchronization of concurrent accesses to the database, transaction management, memory allocation, error handling,...

Constructor of dbDatabase objects allows programmer to specify some database parameters:

    dbDatabase(dbAccessType type = dbAllAccess,
	       size_t poolSize = 0, // autodetect size of available memory
	       size_t dbExtensionQuantum = dbDefaultExtensionQuantum,
	       size_t dbInitIndexSize = dbDefaultInitIndexSize,
	       int nThreads = 1);
Database can be opened in readonly mode (dbDatabase::dbReadOnly access type) or in normal mode allowing modification of database (dbDatabase::dbAllAccess). When database is opened in readonly mode, no new class definitions can be added to database and also definition of existed class and indices can not be altered.

Parameter poolSize specifies number of pages in page pool used to optimize file IO. GigaBASE is using 8Kb pages. Size of pool should not be larger than amount of physical memory at the computer and moreover some amount of memory should be reserved for operating system and other application data structures. When default value 0 of this parameter is used, GigaBASE will automatically select page pool size using information about available physical memory in the system. Current algorithm of page pool size calculation is the following (it is the subject for change in future): GigaBASE uses maximal number which is power of two and less than amount of physical memory in the system unless difference of total amount of available physical memory and this number is greater than some unused memory threshold (currently 64Mb). If the difference is greater than threshold value, then size of pool is taken as size of available physical memory minus threshold.

Parameter dbExtensionQuantum specifies quantum of extension of memory allocation bitmap. Briefly speaking, value of this parameters specifies how much memory will be allocated sequentially without attempt to reuse space of deallocated objects. Default value of this parameter is 16 Mb. See section Memory allocation for more details.

Parameter dbInitIndexSize specifies initial index size. All objects in GigaBASE are accessed through object index. There are two copies of object index: current and committed. Object indices are reallocated on demand and setting initial index size can only reduce (or increase) number of reallocations. Default value of this parameter is 64K object identifiers.

And the last parameter nThreads controls level of query parallelization. If it is greater than 1, then GigaBASE can start parallel execution of some queries (including sorting of result). Specified number of parallel threads will be spawned by GigaBASE engine in this case. Usually there is no sense to specify the value of this parameter greater than number of online CPUs in the system. It is also possible to pass zero as value of the parameter, in this case GigaBASE will automatically detect number of online CPUs in the system. Number of threads can be also set by dbDatabase::setConcurrency method at any moment of time.

Class dbDatabase contains static field dbParallelScanThreshold, which specifies threshold for number of records records in the table after which query parallelization is used. Default value of this parameter is 1000.

Database can be opened by open(char const* fileName = NULL) method. Unlike FastDB, GigaBASE is not needed in database name and only file name should be specified. No any suffixes are implicitly appended to database file name. This is the only difference in interfaces to FastDB and GigaBASE.

As far as some operating systems have limitations for maximal file size, GigaBASE provides way to split one logical data file into several physical segments (operating systems files). Segments can be located at different partitions and file systems. Such file is called in GigaBASE multifile. To create multifile you should specify @ symbol before database file name. In this case GigaBASE will treat this name as name of the file with multifile segments description. Each line of this file (except last) should contain name of the operating system file (or raw partition) corresponds to the multifile segment and size (in 8Kb pages) of the segment. Only the last segment of the multifile can by dynamically extended when database is grown. That is why it is not necessary to specify size of the last segment, so last line should contain only the name of the file.

It is possible to specify offset from the beginning of the file (it can be useful for raw partitions). Offset should be specified as the suffix of the file name in [] brackets without any spaces between, for example: "/dev/hda0[512]". Unlike size of segment, offset is specified in bytes, not in pages.

Below is the example of multifile description file consisting of two segments represented by physical disk partitions, first of which has size 4Gb:

/dev/hdb1 524288
/dev/hdc1

It is also possible to increase performance by balancing disk load. The idea is the same as with multifile - split the database file into several parts, placed at different disk controllers. But unlike multifile approach, blocks are cyclically distributed among disks - first block at first disk, second block at second disk, ... n-th block at n-disk, n+1 block at 1 disk,... The size of block is currently set to one megabyte (it can be changed with

    .RaidBlockSize BLOCK_SIZE
line in description file. The partitions of raid file are specified in the same way as for multifile - one partition name per line, but no size of segment should be specified (so if no segment size was specified, then GigaBASE will create RAID file, otherwise - multifile). Offset within partition can be specified in the same way as for multifile

Method open returns true if database was successfully opened or false if open operation failed. In last case database handleError method is called with DatabaseOpenError error code. Database session can be terminated by close method, which implicitly commits current transaction.

In multithreaded application each thread, which wants to access database, should first be attached to it. Method dbDatabase::attach() allocates thread specific data and attaches thread to the database. This method is automatically called by open() method, so there is no reason to call attach() method for the thread opening database. When thread finishes work with database, it should call dbDatabase::detach() method. Method close automatically invokes detach() method. Method detach() implicitly commits current transaction. Attempt to access database by detached thread causes assertion failure.

GigaBASE is able to perform compilation and execution of queries in parallel, providing significant increase of performance in multiprocessor systems. But concurrent updates of database are not possible (this is a price for efficient log-less transaction mechanism and zero time recovery). When application wants to modify database (open cursor for update or insert new record in the table), it first locks database in exclusive mode, prohibiting accesses to database by other applications, even for read-only queries. So to avoid blocking of database application for a long time, modification transactions should be done as short as possible. No blocking operations (like waiting input from the user) should be done within transaction.

Using only shared and exclusive locks on database level, allows GigaBASE to almost eliminate overhead of locking and optimize speed of execution of non-conflicting operations. But if many applications simultaneously updates different parts of database, then approach used in GigaBASE will be very inefficient. That is why GigaBASE is most suitable for single-application database access model or for multiple applications with read-dominated access pattern model.

Cursor object should be used only by one thread in a multithreaded application. If there are more than one threads in your applications, use local variables for cursors in each thread. It is possible to share query variables between threads, but take care about query parameters. The query should either has no parameters, or relative form of parameters binding should be used.

The dbDatabase object is shared between all threads and uses thread specific data to perform query compilation and execution in parallel with minimal synchronization overhead. There are few global things, which require synchronization: symbol table, pool of tree node,... But scanning, parsing and execution of query can be done without any synchronization, providing high level of concurrency at multiprocessor systems.

Database transaction is started by first select or insert operation. If cursor for update is used, then database is locked in exclusive mode, prohibiting access to the database by other applications and threads. If read-only cursor is used, then database is locked in shared mode preventing other application and threads from modifying database, but allowing concurrent read requests execution. Transaction should be explicitly terminated either by dbDatabase::commit() method, which fixes all changes done by transaction in database, or by dbDatabase::rollback() method which undo all modifications done by transaction. Method dbDatabase::close() automatically commits current transaction.

If several threads are concurrently updating database, it will be possible to increase total performance by using partial transaction commit. Method dbDatabase::precommit() doesn't flush any changes to the disk and switch object index. Instead of this it only release locks hold by transaction allowing other threads to proceed. All cursors opened by the thread are closed by dbDatabase::precommit() method. When the thread will access the database next time, it will have to obtain database locks once again. Using precommit method instead of commit eliminates disk operations and so dramatically increases performance. But it is necessary to remember that if application or system fault will take place after precommit method execution, all changes made by transaction will be lost.

If you start transaction by performing selection using read-only cursor and then use cursor for update to perform some modifications of database, database will be first locked in shared mode and then lock will be upgraded to exclusive. This can cause deadlock problem if database is simultaneously accessed by several applications. Imagine that application A starts read transaction and application B also starts read transaction. Both of them hold shared locks on the database. If both of them wants to upgrade their locks to exclusive, they will forever block each other (exclusive lock can not be granted until shared lock of other process exists). To avoid such situation try to use cursor for update at the beginning of transaction or explicitly use dbdatabase::lock() method. More information about implementation of transactions in GigaBASE can be found in section Transactions.

It is possible to explicitly lock database by lock() method. Locking is usually done automatically and there are few cases when you will want to use this method. It will lock database in exclusive mode until the end of current transaction.

Backup of database can be done by the following method:

        bool dbDatabase::backup(char const* backupFileName);
Backup locks database in shared mode and flush image of database in main memory to specified file. Because of using of shadow object index, database file is always in consistent state, so recovery from the backup can be performed just by renaming backup file (if backup was performed on tape, it should be first restored to the disk). If multifile was used as database storage, then simple renaming or copying of backup file is not possible. GigaBASE provides restore method:
        bool dbDatabase::restore(char const* backupFileName,
                                 const* databaseFileName);
This method should be called before opening database, restore of online database is not possible. If databaseFileName contains @ in first position, the rest of the name is treated as the name of the file with multifile segments description (the same as used by dbDatabase::open method). Database can be also restored using restore code of subsql utility.

Class dbDatabase is also responsible for handling various application errors, such as syntax errors in query compilation, out of range index or null reference access during query execution. There is virtual method dbDatabase::handleError, which handles these errors:

        virtual void handleError(dbErrorClass error,
                                 char const*  msg = NULL,
                                 int          arg = 0);
Programmer can derive his own subclass from dbDatabase class and redefine default reaction on errors.

Error classes and default handling
ClassDescriptionArgumentDefault reaction
QueryErrorquery compilation errorposition in query stringabort compilation
ArithmeticErrorarithmetic error during division or power operations-terminate application
IndexOutOfRangeErrorindex is out if array boundsvalue of indexterminate application
DatabaseOpenErrorerror while database opening-open method will return false
FileErrorfailure of file IO operationerror codeterminate application
OutOfMemoryErrornot enough memory for object allocationrequested allocation sizeterminate application
Deadlockupgrading lock cause deadlock-terminate application
NullReferenceErrornull reference is accessed during query execution-terminate application

Call level interface

Interface described in previous section provides convenient and reliable mechanism for accessing data from C++. It has two drawbacks:
  1. It is very C++ specific and can not be used with other programming languages
  2. It is suitable only for local connections to the database (within one system).
Interface described below outcomes these two restrictions. It consists of the set of pure ANSI C functions and using it mapping of any programming language to the GigaBASE database can be easily implemented. Connection between client and serves is performed by sockets (either local, either standard TCP/IP sockets). Certainly this interface is less convenient and more error prone than C++ interface, but this is a cost of its flexibility. All types, constants and functions are declared in cli.h file.

GigaBASE provides multithreaded server for handling client CLI sessions. This server can be started from SubSQL utility by start server 'HOST:PORT' <number-of-threads> command. This server will accept local (within one system) and global clients connections and attach one thread from the threads pool to each connection. The size of thread's pool is controlled by number-of-threads parameters. But the server can spawn more than specified number of threads if there are many active connections. A thread is attached to the client until the end of the session. If session is abnormally terminated, all changes made by the client are rollbacked. The server can be stopped by correspondent stop server 'HOST:PORT' command.

Gigabase CLI API also optionally supports authentication. To enable it, you should recompile GigaBASE with SECURE_SERVER macro defined. When this macro is defined, the table UserInfo is created at server. cli_open method takes user_name and password parameters which are passed to the server. When connection is established, server searches in UserInfo table for the user with specified name and password. If the user is not found or password doesn't match, the server returns cli_login_failed code to the client. To enter new user or remove old one, database administrator can use SubSQL insert and remove commands. As far as update command is not currently implemented, you will have to perform remove and insert if you what to change users password. Do not forget to commit your changes, otherwise all clients will be blocked.

CLI functions return codes
Error codeDescription
cli_okSuccessful completion
cli_bad_addressInvalid format of server URL
cli_connection_refusedConnection with server could not be established
cli_bad_statementText of SQL statement is not correct
cli_parameter_not_foundParameter was not found in statement
cli_unbound_parameterParameter was not specified
cli_column_not_foundNo such column in the table
cli_incompatible_typeConversion between application and database type is not possible
cli_network_errorConnection with server is broken
cli_runtime_errorError during query execution
cli_bad_descriptorInvalid statement/session description
cli_unsupported_typeUnsupported type for parameter or column
cli_not_foundRecord was not found
cli_not_update_modeAttempt to update records selected by view only cursor
cli_table_not_foundThere is no table with specified name in the database
cli_not_all_columns_specifiedInsert statement doesn't specify values for all table columns
cli_not_fetchedcli_fetch method was not called
cli_already_updatedcli_update method was invoked more than once for the same record
cli_login_failedLogin to the server is failed
cli_empty_parameterParameter is not assigned a value
cli_closed_connectionAccess to the closed connection
cli_table_already_existsAttempt to create existed table
cli_not_implementedFunction is not implemented

Supported types
TypeDescriptionSize
cli_oidObject identifier4
cli_boolBoolean type1
cli_int1Tiny integer type1
cli_int2Small integer type2
cli_int4Integer type4
cli_int8Big integer type8
cli_real4Single precision floating point type4
cli_real8Double precision floating point type8
cli_decimalDecimal numeric type (number is represented as zero terminated ASCII string)1*N
cli_asciizZero terminated string of bytessizeof(char_t)*N
cli_pasciizPointer to zero terminated string (pointer should be initialized by application)sizeof(char_t)*N
cli_cstringString with counter (see declaration of cli_ctring_t in cli.h). The terminated null character is not stored. When cli_cstring is used as fetched column type, the pointer is set to the internal buffer, so no memory should be allocated and deallocated to hold value.8
cli_array_of_oidArray of references4*N
cli_array_of_boolArray of booleans1*N
cli_array_of_int1Array of tiny integers1*N
cli_array_of_int2Array of small integers2*N
cli_array_of_int4Array of integers4*N
cli_array_of_int8Array of big integers8*N
cli_array_of_real4Array of reals4*N
cli_array_of_real8Array of long reals8*N
cli_anycan be used only for binding columns, server will use the same type for column as stored in the database?
cli_datetimetime in seconds since 00:00:00 UTC, January 1, 1970.4
cli_autoincrementcolumn automatically assigned value during record insert4
cli_rectanglerectangle (by default R2 rectangle with int4 coordinatesdimension*2*sizeof(cli_coord_t)


int cli_open(char const*   server_url,
	     int           max_connect_attempts,
	     int           reconnect_timeout_sec,
	     char_t const* user_name,
	     char_t const* password,
	     int           pooled_connection);
Establish connection with the server
Parameters
server_url - zero terminated string with server address and port, for example "localhost:5101" or "195.239.208.240:6100". It can contain multiple addresses, in this case replication socket will be created. Data written to such socket will be broadcasted to all specified servers. And read from this socket cause receiving response from each server and checking that them are equal. If connection with one of the server is broken or response received from this server is not equal to other responses, then this server is excluded from list of available servers. Client can continue work with database until at least one server is available and it is possible to choose correct response (quorum is achieved).
max_connect_attempts - number of attempts to establish connection
reconnect_timeout_sec - timeout in seconds between connection attempts
reconnect_timeout_sec - timeout in seconds between connection attempts
user_name - user name for login
password - password for login
pooled_connection - if not 0, then connection will be allocated from the connection pool
Returns
>= 0 - connection descriptor to be used in all other cli calls
< 0 - error code as described in cli_result_code enum

int cli_close(int session);
Close session. If session was opened with pooled connections enabled, it is not actually closed, but placed in the pool of connection and can be reused ni future.
Parameters
session - session descriptor returned by cli_open
Returns
result code as described in cli_result_code enum

void cli_clear_connection_pool();
Close all released connection in connection pool.

int cli_statement(int session, char const* stmt);
Specify SubSQL statement to be executed at server. Binding to the parameters and columns can be established.
Parameters
session - session descriptor returned by cli_open
stmt - zero terminated string with SubSQL statement
Returns
>= 0 - statement descriptor
< 0 - error code as described in cli_result_code enum

int cli_parameter(int         statement,
		  char const* param_name,
		  int         var_type,
		  void*       var_ptr);
Bind parameter to the statement
Parameters
statement - statement descriptor returned by cli_statement
param_name - zero terminated string with parameter name. Parameter name should start with '%'
var_type - type of variable as described in cli_var_type enum. Only scalar and zero terminated string types are supported.
var_ptr - pointer to the variable
Returns
result code as described in cli_result_code enum

int cli_column(int         statement,
	       char const* column_name,
	       int         var_type,
	       int*        var_len,
	       void*       var_ptr);
Bind extracted column of select or insert statement
Parameters
statement - statement descriptor returned by cli_statement
column_name - zero terminated string with column name
var_type - type of variable as described in cli_var_type enum
var_len - pointer to the variable to hold length of array variable. This variable should be assigned the maximal length of the array/string buffer, pointed by var_ptr. After the execution of the statement it is assigned the real length of the fetched array/string. If it is large than length of the buffer, then only part of the array will be placed in the buffer, but var_len still will contain the actual array length.
var_ptr - pointer to the variable
Returns
result code as described in cli_result_code enum

typedef void* (*cli_column_set)(int var_type, void* var_ptr, int len);
typedef void* (*cli_column_get)(int var_type, void* var_ptr, int* len);

int cli_array_column(int            statement,
		     char const*    column_name,
		     int            var_type,
		     void*          var_ptr,
		     cli_column_set set,
		     cli_column_get get);
Specify get/set functions for the array column
Parameters
statement - statement descriptor returned by cli_statement
column_name - zero terminated string with column name
var_type - type of variable as described in cli_var_type enum
var_ptr - pointer to the variable
set - function which will be called to construct fetched field. It receives pointer to the variable, length of the fetched array and returns pointer to the array's elements.
get - function which will be called to update the field in the database. Given pointer to the variable, it should return pointer to the array elements and store length of the array to the variable pointer by len parameter
Returns
result code as described in cli_result_code enum

typedef void* (*cli_column_set_ex)(int var_type, void* var_ptr, int len, 
				   char const* column_name, int statement, void const* data_ptr);
typedef void* (*cli_column_get_ex)(int var_type, void* var_ptr, int* len, 
				   char const* column_name, int statemen);

int cli_array_column(int               statement,
		     char const*       column_name, 
		     int               var_type,
		     void*             var_ptr,
		     cli_column_set_ex set,
		     cli_column_get_ex get);
Specify extended get/set functions for the array column
Parameters
statement - statememt descriptor returned by cli_statement
column_name - zero terminated string with column name
var_type - type of variable as described in cli_var_type enum
var_ptr - pointer to the variable
set - function which will be called to construct fetched field. It receives type of the vartiable, pointer to the variable, length of the fetched array, name of the fetched column, statement descriptor and pointer to the array data. If this method returns not NULL pointer, database will copy unpacked array to the returned location. Otherwise it is assumed that function handle data itself.
get - function which will be called to update the field in the database. Given type of the vartiable, pointer to the variable, column name and statment descriptor, it should return pointer to the array elements and store length of the array to the variable pointer by len parameter
Returns
result code as described in cli_result_code enum

enum {
    cli_view_only,
    cli_for_update
};

int cli_fetch(int statement, int for_update);
Execute select statement.
Parameters
statement - statement descriptor returned by cli_statement
for_update - not zero if fetched rows will be updated
Returns
>= 0 - success, for select statements number of fetched rows is returned
< 0 - error code as described in cli_result_code enum

int cli_insert(int statement, cli_oid_t* oid);
Execute insert statement.
Parameters
statement - statement descriptor returned by cli_statement
oid - object identifier of created record.
Returns
status code as described in cli_result_code enum

int cli_get_first(int statement);
Get first row of the selection.
Parameters
statement - statement descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_get_last(int statement);
Get last row of the selection.
Parameters
statement - statement descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_get_next(int statement);
Get next row of the selection. If get_next records is called exactly after cli_fetch function call, is will fetch the first record in selection.
Parameters
statement - statement descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_get_prev(int statement);
Get previous row of the selection. If get_next records is called exactly after cli_fetch function call, is will fetch the last record in selection.
Parameters
statement - statement descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

cli_oid_t cli_get_oid(int statement);
Get object identifier of the current record
Parameters
statement - statement descriptor returned by cli_statement
Returns
object identifier or 0 if no object is selected

int cli_update(int statement);
Update the current row in the selection. You have to set for_update parameter of cli_fetch to 1 in order to be able to perform updates. Updated value of row fields will be taken from bound column variables.
Parameters
statement - statement descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_remove(int statement);
Remove all selected records. You have to set for_update parameter of cli_fetch to 1 in order to be able to remove records.
Parameters
statement - statement descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_free(int statement);
Deallocate statement and all associated data
Parameters
statement - statement descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_commit(int session);
Commit current database transaction
Parameters
session - session descriptor as returned by cli_open
Returns
result code as described in cli_result_code enum

int cli_abort(int session);
Abort current database transaction
Parameters
session - session descriptor as returned by cli_open
Returns
result code as described in cli_result_code enum

int cli_show_tables(int session, cli_table_descriptor** tables);
Return list of table presetn in the database.
 
        typedef struct cli_table_descriptor {
            char const*       name;
        } cli_table_descriptor;
Parameters
session - session descriptor as returned by cli_open
tables - address of the pointer to the array with table descriptors. cli_show_tables uses malloc to allocate this array, and it should be deallocated by application using free() function.
Returns
result code as described in cli_result_code enum

int cli_describe(int session, char const* table, cli_field_descriptor** fields);
Return definition of fields of specified table. Definition of field descriptor has the following format:
 
        typedef struct cli_field_descriptor {  
            enum cli_var_type type;
            int               flags;
            char_t const*     name;
            char_t const*     refTableName;
            char_t const*     inverseRefFieldName;
        } cli_field_descriptor;
Parameters
session - session descriptor as returned by cli_open
table - name of the table
fields - address of the pointer to the array with field descriptors. cli_describe uses malloc to allocate this array, and it should be deallocated by application using free() function.
Returns
result code as described in cli_result_code enum


int cli_create_table(int                   session, 
                     char_t const*         tableName, 
                     int                   nFields, 
		     cli_field_descriptor* fields);
Create new table
Parameters
session - session descriptor as returned by cli_open
tableName - name of the created table
nFields - number of columns in the table
fields - array with table columns descriptors. Descriptor is has the following structure:
    enum cli_field_flags { 
        cli_hashed           = 1, /* field should be indexed usnig hash table */
        cli_indexed          = 2, /* field should be indexed using B-Tree */
        cli_case_insensitive = 4  /* index is case insensitive */
    };

    typedef struct cli_field_descriptor { 
        enum cli_var_type type;
        int               flags;
        char_t const*     name;
        char_t const*     refTableName;
        char_t const*     inverseRefFieldName;
    } cli_field_descriptor;
Returns
result code as described in cli_result_code enum

int cli_drop_table(int                   session, 
	           char_t const*         tableName); 
Drop table
Parameters
session - session descriptor as returned by cli_open
tableName - name of the created table
Returns
result code as described in cli_result_code enum

int cli_alter_index(int           session, 
	            char_t const* tableName 
		    char_t const* fieldName, 
		    int           newFlags); 
Add or remove column index
Parameters
session - session descriptor as returned by cli_open
tableName - name of the created table
fieldName - name of the field
newFlags - new flags of the field, if index exists for this field, but is not specified in newFlags mask, then it will be removed; if index not exists, but is specified in newFlags mask, then it will be created.
Returns
result code as described in cli_result_code enum

int cli_freeze(int statement);
Freeze cursor. Make it possible to reused cursor after commit of the current transaction.
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_unfreeze(int statement);
Unfreeze cursor. Reuse previously frozen cursor.
Parameters
statement - statememt descriptor returned by cli_statement
Returns
result code as described in cli_result_code enum

int cli_seek(int statement, cli_oid_t oid);
Position cursor to the record with specified OID
Parameters
statement - statememt descriptor returned by cli_statement
oid - object identifier of the record to which cursor should be positioned
Returns
>= 0 - success, position of the record in the selection
< 0 - error code as described in cli_result_code enum

int cli_skip(int statement, int n);
Skip specified number of rows.
Parameters
statement - statememt descriptor returned by cli_statement
n - number of objects to be skipped
Returns
result code as described in cli_result_code enum

Local implementation of CLI

Starting from version 2.65 GigaBASE provides local implementation of CLI interface. It means that now it is possible to access database directly from C application using CLI functions without starting separate server and socket communication overhead. Local implementation of CLI functions are included in main gigabase library. So if you want to use remote CLI, link you application with cli.lib and if you want to access database locally - link it with gigabase.lib. To create local session you should use cli_create function instead of cli_open. Calling cli_create when your application is linked with cli.lib or cli_open when it is linked with gigabase.lib cause cli_bad_address error.


int cli_create(char_t const* databasePath, 
               unsigned      transactionCommitDelay, 
	       int           openAttr, 
	       size_t        poolSize);
Create connection to the local database
Parameters
databasePath - path to the database file
transactionCommitDelay - transaction commit delay (specify 0 to disable)
openAttr - ask of cli_open_attributes. You can specify combination of the following attributes:
  • cli_open_default
  • cli_open_readonly
  • cli_open_truncate
  • cli_open_no_buffering
poolSize - size of page pool (in pages), specify 0 to let GigaBASE automatically detect pool size
Returns
>= 0 - connection descriptor to be used in all other cli calls
< 0 - error code as described in cli_result_code enum

int cli_attach(int session);
Attach thread to the database. Each thread except one opened the database should first attach to the database before any access to the database, and detach after end of the work with database
Parameters
session - session descriptor as returned by cli_open
Returns
result code as described in cli_result_code enum

int cli_detach(int session, int detach_mode);
Attach thread to the database. Each thread except one opened the database should first attach to the database before any access to the database, and detach after end of the work with database
Parameters
session - session descriptor as returned by cli_open
detach_mode - bit mask representing detach mode
    enum cli_detach_mode {
        cli_commit_on_detach          = 1,
        cli_destroy_context_on_detach = 2
    };
Returns
result code as described in cli_result_code enum

int cli_prepare_query(int session, char_t const* query);
Prepare SubSQL query statement.
Parameters
session - session descriptor as returned by cli_open
query - query string with optional parameters. Parameters are specified as '%T' where T is one or two character code of parameter type using the same notation as in printf:
%d or %icli_int4_t
%fcli_int8_t
%Li, %li, %ld or %Ld cli_int8_t
%pcli_oid_t
%schar_t*
Returns
>= 0 - statement descriptor
< 0 - error code as described in cli_result_code enum

int cli_execute_query(int statement, int for_update, void* record_struct, ...)
Execute query previously prepared by cli_prepare_query.
It is assumed that format of the destination C structure matches format of the target database table. For scalar and reference types mapping is obvious: you should use correspondent cli_ types in declaring structure and table fields. For array types, you should use cli_array_t structure. Strings should be represented as char* and programmer should not try to deallocate them or copy this pointer and access it outside context of the current record.
Parameters
statement - statement descriptor returned by cli_prepare_query
for_update - not zero if fetched rows will be updated
record_struct - structure to receive selected record fields.
... - varying list of query parameters
Returns
result code as described in cli_result_code enum

int cli_insert_struct(int session, char_t const* table_name, void* record_struct, cli_oid_t* oid);
Insert new record represented as C structure.
It is assumed that format of the destination C structure matches format of the target database table. For scalar and reference types mapping is obvious: you should use correspondent cli_ types in declaring structure and table fields. For array types, you should use cli_array_t structure. Strings should be represented as char_t*.
Parameters
session - session descriptor as returned by cli_open
table_name - name of the destination table
record_struct - structure specifying value of record fields
oid - pointer to the location to receive OID of created record (may be NULL)
Returns
result code as described in cli_result_code enum

Interface to PHP language

PHP is a server-side, cross-platform, HTML embedded scripting language. Visit php.net site for more information about this language. GigaBASE interface to PHP is implemented in PHP itself and use sockets to connect to the GigaBASE server. It uses the same protocol as C CLI. Interface consists of two main classes: gb_connection and gb_statement. Methods of these classes are very similar with functions of CLI C protocol.

Status codes

CLI functions return codes
Error codeDescription
cli_okSuccessful completion
cli_bad_addressInvalid format of server URL
cli_connection_refusedConnection with server could not be established
cli_bad_statementText of SQL statement is not correct
cli_parameter_not_foundParameter was not found in statement
cli_unbound_parameterParameter was not specified
cli_column_not_foundNo such column in the table
cli_incompatible_typeConversion between application and database type is not possible
cli_network_errorConnection with server is broken
cli_runtime_errorError during query execution
cli_close_statementAccess to the closed statement object
cli_unsupported_typeUnsupported type for parameter or column
cli_not_foundRecord was not found
cli_not_update_modeAttempt to update records selected by view only cursor
cli_table_not_foundThere is no table with specified name in the database
cli_not_all_columns_specifiedInsert statement doesn't specify values for all table columns
cli_not_fetchedcli_fetch method was not called
cli_already_updatedcli_update method was invoked more than once for the same record
cli_empty_parameterAttempt to bind parameters which was not defined (assigned value or set type)
cli_closed_connectionAccess to the closed connection object
cli_table_already_existsAttempt to create table when table with such name already exists
cli_not_implementedThis function is not implemented by CLI API

Supported types

GigaBASE PHP interface supports the following PHP types for query parameters and table columns:
  1. Integer
  2. Real
  3. String
  4. Reference (represented by gb_reference object).

Class gb_connection_pool

This class can be used for connection pooling. Some application (for example Web server scripts) has to open database connections multiple times. To eliminate overhead of establishing connection each time client request is proceeded, connection pool can be used. In this case connection is not actually closed, but instead of it just placed in the connection pool. Next time the connection with this server with the same user name and password is established, connection is just extracted from the pool.


function new_connection($host_address, $host_port, $user_name = "guest", $password = "");
Reuse existed pooled connection or make new one.
Parameters
host_address - string with server host name
host_port - integer number with server port
user_name - user name to login
password - password to login
Returns
gb_connection

function release_connection($conxn);
Place connection in the connection pool to make it available for future reuse. This method implicitly commits the last transaction performed by the specified connection.
Parameters
conxn - connection been placed in the pool.

function close();
Physically close all connections in the connection pool.

Class gb_connection

This class is responsible for establishing connection with the database and session specific operations.
function open($host_address, $host_port, $user_name = "guest", $password = "");
Establish connection with the server
Parameters
host_address - string with server host name
host_port - integer number with server port
user_name - user name to login
password - password to login
Returns
status code

function close();
Close connection.
Returns
status code

function create_statement($stmt);
Create statement to be executed at server. Binding to the parameters and columns can be established.
Parameters
stmt - string with SubSQL statement. Parameter names should start with '%' character.
Returns
gb_statement object or null if specified statement is invalid

function insert($obj, $oid);
Insert object in the table with the same name as the object class.
Parameters
obj - object to be stored in the database. Name of the object class should match with the name of some of database tables.
oid - reference to the variable to receive OID of created object.
Returns
status code

function commit();
Commit current database transaction
Returns
status code

 
function rollback();
Rollback transaction
Returns
status code

function show_tables($tables) 
Returns information about tables present in the database
Parameters
table - reference of the variable to receive array with table names
Returns
status code

function describe_table($name, $table) 
Returns information about table column names and types
Parameters
name - name of the table
table - reference of the variable to receive associative array <field-name,field-type>
Returns
status code

Class gb_statement

This class represents prepared statement. Statement can be used several time with different parameters values. It should be explicitly deallocated by free method in order to release resources at server. As far as PHP provides no finalize mechanism, GigaBASE PHP API is not able to do it automatically. Forgetting to free statements can cause memory exhaustion at server.

bind_parameter function bind_parameter($name, $binding);

Bind parameter to the statement
Parameters
name - string with name of parameter. Parameter name should start with '%'.
binding - reference to the parameter variable.
Returns
status code

function bind_column($name, $binding);
Bind column of select or insert statement
Parameters
name - string with name of the column.
binding - reference to the column variable.
Returns
status code

function bind_object($name, $obj);
Bind variable to receive fetched object or to specify inserted object
Parameters
binding - reference to the object variable.
Returns
status code

function bind_array($name, $arr);
Bind variable for fetching or inserting record as associative array of <field, value>
Parameters
binding - reference to the array variable.
Returns
status code

function fetch($for_update = false);
Execute select statement.
Parameters
for_update - true if fetched rows will be updated
Returns
status code

function fetch_objects($arr);
Return array with all fetched objects.
Parameters
arr - reference to the variable to receive array with all selected objects
Returns
status code

function fetch_tuples($arr);
Return array with all fetched tuples. Tuple is returned as associative array of <field, value> pairs.
Parameters
arr - reference to the variable to receive array with all selected tuples
Returns
status code

function insert($oid);
Execute insert statement.
Parameters
oid - reference to variable to receive OID of created object
Returns
status code

function get_first();
Get first row of the selection.
Returns
status code

function get_last();
Get last row of the selection.
Returns
status code

function get_next();
Get next row of the selection. If get_next records is called exactly after fetch method call, is will fetch the first record in the selection.
Returns
status code

function get_prev();
Get previous row of the selection. If get_next records is called exactly after fetch method call, is will fetch the last record in selection.
Returns
status code

function get_oid();
Get object identifier of the current record.
Returns
gb_reference object with OID or null if no object was selected

function update();
Update the current row in the selection. You have to set for_update parameter of fetch to 1 in order to be able to perform updates. Updated value of row fields will be taken from bound column variables.
Returns
status code

function remove();
Remove all selected records. You have to set for_update parameter of fetch to 1 in order to be able to remove records.
Returns
status code

function free();
Deallocate statement and all associated data
Returns
status code

Class gb_reference

Class gb_reference is used to represent database references. It contains the single fields - integer oid (object identifier).

Delayed transactions and online backup scheduler

GigaBASE supports ACID transactions. It means that after database is reported that transaction is committed, it is guaranteed that database will be able to recover transaction in case of system fault (except corruption of database image at hard disk). The only way to provide this feature on standard equipment (without non-volatile RAM for example) and under general-purpose operating systems (Windows, Unix, ...) is to perform synchronous write to the disk. "Synchronous" in this context means that operating system will not return control to the application until data will be really written to the disk. Unfortunately synchronous write is very time expensive operation - average disk access time is about 10ms, so it is hard to achieve performance more than 100 transactions per second.

But in many cases it is acceptable to loose changes for few last seconds (but preserving consistency of the database). With this assumption, database performance can be significantly increased. GigaBASE provides "delayed transaction commit model" for such applications. When commit transaction delay is non zero, database doesn't perform commit immediately, instead of it delay it for specified timeout. After expiration of this timeout, transaction is normally committed, so it ensures that only changes done within specified timeout can be lost in case of system crash.

If thread, which has initiated delayed transaction, starts new transactions before delayed commit of transaction is performed, then delayed commit operation is skipped. So GigaBASE is able to group several subsequent transactions performed by on client into the large single transaction. And it will greatly increase performance, because it reduces number of synchronous writes and number created shadow pages (see section Transactions).

If some other client tries to start transaction before expiration of delayed commit timeout, then GigaBASE force delayed commit to proceed and release resource for another thread. So concurrency is not suffered from delayed commit.

By default delayed commits are disabled (timeout is zero). You can sepcify commit delay parameter as second optional argument of dbDatabase::open method. In SubSQL utility, it is possible to specify value of transaction commit delay by setting "GIGABASE_COMMIT_DELAY" environment variable (seconds).

Transaction commit scheme used in GigaBASE guaranty recovery after software and hardware fault if image of the database at the disk was not corrupted (all information which was written to the disk can be correctly read). If for some reasons, database file is corrupted, then the only way to recover database is use backup (hoping that it was performed not so long time ago).

Backup can be done by just copying database file when database is offline. Class dbDatabase provides backup method which is able to perform online backup, which doesn't require stopping of the database. It can be called at any time by programmer. But going further, GigaBASE provides backup scheduler, which is able to perform backup automatically. The only things needed - name of the backup file and interval of time between backups.

The method dbDatabase::scheduleBackup(char_t const* fileName, time_t period) spawns separate thread which performs backups to the specified location with specified period (in seconds). If fileName ends with "?" character, then data of backup initiation is appended to the file name, producing the unique file name. In this case all backup files are kept on the disk (it is responsibility of administrator to remove too old backup files or transfer them to another media). Otherwise backup is performed to the file with fileName + ".new" name, and after completion of backup, old backup file is removed and new file is renamed to fileName. Also in last case, GigaBASE will check the creation date of the old backup file (if exists) and adjust wait timeout in such way, that delta of time between backups will be equal to specified period (so if database server is started only for 8 hours per day and backup period is 24 hours, then backup will be performed each day, unlike scheme with uniquely generated backup file names).

It is possible to schedule backup processing in SubSQL utility by setting GIGABASE_BACKUP_NAME environment variable. Period value is taken from GIGABASE_BACKUP_PERIOD environment variable if specified, otherwise it is set to one day. To recover from backup it is enough to copy some of the backup files instead of corrupted database file.

Query optimization

To reduce query execution time GigaBASE uses indices, inverse references and query parallelization. The following sections supplies more information about these optimizations.

Using indices in queries

Indices is traditional approach for increasing RDBMS performance. GigaBASE uses B-tree for implementing index access to the data. GigaBASE uses simple rules for applying indices, allowing programmer to predict when index will be used. Check for index applicability is done during each query execution, so decision can be made depending on values of operands. The following rules describes algorithm of applying indices by GigaBASE:

If index is used to search prefix of like expression, and suffix is not just '%' character, then index search operation can return more records than really match the pattern. In this case we should filter index search output by applying pattern match operation.

When search condition is disjunction of several subexpressions (expression contains several alternatives combined by or operator), then several indices can be used for query execution. To avoid record duplicates in this case, bitmap is used in cursor to mark records already included in the selection.

If search condition requires sequential table scan, B-tree index still can be used if order by clause contains the single record field for which B-tree index is defined. As far as sorting is very expensive operation, using of index instead of sorting significantly reduce time of query execution.

It is possible to check which indices are used for query execution and number of probes done during index search be compiling GigaBASE with option -DDEBUG=DEBUG_TRACE. In this case GigaBASE will dump trace information about database functionality including information about indices.

Inverse references

Inverse references provides efficient and reliable way of establishing relations between tables. GigaBASE uses information about inverse reference when record is inserted/updated/deleted and also for query optimization. Relations between records can be of one of the following types: one-to-one, one-to-many and many-to-many.