PERST - Simple, Fast, Convenient Object Oriented Database

Introduction
Features
JSQL
Transparent APIs
- Using .Net remoting API
- Using virtual properties
PERST implementation issues
- Memory allocation
- Logless transactions
Where to use
Quick start
Tuning
Tricks and tips
Perst API Web Pages
RDF support
Perst license

Introduction

PERST is very simple object oriented embedded database. It is easy to use and provide high performance. It is intended to be used in applications which needs to deal with persistent data in more sophisticated way than load/store object tree provided but standard serialization mechanism. Although PERST is very simple, it provides fault tolerant support (ACID transactions) and concurrent access to the database.

The main advantage of PERST is tight integration with programming language. There is no gap between database and application data models - PERST directly stores language objects. So there is no need in packing/unpacking code, which has to be written for traditional relational databases. Also PERST (unlike many other OODBMS) requires no special compiler or preprocessor. And still it is able to provide high level of transparency.

Features

Lets now describe key features of PERST architecture.

Persistency by reachability

PERST is implementing persistency by reachability approach. Object of any class derived from Persistent base class is considered as persistent capable. It is automatically made persistent and stored in the storage when it is referenced from some other persistent object and store method of that object was invoked. So there is no need (but it is possible) to explicitly assign object to the storage.

The storage has one special root object. Root object is the only persistent object accessed in the special way (using Storage.getRoot method). All other persistent objects are accessed in normal way: either by traversing by references from another persistent objects, either using object containers (Index, Link or Relation). Unlike many other OODBMS, there can be only one root in the storage. If you need to have several named roots, you should create Index object and use it as root object.

PERST requires that each persistent capable class should be derived from Persistent class. It makes not possible to store "foreign" classes in the storage. This is the cost of easy use of PERST and lack of any specialized preprocessors or compilers. Also components of persistent capable object are restricted to have the following types:

Scalar types: Any valid CSharp scalar type: boolean, integer, real or enum. For example bool, int, short, double,...
String type: System.String type
Date type: System.DateTime type
Reference to the persistent object: Any class inherited from Persistent or any interface extending IPersistent interface.
Value type: Any C-Sharp value type with the same restrictions for types of components as for persitent capable objects. Values are stored inside the persistent object containing them.
Raw binary type: Any CSharp class not derived from IPersistent or IValue interfaces and marked as Serializable. Standard CSharp serialization mechanism will be used to pack cluster of such objects into byte array which will be stored in the database. The class should be marked with Serializable attribute and should not contain references to persistent objects.
Array type: One dimensional array with type of the components described above
Link type: One-to-many link or from implementation point of view - dynamic array of persistent object. Links are created using Storage.createLink method.

Unfortunately it is not possible to detect if object is changed or not without saving old state of the object and performing field-by-field comparison with new state of the object. But overhead of such solution (both space and CPU) is very high. In PERST it is responsibility of programmer to save object in the storage. It can be done by Persistent.store method. or Persistent.modify methods.

Persistent.store method writes object in the storage as well as all objects referenced from this object which are not yet persistent. So if you create tree of objects and assign reference to the root of this tree to some persistent object X, it is only necessary to invoke store() method in this object X. But then if you update one of the elements in this tree, you should invoke store() individually for each such element (X.store() will NOT now automatically save referenced objects).

Persistent.modify method mark object is modified but doesn't immediately write it to the storage. All objects marked as modified will be stored to the storage during transaction commit (store method will be invoked for each modified object). So using modify method is preferable if object is updated multiple times within transaction. In this case instead of storing it several times, it will be stored only once at the end.

Semitransparent object loading

PERST is not using any special compiler or preprocessor. And since C# doesn't provide runtime behavior reflection (changing behavior of object at runtime), it is not possible to implement completely transparent persistence (when there are no differences between access to transient and persistent objects). Instead of it PERST propose transparent behavior in most cases with some exceptions.

IPersistent interface declares recursiveLoading method. Default implementation of this method in Persistent class always returns true. In this case PERST will recursively load any object referenced from target object when target object is loaded. So it cause implicit loading of all cluster of referenced objects from storage to main memory. It is similar with how serialization mechanism works.

To avoid overflow of main memory caused by recursive loading of all objects fro the storage, programmer has to overload recursiveLoading method in some classes and return false n it. Objects loaded from such object will not be implicitly loaded and programmer has to explicitly invoke Persistent.load method to load them. So recursiveLoading method can be used to control loading of objects from storage to main memory.

Also it is important to notice that containers (Link, Relation, Index) always load member object on demand (do not perform automatic loading of all objects in the containers). Since access to the container members is always performed through methods of the container class, programmer will never notice it - container methods will always return loaded object(s).

PERST is using default constructor (constructor without parameters) to create object loaded from the storage. It means the following things:

All persistent capable classes should have default constructor (or have no constructor at all, in this case it will be automatically generated by compiler). Default constructor should not necessarily be public, it can have any access type.
Default constructor should initialize only transient fields.
Default constructor is used to create instance of the persistent object loaded from the storage. So at the time of default constructor execution field of the constructed object are not yet assigned values stored in the database. If you need these values to be able to perform initialization of transient fields, then you need to perform this initialization in onLoad method which is invoked after fetching of all values of non-transient fields of persistent object from the storage.

So summarizing all above, proposed mechanism is convenient and easy to use because it doesn't require programmer to explicitly load any referenced object and from another point of view is is flexible above by providing programmer control on object loading. Only those classes (usually containers) which explicitly control loading of their members (by overloading recursiveLoading to return false value) should be aware calling Persistent.load method.

Automatic scheme evaluation

PERST supports lazy automatic scheme evaluation. When class format is changed, PERST perform conversion of loaded object to new format. If this object is modified, it will be stored in new format. So object is converted to the new format on demand. PERST is able to handle the following scheme changes:

Compatible change of scalar type (change which could not cause data truncation). For example it is possible to change int type to long or int to float. But changing type from int to short is not possible. More precisely, PERST is able to perform any conversion which is supported by Java reflection mechanism (field type XXX can be changed to YYY if java.lang.reflect.Field.setXXX method can be applied to the component with type YYY).
Reordering components within class or moving component to base or derived class.
Adding/removing classes from class inheritance hierarchy.
Changing format of classes with value semantic.

All other schema modifications (such as renaming fields, incompatible change of field type) could not be handled by PERST automatic schema modification mechanism. In this case you can use Perst XML export/import mechanism. Database can be exported to XML format using Storage.exportXML method, then recreated with new class definitions and after it saved data can be imported using Storage.importXML method.

Relations

CSharp references provides a way to implement one-to-one relation between objects. But in many cases one-to-many and many-to-many relations are needed. PERST provides Link interface to deal with relations of such kind. This interface allows to add/delete/inspect members of the relation. Members of the relation can be accessed by index or be extracted as array of objects.

Relations can be of two kinds: embedded (when references to the related objects are stored in relation owner object itself) and standalone (when relation is separate object, which contains the reference to the relation owner and relation members). Both kinds of relations implements Link interface. Embedded relation is created by Storage.createLink method and standalone relation is represented by Relation persistent class created by Storage.createRelation method.

So relation between two classes A and B can be implemented in the following way:

Relation type Object A Object B

one-to-one B bref; A aref;

one-to-many Link bref; A aref;

many-to-one B bref; Link aref;

many-to-many Link bref; Link aref;

Relation type	Object A	Object B
one-to-one	`B bref;`	`A aref;`
one-to-many	`Link bref;`	`A aref;`
many-to-one	`B bref;`	`Link aref;`
many-to-many	`Link bref;`	`Link aref;`

Indices

Usually objects are accessed by traversing from one object to another using references or links. But it is frequently needed to locate object by key. In .NET Hashtable class can be used for it. In databases usually more sophisticated search is required. I do not want to implement in PERST complete SQL language, because it immediately makes DBMS huge and slow. But in most cases application is performing only very simple queries using exact match or key range. This is done in PERST by Index and IndexField interfaces. First interface is used for independent specification key and associated value. IndexField interface allows to index objects using one of the fields of this object (key field).

Indices are created in PERST using Storage.createIndex or Storage.createFieldIndex methods. There can be several index implementations but right now only one implementation based on B+Tree is provided (because B+Tree is the most efficient structure for disk based databases). Methods of Index and FieldIndex interface allows to add, remove and locate objects by key. It is possible to perform search either specifying exact key value either specifying range of key values (high or low boundary or both of them can be skipped or can be declared as been exclusive or inclusive). So it is possible to perform the following types of search:

key is equals to VAL
key belongs to [MIN_VAL, MAX_VAL]
key belongs to [MIN_VAL, MAX_VAL)
key belongs to (MIN_VAL, MAX_VAL]
key belongs to (MIN_VAL, MAX_VAL)
key is greater than MIN_VAL
key is greater or equals to MIN_VAL
key is less than than MAX_VAL
key is less than or equals to MAX_VAL

There are several different ways of selecting objects using index:

IPersistent get(Key key): Get object by key. This method should be used for unique indices to locate object by exact key value.
IPersistent[] get(Key from, Key till): Get array of objects with key belonging to the specified range. Either from boundary, either till boundary either both of them can be null. Both boundaries can be inclusive or exclusive.
IEnumerator GetEnumerator(): Get iterator which will traverse all objects in the index in key ascending order. You can use index in foreach construction to iterate through all indexed objects.
IEnumerator GetEnumerator(Key from, Key till, IterationOrder order): Get iterator for objects with key belonging to the specified range. Either from boundary, either till boundary either both of them can be null. Both boundaries can be inclusive or exclusive. Objects can be traversed in key ascending or descending order .

If you need set of persistent objects you should use Storage.createSet method. Set is implemented using B+Tree where object OID is used as a key.

PERST also support spatial indices (org.garret.perst.SpatialIndex) and generic indices with user defined comparator (org.garret.perst.SortedCollection). Spatial index is implemented using Guttman's R-Tree with quadratic split algorithm. It allows efficient search of R2 objects. Sorted collection provides almost the same methods as FieldIndex but it uses user defined comparator to compare collection members. Sorted collection is implemented using T-Tree and is especially efficient for main-memory databases.

The table below summarize information about all indeices supported by PERST:

Interface	Description	Key type	Implementation	Created by
`Index`	Index with explicitely specified key used for exact match or range queries	scalar, string or reference	B+Tree	`Storage.CreateIndex(Type type, boolean unique)`
`Index`	The same as above but assuming that there can be a lot of duplicate key values	scalar, string or reference	B+Tree	`Storage.CreateThinkIndex(Type type)`
`FieldIndex`	Index constructed for one of the object fields	scalar, string or reference	B+Tree	`Storage.CreateFieldIndex(Type type, String fieldName, boolean unique)`
`BitIndex`	Bit index for searching object by bitmap of properties	persistent object	B+Tree	`Storage.CreateBitIndex()`
`ISet`	Set of persistent objects	-	B+Tree	`Storage.CreateSet()`
`IPersistentSet`	Scalable set of persistent objects (can efficienty handle both small and large number of members	persistent object	Link or B+Tree	`Storage.CreateScalableSet()`
`SpatialIndex`	Index for spatial objects	Rectangle	R-Tree	`Storage.CreateSpatialIndex()`
`SpatialIndexR2`	Index for spatial objects with real coordinates	RectangleR2	R-Tree	`Storage.CreateSpatialIndexR2()`
`SortedCollection`	Index with user defined comparator	any	T-Tree	`Storage.CreateSortedCollection(PersistentComparator comparator, boolean unique)`

Projection

Using PERST indices programmer can easily implement most of simple SQL queries, like:

    select * from T where x=?;

PERST relations can be used to implement simple joins, like the following SQL query fetching all order to the particular vendor:

    select * from O Order, V Vendor where O.vendorID = V.id AND V.name='ABC';

In PERST it is possible to select first vendor using index search and ten traverse correspondent relation to locate all orders to this vendor.

But sometimes it is needed to implement more complex queries. It is also possible in Perst without need to write query in some special (non-procedural) language. Class Projection is use to combine results of several simple operations (such index search). Lets start explanation of this class with an example. Consider that we need to know all order for the detail with name started with 'D1' shipped by vendors with name started with 'V1' or 'V3'. SQL statement which perform this query is the following:

    select * from O Order, V Vendor, D Detail where D.name like 'D1%' and O.detailID = D.id
        and O.vendorID = V.id and (V.name like 'V1%' OR V.name like 'V3%');

And now how it can be done in Perst. Consider that we have indices for details and vendors:

    FieldIndex detailIndex = db.CreateFieldIndex(typeof(Detail), "name", true);
    FieldIndex vendorIndex = db.CreateFieldIndex(typeof(Vendor), "name", true);

Set of requested orders can be obtained in the following way:

    //  Projection from vendors to orders (using "orders" field of Link type)
    Projection v2o = new Projection(typeof(Vendor), "orders");
    //  Projection from details to orders (using "orders" field of Link type)
    Projection d2o = new Projection(typeof(Detail), "orders");
    // Select vendors with name like 'V1%'
    v2o.Project(vendorIndex.StartsWith("V1"));
    // Select vendors with name like 'V3%'
    v2o.Project(vendorIndex.StartsWith("V3"));
    // Select details with name like 'D1%'
    d2o.Project(detailIndex.StartsWith("D1"));
    // Join projections
    v2o.Join(d2o);
    // Get array of requested orders
    Order[] orders = (Order[])v2o.ToArray(typeof(Order)]);

So, as you see Projection class is used for several purposes:

Combine result of several simple operations (implementing OR operator)
Eliminate duplicates of such merge
Get set of related objects (perform projection using specified projection field)
Join several projections (analogue of SQL JOIN)

It is possible to derive your own class from Projection to implement more sophisticated projections than using single projection field.

If you need to sort selection in some particular order, then most efficient way is to use FieldIndex for it. When you select objects using index, selected objects are sorted by search key. If you need to sort selection by field which is not search key, than you can use Array.Sort method. For example if in the query described above we need to sort orders by price field, it can be done with the following statement:

    Array.Sort(orders, Order.PriceComparer);

Transaction model

PERST preserve consistency of the data in case of system or application failure. Transaction mechanism is used to implement all-or-nothing database update. Transaction in PERST are started implicitly when update operation is first time performed and finished explicitly by commit, rollback or close methods.

Commit of transaction cause synchronous write of changed pages to the disk. Synchronous write (application is blocked until data is really flushed to the disk) is very expensive operation. Average positioning time for modern disks is about 10ms, so them are usually not able to perform in one second more than 100 writes in random places. As far as transaction usually consists of several updated pages, it leads to average performance about 10 transaction commits per second.

Performance can be greatly increased if you minimize number of commits (larger transactions). PERST is using shadow mechanism for transaction implementation. When object is changed first time during transaction, shadow of the object is created and original object is kept unchanged. If object is updated multiple times during transaction, shadow is create only once. Because of using shows, PERST do not need transaction log file. So in PERST long transaction can not cause transaction log overflow as in most of other DBMSes. Quite the contrary, if you do not call commit at all, PERST works as DBMS without transaction support, adding almost no overhead of supporting transactions.

The only drawback of long transactions is possibility to loose a lot of changes in case of fault. PERST will preserve consistency of the database, but all changes made since list commit will be lost.

JSQL

Overview

JSQL is subset of SQL languages, which can be used to select objects instances according to selection condition. JSQL uses notation more popular for object-oriented programming than for relational database. Table rows are considered as object instances and the table - as class of these objects. Unlike SQL, JSQL is oriented on work with objects instead of SQL tuples. So the result of each query execution is a set of objects of one class. The main differences of JSQL from standard SQL are:

There are no joins of several tables and nested subqueries. Query always returns set of objects from one table.
Standard C# types are used for atomic table columns.
There are no NULL values, except null references.
Arrays can be used as record components. Special exists quantor is provided for locating element in arrays.
User methods can be defined for table records (objects) as well as for record components.
References between objects are supported including user defined reference resolvers.
As far as query language is deeply integrated with C# language, case sensitive mode is used for language identifiers as well as for keywords.
No implicit conversion of integer and floating types is done to string representation. If such conversion is need, it should be done explicitly.

In Perst there is org.garret.perst.Query class allowing to execute JSQL queries. To execute JSQL query it is necessary to provide object iterator (java.util.Iterator), class of iterated objects and string representing query condition. It is also possible to specify indices, resolvers, query parameters. Result of query execution is another iterator which can be used to traverse though selected objects (object matching specified search criteria).

All Perst collection classes contains select method which allows to filter collection members using JSQL query. This method is give class of objects stored in the collection (for some collections such as FieldIndex it is known and should not be specified) and JSQL condition. As the result of its work, select returns iterator of selected objects.

JSQL formal grammar

The following rules in BNF-like notation specifies grammar of JSQL query language search predicate:

Grammar conventions
Example Meaning

expression non-terminals

not terminals

| disjoint alternatives

(not) optional part

{1..9} repeat zero or more times

Grammar conventions
Example	Meaning
expression	non-terminals
not	terminals
\|	disjoint alternatives
(not)	optional part
{1..9}	repeat zero or more times

select-condition ::= ( expression ) ( traverse ) ( order )
expression ::= disjunction
disjunction ::= conjunction 
        | conjunction or disjunction
conjunction ::= comparison 
        | comparison and conjunction
comparison ::= operand = operand 
        | operand != operand 
        | operand <> operand 
        | operand < operand 
        | operand <= operand 
        | operand > operand 
        | operand >= operand 
        | operand (not) like operand 
        | operand (not) like operand escape string
        | operand (not) in operand
        | operand (not) in expressions-list
        | operand (not) between operand and operand
	| operand is (not) null
operand ::= addition
additions ::= multiplication 
        | addition +  multiplication
        | addition || multiplication
        | addition -  multiplication
multiplication ::= power 
        | multiplication * power
        | multiplication / power
power ::= term
        | term ^ power
term ::= identifier | number | string 
        | true | false | null 
	| current | first | last
	| ( expression ) 
        | not comparison
	| - term
	| term [ expression ] 
	| identifier . term 
	| function term
        | count (*) 
        | contains array-field (with expression) (group by identifier having expression)
        | exists identifier : term
function ::= abs | length | lower | upper
        | integer | real | string | 
        | sin | cos | tan | asin | acos | 
        | atan | log | exp | ceil | floor 
        | sum | avg | min | max 
string ::= ' { { any-character-except-quote } ('') } '
expressions-list ::= ( expression { , expression } )
order ::= order by sort-list
sort-list ::= field-order { , field-order }
field-order ::= field (asc | desc)
field ::= identifier { . identifier }
fields-list ::=  field { , field }
user-function ::= identifier

Identifiers are case sensitive, begin with a..z, A..Z, '_' or '$' character, contain only a-z, A..Z, 0..9 '_' or '$' characters, and do not duplicate a SQL reserved words.

List of reserved words
abs acos and asc asin

atan avg between by contains

cos ceil count current desc

escape exists exp false floor

group having in integer is

length like log lower max

min not null or real

sin string sum tan true

upper with

List of reserved words
abs	acos	and	asc	asin
atan	avg	between	by	contains
cos	ceil	count	current	desc
escape	exists	exp	false	floor
group	having	in	integer	is
length	like	log	lower	max
min	not	null	or	real
sin	string	sum	tan	true
upper	with

JSQL extends ANSI standard SQL operations by supporting bit manipulation operations. Operators and/or can be applied not only to boolean operands but also to operands of integer type. Result of applying and/or operator to integer operands is integer value with bits set by bit-AND/bit-OR operation. Bits operations can be used for efficient implementation of small sets. Also rasing to a power operation ^ is supported by JSQL for integer and floating point types.

Arrays

JSQL is able to work with C# array types. JSQL provides a set of special constructions for dealing with arrays:

It is possible to get the number of elements in the array by length() function.
Array elements can be fetched by [] operator. If index expression is out of array range, then exception will be raised.
Operator in can be used for checking if array contains value specified by left operand. This operation can be used only for arrays of atomic types: with boolean, numeric, reference or string components.
Iteration through array elements is performed by exists operator. Variable specified after exists keyword can be used as index in arrays in the expression preceded by exists quantor. This index variable will iterate through all possible array index values, until value of expression will become true or index runs out of range. Condition
```
        exists i: (contract[i].company.location = 'US')
```
will select all details which are shipped by companies located in US, while query
```
        not exists i: (contract[i].company.location = 'US')
```
will select all details which are shipped only from companies outside US.
Nested exists clauses are allowed. Using of nested exists quantors is equivalent to nested loops using correspondent index variables. For example query
```
        exists colon: (exists row: (matrix[colon][row] = 0))
```
will select all records, containing 0 in elements of matrix field, which has type array of array of integer. This construction is equivalent to the following two nested loops:
```
       boolean result = false;
       for (int colon = 0; colon < matrix.length(); colon++) { 
            for (int row = 0; row < matrix[colon].length(); row++) { 
	         if (matrix[colon][row] == 0) { 
                     result = true;
		     break;
                 }
            }
       }
```
Order of using indices is significant! Result of the following query execution
```
        exists row: (exists colon: (matrix[colon][row] = 0))
```
will be completely different with result of previous query. The program can simply hang in last case due to infinite loop for empty matrices.
It is possible to use the following clause
contains array-field (with expression) (group by array-component-field having expression-with-aggregate-functions)
to select arrays which elements match specified with condition and group elements in array to apply aggregate functions.
If with expression is specified in contains clause, this construction is equivalent to exists statement with the same condition with two exceptions:
- In with expression it is possible to referee components of array elements without any indices or array field, i.e. array element class is added to the current namespace.
- Unlike exists clause which use IndexOutOfRangeError exception to notify end of iteration, contains with will just iterate through all array elements and do not throw any exception. So performance of contains with is better.
- While it is possible to have nested contains clauses, with expression can deal with elements only of the one (most inner) array.

Strings

All strings in JSQL have varying length and programmer should not worry about specification of maximal length for character fields. All operations acceptable for arrays are also applicable to strings. In addition to them strings have a set their own operations. First of all string can be compared with each other using standard relation operators. At the current moment JSQL supports only ASCII character set (corresponds to type char in C) and byte-by-byte comparison of strings ignoring locality settings.

Construction like can be used for matching string with a pattern containing special wildcard characters '%' and '_'. Character '_' matches any single character, while character '%' matches any number of characters (including 0). Extended form of like operator with escape part can be used to handle characters '%' and '_' in the pattern as normal characters if they are preceded by special escape character, specified after escape keyword.

It is possible to search substring within string by in operator. Expression ('blue' in color) will be true for all records which color fields contains 'blue' word. Strings can be concatenated by + or || operators. Last one was added only for compatibility with ANSI SQL standard. As far as JSQL doesn't support implicit conversion to string type in expressions, semantic of operator + can be redefined for strings.

References

References can be dereferenced using the same dot notation as used for accessing structure components. For example the following query

        company.address.city = 'Chicago'

will access record referenced by company component of Contract record and extract city component of address field of referenced record from Supplier table.

References can be checked for null by is null or is not null predicates. Also references can be compared for equality with each other as well as with special null keyword. When null reference is dereferenced, exception is be raised by JSQL.

There is special keyword current, which can be used to get reference to current record during table search. Usually current keyword is used for comparison of current record identifier with other references or locating it within array of references. For example, the following query will search in Contract table for all active contracts (assuming that field canceledContracts has Contract[] type):

        current not in supplier.canceledContracts

Programmer can define methods for structures, which can be used in queries with the same syntax as normal structure components. Such methods should have no arguments except pointer to the object to which they belong (this pointer in C#), and should return atomic value (of boolean, numeric, string or reference type). Also method should not change object instance (immutable method). So user-defined methods can be used for creation virtual components - components which are not stored in database, but instead if this are calculated using values of other components. For example, you can use standard .Net DateTime class with such properties as Year, Month... So it is possible to specify queries like: "delivery.Year = 99" in application, where delivery record field has DateTime type.

Functions

Predefined functions
Name	Argument type	Return type	Description
abs	integer	integer	absolute value of the argument
abs	real	real	absolute value of the argument
sin	real	real	sin (rad)
cos	real	real	cos (rad)
tan	real	real	tan (rad)
asin	real	real	arcsin
acos	real	real	arccos
atan	real	real	arctan
exp	real	real	exponent
log	real	real	natural logarithm
ceil	real	real	the smallest integer value that is not less than the argument
floor	real	real	the largest integer value that is not greater than the argument
integer	real	integer	conversion of real to integer
length	array	integer	number of elements in array
lower	string	string	lowercase string
real	integer	real	conversion of integer to real
string	integer	string	conversion of integer to string
string	real	string	conversion of real to string
upper	string	string	uppercase string

Optimization

JSQL provides mechanism for fast location of object by unique primary key. When query condition is check for equality of scalar (numeric or string) field and constant value or query condition is conjunction (logical AND) of operands and left operand is check for equality of scalar field and constant value then JSQL tries to apply index to locate object by this key value. If object was found but query condition is is conjunction, then JSQL also checks that value of right operand is true.

To be able to use indices, programmer has obtain Query object and register indices in it. JSQL supports Perst Index and FieldIndex indices. Indices are located by JSQL by field name. It is assumed that class objects objects returned by index iterator is the same as specified in select query. If you are going to use the same Query instance for selecting data from different collections (with different collection member class), please check that there no name conflict for key fields, i.e. index corresponding one collection will not be applied for another collection.

Maintaining indices is responsibility of programmer. JSQL is only using indices for searching element by key.

Database

For those you prefer to deal with records in tables instead of objects, Perst provides Database class. This class emulates relational database on top of Perst. Certainly even with this class, Perst provides very object-orient way of dealing with persistent data. Database class allows you to create/drop table, add/drop indices, create/update/delete records. It automatically update indices when new record is added or removed. Also Database class makes it possible to prepare and execute queries.

Database class is used as wrapper for underlying Perst storage. Storage passed to the constructor of Database class should be opened and either not initialized, either initialized before by the same Database constructor. It is not possible to attach database to Perst Storage with existed root object other than one set by Database constructor.

A table in database are associated with C# class. Almost all methods of database class requires passing of Class parameter or persistent object instance. In last case class of this object is used. If Perst founds no table associated with the class, it tries to find table for superclass and so on... So it is possible to have "polymorphic" table - table containing records (object) of multiple classes derived from this one. Methods dealing with object instance have overloaded version where class can be specified explicitly. It makes it possible to place object in table associated with some particular interface.

Database provides two ways of querying tables: using simple and prepared queries. In first case Database.select directly returns result of query execution (iterator). In second case it is possible to execute prepared query multiple times, specifying different values of parameters.

Please find more information of Database methods in API documentation and look at JsqlSSD example - Supplier-Shipment-Detail database built using Database class.

Transparent API's

Using .Net remoting API

.Net framework provides System.Runtime.Remoting package for easy implementation of distributed applications. Programmer should only derived its class from ContextBoundObject and use ContextAttribute which is responsible for providing message sink: mechanism for controlling invocation of methods and access to fields of the object. Using this API its is possible to implement not only distributed system, but also any system based on metaobject protocol: protocol of controlling behavior of the object. For example it is possible to implement transparent interface to object oriented database using this API.

Perst.NET includes two classes PersistentContext and TransparentPersistenceAttrribute which are used to provide transparent persistence for C# objects using .Net remoting API. If application class is derived from PersistentContext and marked with TransparentPersistence attribute, it will automatically load on demand its content from the database and automatically store content if the object is modified. Implementation of recursiveLoading method in PersistentContext class returns false, so object are loaded from the database only when them are accessed.

But there are two significant restrictions on using remoting API:

Remoting API can handle only public members of the class
It cause significant performance overhead (invocation of method through remoting API is about 100 times slower than normal method invocation).

TransparentGuess example illustrates using of this Perst transparent API.

Using virtual properties

Yet another attempt to build transparent API to Perst without using special preprocessor (enhancer) is based on the idea of using virtual properties. Unlike Java, C# provides property mechanism which allows to encapsulate representation of object component. Idea originally proposed by Shi Ziye is very simple - generate wrapper class which will implement these properties. So programmer should not worry any more about cutting recursive loading of objects by redefinition of IPersistent.RecursiveLoading method and explicit loading of object using IPersistent.Load method or about explicit marking of updated object by IPersistent.Modify method. So a lot of error prone things are disappears. But certainly this approach has their own drawbacks:

Your persistent objects can not have persistent fields, instead of it you should define abstract properties (it is also possible to define properties in interface). Fortunately C# provides convenient mechanism for working with properties, so it will significantly complicates programming.
You can not create persistent object by new operator - you should use IStorage.CreateClass method which will generate wrapper class and create instance of this class. Also persistent class should have no constructor except default. And this default constructor should not do something else except initialization of transient components, because this constructor will be invoked each time object is loaded from the storage.
It is not possible to have arrays of references. You have to use Perst.PArray class instead.
Generation of wrapper classes is not cheap operations, so if there are many classes in your applications, then it can take significant time during database open.
This mechanism will not work in Compact.Net framework because it doesn't support code generation at runtime.

There are two possible ways of defining persistent classes using this API. First is based in using interfaces:

class Node:IPersistent {
    int Weight {
        get;
        set;
    }
    Node Parent { 
        get;
        set;
    }
}

Please notice that you should extend IPersistent interface. In this case Perst will derive generated wrapper class from Perst.Persistent class (or from Perst.PersistentResource class if specified interface extends Perst.IResource interface). Wrapper class will implement specified interface and contains correspondent fields:

class NodeWrapper:Persistent, Node {
    public int Weight {
        get {
            return weight;
        }
        set {
            weight = value;
            Modify();
        }
    }
    public Node Parent {
        get {
            return (Node)Storage.getByOid(parent);
        }
        set {
            parent = value.MakePersistent(Storage);
            Modify();
        }
    }
}

In the second case you are using abstract class instead of interface:

interface Node:Persistent {
    public abstract int Weight {
        get;
        set;
    }
    public abstract Node Parent { 
        get;
        set;
    }
}

This class has to be derived from Persistent or PersistentResource class. There is only one difference in generated wrapper class - it extends specified abstract class:

class NodeWrapper:Node {
    ...
}

As far as there are not direct references between objects when this API is used, garbage collector will remove from memory all persistent objects which are not referenced from some program variables. That is why it is highly recommended to use LRU object cache to increase performance (it is used by default).

PropGuess example illustrates using of this Perst transparent API.

PERST implementation issues

Below is more detailed description of PERST implementation. This section can be skipped if you are not interested in the details of implementation:

Memory allocation

Memory allocation is performed in PERST by bitmap. Memory is allocated in chunks called allocation quantum. In current version of PERST size of allocation quantum is 64 byte. It means that size of all allocated objects is aligned on 64 byte boundary. Each 64 byte of database memory is represented by one bit in the bitmap. To locate hole of requested size in bitmap, PERST sequentially searches bitmap pages for correspondent number of successive cleared bits. PERST use three arrays indexed by bitmap byte, which makes possible fast calculation of hole offset and size within the byte.

PERST performs cyclic scanning of bitmap pages. It keeps identifier of current bitmap page and current position within the page. Each time when allocation request arrives, scanning of the bitmap starts from the current position. When last allocated bitmap page is scanned, scanning continues from the beginning (from the first bitmap page) and until current position. When no free space is found after full cycle through all bitmap pages, new bulk of memory is allocated. Size of extension is maximum of size of allocated object and extension quantum. Extension quantum is parameter of database, specified in constructor. Bitmap is extended to be able to map additional space. If virtual space is exhausted and no more bitmap pages can be allocated, then OutOfMemory error is reported.

Allocation memory using bitmap provides high locality of references (objects are mostly allocated sequentially) and also minimizes number of modified pages. Minimization of number of modified pages is significant when commit operation is performed and all dirty pages should be flushed on the disk. When all cloned objects are placed sequentially, number of modified pages is minimal and so transaction commit time is also reduced. Using extension quantum also helps to preserve sequential allocation. Once bitmap is extended, objects will be allocated sequentially until extension quantum will be completely used. Only after reaching the end of the bitmap, scanning restarts from the beginning searching for holes in previously allocated memory.

To reduce number of bitmap pages scans, PERST associates descriptor with each page, which is used to remember maximal size of the hole on the page. Calculation of maximal hole size is performed in the following way: if object of size M can not be allocated from this bitmap pages, then maximal hole size is less than M, so M is stored in the page descriptor if previous value of descriptor is large than M. For next allocation of object of size greater or equal than M, we will skip this bitmap page. Page descriptor is reset when some object is deallocated within this bitmap page.

Some database objects (like hash table pages) should be aligned on page boundary to provide more efficient access. PERST memory allocator checks requested size and if it is aligned on page boundary, then address of allocated memory segment is also aligned on page boundary. Search of free hole will be done faster in this case, because PERST increases step of current position increment according to the value of alignment.

To be able to deallocate memory used by object, PERST needs to keep somewhere information about object size. PERST memory allocator deals with two types of objects - normal table records and page objects. All table records are prepended by record header, which contains record size and pointer of L2-list linking all records in the table. So size of the table record object can be extracted from record header. Page objects always occupies the whole database page are are allocated at the positions aligned on page boundary. Page objects has no headers. PERST distinguish page objects with normal object by using special marker in object index.

Shadow transaction mechanism

Each record (object) in PERST has unique identifier (OID). Object identifiers are used to implement references between objects. To locate object by reference, its OID is used as index in array of object offsets within the file. This array is called object index and element of this array - object handle. These are two copies of object indices in PERST, one of which is current and other - shadow. Header of database contains pointers to both object indices and indicator which index is current at this moment.

When object is modified first time, it is cloned (copy of the object is created) and object handle in current index is changed to point to newly created object copy. And shadow index still contains handle which points to the original version of the object. All changes are done with the object copy, leaving original object unchanged. PERST marks in special bitmap page of the object index, which contains modified object handle.

When transaction is committed, PERST first checks if size of object index was increased during current transaction. If so, it also reallocates shadow copy of object index. Then PERST frees memory for all "old objects", i.e. objects which was cloned within transaction. Memory can not be deallocated before commit, because we wants to preserve consistent state of the database by keeping cloned object unchanged. If we deallocate memory immediately after cloning, new object can be allocated at the place of cloned object and we loose consistency. As far as memory deallocation is done in PERST by bitmap using the same transaction mechanism as for normal database objects, deallocation of object space will require clearing some bits in bitmap page, which also should be cloned before modification. Cloning bitmap page will require new space for allocation the page copy, and we can reuse space of deallocated objects. And it is not acceptable due to the reason explained above - we will loose database consistency. That is why deallocation of object is done in two steps. When object is cloned, all bitmap pages used for marking objects space, are also cloned (if not not cloned before). So when transaction is committed, we only clear bits in bitmap pages and no more requests for allocation memory can be generated at this moment.

After deallocation of old copies, PERST flushes all modified pages on disk to synchronize content of the memory and disk file. After that PERST changes current object index indicator in database header to switch roles of the object indices. Now object index, which was current becomes shadow, and shadow index becomes current. Then PERST again flushes modified page (i.e. page with database header) on disk, transferring database to new consistent state. After that PERST copies all modified handles from new object index to object index which was previously shadow and now becomes current. At this moment contents of both indices is synchronized and PERST is ready to start new transaction.

Bitmap of modified object index pages is used to minimize time of committing transaction. Not the whole object index, but only its modified pages should be copied. After committing of transaction bitmap is cleared.

When transaction is explicitly aborted by Storage.rollback method, shadow object index is copied back to the current index, eliminating all changes done by aborted transaction. After the end of copying, both indices are identical again and database state corresponds to the moment before the start of current transaction.

Allocation of object handles is done by free handles list. Header of the list is also shadowed and two instances of list headers are stored in database header. Switch between them is done in the same way as switch of object indices. When there are no more free elements in the list, PERST allocates handles from the unused part of new index. When there is no more space in the index, it is reallocated. Object index is the only entity in database whose is not cloned on modification. Instead of this two copies of object index are always used.

There are some predefined OID values in PERST. OID 0 is reserved as invalid object identifier. OID starting from 1 are reserved for bitmap pages. Number of bitmap pages depends on database maximum virtual space. For one terabyte virtual space, 8 Kb page size and 64 byte allocation quantum, 32K bitmap pages are required. So 32K handles are reserved in object index for bitmap. Bitmap pages are allocated on demand, when database size is extended. So OID of first users object will be 0x8002.

Recovery procedure is trivial in PERST. There are two instances of object index, one of which is current and another corresponds to consistent database state. When database is opened, PERST checks database header to detect if database was normally closed. If not (dirty flag is set in database header), then PERST performs database recovery. Recovery is very similar to rollback of transaction. Indicator of current index in database object header is used to determine index corresponding to consistent database state and object handles from this index are copied to another object index, eliminating all changes done by uncommitted transaction. As far as the only action performed by recovery procedure is copying of objects index (really only handles having different values in current and shadow indices are copied to reduce number of modified pages) and size of object index is small, recovery can be done very fast. Fast recovery procedure reduces "out-of-service" time of application.

Where to use

PERST is simple and fast embedded database engine. If your applications need embedded database engine and do not need to execute complex SQL queries, and the only thing you need is to be able to store/fetch/locate object in the database using navigation through references or indexed search by key, then PERST is what you need. It will provide much better performance than relational database and other (more sophisticated) object oriented database.

The table below summarize pro features of PERST:

Tight and transparent integration with programming language
No gap in database and application data model
Easy to use
Minimal requirements (PERST package itself has size only 51Kb and it can be configured to use minimal memory and disk space during its work)
High performance (no overheads of communication, locking and parsing and executing queries)
Fault tolerance (transaction support)
Fast recovery after fault
Zero administration efforts (database consists of the single file), there is no need to define or tune any database parameters. Such situation like transaction log overflow can never happen

Certainly nothing in this world can have only positive features. No contra list which contains features lacking in PERST:

No procedural query language
Remote access by multiple clients (unless you will implement you own server).
Data distribution
Lack of support of any standard (for example ODMG)

Quick start

You can build PERST using Microsoft Visual .NET solution file. Template of PERST application is the following:


using Perst;

public class YourPersistentClass : Persistent {
    int    x;
    String y;
    Link   links;
    ...


    void doUpdate() { 
        x = 1;
        y = "Hello World";
        links = getStorage().CreateLink();
        store(); // save changes in the database
    }
}

public class YourRoot : Persistent {
    YourPersistentClass ref;
    Index myIndex;

    void foo() { 
        ref.doUpdate(); // referenced object has no to be explictely loaded
    }

    
};


public class YourApplication {
    static public void Main(String[] args) { 
        String databaseFilePath = args[0]; // path to the database file
        int pagePoolSize = Int32.Parse(args[1]); // size of page pol in bytes

        Storage storage = StorageFactory.Instance.CreateStorage();
        storage.Open(databaseFilePath, pagePoolSize);

        YourRoot root = (YourRoot)storage.Root;
        if (root == null) { 
            // Storage was not initialized yet
            root = new YourRoot();
            root.myIndex = storage.CreateIndex(typeof(string), true); // unique index
            storage.setRoot(root);
        }

        root.foo(); // all objects referenced from root are implicitly loaded
        YourPersistentClass obj = (YourPersistentClass)root.myIndex["John"]; // find object with key "John"
        
        ... // do something else with database
        
        storage.Commit(); // commit transaction

        ... // do something else with database

        storage.Close(); // commit the last transaction and close the database
    }
}

You can also look at PERST examples:

Guess.cs: Very simple game: "Guess An Animal". This program stores user's answers in tree structure and use this information to construct its questions.
TransparentGuess.cs: The same example but using transparent API .Net remoting API
PropGuess.cs: The same example but using trasparent API using virtual properties
TestIndex.cs: Test of B+Tree. This test stores, find and delete objects to the index and also measure performance.
TestIndex2.cs: Test of T-Tree. The same test as TestIndex.cs but implemented using SortedCollection (T-Tree).
TestCompoundIndex.cs: Test of B+Tree compound indices.
TestEnumerator.cs: Test of enumerators for B+Tree.
TestLink.cs: Test of handling relations between objects in DyBASE using Link interface.
TestList.cs: Test of handling very large list of linked objects using virtual proxy API
TestSSD.java: Supplier-Shipment-Detail example. This example illustrates how joins can be implemented in PERST.
JsqlSSD.java: Supplier-Shipment-Detail example. The same data model as in example above, but implemented using JSQL and Database class.
TestSOD.java: Supplier-Order-Detail example. This example illustrates alternative apporach for implementing many-to-many relations based on using Projection class.
TestRTree.cs: Test of spatial index.
TestR2.java: Test of R2 spatial index.
TestBit.java: Test of bit index.
TestTtree.cs: Test of sorted collection.
TestRaw.cs: Test of using CSharp serialization mechanism for storing in database alien objects
TestBlob.java: Test of handling large binary objects in Perst
TestTimeSeries.java: Test and example of handling time series data
TestXML.cs: Test of XML import/export.
TestBackup.java: Test of database online backup.
TestConcur.cs: Test of DyBASE locking mechanism and concurrent access to the database
TestGC.cs: Test implicit memory deallocation (garbage collction) in DyBASE.
TestReplic.cs: Test of Perst replication mechanism. performance.

Tests are located in tst directory and can be build using compile.bat script.

Tuning

This sections contains several hints how to adjust PERST parameters and increase database performance.

Speed of accessing data at the disk is several times slower than speed of access data in main memory. That is why caching of data is the key to increase database performance. PERST is using pool of pages to optimize access to the disk. Page pool size can be specified in Storage.open method (by default it is 4Mb). Usually increasing page pool leads to better performance. But you should notice the following things:

There is can be a limit for maximal size of memory used by application.
If you specify very large pool size, leaving no free memory for other applications and operating system, then swapping will cause significant performance degradation.
Operating system itself maintains file buffers. So data is cached twice. Certainly accessing data from page pool is much faster than from operating system cache, because in this case no operating system calls and context switches are needed.
It is not possible to specify empty page pool (leaving all caching for operating system). When data is access from the page, it is pinned in page pool. So page pool should contain enough entries to keep all pinned pages. So do no make page pool size less then 64kb.

If you think that all your data should fit in main memory, you can use Storage.INFINITE_PAGE_POOL constant in Storage.open method. In this case page pool will be dynamically extended when new page is requested. As a result, all pages will be cached and present in memory. So each page is read from the disk only once. Also in this case strong object cache is used instead of weak object cache. It means that all fetched objects are also pinned in memory and object is unpacked only once. It is important to notice that amount of used main memory will be greater than database size: all objects will be present in memory in packed (inside the page containing the object) and in unpacked (referenced from object cached) form.

In some applications (for example for mobile devices) persistency is not needed. But such PERST container classes as Link, Index, FieldIndex, SpatialIndex still can be used by application. In this case you can use NullFile implementation of IFile interface together with Storage.INFINITE_PAGE_POOL to create transient in-memory database. Data in this case will never be written on the disk.

There are some constants defined in StorageImpl class which has influence on initial and maximal database size. If you want to change them, you will have to rebuild PERST. Below is detailed description of this parameters:

Parameter Default value Description

dbDefaultInitIndexSize 1024 Initial object index size. Object index is increased on demand. Reallocation of index is expensive operation and so to minimize number of such reallocations, object index size is always doubled. Specifying larger initial index size allows to reduce number of future reallocations and so a little bit increase performance (certainly if your application allocates such number of object). But it also leads to larger initial size of database file. With default value of this parameter, initial database size is about 50Kb.

dbDefaultExtensionQuantum 4Mb Database extension quantum. Memory is allocate by scanning bitmap. If there is no large enough hole, then database is extended by the value of dbDefaultExtensionQuantum. Increasing the value of this parameters leads to less frequent rescanning of allocation bitmap from the very beginning. It leads to faster allocation speed and better locality of reference for created objects (because there is more chances that them will be allocated sequentially). From the other side it leads to less efficient memory usage. Reducing the value of this parameter force reusing of existed holes in memory allocation bitmap.

dbObjectCacheInitSize 1319 Size of object cache. PERST needs this cache to check if object with such OID already present in memory. This cache uses weak references to allow garbage collector to do it work. When some threshold of filling cache is reached, cached is reallocated by doubling its size. Once again increasing this parameter can save some number of cache reallocations.

Parameter	Default value	Description
`dbDefaultInitIndexSize`	1024	Initial object index size. Object index is increased on demand. Reallocation of index is expensive operation and so to minimize number of such reallocations, object index size is always doubled. Specifying larger initial index size allows to reduce number of future reallocations and so a little bit increase performance (certainly if your application allocates such number of object). But it also leads to larger initial size of database file. With default value of this parameter, initial database size is about 50Kb.
`dbDefaultExtensionQuantum`	4Mb	Database extension quantum. Memory is allocate by scanning bitmap. If there is no large enough hole, then database is extended by the value of `dbDefaultExtensionQuantum`. Increasing the value of this parameters leads to less frequent rescanning of allocation bitmap from the very beginning. It leads to faster allocation speed and better locality of reference for created objects (because there is more chances that them will be allocated sequentially). From the other side it leads to less efficient memory usage. Reducing the value of this parameter force reusing of existed holes in memory allocation bitmap.
`dbObjectCacheInitSize`	1319	Size of object cache. PERST needs this cache to check if object with such OID already present in memory. This cache uses weak references to allow garbage collector to do it work. When some threshold of filling cache is reached, cached is reallocated by doubling its size. Once again increasing this parameter can save some number of cache reallocations.

Now some hints how to increase PERST performance and reduce size of used main memory. If you database performs a lot of updates of persistent data, then the main limiting factor is speed of writing changes to the disk. Especially synchronous write to the disk performed by commit. If you will do commit after each update, then average speed will be about 10 updates per second (this estimation is based on the assumption than average disk access time is about 10msec and each transaction commit usually requires writing about 10 pages in random places in the file). But it is possible to dramatically increase update performance if you group several updates in one transactions. PERST is creating shadow of the object when it is first time updated inside transaction. If object will be updated once in N transactions, then N shadows will be created. If object will be updated N times inside one transaction, then shadow will be created only once. It explains advantage of having one large transaction.

But if you will perform update of large number of objects in one transaction and for each updated object shadow is created, then it leads to significant increase of database file size. If you update each object in the database inside one transaction, database size will be almost doubled! And if you perform each update in separate transaction, then size of database will be almost unchanged (because space of allocated shadow objects will be reused in this case). So the best approach is to perform commit after 100 or 1000 updates, it will reduce overhead of each commit and save database size.

If your persistent object form tree or graph where all objects cab be accessed by reference from the root object, then once you will load root object in main memory and store reference to it in some variable, GC will never be able to collect any instance of loaded persistent object (because it will be accessible from the root object). So when you application access more and more objects from the storage, at some moment of time all of them will have to be loaded in main memory. It can cause space exhaustion. To prevent this situation you should avoid to store in variables references to container objects which contain references to a large number of members. Especially it is true for storage root object. In this case GC is able to do it work and throw out from the memory objects which are not used at this moment (to which there are no references from local variable). But it is important to say that objects accessible though the index by key can be normally deallocated by garbage collector. So in this case special care is not needed.

Tricks and tips

When to use what collection structure?

Link: to be used for relatively small collections (objects < 100)
Relation: is essentially a Link with the addition of being a 1-n relation. The Projection class can be used for 'query like' purposes.
FieldIndex: to be used for large collections (objects > 100). Indexed on a know attribute or number of attributes (>1 attributes constitutes a 'composite key'). FieldIndex is implemented using B+Tree. B-Tree pages size is 4kb, so minimal size occupied by index is also 4kb, so care should be taken as to when and where to use it.
Index: to be used for large collections (objects > 100). Indexation is 'done' whilst adding the object to the collection. Index is implemented using B+Tree.
BitIndex: locate object but set of its boolean properties
ISet: very convenient for storing objects in a Set environment (there can be only one instance of an object in the set). ISet is implemented using B+Tree.
SortedCollection: is the best for in-memory databases with complex user defined keys. It is implemented using T-Tree (structure optimized for in-memory databases) and do not store values of keys inside T-Tree pages, using provided comparator methods instead.
SpatialIndex: fast access two objects with two integer coordinates (spatial data for example). SpatialCollection is implemented using Guttman's R-Tree with quadratic split algorithm.
SpatialIndexR2: fast access two objects with two real coordinates (spatial data for example). SpatialCollectionR2 is implemented using Guttman's R-Tree with quadratic split algorithm. Unlike SpatialIndex it doesn't pin all its pages in memory, and so is able to handle larger collections.

Can Key class be stores in database?

A Key class is *not* persistent capable and can thus *not* be stored. If one wants to have it readily available one can make it transient and instantiate it through the onLoad() method or via the default constructor of the class.

How to defined constructors for persistent objects?

The normal default constructor of a class is always used by Perst when loading objects. This implies that when one needs to create attributes that are to remain one has to either:

Check for oid existence on the object (if oid then object already exists in the db)
Leave the default constructor as is and introduce an alternate constructor that is only called on new object creation.

Initialization of transient attributes or resettable attributes should/ can be done via the default constructor or the onLoad() method, which is called when an object is retrieved from the db.

When redefinition of recursiveLoading method is needed?

By default Perst recursively load all referenced objects. So once you access some persistent object, the complete closure of persistent objects, directly or indirectly reachable from this object by references, is loaded. So in theory loading of root object can cause loading of all objects in the database. If it is not desired (because loading of all objects will take significant amount of time or because it will cause memory exhaustion), then you should stop recursion by redefinition of recursiveLoading method in some classes. You should explicitly load all object referenced from object with disabled recursive loading.

But really situation is better. It is important to notice that all Perst collection classes always load their members on demand. It means that once you have reference for example to FielIndex, only header of this collection object will be loaded and members will be loaded on demand. It is true for all other Perst collections: Link, Relation, Index, SortedCollection, PersistentSet. As far as persistent objects in database are usually accessible through collections, in most cases explicit loading of persistent objects is not needed. But be careful: if in addition to placing all your objects in some index, you also link them for example in L2-List, then fetching single object will cause loading of all other objects from this collection!

How to store classes not derived from Persistent?

Perst allows to store transient objects using standard Java serialization mechanism. To enable this feature, you should set Storage.setProperty("perst.serialize.transient.objects"). Perst will correctly handle reference to persistent object from serialized transient objects. But it is important to notice, that identity of saved transient objects is not preserved. If there was transient object O referenced from persistent objects P1 and P2, then when objects P1 and P2 are stored, instance of O object will be serialized both in P1 and P2 record. And when P1 and P2 object will be reloaded from the database, them will refer to different instances O1 and O2. Also please notice, that is impossible to mark transient objects is modified. Of some of referenced transient objects is updated, you should also update persistent object referencing (directly or indirectly) this object.

When commit transaction?

Commit operation requires synchronous write to the disk. It is very slow operation (modern disks are not able to perform more than 100 such operations per second). So to improve performance it is better to perform commit not so frequent. But certainly you should realize than once some problem arrives (application or system crash or just power fault), then all uncommitted data will be lost.

How to deallocate objects?

Perst doesn't automatically exclude deleted object from any indices containing this object. So it is responsibility of programmer to remove object from all indices before it is deallocated.
Perst also doesn't remove any object referenced from removed object. If it is needed - programmer should do it himself.
Explicit deletion of object can cause two problems:

Dangling references (references to the removed objects).
Garbage in the database (unreferenced objects).

First problem is most critical and can cause database data corruption. To prevent this problem it is strongly recommended to use Perst garbage collector instead of explicit memory deallocation. If it is not possible (due to performance or some other reasons), it still can be used for debugging, since Perst GC is able to detect both problems: it will throw StorageError(StorageError.ErrorCode.INVALID_OID) exception if reference to the deleted object is found and return non zero number of collected objects if there is garbage in the database.

Why application normally working in single-threaded mode got assertion failures or some other exceptions if database is updated by mode than one thread?

Perst doesn't synchronize itself access of application to the persistent objects. It is responsibility of application to set proper lock to avoid concurrent access to the same object. So just using beginThreadTransaction and endThreadTransaction is not enough for correct concurrent work with persistent objects. So the following alternatives are possible:

Access database only from one thread
Access database from multiple thread but use global lock to provide mutual exclusion of each thread. So thread lock mutex, do some operations with database, commit transaction and unlock mutex.
Use per-thread transactions in EXCLUSIVE_TRANSACTION mode. This approach is similar to 2), but Perst will provide exclusion of threads itself.
Use per-thread transactions in SERIALIZABLE_TRANSACTION mode + objet level locking + alternative B-Tree implementation.

Please notice that in alternative 1-3 only one thread is accessing database at each moment of time, so it is not possible to say about concurrent execution. But it doesn't mean that with approach 4) you will get the best performance - because of locking overhead and alternative B-Tree implementation which is less efficient than original implementation for very large databases. Also approach 4 is most difficult to program - because setting proper locks is responsibility of programmer and incorrect locking can cause synchronization problems: race conditions (as you have with ArrayIndexOutOfBoundsException) or deadlocks (two or more threads mutually block each other).

Why there is significant slowdown of speed of inserting new records in the database when size of database is increased? How it is possible to improve insert performance?

The larger database is, the more probability of page cache miss. In other words when database is small all pages cached in page pool and no disk accesses are needed at all. Once database becomes larger than page pool size, some of pages has to be thrown away from the cache (and also loaded on demand). When database size is increased probability of locating requested page in page pool becomes smaller and so number of disk reads as well as time of database operation is increased.

If you insert keys in index in random order, most likely that loaded pages are located at random offset within database file, so access to the page requires positioning of disk head. Average disk positioning time for the modern disks is about 10msec. It means that disk is able to perform about 100 disk access per second. So if database is really so large that each operation with B-Tree requires loading of all pages at the path from the root till leaf page (except root page - for large number of object it will be 3 pages), then database will be able to perform about 30 inserts (or searches) per second. So inserting 1 million objects in this worst scenario will take about 10 hours! Fortunately real performance is significantly better (because of page cache hits, OS file cache ...)

There are two obvious ways of improving performance:

Increase page pool size (but it should not be larger than available physical memory, otherwise you will get swapping and performance degradation)
Insert objects into the index in sorted order. In this case the same B-Tree pages will be accesses for inserting subsequent objects and number of disk accesses dramatically decreased. If you are importing data from some text file, I suggest to sort it first by index key using "sort" utility. If it is not possible, objects can be loaded in memory, stored in some array and then this array is sorted using standard Array.Sort method. If number of loaded objects is too large to fit in memory, I recommend to load some number of objects which fit in memory, sort them and then insert in index. Then load next portion of objects, sort them, insert in index and so on...

Look for new version at my homepage | E-Mail me about bugs and problems