The main advantage of PERST is tight integration with programming language. There is no gap between database and application data models - PERST directly stores language objects. So there is no need in packing/unpacking code, which has to be written for traditional relational databases. Also PERST (unlike many other OODBMS) requires no special compiler or preprocessor. And still it is able to provide high level of transparency.
Persistent
base class is considered as persistent capable. It is automatically made
persistent and stored in the storage when it is referenced from some other persistent object and
store
method of that object was invoked. So there is no need (but it is possible) to explicitly
assign object to the storage.
The storage has one special root object. Root object is the only persistent object accessed in the special
way (using Storage.getRoot
method). All other persistent objects are accessed in normal way:
either by traversing by references from another persistent objects, either using object containers (Index
,
Link
or Relation
). Unlike many other OODBMS, there can be only one root in the storage.
If you need to have several named roots, you should create Index
object and use it as root object.
PERST requires that each persistent capable class should be derived from Persistent
class.
It makes not possible to store "foreign" classes in the storage. This is the cost of easy use of PERST and lack
of any specialized preprocessors or compilers. Also components of persistent capable object are restricted to have
the following types:
bool, int, short, double,...
System.String
type
System.DateTime
type
Persistent
or any interface extending IPersistent
interface.
IPersistent
or IValue
interfaces and marked as Serializable.
Standard CSharp serialization mechanism will be used to pack cluster of such objects into byte array which will be
stored in the database. The class should be marked with Serializable
attribute
and should not contain references to persistent objects.
Storage.createLink
method.
Unfortunately it is not possible to detect if object is changed or not without saving old state of the object
and performing field-by-field comparison with new state of the object. But overhead of such solution
(both space and CPU) is very high. In PERST it is responsibility of programmer to save object in the storage.
It can be done by Persistent.store
method. or Persistent.modify
methods.
Persistent.store
method writes object in the storage as well
as all objects referenced from this object which are not yet persistent. So if you create tree of objects
and assign reference to the root of this tree to some persistent object X, it is only necessary to
invoke store()
method in this object X. But then if you update one of the elements in this tree,
you should invoke store()
individually for each such element (X.store()
will
NOT now automatically save referenced objects).
Persistent.modify
method mark object is modified but doesn't immediately write it to the storage.
All objects marked as modified will be stored to the storage during transaction commit (store
method
will be invoked for each modified object). So using modify
method is preferable if object
is updated multiple times within transaction. In this case instead of storing it several times, it will
be stored only once at the end.
IPersistent
interface declares recursiveLoading
method. Default implementation of
this method in Persistent
class always returns true
. In this case PERST
will recursively load any object referenced from target object when target object is loaded.
So it cause implicit loading of all cluster of referenced objects from storage to main memory.
It is similar with how serialization mechanism works.
To avoid overflow of main memory caused by recursive loading of all objects fro the storage,
programmer has to overload recursiveLoading
method in some classes and return false
n it.
Objects loaded from such object will not be implicitly loaded and programmer has to explicitly
invoke Persistent.load
method to load them. So recursiveLoading
method can be used
to control loading of objects from storage to main memory.
Also it is important to notice that containers (Link, Relation, Index
) always load member object on demand
(do not perform automatic loading of all objects in the containers). Since access to the container members is always
performed through methods of the container class, programmer will never notice it - container methods will always
return loaded object(s).
PERST is using default constructor (constructor without parameters) to create object loaded from the storage. It means the following things:
onLoad
method which is invoked after fetching of all values of non-transient fields of persistent object from the storage.
So summarizing all above, proposed mechanism is convenient and easy to use because it doesn't require programmer
to explicitly load any referenced object and from another point of view is is flexible above by providing programmer control
on object loading. Only those classes (usually containers) which explicitly control loading of their
members (by overloading recursiveLoading
to return false
value) should be aware
calling Persistent.load
method.
int
type to long
or
int
to float
. But changing type from int
to
short
is not possible. More precisely, PERST is able to perform any conversion
which is supported by Java reflection mechanism (field type XXX can be changed to YYY
if java.lang.reflect.Field.setXXX
method can be applied to the component with type YYY).
All other schema modifications (such as renaming fields, incompatible change of field type)
could not be handled by PERST automatic schema modification mechanism.
In this case you can use Perst XML export/import mechanism. Database can be
exported to XML format using Storage.exportXML
method,
then recreated with new class definitions and after it saved data can be imported using
Storage.importXML
method.
Link
interface to deal with relations of such kind.
This interface allows to add/delete/inspect members of the relation.
Members of the relation can be accessed by index or be extracted as array of objects.
Relations can be of two kinds: embedded (when references to the related objects are stored in relation
owner object itself) and standalone (when relation is separate object, which contains
the reference to the relation owner and relation members). Both kinds of relations
implements Link interface. Embedded relation is created by Storage.createLink
method
and standalone relation is represented by Relation persistent class created by
Storage.createRelation
method.
So relation between two classes A and B can be implemented in the following way:
Relation type | Object A | Object B |
---|---|---|
one-to-one | B bref; | A aref; |
one-to-many | Link bref; | A aref; |
many-to-one | B bref; | Link aref; |
many-to-many | Link bref; | Link aref; |
Hashtable
class
can be used for it. In databases usually more sophisticated search is required.
I do not want to implement in PERST complete SQL language, because it immediately makes DBMS huge and slow.
But in most cases application is performing only very simple queries using exact match or key range.
This is done in PERST by Index
and IndexField
interfaces. First interface is used
for independent specification key and associated value. IndexField
interface allows to index objects
using one of the fields of this object (key field).
Indices are created in PERST using Storage.createIndex
or Storage.createFieldIndex
methods.
There can be several index implementations
but right now only one implementation based on B+Tree is provided (because B+Tree is the most efficient structure
for disk based databases). Methods of Index
and FieldIndex
interface allows to add, remove and
locate objects by key. It is possible to perform search either specifying exact key value either specifying range of key values
(high or low boundary or both of them can be skipped or can be declared as been exclusive or inclusive).
So it is possible to perform the following types of search:
There are several different ways of selecting objects using index:
IPersistent get(Key key)
IPersistent[] get(Key from, Key till)
null
. Both boundaries can be inclusive or exclusive.
IEnumerator GetEnumerator()
foreach
construction to iterate through all indexed objects.
IEnumerator GetEnumerator(Key from, Key till, IterationOrder order)
null
. Both boundaries can be inclusive or exclusive. Objects can be traversed in key ascending or descending order .
If you need set of persistent objects you should use Storage.createSet
method.
Set is implemented using B+Tree where object OID is used as a key.
PERST also support spatial indices (org.garret.perst.SpatialIndex
) and generic indices with
user defined comparator (org.garret.perst.SortedCollection
).
Spatial index is implemented using Guttman's R-Tree with quadratic split algorithm.
It allows efficient search of R2 objects. Sorted collection provides almost the same methods
as FieldIndex
but it uses user defined comparator to compare collection members.
Sorted collection is implemented using T-Tree and is especially efficient for main-memory
databases.
Interface | Description | Key type | Implementation | Created by |
---|---|---|---|---|
Index |
Index with explicitely specified key used for exact match or range queries | scalar, string or reference | B+Tree | Storage.CreateIndex(Type type, boolean unique) |
Index |
The same as above but assuming that there can be a lot of duplicate key values | scalar, string or reference | B+Tree | Storage.CreateThinkIndex(Type type) |
FieldIndex |
Index constructed for one of the object fields | scalar, string or reference | B+Tree | Storage.CreateFieldIndex(Type type, String fieldName, boolean unique) |
BitIndex |
Bit index for searching object by bitmap of properties | persistent object | B+Tree | Storage.CreateBitIndex() |
ISet |
Set of persistent objects | - | B+Tree | Storage.CreateSet() |
IPersistentSet |
Scalable set of persistent objects (can efficienty handle both small and large number of members | persistent object | Link or B+Tree | Storage.CreateScalableSet() |
SpatialIndex |
Index for spatial objects | Rectangle | R-Tree | Storage.CreateSpatialIndex() |
SpatialIndexR2 |
Index for spatial objects with real coordinates | RectangleR2 | R-Tree | Storage.CreateSpatialIndexR2() |
SortedCollection |
Index with user defined comparator | any | T-Tree | Storage.CreateSortedCollection(PersistentComparator comparator, boolean unique) |
select * from T where x=?;PERST relations can be used to implement simple joins, like the following SQL query fetching all order to the particular vendor:
select * from O Order, V Vendor where O.vendorID = V.id AND V.name='ABC';In PERST it is possible to select first vendor using index search and ten traverse correspondent relation to locate all orders to this vendor.
But sometimes it is needed to implement more complex queries.
It is also possible in Perst without need to write query in some special
(non-procedural) language. Class Projection
is use to combine results
of several simple operations (such index search). Lets start explanation of this class with an
example. Consider that we need to know all order for the detail with name started with 'D1'
shipped by vendors with name started with 'V1' or 'V3'.
SQL statement which perform this query is the following:
select * from O Order, V Vendor, D Detail where D.name like 'D1%' and O.detailID = D.id and O.vendorID = V.id and (V.name like 'V1%' OR V.name like 'V3%');And now how it can be done in Perst. Consider that we have indices for details and vendors:
FieldIndex detailIndex = db.CreateFieldIndex(typeof(Detail), "name", true); FieldIndex vendorIndex = db.CreateFieldIndex(typeof(Vendor), "name", true);Set of requested orders can be obtained in the following way:
// Projection from vendors to orders (using "orders" field of Link type) Projection v2o = new Projection(typeof(Vendor), "orders"); // Projection from details to orders (using "orders" field of Link type) Projection d2o = new Projection(typeof(Detail), "orders"); // Select vendors with name like 'V1%' v2o.Project(vendorIndex.StartsWith("V1")); // Select vendors with name like 'V3%' v2o.Project(vendorIndex.StartsWith("V3")); // Select details with name like 'D1%' d2o.Project(detailIndex.StartsWith("D1")); // Join projections v2o.Join(d2o); // Get array of requested orders Order[] orders = (Order[])v2o.ToArray(typeof(Order)]);So, as you see Projection class is used for several purposes:
It is possible to derive your own class from Projection
to implement
more sophisticated projections than using single projection field.
If you need to sort selection in some particular order, then most efficient way
is to use FieldIndex
for it. When you select objects using index, selected objects
are sorted by search key. If you need to sort selection by field which is not search key,
than you can use Array.Sort
method. For example if in the query described above
we need to sort orders by price field, it can be done with the following statement:
Array.Sort(orders, Order.PriceComparer);
commit, rollback
or close
methods.Commit of transaction cause synchronous write of changed pages to the disk. Synchronous write (application is blocked until data is really flushed to the disk) is very expensive operation. Average positioning time for modern disks is about 10ms, so them are usually not able to perform in one second more than 100 writes in random places. As far as transaction usually consists of several updated pages, it leads to average performance about 10 transaction commits per second.
Performance can be greatly increased if you minimize number of commits (larger transactions). PERST is using shadow mechanism for transaction implementation. When object is changed first time during transaction, shadow of the object is created and original object is kept unchanged. If object is updated multiple times during transaction, shadow is create only once. Because of using shows, PERST do not need transaction log file. So in PERST long transaction can not cause transaction log overflow as in most of other DBMSes. Quite the contrary, if you do not call commit at all, PERST works as DBMS without transaction support, adding almost no overhead of supporting transactions.
The only drawback of long transactions is possibility to loose a lot of changes in case of fault. PERST will preserve consistency of the database, but all changes made since list commit will be lost.
In Perst there is org.garret.perst.Query
class allowing to execute JSQL
queries. To execute JSQL query it is necessary to provide object iterator
(java.util.Iterator), class of iterated objects and string representing query condition.
It is also possible to specify indices, resolvers, query parameters.
Result of query execution is another iterator which can be used to traverse though selected objects
(object matching specified search criteria).
All Perst collection classes contains select
method which allows to filter collection members
using JSQL query. This method is give class of objects stored in the collection (for some collections
such as FieldIndex it is known and should not be specified) and JSQL condition. As the result of its work,
select returns iterator of selected objects.
Example | Meaning |
---|---|
expression | non-terminals |
not | terminals |
| | disjoint alternatives |
(not) | optional part |
{1..9} | repeat zero or more times |
select-condition ::= ( expression ) ( traverse ) ( order ) expression ::= disjunction disjunction ::= conjunction | conjunction or disjunction conjunction ::= comparison | comparison and conjunction comparison ::= operand = operand | operand != operand | operand <> operand | operand < operand | operand <= operand | operand > operand | operand >= operand | operand (not) like operand | operand (not) like operand escape string | operand (not) in operand | operand (not) in expressions-list | operand (not) between operand and operand | operand is (not) null operand ::= addition additions ::= multiplication | addition + multiplication | addition || multiplication | addition - multiplication multiplication ::= power | multiplication * power | multiplication / power power ::= term | term ^ power term ::= identifier | number | string | true | false | null | current | first | last | ( expression ) | not comparison | - term | term [ expression ] | identifier . term | function term | count (*) | contains array-field (with expression) (group by identifier having expression) | exists identifier : term function ::= abs | length | lower | upper | integer | real | string | | sin | cos | tan | asin | acos | | atan | log | exp | ceil | floor | sum | avg | min | max string ::= ' { { any-character-except-quote } ('') } ' expressions-list ::= ( expression { , expression } ) order ::= order by sort-list sort-list ::= field-order { , field-order } field-order ::= field (asc | desc) field ::= identifier { . identifier } fields-list ::= field { , field } user-function ::= identifier
Identifiers are case sensitive, begin with a..z, A..Z, '_' or '$' character, contain only a-z, A..Z, 0..9 '_' or '$' characters, and do not duplicate a SQL reserved words.
abs | acos | and | asc | asin |
atan | avg | between | by | contains |
cos | ceil | count | current | desc |
escape | exists | exp | false | floor |
group | having | in | integer | is |
length | like | log | lower | max |
min | not | null | or | real |
sin | string | sum | tan | true |
upper | with |
JSQL extends ANSI standard SQL operations by supporting bit manipulation
operations. Operators and
/or
can be applied not only
to boolean operands but also to operands of integer type. Result of applying
and
/or
operator to integer operands is integer
value with bits set by bit-AND/bit-OR operation. Bits operations can be used
for efficient implementation of small sets. Also rasing to a power
operation ^ is supported by JSQL for integer and floating point
types.
length()
function.
[]
operator.
If index expression is out of array range, then exception will be raised.
in
can be used for checking if array contains
value specified by left operand. This operation can be used only for arrays of
atomic types: with boolean, numeric, reference or string components.
exists
operator. Variable specified after exists
keyword can be used
as index in arrays in the expression preceded by exists
quantor. This index variable will iterate through all possible array
index values, until value of expression will become true
or
index runs out of range. Condition
exists i: (contract[i].company.location = 'US')will select all details which are shipped by companies located in US, while query
not exists i: (contract[i].company.location = 'US')will select all details which are shipped only from companies outside US.
Nested exists
clauses are allowed. Using of nested
exists
quantors is equivalent to nested loops using correspondent
index variables. For example query
exists colon: (exists row: (matrix[colon][row] = 0))will select all records, containing 0 in elements of
matrix
field, which has type array of array of integer.
This construction is equivalent to the following
two nested loops:
boolean result = false; for (int colon = 0; colon < matrix.length(); colon++) { for (int row = 0; row < matrix[colon].length(); row++) { if (matrix[colon][row] == 0) { result = true; break; } } }Order of using indices is significant! Result of the following query execution
exists row: (exists colon: (matrix[colon][row] = 0))
will be completely different with result of previous query. The program can
simply hang in last case due to infinite loop for empty matrices.
IndexOutOfRangeError
exception to notify end of iteration,
contains with will just iterate through all array elements and do not throw any exception. So performance of
contains with is better.
char
in C) and
byte-by-byte comparison of strings ignoring locality settings.
Construction like
can be used for
matching string with a pattern containing special wildcard characters
'%' and '_'. Character '_' matches any single character, while character
'%' matches any number of characters (including 0). Extended form of
like
operator with escape
part can be used
to handle characters '%' and '_' in the pattern as normal characters if
they are preceded by special escape character, specified after
escape
keyword.
It is possible to search substring within string by in
operator. Expression ('blue' in color)
will be true
for all records which color
fields contains 'blue' word.
Strings can be concatenated by +
or ||
operators.
Last one was added only for compatibility with ANSI SQL standard.
As far as JSQL doesn't support implicit conversion to string type in
expressions, semantic of operator +
can be redefined for
strings.
company.address.city = 'Chicago'will access record referenced by
company
component of
Contract
record and extract city component of
address
field of referenced record from Supplier
table.
References can be checked for null
by is null
or is not null
predicates. Also references can be compared for
equality with each other as well as with special null
keyword. When null reference is dereferenced, exception is be raised
by JSQL.
There is special keyword current
, which can be used to get
reference to current record during table search. Usually current
keyword is used for comparison of current record identifier with
other references or locating it within array of references.
For example, the following query will search in Contract
table for all active contracts
(assuming that field canceledContracts
has
Contract[]
type):
current not in supplier.canceledContracts
Programmer can define methods for structures, which can be used
in queries with the same syntax as normal structure components.
Such methods should have no arguments except pointer to the object to which
they belong (this
pointer in C#), and should return
atomic value (of boolean, numeric, string or reference type).
Also method should not change object instance (immutable method).
So user-defined methods can be used for creation virtual components -
components which are not stored in database, but instead if this are calculated
using values of other components. For example, you can use standard .Net
DateTime
class with such properties
as Year
, Month
...
So it is possible to specify queries like: "delivery.Year = 99
"
in application, where delivery
record field has
DateTime
type.
Name | Argument type | Return type | Description |
---|---|---|---|
abs | integer | integer | absolute value of the argument |
abs | real | real | absolute value of the argument |
sin | real | real | sin (rad) |
cos | real | real | cos (rad) |
tan | real | real | tan (rad) |
asin | real | real | arcsin |
acos | real | real | arccos |
atan | real | real | arctan |
exp | real | real | exponent |
log | real | real | natural logarithm |
ceil | real | real | the smallest integer value that is not less than the argument |
floor | real | real | the largest integer value that is not greater than the argument |
integer | real | integer | conversion of real to integer |
length | array | integer | number of elements in array |
lower | string | string | lowercase string |
real | integer | real | conversion of integer to real |
string | integer | string | conversion of integer to string |
string | real | string | conversion of real to string |
upper | string | string | uppercase string |
To be able to use indices, programmer has obtain Query object and register indices in it.
JSQL supports Perst Index
and FieldIndex
indices.
Indices are located by JSQL by field name. It is assumed that class objects objects
returned by index iterator is the same as specified in select
query.
If you are going to use the same Query instance for selecting data from different collections
(with different collection member class), please check that there no name conflict for key fields,
i.e. index corresponding one collection will not be applied for another collection.
Maintaining indices is responsibility of programmer. JSQL is only using indices for searching element by key.
Database
class.
This class emulates relational database on top of Perst. Certainly even with this
class, Perst provides very object-orient way of dealing with persistent data.
Database
class allows you to create/drop table, add/drop indices,
create/update/delete records. It automatically update indices when new record is
added or removed.
Also Database
class makes it possible to prepare and execute queries.
Database
class is used as wrapper for underlying Perst storage. Storage
passed to the constructor of Database
class should be opened and either not initialized,
either initialized before by the same Database
constructor. It is not possible
to attach database to Perst Storage with existed root object other than one set
by Database
constructor.
A table in database are associated with C# class. Almost all methods
of database class requires passing of Class
parameter
or persistent object instance. In last case class of this object is used.
If Perst founds no table associated with the class, it tries to
find table for superclass and so on... So it is possible to have
"polymorphic" table - table containing records (object) of multiple classes derived
from this one. Methods dealing with object instance have overloaded version
where class can be specified explicitly. It makes it possible to place object
in table associated with some particular interface.
Database provides two ways of querying tables: using simple and prepared queries.
In first case Database.select
directly returns result of query execution
(iterator). In second case it is possible to execute prepared query multiple times,
specifying different values of parameters.
Please find more information of Database methods in API documentation and look at JsqlSSD example - Supplier-Shipment-Detail database built using Database class.
Perst.NET includes two classes PersistentContext and TransparentPersistenceAttrribute
which are used to provide transparent persistence for C# objects using .Net remoting API.
If application class is derived from PersistentContext and marked with TransparentPersistence
attribute, it will automatically load on demand its content from the database and
automatically store content if the object is modified. Implementation of
recursiveLoading
method in PersistentContext
class returns false,
so object are loaded from the database only when them are accessed.
But there are two significant restrictions on using remoting API:
TransparentGuess
example illustrates using of this Perst transparent API.
IPersistent.RecursiveLoading
method and explicit loading
of object using IPersistent.Load
method or about explicit
marking of updated object by IPersistent.Modify
method.
So a lot of error prone things are disappears. But certainly this approach has their
own drawbacks:
IStorage.CreateClass
method which will generate wrapper class
and create instance of this class. Also persistent class should have no constructor
except default. And this default constructor should not do something else except
initialization of transient components, because this constructor will be invoked each time
object is loaded from the storage.
Perst.PArray
class instead.
There are two possible ways of defining persistent classes using this API. First is based in using interfaces:
class Node:IPersistent { int Weight { get; set; } Node Parent { get; set; } }Please notice that you should extend IPersistent interface. In this case Perst will derive generated wrapper class from
Perst.Persistent
class (or from Perst.PersistentResource
class
if specified interface extends Perst.IResource
interface).
Wrapper class will implement specified interface and contains correspondent fields:
class NodeWrapper:Persistent, Node { public int Weight { get { return weight; } set { weight = value; Modify(); } } public Node Parent { get { return (Node)Storage.getByOid(parent); } set { parent = value.MakePersistent(Storage); Modify(); } } }In the second case you are using abstract class instead of interface:
interface Node:Persistent { public abstract int Weight { get; set; } public abstract Node Parent { get; set; } }This class has to be derived from Persistent or PersistentResource class. There is only one difference in generated wrapper class - it extends specified abstract class:
class NodeWrapper:Node { ... }
As far as there are not direct references between objects when this API is used, garbage collector will remove from memory all persistent objects which are not referenced from some program variables. That is why it is highly recommended to use LRU object cache to increase performance (it is used by default).
PropGuess
example illustrates using of this Perst transparent API.
PERST performs cyclic scanning of bitmap pages. It keeps identifier
of current bitmap page and current position within the page. Each time
when allocation request arrives, scanning of the bitmap starts from the
current position.
When last allocated bitmap page is scanned, scanning continues from the
beginning (from the first bitmap page) and until current position.
When no free space is found after full cycle through all bitmap pages,
new bulk of memory is allocated. Size of extension is maximum of
size of allocated object and extension quantum. Extension quantum is parameter
of database, specified in constructor. Bitmap is extended to be able to map
additional space. If virtual space is exhausted and no more
bitmap pages can be allocated, then OutOfMemory
error
is reported.
Allocation memory using bitmap provides high locality of references (objects are mostly allocated sequentially) and also minimizes number of modified pages. Minimization of number of modified pages is significant when commit operation is performed and all dirty pages should be flushed on the disk. When all cloned objects are placed sequentially, number of modified pages is minimal and so transaction commit time is also reduced. Using extension quantum also helps to preserve sequential allocation. Once bitmap is extended, objects will be allocated sequentially until extension quantum will be completely used. Only after reaching the end of the bitmap, scanning restarts from the beginning searching for holes in previously allocated memory.
To reduce number of bitmap pages scans, PERST associates descriptor with each page, which is used to remember maximal size of the hole on the page. Calculation of maximal hole size is performed in the following way: if object of size M can not be allocated from this bitmap pages, then maximal hole size is less than M, so M is stored in the page descriptor if previous value of descriptor is large than M. For next allocation of object of size greater or equal than M, we will skip this bitmap page. Page descriptor is reset when some object is deallocated within this bitmap page.
Some database objects (like hash table pages) should be aligned on page boundary to provide more efficient access. PERST memory allocator checks requested size and if it is aligned on page boundary, then address of allocated memory segment is also aligned on page boundary. Search of free hole will be done faster in this case, because PERST increases step of current position increment according to the value of alignment.
To be able to deallocate memory used by object, PERST needs to keep somewhere information about object size. PERST memory allocator deals with two types of objects - normal table records and page objects. All table records are prepended by record header, which contains record size and pointer of L2-list linking all records in the table. So size of the table record object can be extracted from record header. Page objects always occupies the whole database page are are allocated at the positions aligned on page boundary. Page objects has no headers. PERST distinguish page objects with normal object by using special marker in object index.
When object is modified first time, it is cloned (copy of the object is created) and object handle in current index is changed to point to newly created object copy. And shadow index still contains handle which points to the original version of the object. All changes are done with the object copy, leaving original object unchanged. PERST marks in special bitmap page of the object index, which contains modified object handle.
When transaction is committed, PERST first checks if size of object index was increased during current transaction. If so, it also reallocates shadow copy of object index. Then PERST frees memory for all "old objects", i.e. objects which was cloned within transaction. Memory can not be deallocated before commit, because we wants to preserve consistent state of the database by keeping cloned object unchanged. If we deallocate memory immediately after cloning, new object can be allocated at the place of cloned object and we loose consistency. As far as memory deallocation is done in PERST by bitmap using the same transaction mechanism as for normal database objects, deallocation of object space will require clearing some bits in bitmap page, which also should be cloned before modification. Cloning bitmap page will require new space for allocation the page copy, and we can reuse space of deallocated objects. And it is not acceptable due to the reason explained above - we will loose database consistency. That is why deallocation of object is done in two steps. When object is cloned, all bitmap pages used for marking objects space, are also cloned (if not not cloned before). So when transaction is committed, we only clear bits in bitmap pages and no more requests for allocation memory can be generated at this moment.
After deallocation of old copies, PERST flushes all modified pages on disk to synchronize content of the memory and disk file. After that PERST changes current object index indicator in database header to switch roles of the object indices. Now object index, which was current becomes shadow, and shadow index becomes current. Then PERST again flushes modified page (i.e. page with database header) on disk, transferring database to new consistent state. After that PERST copies all modified handles from new object index to object index which was previously shadow and now becomes current. At this moment contents of both indices is synchronized and PERST is ready to start new transaction.
Bitmap of modified object index pages is used to minimize time of committing transaction. Not the whole object index, but only its modified pages should be copied. After committing of transaction bitmap is cleared.
When transaction is explicitly aborted by Storage.rollback
method, shadow object index is copied back to the current index, eliminating
all changes done by aborted transaction. After the end of copying,
both indices are identical again and database state corresponds to the moment
before the start of current transaction.
Allocation of object handles is done by free handles list. Header of the list is also shadowed and two instances of list headers are stored in database header. Switch between them is done in the same way as switch of object indices. When there are no more free elements in the list, PERST allocates handles from the unused part of new index. When there is no more space in the index, it is reallocated. Object index is the only entity in database whose is not cloned on modification. Instead of this two copies of object index are always used.
There are some predefined OID values in PERST. OID 0 is reserved as invalid object identifier. OID starting from 1 are reserved for bitmap pages. Number of bitmap pages depends on database maximum virtual space. For one terabyte virtual space, 8 Kb page size and 64 byte allocation quantum, 32K bitmap pages are required. So 32K handles are reserved in object index for bitmap. Bitmap pages are allocated on demand, when database size is extended. So OID of first users object will be 0x8002.
Recovery procedure is trivial in PERST. There are two instances of
object index, one of which is current and another corresponds to
consistent database state. When database is opened, PERST checks database
header to detect if database was normally closed. If not
(dirty
flag is set in database header), then PERST performs
database recovery. Recovery is very similar to rollback of transaction.
Indicator of current index in database object header is used to
determine index corresponding to consistent database state and object handles
from this index are copied to another object index, eliminating
all changes done by uncommitted transaction. As far as the only action
performed by recovery procedure is copying of objects index (really only
handles having different values in current and shadow indices are copied to
reduce number of modified pages) and size of object index is small,
recovery can be done very fast.
Fast recovery procedure reduces "out-of-service" time of application.
The table below summarize pro features of PERST:
using Perst; public class YourPersistentClass : Persistent { int x; String y; Link links; ... void doUpdate() { x = 1; y = "Hello World"; links = getStorage().CreateLink(); store(); // save changes in the database } } public class YourRoot : Persistent { YourPersistentClass ref; Index myIndex; void foo() { ref.doUpdate(); // referenced object has no to be explictely loaded } }; public class YourApplication { static public void Main(String[] args) { String databaseFilePath = args[0]; // path to the database file int pagePoolSize = Int32.Parse(args[1]); // size of page pol in bytes Storage storage = StorageFactory.Instance.CreateStorage(); storage.Open(databaseFilePath, pagePoolSize); YourRoot root = (YourRoot)storage.Root; if (root == null) { // Storage was not initialized yet root = new YourRoot(); root.myIndex = storage.CreateIndex(typeof(string), true); // unique index storage.setRoot(root); } root.foo(); // all objects referenced from root are implicitly loaded YourPersistentClass obj = (YourPersistentClass)root.myIndex["John"]; // find object with key "John" ... // do something else with database storage.Commit(); // commit transaction ... // do something else with database storage.Close(); // commit the last transaction and close the database } }
You can also look at PERST examples:
Tests are located in tst
directory and can be build using compile.bat
script.
Speed of accessing data at the disk is several times slower than speed of access data in main memory.
That is why caching of data is the key to increase database performance. PERST is using
pool of pages to optimize access to the disk. Page pool size can be specified in Storage.open
method (by default it is 4Mb). Usually increasing page pool leads to better performance. But you should
notice the following things:
If you think that all your data should fit in main memory, you can use
Storage.INFINITE_PAGE_POOL
constant in Storage.open
method.
In this case page pool will be dynamically extended when new page is requested.
As a result, all pages will be cached and present in memory. So each page is read from the disk only once.
Also in this case strong object cache is used instead of weak object cache. It means that all fetched
objects are also pinned in memory and object is unpacked only once. It is important to notice
that amount of used main memory will be greater than database size: all objects will be present in memory in packed
(inside the page containing the object) and in unpacked (referenced from object cached) form.
In some applications (for example for mobile devices) persistency is not needed. But such PERST container classes
as Link, Index, FieldIndex, SpatialIndex still can be used by application. In this case you can use
NullFile
implementation of IFile
interface together
with Storage.INFINITE_PAGE_POOL
to create transient in-memory database. Data in this case will never be
written on the disk.
There are some constants defined in StorageImpl
class which has influence on
initial and maximal database size. If you want to change them, you will have to rebuild PERST.
Below is detailed description of this parameters:
Parameter | Default value | Description |
---|---|---|
dbDefaultInitIndexSize | 1024 | Initial object index size. Object index is increased on demand. Reallocation of index is expensive operation and so to minimize number of such reallocations, object index size is always doubled. Specifying larger initial index size allows to reduce number of future reallocations and so a little bit increase performance (certainly if your application allocates such number of object). But it also leads to larger initial size of database file. With default value of this parameter, initial database size is about 50Kb. |
dbDefaultExtensionQuantum | 4Mb | Database extension quantum.
Memory is allocate by scanning bitmap. If there is no large enough hole, then database is extended by the value of
dbDefaultExtensionQuantum . Increasing the value of this parameters leads to less frequent
rescanning of allocation bitmap from the very beginning. It leads to faster allocation speed and better locality of
reference for created objects (because there is more chances that them will be allocated sequentially).
From the other side it leads to less efficient memory usage. Reducing the value of this parameter force reusing
of existed holes in memory allocation bitmap.
|
dbObjectCacheInitSize | 1319 | Size of object cache. PERST needs this cache to check if object with such OID already present in memory. This cache uses weak references to allow garbage collector to do it work. When some threshold of filling cache is reached, cached is reallocated by doubling its size. Once again increasing this parameter can save some number of cache reallocations. |
Now some hints how to increase PERST performance and reduce size of used main memory. If you database performs a lot of updates of persistent data, then the main limiting factor is speed of writing changes to the disk. Especially synchronous write to the disk performed by commit. If you will do commit after each update, then average speed will be about 10 updates per second (this estimation is based on the assumption than average disk access time is about 10msec and each transaction commit usually requires writing about 10 pages in random places in the file). But it is possible to dramatically increase update performance if you group several updates in one transactions. PERST is creating shadow of the object when it is first time updated inside transaction. If object will be updated once in N transactions, then N shadows will be created. If object will be updated N times inside one transaction, then shadow will be created only once. It explains advantage of having one large transaction.
But if you will perform update of large number of objects in one transaction and for each updated object shadow is created, then it leads to significant increase of database file size. If you update each object in the database inside one transaction, database size will be almost doubled! And if you perform each update in separate transaction, then size of database will be almost unchanged (because space of allocated shadow objects will be reused in this case). So the best approach is to perform commit after 100 or 1000 updates, it will reduce overhead of each commit and save database size.
If your persistent object form tree or graph where all objects cab be accessed by reference from the root object, then once you will load root object in main memory and store reference to it in some variable, GC will never be able to collect any instance of loaded persistent object (because it will be accessible from the root object). So when you application access more and more objects from the storage, at some moment of time all of them will have to be loaded in main memory. It can cause space exhaustion. To prevent this situation you should avoid to store in variables references to container objects which contain references to a large number of members. Especially it is true for storage root object. In this case GC is able to do it work and throw out from the memory objects which are not used at this moment (to which there are no references from local variable). But it is important to say that objects accessible though the index by key can be normally deallocated by garbage collector. So in this case special care is not needed.
Initialization of transient attributes or resettable attributes should/ can be done via the default constructor or the onLoad() method, which is called when an object is retrieved from the db.
recursiveLoading
method in some classes. You should explicitly load all object referenced
from object with disabled recursive loading.
But really situation is better. It is important to notice that all Perst collection classes always load their members on demand. It means that once you have reference for example to FielIndex, only header of this collection object will be loaded and members will be loaded on demand. It is true for all other Perst collections: Link, Relation, Index, SortedCollection, PersistentSet. As far as persistent objects in database are usually accessible through collections, in most cases explicit loading of persistent objects is not needed. But be careful: if in addition to placing all your objects in some index, you also link them for example in L2-List, then fetching single object will cause loading of all other objects from this collection!
Storage.setProperty("perst.serialize.transient.objects")
.
Perst will correctly handle reference to persistent object from serialized transient objects.
But it is important to notice, that identity of saved transient objects is not preserved.
If there was transient object O referenced from persistent objects P1 and P2, then when
objects P1 and P2 are stored, instance of O object will be serialized both in P1 and P2 record.
And when P1 and P2 object will be reloaded from the database, them will refer to different instances
O1 and O2. Also please notice, that is impossible to mark transient objects is modified.
Of some of referenced transient objects is updated, you should also update persistent object
referencing (directly or indirectly) this object.
StorageError(StorageError.ErrorCode.INVALID_OID)
exception
if reference to the deleted object is found and return non zero number of collected objects
if there is garbage in the database.
If you insert keys in index in random order, most likely that loaded pages are located at random offset within database file, so access to the page requires positioning of disk head. Average disk positioning time for the modern disks is about 10msec. It means that disk is able to perform about 100 disk access per second. So if database is really so large that each operation with B-Tree requires loading of all pages at the path from the root till leaf page (except root page - for large number of object it will be 3 pages), then database will be able to perform about 30 inserts (or searches) per second. So inserting 1 million objects in this worst scenario will take about 10 hours! Fortunately real performance is significantly better (because of page cache hits, OS file cache ...)
There are two obvious ways of improving performance:
Look for new version at my homepage | E-Mail me about bugs and problems