The main advantage of PERST is tight integration with programming language. There is no gap between database and application data models - PERST directly stores language objects. So there is no need in packing/unpacking code, which has to be written for traditional relational databases. Also PERST (unlike many other OODBMS) requires no special compiler or preprocessor. And still it is able to provide high level of transparency.
Persistent
base class is considered as persistent capable. It is automatically made
persistent and stored in the storage when it is referenced from some other persistent object and
store
method of that object was invoked. So there is no need (but it is possible) to explicitly
assign object to the storage.
The storage has one special root object. Root object is the only persistent object accessed in the special
way (using Storage.getRoot
method). All other persistent objects are accessed in normal way:
either by traversing by references from another persistent objects, either using object containers (Index
,
Link
or Relation
). Unlike many other OODBMS, there can be only one root in the storage.
If you need to have several named roots, you should create Index
object and use it as root object.
PERST requires that each persistent capable class should be derived from Persistent
class.
It makes not possible to store "foreign" classes in the storage. This is the cost of easy use of PERST and lack
of any specialized preprocessors or compilers. Also components of persistent capable object are restricted to have
the following types:
boolean, int, short, double,...
java.lang.String
type
java.util.Date
type
Persistent
or any interface extending IPersistent
interface.
IValue
interface with the same restrictions for types of components as for persistent capable objects. If perst.implicit.values
property is true
, then any class which is not derived from IPersistent
will be treated as value. Values are stored inside the persistent object containing them. Value class should have default constructor (constructor with empty parameters list) or have not constructors
at all.
Declaration type of the field referencing value should be the same as actual type of referenced object (so it is not
possible to assign to this filed object of the derived class). Also value class should not have recursive references
(direct or indirect) - field of value class V could not have type V.
perst.serialize.transient.objects
property is true, then PERST will serialize
all objects of non-persistent and non-value classes (classes which are not derived from IPersistent
or IValue
interfaces) using standard Java serialization mechanism.
Object closure will be packed into array of bytes which is stored in the database. The class should implement
java.io.Serializable
interface should not contain references to persistent objects.
Storage.createLink
method.
Unfortunately it is not possible to detect if object is changed or not without saving old state of the object
and performing field-by-field comparison with new state of the object. But overhead of such solution
(both space and CPU) is very high. In PERST it is responsibility of programmer to save object in the storage.
It can be done by Persistent.store
or Persistent.modify
methods.
Persistent.store
method writes object in the storage as well
as all objects referenced from this object which are not yet persistent. So if you create tree of objects
and assign reference to the root of this tree to some persistent object X, it is only necessary to
invoke store()
method in this object X. But then if you update one of the elements in this tree,
you should invoke store()
individually for each such element (X.store()
will
NOT now automatically save referenced objects).
Persistent.modify
method mark object is modified but doesn't immediately write it to the storage.
All objects marked as modified will be stored to the storage during transaction commit (store
method
will be invoked for each modified object). So using modify
method is preferable if object
is updated multiple times within transaction. In this case instead of storing it several times, it will
be stored only once at the end.
Perst doesn't support nested non-static classes. The reason is that this class contains final field pointed to outer class. As far as field is final it is not possible to assign value to it during object loading in the same way as for all other fields of persistent object.
IPersistent
interface declares recursiveLoading
method. Default implementation of
this method in Persistent
class always returns true
. In this case PERST
will recursively load any object referenced from target object when target object is loaded.
So it cause implicit loading of all cluster of referenced objects from storage to main memory.
It is similar with how serialization mechanism works.
To avoid overflow of main memory caused by recursive loading of all objects from the storage,
programmer has to overload recursiveLoading
method in some classes and return false
n it.
Objects loaded from such object will not be implicitly loaded and programmer has to explicitly
invoke Persistent.load
method to load them. So recursiveLoading
method can be used
to control loading of objects from storage to main memory.
Also it is important to notice that containers (Link, Relation, Index
) always load member object on demand
(do not perform automatic loading of all objects in the containers). Since access to the container members is always
performed through methods of the container class, programmer will never notice it - container methods will always
return loaded object(s).
PERST is using default constructor (constructor without parameters) to create object loaded from the storage. It means the following things:
sun.reflect.ReflectionFactory
. But it will work only with Sun's JDK.
onLoad
method which is invoked after fetching of all values of non-transient fields of persistent object from the storage.
So summarizing all above, proposed mechanism is convenient and easy to use because it doesn't require programmer
to explicitly load any referenced object and from another point of view is is flexible above by providing programmer control
on object loading. Only those classes (usually containers) which explicitly control loading of their
members (by overloading recursiveLoading
to return false
value) should be aware
calling Persistent.load
method.
int
type to long
or
int
to float
. But changing type from int
to
short
is not possible. More precisely, PERST is able to perform any conversion
which is supported by Java reflection mechanism (field type XXX can be changed to YYY
if java.lang.reflect.Field.setXXX
method can be applied to the component with type YYY).
All other schema modifications (such as renaming fields, incompatible change of field type)
could not be handled by PERST automatic schema modification mechanism.
In this case you can use Perst XML export/import mechanism. Database can be
exported to XML format using Storage.exportXML
method,
then recreated with new class definitions and after it saved data can be imported using
Storage.importXML
method.
Link
interface to deal with relations of such kind.
This interface allows to add/delete/inspect members of the relation.
Members of the relation can be accessed by index or be extracted as array of objects.
Relations can be of two kinds: embedded (when references to the related objects are stored in relation
owner object itself) and standalone (when relation is separate object, which contains
the reference to the relation owner and relation members). Both kinds of relations
implements Link interface. Embedded relation is created by Storage.createLink
method
and standalone relation is represented by Relation persistent class created by
Storage.createRelation
method.
So relation between two classes A and B can be implemented in the following way:
Relation type | Object A | Object B |
---|---|---|
one-to-one | B bref; | A aref; |
one-to-many | Link bref; | A aref; |
many-to-one | B bref; | Link aref; |
many-to-many | Link bref; | Link aref; |
Hashtable
or HashMap
class
can be used for it. In databases usually more sophisticated search is required.
I do not want to implement in PERST complete SQL language, because it immediately makes DBMS huge and slow.
But in most cases application is performing only very simple queries using exact match or key range.
This is done in PERST by Index
and IndexField
interfaces. First interface is used
for independent specification key and associated value. IndexField
interface allows to index objects
using one of the fields of this object (key field).
Indices are created in PERST using Storage.createIndex
or Storage.createFieldIndex
methods.
There can be several index implementations
but right now only one implementation based on B+Tree is provided (because B+Tree is the most efficient structure
for disk based databases). Methods of Index
and FieldIndex
interface allows to add, remove and
locate objects by key. It is possible to perform search either specifying exact key value either specifying range of key values
(high or low boundary or both of them can be skipped or can be declared as been exclusive or inclusive).
So it is possible to perform the following types of search:
There are several different ways of selecting objects using index:
IPersistent get(Key key)
IPersistent[] get(Key from, Key till)
null
. Both boundaries can be inclusive or exclusive.
Iterator iterator()
Iterator iterator(Key from, Key till, int order)
null
. Both boundaries can be inclusive or exclusive. Objects can be traversed in key ascending or descending order .
If you need set of persistent objects you should use Storage.createSet
method.
Set is implemented using B+Tree where object OID is used as a key.
PERST also support spatial indices (org.garret.perst.SpatialIndex
) and generic indices with
user defined comparator (org.garret.perst.SortedCollection
).
Spatial index is implemented using Guttman's R-Tree with quadratic split algorithm.
It allows efficient search of R2 objects. Sorted collection provides almost the same methods
as FieldIndex
but it uses user defined comparator to compare collection members.
Sorted collection is implemented using T-Tree and is especially efficient for main-memory
databases.
Interface | Description | Key type | Implementation | Created by |
---|---|---|---|---|
Index |
Index with explicitly specified key used for exact match or range queries | scalar, string or reference | B+Tree | Storage.createIndex(Class type, boolean unique) |
Index |
The same as above but assuming that there can be a lot of duplicate key values | scalar, string or reference | B+Tree | Storage.createThinkIndex(Class type) |
FieldIndex |
Index constructed for one of the object fields | scalar, string or reference | B+Tree | Storage.createFieldIndex(Class type, String fieldName, boolean unique) |
BitIndex |
Bit index for searching object by bitmap of properties | persistent object | B+Tree | Storage.createBitIndex() |
IPersistentSet |
Set of persistent objects | persistent object | B+Tree | Storage.createSet() |
IPersistentSet |
Scalable set of persistent objects (can efficienty handle both small and large number of members | persistent object | Link or B+Tree | Storage.createScalableSet() |
SpatialIndex |
Index for spatial objects with integer coordinates | Rectangle | R-Tree | Storage.createSpatialIndex() |
SpatialIndexR2 |
Index for spatial objects with real coordinates | RectangleR2 | R-Tree | Storage.createSpatialIndexR2() |
SortedCollection |
Index with user defined comparator | any | T-Tree | Storage.createSortedCollection(PersistentComparator comparator, boolean unique) |
select * from T where x=?;PERST relations can be used to implement simple joins, like the following SQL query fetching all order to the particular vendor:
select * from O Order, V Vendor where O.vendorID = V.id AND V.name='ABC';In PERST it is possible to select first vendor using index search and ten traverse correspondent relation to locate all orders to this vendor.
But sometimes it is needed to implement more complex queries.
It is also possible in Perst without need to write query in some special
(non-procedural) language. Class Projection
is use to combine results
of several simple operations (such index search). Lets start explanation of this class with an
example. Consider that we need to know all order for the detail with name started with 'D1'
shipped by vendors with name started with 'V1' or 'V3'.
SQL statement which perform this query is the following:
select * from O Order, V Vendor, D Detail where D.name like 'D1%' and O.detailID = D.id and O.vendorID = V.id and (V.name like 'V1%' OR V.name like 'V3%');And now how it can be done in Perst. Consider that we have indices for details and vendors:
FieldIndex detailIndex = db.createFieldIndex(Detail.class, "name", true); FieldIndex vendorIndex = db.createFieldIndex(Vendor.class, "name", true);Set of requested orders can be obtained in the following way:
// Projection from vendors to orders (using "orders" field of Link type) Projection v2o = new Projection(Vendor.class, "orders"); // Projection from details to orders (using "orders" field of Link type) Projection d2o = new Projection(Detail.class, "orders"); // Select vendors with name like 'V1%' v2o.project(vendorIndex.getPrefix("V1")); // Select vendors with name like 'V3%' v2o.project(vendorIndex.getPrefix("V3")); // Select details with name like 'D1%' d2o.project(detailIndex.getPrefix("D1")); // Join projections v2o.join(d2o); // Get array of requested orders Order[] orders = (Order[])v2o.toArray(new Order[v2o.size()]);So, as you see Projection class is used for several purposes:
It is possible to derive your own class from Projection
to implement
more sophisticated projections than using single projection field.
If you need to sort selection in some particular order, then most efficient way
is to use FieldIndex
for it. When you select objects using index, selected objects
are sorted by search key. If you need to sort selection by field which is not search key,
than you can use Arrays.sort
method. For example if in the query described above
we need to sort orders by price field, it can be done with the following statement:
Array.sort(orders, new Comparator() { public int compare(Object a, Object b) { return ((Order)a).price - ((Order)b).price; } });
commit, rollback
or close
methods.Commit of transaction cause synchronous write of changed pages to the disk. Synchronous write (application is blocked until data is really flushed to the disk) is very expensive operation. Average positioning time for modern disks is about 10ms, so them are usually not able to perform in one second more than 100 writes in random places. As far as transaction usually consists of several updated pages, it leads to average performance about 10 transaction commits per second.
Performance can be greatly increased if you minimize number of commits (larger transactions). PERST is using shadow mechanism for transaction implementation. When object is changed first time during transaction, shadow of the object is created and original object is kept unchanged. If object is updated multiple times during transaction, shadow is create only once. Because of using shows, PERST do not need transaction log file. So in PERST long transaction can not cause transaction log overflow as in most of other DBMSes. Quite the contrary, if you do not call commit at all, PERST works as DBMS without transaction support, adding almost no overhead of supporting transactions.
The only drawback of long transactions is possibility to loose a lot of changes in case of fault. PERST will preserve consistency of the database, but all changes made since list commit will be lost.
Perst also support per-thread transactions. Three types of per-thread transactions are supported:
exclusive, cooperative and serializable. In case of exclusive transaction, only one
thread can update the database. In cooperative mode, multiple transaction can work
concurrently and commit()
method will be invoked only when transactions of all threads
are terminated. Serializable transactions can also work concurrently. But unlike
cooperative transaction, the threads are isolated from each other. Each thread
has its own associated set of modified objects and committing the transaction will cause
saving only of these objects to the database. To synchronize access to the objects
in case of serializable transaction programmer should use lock methods
of IResource
interface. Shared lock should be set before read access to any object,
and exclusive lock - before write access. Locks will be automatically released when
transaction is committed (so programmer should not explicitly invoke unlock()
method)
In this case it is guaranteed that transactions are serializable.
It is not possible to use IPersistent.store()
method in
serializable transactions (you should use IPersistent.modify()
instead). That is why it is also not possible to use Index and FieldIndex
containers (since them are based on B-Tree and B-Tree directly access database pages
and use store()
method to assign OID to inserted object.
You should use SortedCollection
based on T-Tree instead or alternative
B-Tree implemenataion (see "perst.alternative.btree" property).
In Perst there is org.garret.perst.Query
class allowing to execute JSQL
queries. To execute JSQL query it is necessary to provide object iterator
(java.util.Iterator), class of iterated objects and string representing query condition.
It is also possible to specify indices, resolvers, query parameters.
Result of query execution is another iterator which can be used to traverse though selected objects
(object matching specified search criteria).
All Perst collection classes contains select
method which allows to filter collection members
using JSQL query. This method is give class of objects stored in the collection (for some collections
such as FieldIndex it is known and should not be specified) and JSQL condition. As the result of its work,
select returns iterator of selected objects.
Example | Meaning |
---|---|
expression | non-terminals |
not | terminals |
| | disjoint alternatives |
(not) | optional part |
{1..9} | repeat zero or more times |
select-condition ::= ( expression ) ( traverse ) ( order ) expression ::= disjunction disjunction ::= conjunction | conjunction or disjunction conjunction ::= comparison | comparison and conjunction comparison ::= operand = operand | operand != operand | operand <> operand | operand < operand | operand <= operand | operand > operand | operand >= operand | operand (not) like operand | operand (not) like operand escape string | operand (not) in operand | operand (not) in expressions-list | operand (not) between operand and operand | operand is (not) null operand ::= addition additions ::= multiplication | addition + multiplication | addition || multiplication | addition - multiplication multiplication ::= power | multiplication * power | multiplication / power power ::= term | term ^ power term ::= identifier | number | string | true | false | null | current | first | last | ( expression ) | not comparison | - term | term [ expression ] | identifier . term | function term | count (*) | contains array-field (with expression) (group by identifier having expression) | exists identifier : term function ::= abs | length | lower | upper | integer | real | string | | sin | cos | tan | asin | acos | | atan | log | exp | ceil | floor | sum | avg | min | max string ::= ' { { any-character-except-quote } ('') } ' expressions-list ::= ( expression { , expression } ) order ::= order by sort-list sort-list ::= field-order { , field-order } field-order ::= field (asc | desc) field ::= identifier { . identifier } fields-list ::= field { , field } user-function ::= identifier
Identifiers are case sensitive, begin with a..z, A..Z, '_' or '$' character, contain only a-z, A..Z, 0..9 '_' or '$' characters, and do not duplicate a SQL reserved words.
abs | acos | and | asc | asin |
atan | avg | between | by | contains |
cos | ceil | count | current | desc |
escape | exists | exp | false | floor |
group | having | in | integer | is |
length | like | log | lower | max |
min | not | null | or | real |
sin | string | sum | tan | true |
upper | with |
JSQL extends ANSI standard SQL operations by supporting bit manipulation
operations. Operators and
/or
can be applied not only
to boolean operands but also to operands of integer type. Result of applying
and
/or
operator to integer operands is integer
value with bits set by bit-AND/bit-OR operation. Bits operations can be used
for efficient implementation of small sets. Also rasing to a power
operation ^ is supported by JSQL for integer and floating point
types.
length()
function.
[]
operator.
If index expression is out of array range, then exception will be raised.
in
can be used for checking if array contains
value specified by left operand. This operation can be used only for arrays of
atomic types: with boolean, numeric, reference or string components.
exists
operator. Variable specified after exists
keyword can be used
as index in arrays in the expression preceded by exists
quantor. This index variable will iterate through all possible array
index values, until value of expression will become true
or
index runs out of range. Condition
exists i: (contract[i].company.location = 'US')will select all details which are shipped by companies located in US, while query
not exists i: (contract[i].company.location = 'US')will select all details which are shipped only from companies outside US.
Nested exists
clauses are allowed. Using of nested
exists
quantors is equivalent to nested loops using correspondent
index variables. For example query
exists colon: (exists row: (matrix[colon][row] = 0))will select all records, containing 0 in elements of
matrix
field, which has type array of array of integer.
This construction is equivalent to the following
two nested loops:
boolean result = false; for (int colon = 0; colon < matrix.length(); colon++) { for (int row = 0; row < matrix[colon].length(); row++) { if (matrix[colon][row] == 0) { result = true; break; } } }Order of using indices is significant! Result of the following query execution
exists row: (exists colon: (matrix[colon][row] = 0))
will be completely different with result of previous query. The program can
simply hang in last case due to infinite loop for empty matrices.
IndexOutOfRangeError
exception to notify end of iteration,
contains with will just iterate through all array elements and do not throw any exception. So performance of
contains with is better.
char
in C) and
byte-by-byte comparison of strings ignoring locality settings.
Construction like
can be used for
matching string with a pattern containing special wildcard characters
'%' and '_'. Character '_' matches any single character, while character
'%' matches any number of characters (including 0). Extended form of
like
operator with escape
part can be used
to handle characters '%' and '_' in the pattern as normal characters if
they are preceded by special escape character, specified after
escape
keyword.
It is possible to search substring within string by in
operator. Expression ('blue' in color)
will be true
for all records which color
fields contains 'blue' word.
Strings can be concatenated by +
or ||
operators.
Last one was added only for compatibility with ANSI SQL standard.
As far as JSQL doesn't support implicit conversion to string type in
expressions, semantic of operator +
can be redefined for
strings.
company.address.city = 'Chicago'will access record referenced by
company
component of
Contract
record and extract city component of
address
field of referenced record from Supplier
table.
References can be checked for null
by is null
or is not null
predicates. Also references can be compared for
equality with each other as well as with special null
keyword. When null reference is dereferenced, exception is be raised
by JSQL.
There is special keyword current
, which can be used to get
reference to current record during table search. Usually current
keyword is used for comparison of current record identifier with
other references or locating it within array of references.
For example, the following query will search in Contract
table for all active contracts
(assuming that field canceledContracts
has
Contract[]
type):
current not in supplier.canceledContracts
Programmer can define methods for structures, which can be used
in queries with the same syntax as normal structure components.
Such methods should have no arguments except pointer to the object to which
they belong (this
pointer in Java), and should return
atomic value (of boolean, numeric, string or reference type).
Also method should not change object instance (immutable method).
So user-defined methods can be used for creation virtual components -
components which are not stored in database, but instead if this are calculated
using values of other components. For example, you can use standard Java
java.util.Date
class with such methods
as getYear()
, getMonth()
...
So it is possible to specify queries like: "delivery.getYear = 99
"
in application, where delivery
record field has
java.util.Date
type.
Name | Argument type | Return type | Description |
---|---|---|---|
abs | integer | integer | absolute value of the argument |
abs | real | real | absolute value of the argument |
sin | real | real | sin (rad) |
cos | real | real | cos (rad) |
tan | real | real | tan (rad) |
asin | real | real | arcsin |
acos | real | real | arccos |
atan | real | real | arctan |
exp | real | real | exponent |
log | real | real | natural logarithm |
ceil | real | real | the smallest integer value that is not less than the argument |
floor | real | real | the largest integer value that is not greater than the argument |
integer | real | integer | conversion of real to integer |
length | array | integer | number of elements in array |
lower | string | string | lowercase string |
real | integer | real | conversion of integer to real |
string | integer | string | conversion of integer to string |
string | real | string | conversion of real to string |
upper | string | string | uppercase string |
To be able to use indices, programmer has obtain Query object and register indices in it.
JSQL supports Perst Index
and FieldIndex
indices.
Indices are located by JSQL by field name. It is assumed that class objects objects
returned by index iterator is the same as specified in select
query.
If you are going to use the same Query instance for selecting data from different collections
(with different collection member class), please check that there no name conflict for key fields,
i.e. index corresponding one collection will not be applied for another collection.
Maintaining indices is responsibility of programmer. JSQL is only using indices for searching element by key.
Database
class.
This class emulates relational database on top of Perst. Certainly even with this
class, Perst provides very object-orient way of dealing with persistent data.
Database
class allows you to create/drop table, add/drop indices,
create/update/delete records. It automatically update indices when new record is
added or removed.
Also Database
class makes it possible to prepare and execute queries.
Database
class is used as wrapper for underlying Perst storage. Storage
passed to the constructor of Database
class should be opened and either not initialized,
either initialized before by the same Database
constructor. It is not possible
to attach database to Perst Storage with existed root object other than one set
by Database
constructor.
A table in database are associated with Java class. Almost all methods
of database class requires passing of Class
parameter
or persistent object instance. In last case class of this object is used.
If Perst founds no table associated with the class, it tries to
find table for superclass and so on... So it is possible to have
"polymorphic" table - table containing records (object) of multiple classes derived
from this one. Methods dealing with object instance have overloaded version
where class can be specified explicitly. It makes it possible to place object
in table associated with some particular interface.
Database provides two ways of querying tables: using simple and prepared queries.
In first case Database.select
directly returns result of query execution
(iterator). In second case it is possible to execute prepared query multiple times,
specifying different values of parameters.
Please find more information of Database methods in Javadoc and look at JsqlSSD example - Supplier-Shipment-Detail database built using Database class.
Concerning interface to database system, there is object persistence aspect
which should control transparent object loading and storing to/from
the storage. Without AspectJ, PERST uses programmer controlled
recursive object loading mechanism (when some object is requested,
all object referenced from this object will also be fetched unless them
redefine
AspectJ interface to Perst was developed by
Patrick Morris-Suzuki. It is located in package
In directory
With JAssist you can use normal Java compiler. Perst translator for JAssist automatically
made persistent capable all class matching specifying name patterns. With this
translator it is not required for application classes to be derived from Persistent class
and provide empty default constructor.
To be able to use JAssist interface, you should use special JAssist class loader instead
of default Java class loader to load all classes which you want to make persistent
capable. Classes which fully qualified name matches one of specified pattern will be
transformed during loading. Other classes are loaded unchanged.
I have slightly changed JAssist code to be able to distinguish access to this fields
and fields of foreign object. It makes it possible to eliminate extra overhead
for this fields access. If your are going to use original JAssist library,
then you should comment my code using
JAssist interface has only one restriction: you should not access fields of
persistent object other than this object in constructor.
Below is example of using JAssist interface:
To build JAssist interface, you should use
PERST performs cyclic scanning of bitmap pages. It keeps identifier
of current bitmap page and current position within the page. Each time
when allocation request arrives, scanning of the bitmap starts from the
current position.
When last allocated bitmap page is scanned, scanning continues from the
beginning (from the first bitmap page) and until current position.
When no free space is found after full cycle through all bitmap pages,
new bulk of memory is allocated. Size of extension is maximum of
size of allocated object and extension quantum. Extension quantum is parameter
of database, specified in constructor. Bitmap is extended to be able to map
additional space. If virtual space is exhausted and no more
bitmap pages can be allocated, then
Allocation memory using bitmap provides high locality of references
(objects are mostly allocated sequentially) and also minimizes
number of modified pages. Minimization of number of modified pages is
significant when commit operation is performed and all dirty pages should
be flushed on the disk. When all cloned objects are placed sequentially,
number of modified pages is minimal and so transaction commit time is also
reduced. Using extension quantum also helps to
preserve sequential allocation. Once bitmap is extended, objects will
be allocated sequentially until extension quantum will be completely used.
Only after reaching the end of the bitmap, scanning restarts from the beginning
searching for holes in previously allocated memory.
To reduce number of bitmap pages scans, PERST associates descriptor with
each page, which is used to remember maximal size of the hole on the page.
Calculation of maximal hole size is performed in the following way:
if object of size M can not be allocated from this bitmap pages,
then maximal hole size is less than M, so M
is stored in the page descriptor if previous value of descriptor is large
than M. For next allocation of object of size greater or
equal than M, we will skip this bitmap page. Page descriptor
is reset when some object is deallocated within this bitmap page.
Some database objects
(like hash table pages) should be aligned on page boundary
to provide more efficient access. PERST memory allocator checks requested
size and if it is aligned on page boundary, then address of
allocated memory segment is also aligned on page boundary. Search of free hole
will be done faster in this case, because PERST increases step of current
position increment according to the value of alignment.
To be able to deallocate memory used by object, PERST needs to keep
somewhere
information about object size. PERST memory allocator deals with two types
of objects - normal table records and page objects.
All table records are prepended by record header, which contains
record size and pointer of L2-list linking all records in the table.
So size of the table record object can be extracted from record header.
Page objects always occupies the whole database page are are allocated at
the positions aligned on page boundary. Page objects has no headers.
PERST distinguish page objects
with normal object by using special marker in object index.
When object is modified first time, it is cloned
(copy of the object is created) and object handle in current index is
changed to point to newly created object copy. And shadow index still
contains handle which points to the original version of the object.
All changes are done with the object copy, leaving original object unchanged.
PERST marks in special bitmap page of the object index, which contains
modified object handle.
When transaction is committed, PERST first checks if size of object index
was increased during current transaction. If so, it also reallocates shadow
copy of object index. Then PERST frees memory for all "old objects",
i.e. objects which was cloned within transaction. Memory can not be
deallocated before commit, because we wants to preserve consistent
state of the database by keeping cloned object unchanged.
If we deallocate memory immediately after cloning, new object can be
allocated at the place of cloned object and we loose
consistency. As far as memory deallocation is done in PERST by bitmap
using the same transaction mechanism as for normal database objects,
deallocation of object space will require clearing some bits in bitmap page,
which also should be cloned before modification. Cloning bitmap page will
require new space for allocation the page copy, and we can reuse space of
deallocated objects. And it is not acceptable due to the reason explained
above - we will loose database consistency. That is why deallocation
of object is done in two steps. When object is cloned, all bitmap pages
used for marking objects space, are also cloned (if not
not cloned before). So when transaction is committed, we only clear bits in
bitmap pages and no more requests for allocation memory can be generated at
this moment.
After deallocation of old copies, PERST flushes all modified pages on disk
to synchronize content of the memory and disk file. After that PERST
changes current object index indicator in database
header to switch roles of the object indices. Now object index, which was
current becomes shadow, and shadow index becomes current. Then PERST again
flushes modified page (i.e. page with database header) on disk, transferring
database to new consistent state.
After that PERST copies all modified handles from new object index
to object index which was previously shadow and now becomes current.
At this moment contents of both indices is synchronized and PERST is ready
to start new transaction.
Bitmap of modified object index pages is used to minimize time of committing
transaction. Not the whole object index, but only its modified pages should be
copied. After committing of transaction bitmap is cleared.
When transaction is explicitly aborted by
Allocation of object handles is done by free handles list. Header of the list
is also shadowed and two instances of list headers are stored in database
header. Switch between them is done in the same way as switch of
object indices. When there are no more free elements in the list, PERST
allocates handles from the unused part of new index. When there is no
more space in the index, it is reallocated. Object index is the only
entity in database whose is not cloned on modification. Instead of this
two copies of object index are always used.
There are some predefined OID values in PERST. OID 0 is reserved
as invalid object identifier. OID starting from 1 are reserved for bitmap pages.
Number of bitmap pages depends on database maximum virtual space.
For one terabyte virtual space, 8 Kb page size and 64 byte allocation quantum,
32K bitmap pages are required. So 32K handles are reserved in object index for
bitmap. Bitmap pages are allocated on demand, when database size is extended.
So OID of first users object will be 0x8002.
Recovery procedure is trivial in PERST. There are two instances of
object index, one of which is current and another corresponds to
consistent database state. When database is opened, PERST checks database
header to detect if database was normally closed. If not
(
The table below summarize pro features of PERST:
You can also look at PERST examples:
Tests are located in
Speed of accessing data at the disk is several times slower than speed of access data in main memory.
That is why caching of data is the key to increase database performance. PERST is using
pool of pages to optimize access to the disk. Page pool size can be specified in
If you think that all your data should fit in main memory, you can use
In some applications (for example for mobile devices) persistency is not needed. But such PERST container classes
as Link, Index, FieldIndex, SpatialIndex still can be used by application. In this case you can use
There are some constants defined in
Now some hints how to increase PERST performance and reduce size of used main memory.
If you database performs a lot of updates of persistent data, then the main limiting factor is speed
of writing changes to the disk. Especially synchronous write to the disk performed by commit.
If you will do commit after each update, then average speed will be about 10 updates per second
(this estimation is based on the assumption than average disk access time is about 10msec and
each transaction commit usually requires writing about 10 pages in random places in the file).
But it is possible to dramatically increase update performance if you group several updates in
one transactions. PERST is creating shadow of the object when it is
first time updated inside transaction. If object will be updated once in N transactions,
then N shadows will be created. If object will be updated N times inside one transaction, then
shadow will be created only once. It explains advantage of having one large transaction.
But if you will perform update of large number of objects in one transaction and for each updated
object shadow is created, then it leads to significant increase of database file size.
If you update each object in the database inside one transaction, database size will be
almost doubled! And if you perform each update in separate transaction, then size of database
will be almost unchanged (because space of allocated shadow objects will be reused in this case).
So the best approach is to perform commit after 100 or 1000 updates, it will
reduce overhead of each commit and save database size.
If your persistent object form tree or graph where all objects cab be accessed by reference from the root object,
then once you will load root object in main memory and store reference to it in some variable,
GC will never be able to collect any instance of loaded persistent object (because it will be accessible from
the root object). So when you application access more and more objects from the storage,
at some moment of time all of them will have to be loaded in main memory. It can cause space exhaustion.
To prevent this situation you should avoid to store in variables references to container objects which
contain references to a large number of members. Especially it is true for storage root object.
In this case GC is able to do it work and throw out from the memory objects which are not used
at this moment (to which there are no references from local variable).
But it is important to say that objects accessible though the index by key can be normally deallocated by garbage
collector. So in this case special care is not needed.
Values are always stored in context of persistent object referencing them.
And only those fields of value objects, when are known at runtime are stored.
If you assigned to the value field object derived from the declared class of field, then
derived fields will not be saved.
Initialization of transient attributes or resettable attributes should/
can be done via the default constructor or the onLoad() method, which is
called when an object is retrieved from the db.
But really situation is better. It is important to notice that all Perst collection classes
always load their members on demand. It means that once you have reference for example to FielIndex,
only header of this collection object will be loaded and members will be loaded on demand.
It is true for all other Perst collections: Link, Relation, Index, SortedCollection, IPersistentSet.
As far as persistent objects in database are usually accessible through collections, in
most cases explicit loading of persistent objects is not needed. But be careful: if in addition
to placing all your objects in some index, you also link them for example in L2-List, then
fetching single object will cause loading of all other objects from this collection!
If you insert keys in index in random order, most likely that loaded
pages are located at random offset within database file, so access to
the page requires positioning of disk head. Average disk positioning time
for the modern disks is about 10msec. It means that disk is able to perform
about 100 disk access per second. So if database is really so large
that each operation with B-Tree requires loading of all pages at the
path from the root till leaf page (except root page - for large number
of object it will be 3 pages), then database will be able to perform
about 30 inserts (or searches) per second. So inserting 1 million
objects in this worst scenario will take about 10 hours!
Fortunately real performance is significantly better (because of page
cache hits, OS file cache ...)
There are two obvious ways of improving performance:
Look for new version at my homepage |
E-Mail me about bugs and problemsfalse
.
To store modified object in database, programmer should explicitly call
store()
or modify()
method.
With AspectJ it is not needed: persistence aspect will automatically load
referenced object and mark it as modified when values of some of object fields
is changed. Also there is no need to derive all persistent capable classes
from Persistent
base class, it can be easily done using AspectJ
declaration construction.org.garret.perst.aspectj
. To built it you can use
perst/src/compile_aspectj.bat
batch file (certainly AspectJ
should be properly installed before). It will build
perst/lib/perst_aspectj.jar
file.
Each persistence capable method must implement
org.garret.perst.aspectj.AutoPersist, which is a marker interface with no
methods. However the user can avoid this by writing a simple aspect to
define the persistence capable classes like this:
public aspect PersistentClasses {
declare parents: mypackage.* implements AutoPersist;
}
Then you can build you application like this:
ajc *.java -aspectpath /perst/lib/perst_aspectj.jar
Restrictions of AspectJ API to Perst:
perst/tst/aspectj
there is example of application using
AspectJ interface to Perst: Guess. It is the same example "Guess an animal"
as located in perst/tst
directory (which is not using AspectJ),
but it doesn't contain explicit calls of store()
methods.
Also original Guess example recursively load the while game tree when root
object is accessed. And AspectJ version fetches object instances only on demand.
You can build example using compile.bat
script and run using Guess.bat
.
Using JAssist
JAssist
is yet another popular product for transformation of Java byte code.
It is also possible to build transparent API to Perst using JAssist package.
Despite to the fact that it is much simpler and compact than AspectJ, its API makes it possible
to implement even more convenient and flexible Perst interface than AspectJ and with less
limitations.!isSelfReader()
condition.
In this case Perst will not be able to handle access to foreign (non-this) fields.
You should use getter/setter methods instead.
Alternatively, you can replace this condition with true
,
allowing access to foreign fields but with significant degradation of performance and
increased code size, because in this case before ALL accesses
to fields of persistent capable object call of load() method will be inserted.
Modified versions of JAssist files are included in Perst distribution in perst/src/org/garret/jassist/edit
directory.
java.lang.Object
,
then replace superclass with org.garret.perst.Persistent
Persistent.load()
method.
Persistent.modify()
method
recuriveLoading
method returning false
.
method
package com.mycompany.mypackage;
import org.garret.perst.jassist.PerstTranslator;
import javassist.*;
public class MyApp {
public static void main(String[] args) {
Translatator trans = new PerstTranslator(new String[]{"com.mycompany.*"});
ClassPool pool = ClassPool.getDefault(trans);
Loader cl = new Loader(pool);
cl.run("Main", args);
}
}
In this example all classes from com.mycompany.mypackage
except
MyApp will be loaded by JAssist class loader and automatically made persistent capable.compile-aspectj.bat
file in perst/src directory. javassist.jar
file is included in Perst distributive
and located in perst/lib directory. You can find example of using JAssist interface
in perst/tst/jassist
directory. PERST implementation issues
Below is more detailed description of PERST implementation.
This section can be skipped if you are not interested in the details of implementation: Memory allocation
Memory allocation is performed in PERST by bitmap. Memory is allocated in
chunks called allocation quantum. In current version of PERST size of
allocation quantum is 64 byte. It means that size of all allocated objects is
aligned on 64 byte boundary. Each 64 byte of database memory is represented by
one bit in the bitmap. To locate hole of requested size in bitmap, PERST
sequentially searches bitmap pages for correspondent number of successive
cleared bits. PERST use three arrays indexed by bitmap byte, which
makes possible fast calculation of hole offset and size within the byte.OutOfMemory
error
is reported. Shadow transaction mechanism
Each record (object) in PERST has unique identifier (OID).
Object identifiers
are used to implement references between objects. To locate object by
reference, its OID is used as index in array of object offsets within the file.
This array is called object index and element of this array -
object handle. These are two copies of object
indices in PERST, one of which is current and other - shadow.
Header of database contains pointers to both object indices and indicator
which index is current at this moment.Storage.rollback
method, shadow object index is copied back to the current index, eliminating
all changes done by aborted transaction. After the end of copying,
both indices are identical again and database state corresponds to the moment
before the start of current transaction.dirty
flag is set in database header), then PERST performs
database recovery. Recovery is very similar to rollback of transaction.
Indicator of current index in database object header is used to
determine index corresponding to consistent database state and object handles
from this index are copied to another object index, eliminating
all changes done by uncommitted transaction. As far as the only action
performed by recovery procedure is copying of objects index (really only
handles having different values in current and shadow indices are copied to
reduce number of modified pages) and size of object index is small,
recovery can be done very fast.
Fast recovery procedure reduces "out-of-service" time of application. Where to use
PERST is simple and fast embedded database engine.
If your applications need embedded database engine and do not need to execute complex SQL queries,
and the only thing you need is to be able to store/fetch/locate object in the database using navigation
through references or indexed search by key, then PERST is what you need. It will provide much better performance
than relational database and other (more sophisticated) object oriented database.
Certainly nothing in this world can have only positive features.
No contra list which contains features lacking in PERST:
Quick start
PERST is distribute together with build perst.jar
. You can also build it yourself using
compile.bat
script in src
directory.
The only thing you need to work with the database, is to include this JAR in your classpath.
Template of PERST application is the following:
import org.garret.perst.*;
public class YourPersistentClass extends Persistent {
int x;
String y;
Link links;
...
void doUpdate() {
x = 1;
y = "Hello World";
links = getStorage().createLink();
store(); // save changes in the database
}
}
public class YourRoot extends Persistent {
YourPersistentClass ref;
Index myIndex;
void foo() {
ref.doUpdate(); // referenced object has no to be explictely loaded
}
};
public class YourApplication {
static public void main(String[] args) {
String databaseFilePath = args[0]; // path to the database file
int pagePoolSize = Integer.parseInt(args[1], 10); // size of page pol in bytes
Storage storage = StorageFactory.getInstance().createStorage();
storage.open(databaseFilePath, pagePoolSize);
YourRoot root = (YourRoot)storage.getRoot();
if (root == null) {
// Storage was not initialized yet
root = new YourRoot();
root.myIndex = storage.createIndex(String.class, true); // unique index
storage.setRoot(root);
}
root.foo(); // all objects referenced from root are implicitly loaded
YourPersistentClass obj = root.myIndex.get(new Key("John")); // find object with key "John"
... // do something else with database
storage.commit(); // commit transaction
... // do something else with database
storage.close(); // commit the last transaction and close the database
}
}
tst
directory and can be build using compile.bat
script.JDK 1.5
Perst provides also API for JDK 1.5. It includes generic (parametrized) versions
of Perst container classes. To build it, change directory to perst/src15
and run compile.bat
script. It will build perst15.jar
(precompiled version is included in distributive).
You can also try TestIndex example for JDK 1.5 located in directory
perst/src15/tst
. This example illustrates using if Perst generic
containers and new Java foreach construction.Tuning
This sections contains several hints how to adjust PERST parameters and increase database performance.Storage.open
method (by default it is 4Mb). Usually increasing page pool leads to better performance. But you should
notice the following things:
-Xmx
parameters, but if you do not specify it, maximal size of
of page pool should not be greater than 32Mb (at least for Sun's JVM).
Storage.INFINITE_PAGE_POOL
constant in Storage.open
method.
In this case page pool will be dynamically extended when new page is requested.
As a result, all pages will be cached and present in memory. So each page is read from the disk only once.
Also in this case strong object cache is used instead of weak object cache. It means that all fetched
objects are also pinned in memory and object is unpacked only once. It is important to notice
that amount of used main memory will be greater than database size: all objects will be present in memory in packed
(inside the page containing the object) and in unpacked (referenced from object cached) form.NullFile
implementation of IFile
interface together
with Storage.INFINITE_PAGE_POOL
to create transient in-memory database. Data in this case will never be
written on the disk.StorageImpl
class which has influence on
initial and maximal database size. If you want to change them, you will have to rebuild PERST.
Below is detailed description of this parameters:
Parameter Default value Description dbDefaultInitIndexSize
1024
Initial object index size. Object index is increased on demand. Reallocation of index is
expensive operation and so to minimize number of such reallocations, object index size is always doubled.
Specifying larger initial index size allows to reduce number of future reallocations and so a little bit increase
performance (certainly if your application allocates such number of object). But it also leads to larger initial
size of database file. With default value of this parameter, initial database size is about 50Kb.
dbDefaultExtensionQuantum
4Mb
Database extension quantum.
Memory is allocate by scanning bitmap. If there is no large enough hole, then database is extended by the value of
dbDefaultExtensionQuantum
. Increasing the value of this parameters leads to less frequent
rescanning of allocation bitmap from the very beginning. It leads to faster allocation speed and better locality of
reference for created objects (because there is more chances that them will be allocated sequentially).
From the other side it leads to less efficient memory usage. Reducing the value of this parameter force reusing
of existed holes in memory allocation bitmap.
dbObjectCacheInitSize
1319
Size of object cache. PERST needs this cache to check if object with such OID already
present in memory. This cache uses weak references to allow garbage collector to do it work.
When some threshold of filling cache is reached, cached is reallocated by doubling its size.
Once again increasing this parameter can save some number of cache reallocations.
Tricks and tips
recursiveLoading
method in some classes. You should explicitly load all object referenced
from object with disabled recursive loading.
Storage.setProperty("perst.serialize.transient.objects")
.
Perst will correctly handle reference to persistent object from serialized transient objects.
But it is important to notice, that identity of saved transient objects is not preserved.
If there was transient object O referenced from persistent objects P1 and P2, then when
objects P1 and P2 are stored, instance of O object will be serialized both in P1 and P2 record.
And when P1 and P2 object will be reloaded from the database, them will refer to different instances
O1 and O2. Also please notice, that is impossible to mark transient objects is modified.
Of some of referenced transient objects is updated, you should also update persistent object
referencing (directly or indirectly) this object.
Perst also doesn't remove any object referenced from removed object.
If it is needed - programmer should do it himself.
Explicit deletion of object can cause two problems:
First problem is most critical and can cause database data corruption.
To prevent this problem it is strongly recommended to use Perst garbage collector
instead of explicit memory deallocation. If it is not possible (due to performance
or some other reasons), it still can be used for debugging, since Perst GC is able to detect
both problems: it will throw StorageError(StorageError.INVALID_OID)
exception
if reference to the deleted object is found and return non zero number of collected objects
if there is garbage in the database.
Please notice that in alternative 1-3 only one thread is accessing
database at each moment of time, so it is not possible to say about
concurrent execution. But it doesn't mean that with approach 4) you
will get the best performance - because of locking overhead and
alternative B-Tree implementation which is less efficient than
original implementation for very large databases.
Also approach 4 is most difficult to program - because setting proper
locks is responsibility of programmer and incorrect locking can cause
synchronization problems: race conditions (as you have with
ArrayIndexOutOfBoundsException) or deadlocks (two or more threads
mutually block each other).