Today I have finally come to the conclusion that nntp//rss, as of v0.4, will no longer use hsqldb as its embedded database. It has been a bit of a rocky road over the past few months, with memory consumption issues, and problems of reliability, especially after system failure. I’ve spent some time evaluating alternatives, and have decided that Mckoi will may be (see update below) the replacement.
I’m currently in the process of finishing up a migration process to upgrade v0.3 users’ hsqldb databases to Mckoi, and will test to see how the whole thing behaves on my 100K+ item database.
Note that this is in addition to the MySQL support that has also been implemented. The demo server has now been running on MySQL for about a month, with good results. This will play a key role in the more group-oriented features to come in later releases of nntp//rss. Which, by that time, may deserve a rethink on the name…
Update: Stop the press… somebody referred me to Axion, a “small, fast, open source relational database system (RDBMS) supporting SQL and JDBC written in and for the Java programming language”. Will be checking this out as well.
Update 2: Looks like my 100K record test is proving an interesting challenge. The first the part of the challenge being getting the data out of hsqldb. As hsqldb – to the best of my knowledge – does not implement any disk scratch strategies for queries, it creates the entire result set within memory. I’m now performing iterative queries across the items table in my nntp//rss database to extract the content. This is turning out to be a long process. Anyway, once I’ve got the data out of hsqldb and into Mckoi, Axion et al, I’m focused initially on the following areas:
- Start up speed, given a non-trivial database
- Update/Insert speed
- Querying across a large recordset (on indexed and non-indexed columns)
- Shutdown speed
Watch this space for further updates!
Update 3: Managed to port 100K database to Mckoi, but getting some strange exceptions. Still investigating. Also looking at the potential of other persistence mechanisms, e.g. JDBM and SMYLE. Axion looks interesting, but has a more limited range of features (no multiple aggregate functions within a select, no GROUP BY).
Why not postgresql ?
PostgreSQL is on the list as a future supported external database. MySQL was the first step to establishing the model for supporting external databases, and it should be pretty straightforward to support others as the need/demand arises.
As far as the embedded database goes, I need to use something that is lightweight, has zero non-automatable installation tasks, and is implemented in Java. This database is intended for users running nntp//rss as a standalone desktop app. The most important characteristics are that it is pretty conservative in terms of memory usage, is reliable, and that it does not place any burden on the user when it comes to installation.
Have you looked at Berkley DB (Sleepycat)? I’m sure that you have. Is there a strict SQL requirement, or are there other reasons for not using Sleepycat’s Berkley DB extensions?
The only problem with Berkeley DB is that it is not a pure Java implementation. Its Java API is essentially a JNI wrapper around the underlying C implementation. For the embedded database, I’m looking for a solution that is cross-platform, and doesn’t require the creation of separate packages for each OS. I’m definitely intrigued by JDBM, as it seems to offer efficient persistence in a pure Java implementation. However, I’ve yet to actually try it out, so I’ll leave any further comment until that time.