The JVM Fanboy Blog 

Apache Derby part 1: Introduction

by vincent


Posted on Tuesday Jul 07, 2015 at 12:08AM in Relational Database


Apache Derby part 1: Introduction

This will be a series of blog posts about Apache Derby, an open-source "traditional" related database management system, fully implemented in Java.

I've been a happy Derby user for a few years now (in fact, this blog is powered by a Derby database) and I'd like to share my enthusiasm. As I don't see Derby mentioned a lot in recent blogs, articles etc., I thought maybe this article will be of some help for people who are looking for more information. I've used Derby successfully in several smaller-scale web applications.

This first part of the series is a generic introduction, the following parts will be much more technical and actually contain code :) .

History

The early years

The history of Derby is widely documented.

In 1997 a start-up called Cloudscape Inc released a commercial database management system called Cloudscape (initially it was released with the generic "JDBMS" name). Two years later Informix Software acquired Cloudscape. In 2001 IBM acquired their database assets and renamed it to IBM Cloudscape.

IBM donated the code to Apache Software Foundation in 2004 and one year later it became an open-source sub-project of the Apache's DB top-level project. IBM continued to sell IBM Cloudscape by taking a snapshot of the then-current Apache Derby version, adding installers, offering support contracts and adding additional proprietary components that provided ODBC and Microsoft MDAC compatibility, among other things. IBM retired the IBM Cloudscape product in 2007. The additional proprietary components are no longer available.

Sun/Oracle JavaDB

Sun joined the project when the code was donated to the Apache Software Foundation. They took a snapshot of the then-current Apache Derby version, added it to the Java Developers Kit (JDK) and called it the "JavaDB" component. They did not modify one single line of code, the project name is still Apache Derby, but Sun/Oracle call it JavaDB in their documentation. So every user of a recent JDK has access to a full copy of Apache Derby. JavaDB was Derby 10.6 for Sun's JDK 6, 10.8 for Oracle's JDK 7 and 10.11 for JDK 8.

While JavaDB is very convenient while testing Apache Derby, I'd normally recommend to use the latest available Derby version for production use (10.11.1.1 released on 27th of August 2014 at the time of this writing).

Why choose Apache Derby?

Apache Derby was not created to replace generic, full enterprise-ready DBMS'es like Oracle Database, IBM DB2, Postgresql, etc. Derby was, in my opinion, also not envisioned with "big data" applications in mind. One of the reasons is that Derby can store its database on one logical disk only (unless RAID is used...). Also, LOB fields are 'limited' to 2GB each. 

An old 2005 presentation sheet from the IBM website about IBM Cloudscape (that was still on-line at the time of this writing) suggests that Derby should be used when the total stored data is expected to be 50GB or less. Note that this is not a documented maximum, there does not seem to be a hard-coded maximum database size (apart from the mentioned 2 GB per LOB field). I should note many of the shortcomings mentioned in the presentation sheet have been fixed by the Derby developers in the last 10 years.

If your applications can live without storing terabytes worth of data and serving hundreds of users concurrently, here are some of the highlights that I am aware of:

  • Derby can run in two modes: Client/Server and Embedded mode, more on this below.

  • The engine makes full use of JVM's powerful multi-threading, features. I'd call the performance very good (but remember the name of this blog, I'm a bit biased on JVM performance probably). I'd like to create some proper benchmarks, but I'll have to research how to do this carefully.

  • Stored Procedures can be written in any JVM language that can compile classes that implement Interfaces. Those classes are then loaded by Derby and can be triggered using SQL or JDBC code.

  • You can create Derby-style table functions by creating custom classes that implement the standard JDBC "Resultset" interface. Those classes are loaded in Derby and are available to SQL queries like normal, but read-only, Derby database tables. This is a great feature to import foreign data from external text/XML files, other databases (perhaps even NoSQL(!) databases), etc. The user can use the full SQL syntax to filter the data dynamically returned by the object. This is very powerful stuff. "Yes SQL", please! :)

  • A more optimized version of table functions are also available, the class must additionally implement one of Derby's own interfaces. Classes that will do this will be much quicker when the WHERE-clause is used for example, but give up some flexibility as the columns and their data-types must have been declared beforehand.

  • Custom data types are supported, but classes must implement the standard Serializable, or, preferably, the Externalizable interface, so this will in many cases not work on your classes without some additional work.

  • In Embedded mode, Derby can load JAR files stored in the database as if they were on the user's ClassPath.

  • Temporary in-memory only databases can easily be created. Those databases are removed from memory when disconnecting or shutting down the server. Those are supported both in Embedded and in Server/Client mode.

  • The newest version contains (currently experimental) built-in support for the Apache Lucene high-performance search indexer project.

  • SQL support is getting better and better with each release. New is the MERGE statement that INSERTS, UPDATES and DELETES records with one single statement.

  • Derby comes with built-in support for the Java Management Extensions (JMX) , to remotely monitor the Derby processes in real-time.

  • For testing and development purposes only, Derby comes with a simple Servlet web-application that can be used to do some common administration tasks

Embedded / Client/Server Modes

Embedded Mode 

Derby comes with a JDBC driver that contains its complete database engine embedded in it. The JAR file containing this driver is only about 2.5 MB (yes, still a lot bigger than the public-domain SQLite written in C, but still very acceptable). In Embedded Mode applications store their databases on the local file-system, or in temporary in-memory only databases.  Databases are available on a single JVM instance only, but can still accept multiple connections and fully supports multi-threading (but, as mentioned, those are not accessable from other JVM instances). 

As mentioned in the previous section, in this mode Derby can load JAR files stored in the database as if they were on the user's classpath. This, like stored procedures written in Java, requires some extremely careful planning on the developer's side, but could probably be useful in some specific situations. Also, the driver has some additional features to load read-only databases directly from local JAR files that are available on the user's classpath.

Client/Server Mode

In Client/Server Mode, a Derby Network Server runs on a server that is available via a standard TCP/IP port. A specific Derby Network JDBC Driver is now used to connect client connections to the databases. In this mode databases are available to any user that can connect to it, from any JVM instance.

Several options of user authentication are available, including storing name/password in simple plain text Java properties file format, using a LDAP server or storing usernames and passwords directly in the Derby database. User authorization can be controlled per database by standard SQL GRANT/REVOKE commands, or users can be grouped in either READ/WRITE or READ-ONLY groups. The authorization features could be improved, from what I understand any user that has server access level can shutdown the Network Server with a single command... :(

A standard Java Security Manager can be used to restrict the rights of the database server.

Derby in use

  • Derby only supports the JDBC standard to let clients connect to its databases. Many non-JVM programming languages, like Node.js and Python, have libraries that provide basic JDBC support.

  • As mentioned, Derby's SQL support is strong. It is quite strict though. If you, like me, are familiar with default Postgresql setups, it will take some time to get used to put every schema, table and column names in your SQL queries between double quotes, because otherwise they are automatically stored in UPPERCASE format. Once you use quoted names, working with unquoted names does not work any more ("SELECT * FROM person", does not work when the table name is "Person"). This restriction is part of the official SQL standard, but most DBMS'es are more forgiving or can be configured to be completely case insensitive. As far as I know this is not possible with Derby. I believe, however, that being forced to write strict SQL queries that adhere fully to the official SQL standard is not necessarily a bad thing.

  • I believe most major ORM libraries offer Apache Derby support. Oracle's JavaDB component probably helps here. For example, the popular Hibernate ORM comes with a Derby dialect.

  • Derby comes with the "ij" command-line tool that can be used to create databases, write queries, etc. Since this is a default Java console application, I really miss some basic features like cursor keys support. I believe this can be added by adding some external JAR files to the classpath, but could not get this to work yet unfortunately. This makes using this application a frustrating experience to me every time I boot it up.

  • Luckily Derby is supported by a lot of Java developer and JDBC SQL tools. For example, I am very happy with NetBeans IDE's Databases features on the Services tab. I find it a joy to add records, tables to, and execute queries on my Derby database this way. I'm sure the other famous IDEs offer similar features as well.  I had reasonable success with SQuirreL SQL Client as well.

  • The quality of the standard documentation is generally quite good, especially for an open source project. I presume this is because of the commercial roots of Derby. The organization of the information sometimes leaves something to be desired, as content is spread between different PDF/HTML files and sometimes content you'd expect to be in one manual is actually described in one of the others.

To be continued...

In the next installment of this series, i plan to write about setting up and configuring Derby.



No one has commented yet.
Comments are closed for this entry.