Introducing Apache Commons Runtime

February 8, 2011

Blogs

Actually this is pretty old post. I’m currently redefining the Apache Commons Runtime as a concept as well as a project. I decided to move some parts of it outside the ASF since I wish to have a total control over the code as an individual rather then a community member. I might put it backs to ASF since it’s Apache licensed, but cannot be sure.

In today’s world Java is used mainly to develop server based applications. On the other hand Java doesn’t offer API that can fully benefit from the features most modern operating systems provide.

The reason why I started the Apache Commons Runtime was the lack of features inside two other projects I’m working on: Apache Commons Daemon and Tomcat Native. Those two projects cover two areas in developing server based applications but are not related one to another in any way. Tomcat Native uses APR for creating high performance networking applications, and Apache Commons Daemon transforms the Java applications into daemons (or services for ones coming from Microsoft world) allowing those server based applications to run as such.

Major feature missing is a concept from another project I’m working on: The nice and shiny Apache Httpd that basically drives the nowadays Internet. That feature is graceful restart which allows zero down time for networking applications. In theory it is very simple concept. You create a socket in one process that then creates a process that actually handles the connections. Since one can bind a socket to an port only once, it has to be done only once for any ip:port combination. That socket is passed to the handler process allowing unlimited number of them. When you need a zero down time restart, you just create a new process informing the old one to stop accepting new connections. The old process will handle all pending connections until they are all closed and then shut down itself. However the new process will continue accepting any new connection thus basically although you restarted the process your server didn’t refuse any connection. Something like that is impossible with Java, because there is no way to pass the native object descriptors between distinctive processes. This leads to the service down time during shutdown/restart period which can last from few seconds to couple of minutes. Something like that is obviously not acceptable within the production systems. The standard way of handling restarts is by introducing multiple backend instances and using front end proxy that knows how to handle the session affinity and then restarting the backend nodes in a sequential order. Anyone that tried to setup such system will know how painful the maintenance of such systems can be.

To be able to pass the native object descriptors from one process to another inside JVM we need a native JNI layer that would allow us to duplicate those objects inside child process. Unfortunately duplicating is not enough because you don’t have an option to create Socket object directly from the native descriptor using standard Java runtime classes. Thus the complete Socket class rewrite was needed and we already had that inside Tomcat Native project. But…

The problem with APR

Tomcat Native uses Apache Portable Runtime (APR) that is operating system abstraction layer. When you look at the APR it is basically what rt.jar is for Java developer. It offers consistent API across multiple operating systems. However APR has few minor and one major drawback. The major one comes from design perspective for using it inside Apache Httpd. For each and every operation APR uses it’s own memory pool which allows easier programming by taking care of object life cycle and memory leeks. Sounds familiar? When you try to embed such system inside the environment that has it’s own memory and life cycle management system like Java with its Garbage Collector you are obviously duplicating things. Not only that, APR pools are not thread safe, so everything accessing the pools has to be synchronized. Inside Tomcat Native the APR pools are used in a way they were not designed for. Each and every object creates it’s own pool, thus introducing a huge memory overhead for small objects. Each APR pool when created uses 8K of memory which makes an 90% memory overhead for each socket. One other major problem is object destruction which makes things really hard to sync with Java’s GC which decides to destroy objects on its’s own order. APR requires however to destroy the objects in LIFO order, and that requires some nasty tricks just to prevent the JVM from crashing. Inside Tomcat Native we represent each native object by primitive long value, which allows zero GC for those objects, but on the other hand it doesn’t allow to use those objects as standard Java objects. Writing InputStream implementation is almost impossible because there is no way to inform the object about native’s object life time. At the end we cannot pass the Servlet socket to a third party library for custom processing, which is something anyone that tried to use the PDF library with Tomcat Native is aware of.

For Apache Commons Runtime we are using forked APR. Well, it’s not actually forked, but rather rewritten without using APR pools, and without the things that just sits there doing nothing. Like any abstraction layer APR comes with huge code base that is already done inside JVM like string, xml, date and url processing, to name a few, so we obviously don’t need that code.

Design

Apache Commons Runtime major design requirement was the ease of use. Java developers are simply scared of anything having native in the title, because this sort of things break their vision of the JVM as platform independent system. Thus we have a nice automatic native loader that loads all the needed libraries in the correct dependency order directly from the.jar file. All the user has to do is to call the o.a.c.r.Library.load method and the runtime will dynamically load the native library according to the operating system and the cpu JVM is running on.

Supported Operating systems

Unlike JDK itself, the Apache Commons Runtime is focused on the platforms that are used for server type of applications. Also since it’s a new projects we can focus on future, because applications using it still have to be developed. This means that selection of supported operating system is based on their deployment for server based applications

Initially supported operating systems are:

  • Linux
    Kernel 2.6.18 and later. The reason is because we need the kernel to support the AIO
  • Solaris
    Version 10 and later.
  • Mac OS X
    Version 10.5 ad later
  • HP UX
    Version 11 and later on PARISC and Intel Itanium

Adding support for other operating systems is relatively easy because the way how native source code is organized. Each platform has it’s own implementation directory with its own source files for platform specific code, together with common shared code.

This also allows to have strictly platform specific code together with corresponding Java classes. Windows Registry and WMI as well as SE Linux are examples of those platform specific features. Exception will be thrown if one tries to use Windows Registry classes on Linux platform. Although not portable it allows to write the portable applications that can behave differently and use platform specific features depending on the platform the application is running on. For example if the application is running on windows the developer can choose to store some configuration data inside windows registry, but use files or something else on other platforms.

Where to use it and for what

Hard to answer, but its primary target would be high performance server applications. As an example take a look at any networking server that occasionally needs a full restart.

Although modern Java application consists usually of some container that uses class reloading and dynamically reloads almost all classes that make a server application, if you need to reload the container itself, you will need a full restart. Also if your system starts throwing memory exceptions you will need to reconfigure the JVM and again do a full restart. With Apache Commons runtime it is possible to modify the application to be aware of multiple generations and while you start a new application instance with reconfigured JVM the old generation will still continue to serve the pending requests.

Other feature that is consequence of passing the socket descriptors between processes is the possibility to run the connection handler or application itself under a different security context from the one used to create those sockets. On Unixes only the privileged processes can create sockets that are bound to ports lower then 1024. However running a process under such high privileges is a security concern, and Apache Commons Runtime allows to dynamically downgrade the security context under which the application runs. Unlike traditional Apache Commons Daemon where the effective user id is only changed, here you can fully change the user credentials. This means that your custom application will not be able to create any objects that require the privilege not granted to the effective user.

As final you have an option to launch multiple child instances or workers. This increases the application availability, because if one of the instances dies because of JVM issues or faulty code, only the client connections that were bound to that instance will be lost. Today this is usually done by running multiple application instances on the same box and fronting it with Apache Httpd web server with mod_proxy connecting to those instances. This of course introduces additional point of failure (actually three of them) into the system.

Additional benefit of worker concept is that you can fine grade your JVM. Since multiple instances of the JVM can be launched there is no need to use huge memory settings or use 64-bit JVM’s that will allow +3GB limit since multiple worker instances can be launched each serving limited number of connections.

Conclusion

Apache Commons Runtime is currently in development stage as sandbox project meaning it’s currently my own toy. When I finalize the API I’ll apply for a standard commons project. Any comments are more then welcome, but keep in mind that sandbox mean:

Don’t knock over my castle.
If you want to create your own castle, then ask for your own sandbox
(I’ll copy your castle later if it looks better than mine).

Subscribe

Subscribe to our e-mail newsletter to receive updates.

2 Responses to “Introducing Apache Commons Runtime”

  1. Mladen Says:

    Testing comment system. I’m still new to WordPress, so please bare with me :)

    Reply

Trackbacks/Pingbacks

  1. Tweets that mention Introducing Apache Commons Runtime | Syndicate Of Ideas -- Topsy.com - February 12, 2011

    [...] This post was mentioned on Twitter by Jean Louis Boudart, Henri Gomez. Henri Gomez said: Introducing Apache Commons Runtime : http://t.co/4Msb5Zd [...]