Wednesday 18 May 2016

New Weka 3.8.0 Stable Release

On Friday 15th April NZT we released Weka 3.8.0. I've only just got around to writing about it because, after collapsing from exhaustion right after the release went out, I required immediate vacational therapy with my son at the theme parks on Australia's Gold Coast :-)

Weka uses the Linux model of releases, where an even second digit of a release number indicates a "stable" release and an odd second digit indicates a "development" release. Weka 3.8.0 is the first stable release of Weka since 2008! There are tons of new features and improvements in 3.8.0 compared to 3.6 (the previous stable release). Furthermore, it supports the newest Weka MOOC that launched on the 25th of April and the forthcoming 4th edition of the Data Mining book.

Some of the features added since Weka 3.6 include:

A package management system

The Weka software has evolved considerably since Weka 3.6. Many new algorithms and features (too many to detail here) have been added to the system, a number of which have been contributed by the community. With so many algorithms on offer we felt that the software could be considered overwhelming to the new user. Therefore a number of algorithms and community contributions were removed and placed into plugin packages. A package management system was added that allows the user to browser for, and selectively install, packages of interest. To date, there are 177 packages that can be installed via the package manager.

Another motivation for introducing the package management system was to make the process of contributing to the Weka software easier, and to ease the maintenance burden on the Weka development team. A contributor of a plugin package is responsible for maintaining its code and hosting the installable archive; while Weka simply tracks the package metadata. The package system also opens the door to the use of third-party libraries, something that we'd discouraged in the past in order to keep a lightweight footprint for Weka.

Plugin packages for Weka can be viewed online at the central package metadata repository.

A completely rewritten Knowledge Flow

Weka's Knowledge Flow user interface received a graphical makeover a few years ago, but for 3.8 it has been completely rewritten from scratch. The rewrite includes a brand new engine that is fully multithreaded and supports pluggable execution environments. There is also a radically simplified API for developers to use. New features in the knowledge flow include:
  • Automatic execution of individual steps in separate threads
  • Single-threaded execution for streaming flows
  • Separate executor service for resource intensive steps and tasks
  • Support for attribute selection and prediction boundary visualisation
  • JSON-based flow persistence
  • Support for loading legacy .kfml flow files
  • Settings and preferences at the application and perspective level
  • User-configurable logging level
  • New and simplified API

MTJ-based linear algebra

The old JAMA-based linear algebra routines have been replaced with MTJ. MTJ provides faster pure JVM routines than JAMA and, more importantly, can seamlessly use reference and system-optimised versions of native libraries based on BLAS, LAPACK and ARPACK if available. To that end we have provided three plugin packages - one for each of the major OS's - providing JNI native libraries. Mac OSX users are lucky because system-optimised versions are pre-installed, giving the biggest speed increases. Linux and Windows users get a native reference implementation in their respective packages (which is faster than a pure JVM implementation), but will have to install a system-optimised library if they want the ultimate speed. Multiple linear regression, Gaussian processes, PCA and LDA are some of the schemes in Weka to benefit from MTJ.

Core improvements

Numerous efficiency improvements to core data structures, filters and some classifiers have been made over the years. All of these add up to better memory utilisation and faster execution.

Workbench

Aside from the Knowledge Flow, the other major graphical user interfaces in Weka (Explorer and Experimenter) have remained largely unchanged from Weka 3.6. However, one new user interface - the Workbench - has been added in 3.8.0. This is a unified graphical interface that combines the other three (and any plugins that the user has installed) into one application. The Workbench is highly configurable, allowing the user to specify which applications and plugins will appear, along with settings relating to them.