Tuesday, 30 June 2015

CPython Integration in Weka

Continuing the interoperability in Weka that was started with R integration a few years ago, we now have integration with Python. Whilst Weka has had the ability to do Python scripting via Jython for quite some time, the latest effort adds CPython integration in the form of a "wekaPython" package that can be installed via Weka's package manager. This opens the door to all the highly optimised scientific libraries in Python - such as numpy, scipy, pandas and scikit-learn - that have components written in C or Fortran. Scikit-learn is a relatively new machine learning library that is increasing in popularity very rapidly (see the latest KDNuggets software poll).

Like the R integration in Weka, the CPython support allows for general scripting via a Knowledge Flow Python scripting step. This allows arbitrary scripts to be executed and one or more variables to be extracted from the Python runtime. Weka instances are transferred into Python as pandas data frames, and pandas data frames can be extracted from Python and converted back into instances. Furthermore, arbitrary variables can be extracted in textual form, and matlibplot graphics can be extracted as PNG images.


The package also provides a wrapper classifier and wrapper clusterer for the supervised and unsupervised learning algorithms implemented in scikit-learn. This allows the scikit-learn algorithms to be used and evaluated within Weka's framework, just like the MLRClassifier from the RPlugin package allows ML algorithms from R to be used. With both RPlugin and wekaPython installed it is quite cool to run comparisons between implementations in the different frameworks - e.g. here is a quick comparison on some UCI datasets (using Weka's Experiment environment to run a 10x10 fold cross-validation) between random forest implementations in Weka, R and scikit-learn. All default settings were used except for the number of trees, which was set to 500 for each implementation. Since scikit-learn only handles numeric input variables, both Weka's random forest and the MLRClassifier running R random forest were wrapped in the FilteredClassifier to apply unsupervised nominal to binary encoding (one hot encoding) so that all three implementations received the same input:


weka.classifiers.sklearn.ScikitLearnClassifier

This classifier wraps the majority of the supervised learning algorithms in scikit-learn. The wrapper supports retrieving the underlying model from python (as a pickled string) so that the ScikitLearnClassifier can be serialised and used for prediction at a later date.


weka.clusterers.ScikitLearnClusterer

This clusterer wraps clustering algorithms in scikit-learn. It basically functions in exactly the same way as the ScikitLearnClassifier, which allows it to be used in any Weka UI or from Weka's command line interface.

Under the hood

The underlying integration works via a micro-server written in python that is launched by Weka automatically. Communication is done over plain sockets and messages are stored in JSON structures. Datasets are transmitted as plain CSV and image data as base64 encoded PNG.

wekaPython works with both Python 2.7.x and 3.x. As it relies on a few new features in core Weka, a snapshot build of the development version (3.7) of Weka is required until Weka 3.7.13 is released. Numpy, pandas, matplotlib and scikit-learn must be installed in python for the wekaPython package to operate. Anaconda is a nice python distribution that comes with all the requirements (and lots more).

20 comments:

  1. Thanks for providing this great functionality. I am however unable to use Python as I get the message "Python Environment not available:" even though I have Anaconda 3 and have defined them env vars. Can you please advise? Thanks

    ReplyDelete
  2. Assuming that the python executable is in your PATH, and that is available to Weka when you launch Weka, then there may be an issue with write permissions for python. Where did you install Anaconda, and as which user? The easiest way to get things working is to install Anaconda into your own account as you.

    Cheers,
    Mark.

    ReplyDelete
    Replies
    1. Hello Mark. when i click “get frame fields”, rasing an error like this “java.net.SocketException: writed failed”.can you please advise?thanks.

      Delete
  3. OK, I reinstalled Anaconda 3 in my own account as you suggested and it now works. Thank you!

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. We at Coepd declared Data Science Internship Programs (Self sponsored) for professionals who want to have hands on experience. We are providing this program in alliance with IT Companies in COEPD Hyderabad premises. This program is dedicated to our unwavering participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Data Science discipline. This internship is designed to ensure that in addition to gaining the requisite theoretical knowledge, the readers gain sufficient hands-on practice and practical know-how to master the nitty-gritty of the Data Science profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.

    http://www.coepd.com/AnalyticsInternship.html

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Nice Information
    We are the best piping design course in Hyderabad, India. Sanjary academy Offers Piping Design Course and Best Piping Design Training Institute in Hyderabad. Piping Design Institute in India Piping Design Engineering.
    Piping Design Course
    Piping Design Course in india
    Piping Design Course in hyderabad

    ReplyDelete
  8. Nice Post
    Sanjary kids is the best playschool, preschool in Hyderabad, India. Start your play school,preschool in Hyderabad with sanjary kids. Sanjary kids provides programs like Play group,Nursery,Junior KG,Serior KG,and Teacher Training Program.
    play school in hyderabad, India
    Preschool in hyderabad, India
    Preschool teacher training course in hyderabad, India
    pre and primary teacher training course in hyderabad,India
    early childhood teacher training course in hyderabad, India

    ReplyDelete
  9. Good information
    Best QA / QC Course in India, Hyderabad. sanjaryacademy is a well-known institute. We have offer professional Engineering Course like Piping Design Course, QA / QC Course,document Controller course,pressure Vessel Design Course, Welding Inspector Course, Quality Management Course, #Safety officer course.
    QA / QC Course
    QA / QC Course in india
    QA / QC Course in hyderabad

    ReplyDelete
  10. Nice Post
    "Yaaron media is one of the rapidly growing digital marketing company in Hyderabad,india.Grow your business or brand name with best online, digital marketing companies in ameerpet, Hyderabad. Our Services digitalmarketing, SEO, SEM, SMO, SMM, e-mail marketing, webdesigning & development, mobile appilcation.
    "
    Best web designing companies in Hyderabad
    Best web designing & development companies in Hyderabad
    Best web development companies in Hyderabad

    ReplyDelete
  11. Wow! this is Amazing! Do you know your hidden name meaning ? Click here to find your hidden name meaning

    ReplyDelete
  12. Thanks for the article! Could I use these features with Java API? Could you please provide code example?

    ReplyDelete
  13. hi dear i make the GUI Based application on python that Classified the data.
    i send data to weka
    then weka classified these data
    then show result in python GUI application .
    need your help in these task?
    Thanks

    ReplyDelete
  14. We are glad to announce that in COEPD we have introduced Digital Marketing Internship Programs (Self sponsored) for professionals who want to have hands on experience. In affiliation with IT companies we are providing this program. Presently, this program is available in COEPD Hyderabad premises. We deem in real time practical Internship program. We guide participants through real-time project examples and assignments, giving credits for Real-Time Internship. Our digital marketing certified mentors tutor our learning people through modules of Digital Marketing in an exhaustive manner. This internship is intelligently dedicated to our avid and passionate participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Digital Marketing discipline. We upskill and master the nitty-gritty of the Digital Marketing profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.
    http://www.coepd.com/DMInternship.html

    ReplyDelete
  15. We are glad to announce that in COEPD we have introduced Digital Marketing Internship Programs (Self sponsored) for professionals who want to have hands on experience. In affiliation with IT companies we are providing this program. Presently, this program is available in COEPD Hyderabad premises. We deem in real time practical Internship program. We guide participants through real-time project examples and assignments, giving credits for Real-Time Internship. Our digital marketing certified mentors tutor our learning people through modules of Digital Marketing in an exhaustive manner. This internship is intelligently dedicated to our avid and passionate participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Digital Marketing discipline. We upskill and master the nitty-gritty of the Digital Marketing profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.
    http://www.coepd.com/DMInternship.html

    ReplyDelete
  16. We are glad to announce that in COEPD we have introduced Digital Marketing Internship Programs (Self sponsored) for professionals who want to have hands on experience. In affiliation with IT companies we are providing this program. Presently, this program is available in COEPD Hyderabad premises. We deem in real time practical Internship program. We guide participants through real-time project examples and assignments, giving credits for Real-Time Internship. Our digital marketing certified mentors tutor our learning people through modules of Digital Marketing in an exhaustive manner. This internship is intelligently dedicated to our avid and passionate participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Digital Marketing discipline. We upskill and master the nitty-gritty of the Digital Marketing profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.
    http://www.coepd.com/DMInternship.html

    ReplyDelete
  17. I wanted to implement association rules mining algorithm in python and run the algorithm on weka 3.9. Then I wanted to compare my newly design algorithm with existing association rules mining algorithms found in weka. Is there any possible way to integrate my python designed algorithm to weka tool?

    ReplyDelete