Tyler Clemons

Weka and Ruby

by on May.06, 2010, under Computer Science, Data Mining, rjb, Ruby, Weka

Weka is a collection of machine learning tools used for data mining.  Weka is written in Java however it is possible to use Weka’s libraries inside Ruby.  To do this, we must install the Java, Rjb, and of course obtain the Weka source code.  One could use JRuby, but I wanted to try this method to eliminate the dependency with JRuby.

Setup

The first step is to install Java.  That’s very simple, just follow this link to download Java.  be sure to download the JDK.

Getting Rjb, which stands for Ruby Java Bridge, is a ruby-gem.  You can either install the gem using the command line:


gem install rjb

or, we can download the gem from this link. The readme inside the gem is very easy to follow.

The last step in setup is obtaining Weka.  On the download page of Weka, choose “Other platforms (Linux, etc.)”  Inside, we find a jar file called “weka.jar”  This is what we need to continue.

Running Weka

The following is an example of running the SimpleKmeans classifier.


require 'rjb'

#———————-
def kmeans()

#Load Java Jar
dir = “./weka.jar”
#Have Rjb load the jar file, and pass Java command line arguments
Rjb::load(dir, jvmargs=[“-Xmx1000M”])

#make k-means classifier
obj = Rjb::import(“weka.clusterers.SimpleKMeans”)
kmeans = obj.new

#load the data using Java and Weka
labor_src = Rjb::import(“java.io.FileReader”).new(“labor.arff”)
labor_data = Rjb::import(“weka.core.Instances”).new(labor_src)

#build the cluster and output the k-means data
kmeans.buildClusterer(labor_data)
puts kmeans.toString

#examine the particular datapoints
points = labor_data.numInstances
points.times {|instance|

cluster = kmeans.clusterInstance(labor_data.instance(instance))
point = labor_data.instance(instance).toString
puts “#{point} \t #{cluster}”

}

end

#———————-
kmeans()

This is a simple example of how to use K-means.  We first tell Rjb to load the specified classpath, for us it’s our Jar file.  I passed command line arguments that specify the amount of RAM to use.

Rjb::import loads specific classes.  These are relative to our classpath.

I call the constructor for the new classes by using the .new method from Ruby.  Afterward, I can use the new object like any other Ruby object.  The method names are as they are found in their Java source files.  For an explanation of data type conversions, click here.

The dataset is found inside the data folder downloaded with weka.jar.

I will probably look at the other classifiers and post some more example.  As I said before, this can be done in JRuby.  A great example can be found at this great blog post, which inspired my post.

:, , , ,

3 Comments for this entry

  • Ryan

    Excellent post! Helped me a lot. I am trying to get a classifier working in ruby but keep running into a “Class attribute not set!” error. Hopefully you get those classifier examples up soon 🙂

  • Tyler

    iirc that error means you have not specified which one of your attributes is the class attribute. That is usually your last variable.

    I have an example of the classifier; I made a post using decision trees –>HERE.<–

    The line I used:
    labor_data.setClassIndex(labor_data.numAttributes() – 1)

    Should set the last attribute as the class variable

  • Janit

    I want to work on building a decision tree on the ruby platform( using the data on mysql) and want to use Weka for implementing the tree. Can you please help me on email me about how to go about it?

Leave a Reply

Looking for something?

Use the form below to search the site: