Tyler Clemons

Weka, Ruby, Decision Trees

by on May.15, 2010, under Data Mining, Ruby, Weka

Weka is a collection of machine learning tools used for data mining.  Weka is written in Java however it is possible to use Weka’s libraries inside Ruby.  To do this, we must install the Java, Rjb, and of course obtain the Weka source code.  In this example, I look at Decision Trees.

Refer to my previous example for setup instructions.

Running Weka

The following is an example of running a Decision Tree.


require 'rjb'

#———————-
def dtree()

#Load Java Jar
dir = “./weka.jar”
#Have Rjb load the jar file, and pass Java command line arguments
Rjb::load(dir, jvmargs=[“-Xmx1000M”])

#make the initial tree
obj = Rjb::import(“weka.classifiers.trees.J48”)
dtree = obj.new

#load the data using Java and Weka
labor_src = Rjb::import(“java.io.FileReader”).new(“labor.arff”)
labor_data = Rjb::import(“weka.core.Instances”).new(labor_src)

#set the class attribute, here it’s the last value, and then build the classifier
labor_data.setClassIndex(labor_data.numAttributes() – 1)
dtree.buildClassifier(labor_data)
puts dtree.toString

#examine the particular datapoints
points = labor_data.numInstances
points.times {|instance|

theclass = dtree.classifyInstance(labor_data.instance(instance))
point = labor_data.instance(instance).toString
puts “#{point} \t #{theclass}”

}

end

#———————-
dtree()

We first tell Rjb to load the specified classpath, for us it’s our Jar file.  I passed command line arguments that specify the amount of RAM to use.

Rjb::import loads specific classes.  These are relative to our classpath.

I call the constructor for the new classes by using the .new method from Ruby.  Afterward, I can use the new object like any other Ruby object.  The method names are as they are found in their Java source files.  For an explanation of data type conversions, click here.

The dataset is found inside the data folder downloaded with weka.jar.

As I said before, this can be done in JRuby.  A great example can be found at this great blog post, which inspired my post.

:, , , ,

3 Comments for this entry

  • Jimmy

    Minor suggestion. Had to make the following otherwise ruby throws tINDENTIFER not found … )

    classindexX = (labor_data.numAttributes() – 1)
    labor_data.setClassIndex(classindexX)

    thanks. great post.

  • Ronnie

    How do I evaluate the model using a test dataset that I have? The code I’m using is this:

    evaluator = Rjb::import(“weka.classifiers.Evaluation”).new(test_data)
    evaluator.evaluateModel(adtree, test_data)

    I keep getting the error:

    `method_missing’: Fail: unknown method name `evaluateModel’ (RuntimeError)

  • Sham

    the tIDENTIFIER error happens when you copy-paste the code, the minus sign is copied as a dash.. code works perfectly otherwise..

Leave a Reply

Looking for something?

Use the form below to search the site: