Weka, Ruby, Decision Trees
by Tyler on May.15, 2010, under Data Mining, Ruby, Weka
Weka is a collection of machine learning tools used for data mining. Weka is written in Java however it is possible to use Weka’s libraries inside Ruby. To do this, we must install the Java, Rjb, and of course obtain the Weka source code. In this example, I look at Decision Trees.
Refer to my previous example for setup instructions.
Running Weka
The following is an example of running a Decision Tree.
require 'rjb'
#———————-def dtree()
#Load Java Jar
dir = “./weka.jar”
#Have Rjb load the jar file, and pass Java command line arguments
Rjb::load(dir, jvmargs=[“-Xmx1000M”])
#make the initial tree
obj = Rjb::import(“weka.classifiers.trees.J48”)
dtree = obj.new
#load the data using Java and Weka
labor_src = Rjb::import(“java.io.FileReader”).new(“labor.arff”)
labor_data = Rjb::import(“weka.core.Instances”).new(labor_src)
#set the class attribute, here it’s the last value, and then build the classifier
labor_data.setClassIndex(labor_data.numAttributes() – 1)
dtree.buildClassifier(labor_data)
puts dtree.toString
#examine the particular datapoints
points = labor_data.numInstances
points.times {|instance|
theclass = dtree.classifyInstance(labor_data.instance(instance))
point = labor_data.instance(instance).toString
puts “#{point} \t #{theclass}”
}
end
#———————-
dtree()
We first tell Rjb to load the specified classpath, for us it’s our Jar file. I passed command line arguments that specify the amount of RAM to use.
Rjb::import loads specific classes. These are relative to our classpath.
I call the constructor for the new classes by using the .new method from Ruby. Afterward, I can use the new object like any other Ruby object. The method names are as they are found in their Java source files. For an explanation of data type conversions, click here.
The dataset is found inside the data folder downloaded with weka.jar.
As I said before, this can be done in JRuby. A great example can be found at this great blog post, which inspired my post.
April 11th, 2014 on 1:16 am
Minor suggestion. Had to make the following otherwise ruby throws tINDENTIFER not found … )
classindexX = (labor_data.numAttributes() – 1)
labor_data.setClassIndex(classindexX)
thanks. great post.
December 4th, 2014 on 9:54 pm
How do I evaluate the model using a test dataset that I have? The code I’m using is this:
evaluator = Rjb::import(“weka.classifiers.Evaluation”).new(test_data)
evaluator.evaluateModel(adtree, test_data)
I keep getting the error:
`method_missing’: Fail: unknown method name `evaluateModel’ (RuntimeError)
March 3rd, 2015 on 6:44 am
the tIDENTIFIER error happens when you copy-paste the code, the minus sign is copied as a dash.. code works perfectly otherwise..