Tyler Clemons

Weka, Ruby, Association Mining

by on May.15, 2010, under Data Mining, Ruby, Weka

Weka is a collection of machine learning tools used for data mining.  Weka is written in Java however it is possible to use Weka’s libraries inside Ruby.  To do this, we must install the Java, Rjb, and of course obtain the Weka source code.  In this example, I look at Associating Mining.

Refer to my previous example for setup instructions.

Running Weka

The following is an example of a frequent itemset finder.


require 'rjb'

#———————-
def asscm()

#Load Java Jar
dir = “./weka.jar”
#Have Rjb load the jar file, and pass Java command line arguments
Rjb::load(dir, jvmargs=[“-Xmx1000M”])

#make initial association Miner
obj = Rjb::import(“weka.associations.Apriori”)
assc= obj.new

#load the data using Java and Weka
weather_src = Rjb::import(“java.io.FileReader”).new(“weather.nominal.arff”)
weather_data = Rjb::import(“weka.core.Instances”).new(weather_src)

#Find the frequent itemsets
assc.setCar(true) #mines for class association
assc.setLowerBoundMinSupport(0.25) #set minimum support
assc.buildAssociations(weather_data)
puts assc.toString

end

#———————-
asscm()

We first tell Rjb to load the specified classpath, for us it’s our Jar file.  I passed command line arguments that specify the amount of RAM to use.

Rjb::import loads specific classes.  These are relative to our classpath.

I call the constructor for the new classes by using the .new method from Ruby.  Afterward, I can use the new object like any other Ruby object.  The method names are as they are found in their Java source files.  For an explanation of data type conversions, click here.

The dataset I used is different from the previous because we can’t use numerical values.  The dataset is found inside the data folder downloaded with weka.jar.

As I said before, this can be done in JRuby.  A great example can be found at this great blog post, which inspired my post.

:, , , ,

4 Comments for this entry

  • ksenia

    Great post, thank you. I am trying to integrate WEKA with my rails app. I keep getting java error:
    ERROR retrieving feed: java.lang.NoClassDefFoundError: weka/clusterers/SimpleKMeans

    Where do you put weka.jar file?

    Thank you,
    Ksenia

  • Tyler

    In one of my old class projects, I created a subfolder called “tools” in the rails root directory. I placed the Weka files there.

    dir = “./tools/”
    Rjb::load(dir, jvmargs=[“-Xmx1000M”])

  • Ksenia

    Another question. Have you done anything with weka.core.Attribute. I am trying to do class evaluation and keep getting error ava.lang.IllegalArgumentException: Illegal pattern character ‘e’

    Here is my code:
    fastVector = Rjb::import(‘weka.core.FastVector’)
    attribute = Rjb::import(‘weka.core.Attribute’)

    inputClassVector = fastVector.new(2)
    inputClassVector.addElement(“english”)
    inputClassVector.addElement(“none_english”)
    wekaClassAttribute = attribute.new(“class”, inputClassVector)

    seems like either fastVector not correct, i get an error while trying to create new attribute

    Thank you,
    Ksenia

  • Tyler

    I think it’s parsing your second parameter as a string with a date format. Attribute has a few constructors and one of them is for date attributes. I’m not sure why it’s confusing the two lol. Try this:

    wekaClassAttribute = attribute.new_with_sig(‘Ljava.lang.String;Lweka.core.FastVector;’,”class”,inputClassVector)

    new_with_sig calls new and specifies the signature. The first parameter specifies the exact signature we want, and the remaining are the arguments.

Leave a Reply

Looking for something?

Use the form below to search the site: