Skip to content
Christoph Kindl edited this page Jan 24, 2015 · 14 revisions

How to start the REST webservice

  1. download and extract the following TAR file into the repo directory: download -- contains the folder structure and files needed by the classifier
  2. run mvn compile exec:java -Dexec.mainClass="at.ac.tuwien.infosys.dsg.aic.ws2014.g4.t1.webservice.rest.JettyTwitterSentimentRestService" to start the webservice
  3. use GUI (gui/index.html) to test the webservice

Classifier training -- files

(Processed) training data (ARFF files) -- Sentiment140

(see http://help.sentiment140.com/for-students/ for original data)

  • 10k entries (5k+,5k-) [download](https://kindl.io/owncloud/public.php? service=files&t=99513d59c214bd98f275e8235c93ae01) (MD5: c1aed1c76841314250997141ca554d6a)
  • 20k entries (10k+,10k-) download (MD5: 347156a815170354da0e9da8aee2f544)
  • 100k entries (50k+,50k-) download (MD5: 313db80631934d4cb09bffa32031b8f1)
  • 200k entries (100k+,100k-) download (MD5: 5b69335910ce5fdad597746d34612b29)
  • 500k entries (250k+,250k-) download (MD5: a5bd44a08fa06b2b27222892f23c9b89)

Trained classifier (WEKA)

(standard options used -- see WEKA API for classifier implementations mentioned below)

  • 10k entries (5k+,5k-):

    • (attributes file): download (MD5: 4d03d6d701b623b2ca4251ae06ccf61c)
    • IBk: download (MD5: c5bba5d081e15986549841ab4dab95a6)
    • NaiveBayes: download (MD5: 66787c1e5010e22ca68c47062d027d78)
    • SMO: download (MD5: 693ce3917e0eb8385a500346f274daad)
  • 20k entries (10k+,10k-)

    • (attributes file): download (MD5: d93e97174dc684852e0c9aa9300a3f43)
    • IBk: download (MD5: 07bf8a04b137dcf27aae5b42e5e49937)
    • NaiveBayes: download (MD5: 44af3b90716d50fb91e57b3d706b640c)
    • SMO: download (MD5: 2b8781b7f8f128b9b4362bbd9813428e)
  • 100k entries (50k+,50k-)

    • (attributes file): download (MD5: 4761367340aa06ca9267464dfa41202c)
    • IBk: download (MD5: 0d08c984cc0ecb22f7346174ee1757f4)
    • NaiveBayes: download (MD5: cc0fd9556b0fc85b6cef2741885e7588)
    • SMO: download (MD5: 38af148e5919cb2a45650612660ea040)
  • 200k entries (100k+,100k-)

    • (attributes file): download (MD5: 119e9759b7e24dc325f2e73042763250)
    • IBk: download (MD5: 926d3455942f0b40cb9b6275dd92a072)
    • NaiveBayes: download (MD5: e60e9b000fdae016162f1cf752623cbb)
    • SMO: download (MD5: 3e08ef1ca8da376bd445a975f69d9709)
  • 500k entries (250k+,250k-)

    • TODO (took too long / too much resources)

Evaluation of classifier

  • 10k entries (5k+, 5k-)

    • IBk: correct: 207 instances (41.5663%); incorrect: 291 instances (58.4337%)
    • NaiveBayes: correct: 234 instances (46.988%); incorrect: 264 instances (53.012%)
    • SMO: correct: 262 instances (52.6104%); incorrect: 236 instances (47.3896%)
  • 20k entries (10k+,10k-)

    • IBk: correct: 215 instances (43.1727%); incorrect: 283 instances (56.8273%)
    • NaiveBayes: correct: 232 instances (46.5863%); incorrect: 266 instances (53.4137%)
    • SMO: correct: 259 instances (52.008%); incorrect: 239 instances (47.992%)
  • 100k entries (50k+,50k-)

    • IBk: correct: 238 instances (47.7912%); incorrect: 260 instances (52.2088%)
    • NaiveBayes: correct: 240 instances (48.1928%); incorrect: 258 instances (51.8072%)
    • SMO: correct: 272 instances (54.6185%); incorrect: 226 instances (45.3815%)
  • 200k entries (100k+,100k-)

    • IBk: correct: 242 instances (48.5944%); incorrect: 256 instances (51.4056%)
    • NaiveBayes: correct: 244 instances (48.996%); incorrect: 254 instances (51.004%)
    • SMO: correct: 262 instances (52.6104%); incorrect: 236 instances (47.3896%)