Sean Steinle | [email protected]
- cleaning/prepping data, loading into a DF, all that jazz.
- I understand this will be harder than I'm making it sound, but I'm not sure what the exact steps are.
- undergo basic sentiment analysis on all Enron emails
- general plan:
- use pre-loaded, simple tagger (likely just Naive-Bayes)
- questions:
- how to partition?
- I need tags(i.e. "blah blah blah happy email here", "POS"), right?
- get tags from pre-loaded tagger
- manual?
- general plan:
- customize base ML model
- general plan:
- tweak the original sentiment analysis model with my own features
- I'm almost certainly at the mercy of the best-fitting model out there... right?
- tweak the original sentiment analysis model with my own features
- questions:
- which feats?
- should I be reading books about Enron for more domain specific knowledge?
- unsupervised ML (Jevon mentioned this)
- which technique of ML would be most effective (and practical??)
- which feats?
- general plan:
- data analysis
- general plan:
- basic metrics:
- classifier accuracy
- classifier confidence accuracy (I would really like a probabilistic rating schema for each email, 1-10 or something with 10 being incriminating)
- effectiveness on other email corpora?
- this is probably very hard if at all possible, but it would be really cool.
- basic metrics:
- general plan: