"Hidden Markov modeling is a powerful statistical framework for time varying quasi-stationary process..." I found this to be a VERY succinct statement of what Hidden Markov models are best for.
Here is another set of quotes from the same chapter that I found interesting:
"...regardless of the practical effectiveness of HMM...it should not be taken as the true distribution form ..."
"...HMM is not going to achieve the minimum error rate implied in the true Bayes MAP decision.""This motivates effort of searching for other alternative criteria in classification design...MMI (maximum mutual information) and MDI (minimum discriminative information)..."Though I've heard this before I felt it was well stated here and important to remember. Basically, I believe they are saying that HMM are effective in practice and this gives us the ILLUSION that it is the true distribution but in fact it is not. HMMs are not going to achieve the minimum error rate of MAP even if they achieve a good estimate. Again, this is something that is easy to forget when you use them regularly.
"..without knowledge of the form of the class posterior probabilities required in the classical Bayes decision theory, classifier design by distribution estimation often does not lead to optimal performance.""This motivates effort of searching for other alternative criteria in classification design...MMI (maximum mutual information) and MDI (minimum discriminative information)..."I especially liked this because it reminded me that HMM is distribution "estimation" and it linked together, for me, the reasoning for exploring MMI and MDI. I've often wondered these other criteria are used and this passage made it clear to me why they are explored.
I ended up putting down "Pattern Recognition in Speech and Language Processing". When scanning through the pattern recognition book below, I found myself loosing interest int he Chou book and anxious to pick up the Bishop book. So this week I started reading http://books.google.com/books?id=kTNoQgAACAAJ "Pattern Recognition and Machine Learning" by Bishop. I am finding it easier for me to understand. Mostly because the amount of new material that I haven't been exposed to isn't as dense. I'm only about 1/2 way through the first chapter but the review is good for me. I'm excited to get to the Neural Network parts because all my study of Neural Networks to date has been about building classifier networks. I'm also interested in building a network that predicts and actual value.
I came across this paper this week as well: "An empirical comparison of supervised learning algorithms" by A Niculescu-Mizil, R. Caruanahttp://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf. In it the compare the performance ofI also stumbled across this article this week: "Natural Language Processing (almost) from Scratch" by Collobert et. al. http://leon.bottou.org/morefiles/nlp.pdf . I've scanned through it quickly and hope to dig into it further when I have more time.
- Boosted Trees
- Random forests
- Bagged Trees
- Support Vector Machines
- Neural Networks
- k nearest neighbors
- Boosted Stumps
- Decision Trees
- Logistic Regression
- Naive BayesI'm only familiar with the highlighted ones and am interested in looking into the others when I get a chance. It was interesting that the paper said that Neural Networks seem to be the best choice for general purpose machine learning though many of the other techniques can perform better if you tune them to your problem.