Tuesday, April 21, 2015

Feature comparison of Machine Learning Libraries

Machine learning is a subfield of computer science stemming from research into artificial intelligence. It is a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model from example inputs and using that to make predictions or decisions, rather than following strictly static program instructions.

Every machine learning algorithm constitutes two phases:

1. Training Phase: When the algorithm learns from the input data and creates a model for reference.
2. Testing Phase: When the algorithm predicts the results based on it’s learnings stored in the model.

Machine learning is categorized into:

1. Supervised Learning: In supervised learning, the model defines the effect one set of observations, called inputs, has on another set of observations, called outputs.
2. Unsupervised Learning: In unsupervised learning, all the observations are assumed to be caused by latent variables, that is, the observations are assumed to be at the end of the causal chain.

There is wide range of machine learning libraries that provide implementations of various classes of algorithms. In my coming posts, we shall be evaluating the following open-source machine learning APIs on performance, scalability, range of algorithms provided and extensibility.

1. H2O
2. SparkMLlib
3. Sparkling Water
4. Weka


In the following posts, we shall be installing each of the above libraries and run one implementation of an algorithm available in all. This would give us an insight into the ease of use/execution, performance of the algorithm and accuracy of the algorithm.

No comments:

Post a Comment