The modern digital enterprise collects data on an unprecedented scale. Andrew Ng, currently at startup deeplearning.ai, formerly chief scientist at Chinese internet giant Baidu and co-founder of education startup Coursera, says, like electricity 100 years ago, “AI will change pretty much every major industry.” Machine Learning (ML) is a popular application of AI that refers to the use of algorithms that iteratively learn from data. ML, at its best, allows companies to find hidden insights in data without explicitly programming where to look.
Applications built based on ML are proliferating quickly. The list of well-known uses is long and growing every day. Apple’s Siri, Amazon’s recommendation engine, and IBM’s Watson are just a few prominent examples. All of these applications sift through incredible amounts of data and provide insights mapped to users’ needs.
Why is ML exploding in popularity? It is because the foundational technology in ML is openly available and accessible to organizations without specialized skill sets. Open source provides key technologies that make ML easy to learn, integrate and deploy into existing applications. This has lowered the barrier to entry and quickly opened ML to a much larger audience.
In the past two years, there has been an explosion of projects and development tools. The vast majority of consequential ones are open source. TensorFlow, just one key example, is a powerful system for building and training neural networks to detect and decipher patterns and correlations, similar to human learning and reasoning. It was open-sourced by Google at the end of 2015.
Main Languages for ML – Open Source Dominates
Open source programming languages are extremely popular in ML due to widespread adoption, supportive communities, and advantages for quick prototyping and testing.
For application languages, Python has a clear lead with interfaces and robust tools for almost all ML packages. Python has the added benefit of practically ubiquitous popularity. It is easy to integrate with applications and provides a wide ecosystem of libraries for web development, microservices, games, UI, and more.
Beyond Python, other open-source languages used in ML include R, Octave, and Go, with more coming along. Some of these, like R and Octave, are statistical languages that have a lot of the tools for working with data analysis and working within a sandbox. Go, developed and backed by Google, is new and is an excellent server and systems language with a growing library of data science tools. Its advantages include compiled code and speed. Its adoption rates are increasing dramatically.
Python Tools and Libraries for ML – An Introduction
The amazing strength of open source is in the proliferation of powerful tools and libraries that get you up and running quickly. At the core of the Python numerical/scientific computing ecosystem are NumPy and SciPy. NumPy and SciPy are foundational libraries on top of which many other ML and data science packages are built. NumPy provides support for numerical programming in Python. NumPy has been in development since 2006 and just received US$645,000 in funding this summer.
SciKit-Learn, with 20k stars and 10.7k forks, provides simple and efficient tools for data mining and data analysis. It is accessible to everybody, and reusable in various contexts. Built on NumPy, SciPy, and matplotlib, SciKit-Learn is very actively maintained and supports a wide variety of the most common algorithms including Classification, Regression, Clustering, Dimensionality Reduction, Model Selection, and Preprocessing. This is open source that is immediately ready for commercial implementation.
Keras is a Python Deep Learning library that allows for easy and fast prototyping and does not need significant ML expertise. It has been developed with a focus on enabling fast experimentation and being able to go from idea to result with the least possible delay. Keras can use TensorFlow, Microsoft Cognitive Toolkit (CNTK) or Theano as its backend, and you can swap between the three. Keras has 17.7k stars and 6.3k forks. Keras supports both convolutional networks and recurrent networks, as well as combinations of the two, and runs seamlessly on CPU and GPU.
TensorFlow is Google’s library for ML, which expresses calculations as a computation graph. With 64k stars and 31k forks, it is possibly one of the most popular projects on all GitHub and is becoming the standard intermediate format for many ML projects. Python is the recommended language by Google, though there are other language bindings.
These three superstar foundational ML tools are all open source and represent just a taste of the many important applications available to companies building ML strategies.
The Importance of ML Open Source Communities
Open source is built by communities that connect developers, users and enthusiasts in a common endeavor. Developers get useful examples and a feeling that others are extending the same topics. Communities provide examples, support and motivation that proprietary tools often lack. This also lowers the barrier to entry. Plus, many active ML communities are backed by large players like Google, Microsoft, Apple, Amazon, Apache and more.