My Python 30 days of learning challenge series is in the last phase and thought out the challenge, I have explored the fundamental concepts of the Python language along with experimenting with some intermediate and advanced concepts as well. I wanted to use this challenge as an opportunity to explore another domain which I have heard a lot about but never tried anything out myself - Machine Learning and Data Science. So for the final days of this challenge, I would like to understand the basic concepts of Machine Learning and Data Science using Python and try building some projects and share whatever I learn. This would be a good starting point to explore deep into this domain in future.
What is Machine Learning?
Machine Learning is the field of computer science by which computers are able to generate algorithms and eventually improve their ability to solve problems by analyzing data provided to them. It is a sub-set or a segment of Artificial Intelligence that allows computers the ability to perform tasks without human intervention.
Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. - Wikipedia
Why Machine Learning?
There is a famous quote by Phil Knight - The founder of Nike
“Don't tell people how to do things, tell them what to do and let them surprise you with their results.”
This quote came to my mind while studying the reasons for using machine learning.
If we replace 'people' with 'computers' in the above quote it becomes
“Don't tell computers how to do things, tell them what to do and let them surprise you with their results.”
I think that pretty much summarizes why machine learning is useful and so relevant.
Here are some articles in case you are interested in knowing more
Types of Machine Learning
Machine Learning is all about predicting results based on incoming data. Machine Learning can be broadly classified into these following types:
- Supervised - Here the input data set is sort of categorized.
- Classified - The provided input data has some kind of labels to help the machine segment them based on their classifications. Such a provided data set of apples and oranges would segment apples and oranges.
- Regression - Performing tasks repeatedly to obtain results such as determining stock prices
- Unsupervised - When the input data is not classified by any labels, unsupervised algorithms are used
- Clustering - Clustering or grouping data
- Association Rule Learning
- Reinforcement - It is often also referred to as Skill acquisition or real-time learning. In this, the machine uses trial and error to determine the best outcome possibilities. For example, a computer can teach itself to play a game million times to find out to obtain the highest score.
Basic Steps of Machine Learning
Machine Learning in general, involves these main steps:
- Importing data from some source
- Cleaning up the data to remove any irrelevant data if needed
- Splitting up data into Training Set and Test Set.
- Creating a model or an algorithm or a function
- Checking the output
- Improve and repeat the above steps
Python for Machine Learning
Machine Learning or in-fact A.I is language agnostic. Python however, is one of the popular languages that is being used by Machine Learning experts, data scientists and large companies due its vast community and plethora of community-built tools made available for anyone to start exploring the possibilities of this domain.
Python basically has
- A rich library ecosystem
- A low entry barrier - Easy to pick up for someone with no prior experience in programming
- It's very flexible
- Is Platform agnostic
- Easy to read
- Huge Community
Machine Learning Tools
The tools that I would be exploring to go through the basics of Machine Learning and Data Science are:
- Jupyter Notebooks - It comes as a part of the Anaconda toolkit
- Numpy A package for scientific computing with Python
- Pandas- A library for data analysis with Python
- Mathplotlib - Library for creating visualizations with Python
There are several other tools but I would mostly go over the basics of these over the last few days of the challenge. I will be exploring other tools such as Tensorflow in future while exploring more on Machine Learning.
I decided to bookmark a few handpicked resources related to Machine Learning and Data Science which would serve as a good reference in future.
Here is an inspiring documentary which I found deeply engrossing
I am quite excited about learning this new domain using Python. I will be sharing the projects that I develop in the upcoming days and hope to learn new concepts in the journey.
Have a great one!