Training a Machine to Learn

Machine learning, its a buzzword that many people like to throw around when discussing a myriad of tech related subjects. Often times as a way to convey that this product or that service is “state of the art” and “cutting edge” because it uses machine learning. While in some instances that can absolutely be the case, in most modern applications its a bit of an oversell. Regardless, teaching a computer to do a task, and then improve at it as more data is given is no small feat and takes a lot of know how to do properly and efficiently. In this blog post I hope to share the things I have learned regarding machine learning and how it works, to hopefully help others who found themselves in a similar spot to me. Which is to say they somewhat understand how it works, but dont exactly know the specifics behind machine learning.

So how does machine learning work? In its essence machine learning is a set of techniques that give computers an ability to learn and improve at a task without being specifically programmed to do so. This is what most people familiar with the term understand it to be. But as with many things we can go deeper. The underlying science of actually having the machine learn its task is pretty complex, and I wont even pretend that I somehow taught myself the math and logic behind it in the time I spend researching for this blog post. Simply put, to start the process of machine learning you need to force a computer to run through its set task over and over again while introducing more data points and allowing it to sort them out on its own. This is not as hands off as many people may think. Just because you wrote the initial algorithm doesn’t mean you can just run it on a dataset and call it a day. Not only do you have to procure and categorize the initial training datasets, you also need to guide the machine in its initial learning process. And I’m not talking a dataset of tens or even a few hundred data points. For any machine learning algorithm to be successful you realistically need thousands of data points in each set. The datasets also need to vary in their content in order to give the algorithm the best chance at learning its task. Take for example a common use of machine learning, image identification. Lets say you want to teach a computer to identify photos that contain a monarch butterfly. You write your algorithm and now you need to procure a dataset to teach it. What kinds of photos should you train the algorithm on? The simple answer is obviously photos containing monarch butterflies. And while this approach is sound, it leaves the door open for false positives/negatives. Realistically you need a balance of photos containing monarch butterflies, photos not containing monarch butterflies, photos containing things that look like but aren’t monarch butterflies, photos containing other species of butterflies and so on. Now keep in mind you realistically need thousands of data points to teach a machine to do something, and each data point in the training dataset needs to initially be categorized by a human, So now take each of those individual data variants and multiply them by at least a 1000 and you can see where teaching a machine to do something can become very difficult. This isnt to say that you cant teach a machine to do something with much less data, it just wont be as accurate and reliable as a machine that was taught using a much much larger dataset. Now with all the data that has been given to the monarch butterfly identifying machine, it can begin to chew through it and identify data patterns between the images, and begin to form its own method of categorizing the initial training images. There is an entire post graduate field of study dedicated to what actually happens INSIDE the algorithm, so I will definitely not be covering the ins and outs of that here. Simply put, the algorithm comes up with its own way to categorize these images, and applies that method to any new data that comes in, continuously adjusting its own model with each additional data point. This in its absolute simplest form is how many machine learning algorithms work.

Leave a comment

Design a site like this with WordPress.com
Get started