In the previous part, I mentioned that there are two ways that AI can be trained. These two training methods are named unsupervised machine learning and supervised machine learning.
The first method doesn’t require human supervision - the student can learn on its own and quickly become the master. And with recent developments, this unsupervised training method could also include an AI system training another AI system, which will be expanded upon in a later blog.
The second method requires humans to be excellent teachers to AI systems, as with most tasks, AI is unable to verify the correctness of its own results.
A typical task where AI can learn without our supervision is playing games, even the most difficult ones, like Chess and the Go Game. We simply show AI what is a “Game Over”, then tell it to try any combination of moves that both follow the rules of the game and don’t consequently lead to a Game Over. Showing only what a checkmate in chess is, and how the pieces move, is rather simple.
Teaching AI how to play computer games that end with a Game Over screen is even easier. One of Google's first computer games that AI mastered to play was an old video game from 1978 named Space Invaders. Google employees only showed AI the “Game Over” screen, told it to press any keys on the keyboard that didn’t lead to this screen, and went home. The next day when people came to work the machine was already better in Space Invaders than any human could presumably ever be. It could even predict the trap in the final move where you needed to shoot the aliens in advance in order not to be shot. All this because AI could successfully train unsupervised.
Unfortunately, nearly all AI tasks besides playing games are not as simple to formulate and the results cannot be automatically verified. Output of human face detection, detection of animals, plants or objects on pictures and video, cannot be qualified as either right or wrong by the machine. Now you may wonder, how can recognizing human faces be more difficult for AI than winning a chess game with a chess master? The reason is that we inherited the skill of recognizing faces as a valuable survival trait from our ancestors. We did not, however, inherit any skills to play chess from them. AI can learn chess all by itself because it knows when a game is won or lost. With recognizing faces, however, AI cannot automatically tell whether a face and another face belong to the same person or if it is two different people. We can, because our brains are wired to know this.
All this means that most of the time we need to supervise the AI learning process. We need to prepare the input training data and label the output as correct and incorrect. This process requires a significant amount of work from us and is time-consuming, to say the least.
Taking the dog breed detection example that I mentioned in the previous part, we need to prepare thousands of dog pictures and manually label them. In the end, even if we were thorough enough, AI will still surprise us now and then by making incorrect decisions because the data we fed it with was incomplete or biased, such as with the husky dog.
The larger the training set, the less likely it will happen, but the more difficult to remove possible garbage, which is a big problem when analyzing database data. Every big database in a company has incorrect and unwanted data. Think of it as a picture of a cow labeled as a genuine dog and accidentally included in the dog breed training set. If we make the training set smaller, the problem can again be incomplete data. In face recognition, a woman with long hair falling on her face or with her face partially covered with her hands makes almost no difference for us in being able to recognize her, but for AI it can be a different person, unless we train it with such pictures as well.
You Might Also Enjoy: Delivering Data Beyond the Hype
All described problems apply to any data, also when analyzing data from databases and other data sources, which is of most interest to companies. Preparation and cleaning of the data along with ensuring the data is as complete as possible, is required if you wish for useful and successful training of AI. Simply, we need to master the data ourselves before AI can master it. This is where data automation tools can help tremendously when it comes to training AI from any data sources. Tools like Discovery Hub® simplify the process of controlling data and can drastically improve its quality. With enough control, processes become more efficient and easier to handle and we get more confidence that our data is unbiased, complete and of high quality.
To summarize the story, AI will require human intelligence for the foreseeable future in order to hone their training skills. I also believe that augmenting humans with AI, what many scientists predict as the next logical step in AI development, will likely be our future.
About myself, my current road to AI includes Google’s TensorFlow and various video and image categorizing neural networks, plus Python as the lingua franca of AI. For any developer who wants to learn how to use AI, I recommend the following links. I promise, that once you have touched AI programs and the theory behind them you will no longer feel weary of computers and robots that can see and talk, and you will treat them as what they are: computers or micro-controllers with an often surprisingly simple AI program and one or more big files with trained data.
Resources for further research