One of the hottest topics in the field of artificial intelligence is machine learning. The algorithms used by the likes of Netflix are able to nudge us to make decisions we did not even know we wanted to take. But at heart, machine learning algorithms are simply extremely good pattern spotters and will beat humans every time in this field. However, many of the interesting problems in cognition are not represented by the classification problems that machines are good at. We should thus beware some of the hype and treat machine learning as a complement to human skills rather than a replacement.
Machine learning is currently one of the hot topics in the field of artificial intelligence. Its goal is to construct systems that allow computers to progressively improve their performance in a specific task in much the same way that humans learn by repetition. The concept is not new – the name was coined by the pioneer computer scientist Arthur Samuel as long ago as 1959, and it was he who broadly defined the discipline by defining it as ‘a field of study that gives computers the ability to learn without being explicitly programmed’. But it is only in recent years that computing power has advanced sufficiently to make significant strides towards realising this goal. Traditional computer programs represent a series of instructions designed to perform a specific task in a predictable manner. But they run into difficulties in the case of big data applications because the decision trees built into the program (the if-then loops) can simply become too big. Moreover, a traditional program does not change, but this may not be ideal in a situation where we gather more data and begin to understand it better. A machine learning algorithm (MLA) is designed to be much more flexible. Rather than being based on a series of hard-coded decision rules, an MLA incorporates a very large number of basic rules which can be turned off and on via a series of weights derived via mathematical optimisation routines. It turns out that MLAs are more successful than traditional computer programs in areas such as handwriting and speech recognition and are much better able to deal with tasks such as driving.
MLAs are designed to ‘learn’ how to map from a series of input data to the outputs. Given a series of inputs and outputs, the training phase of the process requires the system to accurately identify the outputs (e.g. how to recognise a car) given the inputs (a digital photograph). The system can be said to have ‘learned’ when it can identify the output with total accuracy. There are essentially five broad categories of machine learning: supervised; semi-supervised; active; unsupervised and reinforcement.
- Supervised learning is currently the most widely practiced form of machine learning. In a supervised learning environment, input and output data are labelled (i.e. tagged with informative labels that aid identification), and the MLA is manually corrected by the human supervisor in order to improve its accuracy. Many of us will have encountered login processes on websites that force us to identify all the pictures on the screen that contain particular objects before being allowed to proceed. This data is part of a project in which the website visitor is being used as a supervisor.
- In a semi-supervised environment, the data are unlabelled but the MLA is still subject to manual correction.
- An active learning approach allows the MLA to query an external information source for labels to aid in identification.
- Unsupervised learning forces the MLA to find structure in the input data without recourse to labels and without any correction from a supervisor.
- Reinforcement learning is an autonomous, self-teaching MLA that learns by trial and error with the aim of achieving the best outcome. It has been likened to learning to ride a bicycle, in which early efforts often involve falling off but fine-tuning actions in order to avoid mistakes eventually lead to success.
One of the basic tenets of machine learning is the ‘no free lunch’ theorem which says that there is no single algorithm that works for all types of problem. This does not prevent practitioners from seeking one, but like the alchemists of old who sought to turn base metals into gold, it has been a long and (so far) fruitless search. Accordingly, the user has to choose a particular form of MLA to match the type of question they are seeking to answer. Some of the most commonly used algorithms are as follows:
- Nearest neighbour algorithm (k-NN): This method estimates how likely a data point is to belong to one group depending on what group the nearest data points are in. It is a so-called ‘lazy learning’ system because it only starts to run computations when it is required to do so (in contrast to an ‘eager learning’ system which processes data before receiving a query). This is most frequently used when looking for similar items amongst a group of inputs (e.g. given a user likes a particular item, the system will recommend similar items for them).
- A decision tree algorithm creates a tree structure in which each branch represents an outcome for each data input. They are mainly used when we want to allocate a series of inputs to more precise groups.
- Bayesian methods allow us to encode our prior beliefs about what the data generation process looks like independently of the data. This is a useful technique when we do not have much initial data to start with. As more data become available we are able to update our view of the process to achieve more accurate guesses of the output given the input.
- Neural networks are complex systems in which individual components are linked via artificial synapses which send pulses to each other and trigger responses from other linked components. They are frequently used in deep learning applications.
The pros and cons of machine learning
Although machine learning is currently the next big thing, it is not without its downsides. Obviously, machines can identify trends and patterns more quickly than humans, but systems are susceptible to data errors which can lead to undesirable outcomes. If, for example, a machine is trained on, or encounters, a faulty batch of data the lessons derived may take us in the wrong direction. Suppose, for example, that data on murder rates in Bogotá in Colombia are transposed with those of Bogota in Texas. The machine would simply allocate the rate of 19 per 100,000 people in the former case to the small Texan town without questioning why it has suddenly quadrupled. Any policy implications derived from such analysis would clearly be invalid.
Another common problem attributed to MLAs is that of overfitting. An overfitting model is one that actually models the training data too well by accounting for all the noise in the data but is completely flummoxed when faced with out-of‑sample data. Conversely, an underfitting model cannot replicate either the training data or the out-of-sample data which makes it useless for decision-making (see Charts 1 to 3). Overfitting is by far the bigger problem, and the simplest preventative solution is to split the data set into smaller units in order to train it on different subsets of training data and build up an estimate of the performance of a machine learning model on unseen data.
A final problem is to ensure that the MLA has learned what we want it to. In one early experiment, data scientists tried to teach a system to differentiate between battle tanks and civilian vehicles. It turned out that it learned only to differentiate between sunny and cloudy days and it proved to be useless in real world situations. Again, one solution is to hold back more of the data sample for testing purposes. But more generally it demonstrates the old adage that if you ask a stupid question, you get a stupid answer, and highlights the importance of setting up the MLA in order that it focuses on the question of interest.
Not just science fiction
Most of us have encountered some form of MLA in the course of our everyday life. A simple example is predictive texting which has improved considerably over the course of the last decade based on repeated correction of the initial results (an example of supervised learning). Machines are extremely adept at pattern spotting which makes them very useful for online search engines. If, for example, we input a word into a search engine the MLA is able to find matches based on the most frequently occurring hits, which are likely to be the ones that are most relevant to the biggest number of people.
MLAs are also increasingly useful as a way of facilitating online language translation. There has been a significant improvement in the quality of output from Google Translate over the past couple of years, whilst DeepL is another translation application which is making big inroads. By setting the algorithm to work on a huge range of data, the system not only knows the literal translation of every word but is able to work out the grammatical construction for the sentence given the context in which it appears. Native speakers of many languages may quibble with the results derived from many of these programs but they represent a very good first approximation and can only improve over time as we gain more experience. Around 80% of the content watched on Netflix is based on the recommendation of the search algorithm. Those of you familiar with the Netflix offerings will be aware of the menu of choices which pops up when you log in. Many of those choices are based on previous viewing experience – in that sense it operates like a nearest neighbour algorithm. But it also subtly encourages viewers to look at shows they may not have considered. Each user can be defined into a number of ‘taste brackets’ depending on the content of the shows they watch and the viewing patterns of other people who have watched the same show. Putting all the information together gives a dynamic profile that makes viewing recommendations. It seems to work: Netflix has around 100 million global subscribers and is adding content all the time. Of course, some of the applications of machine learning can be used for a wider variety of purposes. Take, for example, facial recognition. Facebook has a system that allows people to be notified of any photographs they appear in, even though they have not been tagged. The underlying MLA scans each pixel of a photo to build up a unique template of an individual, against which each uploaded photo is compared. There are some obvious fraud prevention benefits from such a system which would prevent one person from passing themselves off as someone else. Such facial recognition software can also be useful for more formal identification purposes such as border crossing – the world’s major airports increasingly deploy automatic gates at the border rather than relying only on manual checks. Security services are also increasingly able to use this technology in their work. Those responsible for the attempted assassination of a former Soviet spy now living in the UK were tracked down by facial recognition methods. The Chinese government is reported to be working to combine the country’s 170 million security cameras with facial technology systems to create a huge surveillance network. Whilst this could potentially deliver major security benefits it also has significant implications for privacy.
The future of machine learning – beware the hype
It seems that the future of machine learning is bright, and experts in the field tell us that the possibilities are limited only to what we can imagine. The most complex tasks will be cracked by deep learning techniques, which are neural networks with multiple layers which can operate at far greater levels of complexity than most MLAs. Whilst such apps currently require a lot of processing power they also deliver results. One of the most well-known applications of deep learning was the program developed by Alphabet to master the fiendishly complicated strategy game Go. This is a complex game to program because there are so many permutations that a traditional computer program would require a prohibitively large number of branches to process all the possible moves. But by training a machine using deep learning and reinforcement techniques which allow the MLA to quickly iterate towards a human strategy, Alphabet designed a program that has beaten the world’s top players.
But equally we have to beware of overdoing the hype. Even deep learning algorithms, for all their potential, are essentially a statistical technique for classifying patterns. But many of the interesting problems in cognition are not classification problems that MLAs are good at solving. Moreover, MLAs tend to be data-hungry whereas a human can learn abstract relationships with much less data input. All of us have learned to communicate using the abstract concept of language with far smaller data samples than are available to MLAs during their training period. Another well aimed criticism is that deep learning is akin to a black box system which makes it difficult even for experts to know what is going on. This has obvious implications for the ability to find errors in routines. It also implies giving the MLA a form of control which people may be uncomfortable with. Finally, deep learning algorithms are only approximations to the data generation process – albeit good ones – which suggests that we cannot always rely on the answers which they generate.
The pace of automation has picked up rapidly in recent years, and advances in machine learning are at the forefront of the dash towards a brave new tech world. Obviously, concerns have been expressed that this will result in machines replacing people for a huge range of tasks. Whilst the potential offered by machine learning is huge, there are many things that it cannot do so we should not overdo these fears. One thing that humans do well is adaption, and it is more likely that machines will complement humans in the workplace rather than replace them altogether.