From a skyscraper, we expect it to meet specifications, with *minimum minimum* to support its own weight and withstand an earthquake. Not so with one of the most important technologies of the modern world. With it, we walk blindly, tinker with configurations, but until they’ve been tested, we don’t really know what they can do, or where they will fail. This technology is the artificial neural network; it is he who supports the most advanced artificial intelligence systems to date. More and more deeply integrated into our society, it filters the flow of social networks that we receive, helps doctors diagnose diseasesand even influences the length of imprisonment of a person convicted of a crime.

In fact, according to mathematician Boris Hanin, “the best approximation of what we know is that…we know next to nothing about how neural networks actually work, or what a really relevant theory on the subject would say! This professor at Princeton University likes the analogy with another revolutionary technology: the steam engine. At first glance, these machines were only good for pumping water. Then, they activated locomotives, which corresponds perhaps to the degree of sophistication reached today by neural networks. Finally, scientists and mathematicians, thanks to the theory of thermodynamics, have been able to understand exactly what is going on inside any machine. In the end, this technology led us to the Moon. “First, good engineering, then excellent trains, then a bit of theoretical understanding to get to the rockets”, summarizes Boris Hanin.

Within the sprawling community of artificial intelligence researchers, however, a small group of math-savvy individuals are trying to come up with a theory that can explain how they work and ensure that if you build such a network neuron by following such prescribed method, it will perform such specific tasks. This work is still in its infancy, but in 2018 several studies detailed the relationship between form and function in neural networks. The objective is to define the very foundations. It shows that long before you can certify that they can drive cars, you have to prove that neural networks are adept at simple multiplication. This requires some explanation.

### A dog is four hairy legs

Neural networks aim to imitate the human brain, an organ that can be represented as a structure that advances through increasingly broad abstractions. So the complexity of thought is measured by the range of basic abstractions you can draw on, and how many times you can combine them into new, higher-level ones – that’s how we learn. to distinguish dogs from birds.

“As a human being, learning to recognize a dog is like knowing how to identify four hairy paws, analyzes Maithra Raghu, researcher at Google Brain. Ideally, we’d like our neural networks to do the same. Abstraction comes naturally to the human brain. Not to neural networks. They are made of building blocks called “neurons”, connected in various ways. Each can represent an attribute, or a combination of attributes, that the network takes into account at each level of abstraction. When linking these neurons, engineers have to make many choices, including deciding how many layers of neurons the network should have, in other words, how “depth” it is. Imagine that the challenge is to recognize objects in images. The image enters the system at the first layer. At the next layer, the network may have neurons that simply detect edges in the image. Then, the next one combines the curves into shapes and textures, and the last deals with shapes and textures to conclude on the nature of the object: it’s a woolly mammoth!

Each layer combines several aspects of the previous one. Thus, a circle can be seen as a combination of curves, a curve as a combination of segments. Engineers also need to decide on the “width” of each layer, which is how many different features the network takes into account at each level of abstraction. In the case of image recognition, this width can be the number of types of segments, curves and shapes considered.

Beyond the depth and breadth of a network, there is also the choice of how to wire neurons within and between layers, and what weight to give to each connection. Under these conditions, once the objective has been set, how do you know which neural network architecture will best achieve it? A few general rules emerge. For tasks involving images, engineers use “convolutional” neural networks, which offer the same pattern of connections between layers, repeated over and over. For natural language processing – automatic speech recognition or text generation – “recurrent” neural networks seem to work better: neurons can be connected to non-adjacent layers in them.

### Success as a product of trial and error

Beyond a few guidelines, engineers have no choice but to get experimental proof: they run 1,000 different neural networks and just observe which one does the job. “Their choices are the product of trial and error,” observes Boris Hanin. It’s hard, because there are an infinite number of possible choices and no one knows which is the best. »

It would be useful to reduce the amount of trial and error and to better understand beforehand what the architecture of a neural network can bring. Some recent studies point in this direction. According to David Rolnick, now professor of computer science at McGill University, “this work aims to write a cookbook that allows you to concoct the right neural network. Depending on what you want to get out of it, you will choose this or that recipe”.

One of the oldest theoretical guarantees concerning the architecture of neural networks is already thirty years old. In 1989, computer scientists proved that if a single computing layer is provided, but with an unlimited number of neurons and as many connections between them, then the network will be able to perform any task.

Despite its square wording, this statement turns out to be intuitive and ultimately not very useful. It amounts to saying that if you can identify an unlimited amount of lines in an image, you can distinguish all objects using only one layer. In principle, this may be true, but to get there in practice…good luck!

Today, researchers call these broad, flat networks “expressive”, which means that they are theoretically able to embrace a richer set of connections between input data (for example, an image) and output (here, a description of this image). In practice, they are extremely difficult to train, and teaching them to actually produce this output is nearly impossible. Their greed for computing power also exceeds the capacity of our computers.

### Width or depth?

More recently, researchers have tried to figure out how far they can push neural networks in the opposite direction: narrowing the width (fewer neurons per layer) and increasing the depth (more layers). Maybe it’s enough to spot 100 different lines, as long as there are enough connections to turn those 100 lines into 50 curves, which you can combine into 10 different shapes, which give you all the building blocks you need to recognize most objects.

In 2018, David Rolnick and Max Tegmark, then both at the Massachusetts Institute of Technology (MIT), proved that more depth and less breadth can perform equivalent functions, with an exponential reduction in the number of neurons. If you have 100 input variables, the same reliability can be obtained by using 2^{100} neurons in one layer or with just 2^{10} neurons distributed over two layers. Combining small elements with larger levels of abstraction offers more potential than capturing all levels of abstraction at once.

“The notion of depth in a neural network translates the idea that you can reduce anything complicated into a series of simpler things,” explains David Rolnick. Like on an assembly line. »

David Rolnick and Max Tegmark proved the usefulness of depth by having neural networks perform a simple task: multiplying polynomial functions. These are equations in which certain variables are raised to an integer power, for example y = x^{3} + 1. They trained the networks by showing them examples of equations and their product. Then they submitted new equations to them. The deepest neural networks learned this task with far fewer neurons than the lesser ones.

Although a multiplication is not a stunning operation, David Rolnick believes that the study makes an important point: “If a shallow network is unable to multiply, we should not place any confidence in it. »

### punk rock sheep

Other researchers estimate the minimum width necessary. Also in 2018, mathematician Jesse Johnson, who was a researcher at Oklahoma State University before being recruited by Sanofi, proved that, beyond a certain point, no depth can compensate for the lack of width.

To understand, imagine sheep in a meadow. Special sheep, of the punk rock type: each one is dyed in one color among a possible quantity. The task of your neural network is to draw a boundary around all sheep of the same color. In idea, it looks like an image classification: the network has a collection of images (which it represents as points in a higher-dimensional space), and it must group those that look alike.

Jesse Johnson demonstrated that a neural network will fail this task if the width of the layers is less than or equal to the number of inputs. Each sheep can be described with two inputs: two coordinates *x* and *there* to specify the position in the meadow. The neural network labels each sheep with a color and circumscribes those with the same color. You will then need at least three neurons per layer to solve this problem.

Specifically, Jesse Johnson showed that if the width-to-variable ratio isn’t right, the neural network will be unable to draw closed loops — the kind it would need to draw if, say, all the sheep red were grouped in the middle of the pasture. “If none of the layers are wider than the number of input dimensions, some shapes will be impossible to create no matter how many layers you have,” says Jesse Johnson.

This type of study lays the foundation for a theory of neural networks. For the time being, researchers can only draw elementary observations on the relationship between architecture and function – and these observations are very few in view of the quantity of tasks that neural networks carry out. While the way systems are designed won’t change anytime soon, we can already see the beginnings of a new theory about how computers learn – one that promises humanity a journey of greater impact than a one-way ticket. back to the moon.

We wish to thank the author of this post for this remarkable content

Neural networks: theory under construction

Check out our social media accounts as well as other related pageshttps://www.ai-magazine.com/related-pages/