One of the strengths of Python is the presence of a large set of libraries associated with this programming language. The community of developers that has formed around Python leads to taking advantage of each other’s contributions and the result is incredibly rich ecosystems. Originally dedicated to scientific computing but exploited well beyond the world of research, NumPy is a good example of a cleverly thought-out tool that nevertheless retains the accessibility specific to Python.
What is NumPy?
NumPy is a Python programming language librarydedicated to computation on arrays (in English, arrays), or vast volumes of numbers. This library provides “standard” representations of tabulated numeric data.
Python data structures
By nature, Python provides rich data structure tools:
- them ” tuples », ie sequences of immutable heterogeneous elements (they cannot be updated);
- sets (sets) which are unordered collections of unique, immutable objects;
- lists to enumerate ordered collections of objects;
- the intervals (tidy), these are sequences with a mandatory end value and which can also include a start value and a step value;
- dictionaries where we associate keys (keys) unique and immutable to values (values) subject to change…
However, such structures are not ideal for high-performance computing. NumPy was therefore created for such a purpose: to facilitate fast operations on large arrays.
How was NumPy created?
It was an American researcher, Travis Oliphant, who created NumPy in the early 2000s, with the aim of defining a data structure that facilitates calculation on arrays.
Oliphant’s ambition was to be able to combine two calculation libraries that had appeared before:
- Numeric, which was created in 1995, already offered to be able to manipulate arrays of numbers in Python. Numeric had been created by several developers including the programmer Jim Hugunin, but also Konrad Hinsen, a CNRS researcher;
- Numarray (2001), which was the work of three researchers from Space Telescope Science Institute, an organization founded by NASA: Peter Greenfield, Rick White and Todd Miller. Their ambition was to help analyze the vast volumes of data transmitted by the Hubble Space Telescope.
Numeric was coded in C for efficiency, while Numarray was written entirely in Python. This resulted in Numarray, although easier to manage, operating slower than Numeric. As a result, there was a tension between the proponents of these two approaches in the scientific world.
During a collaborative workshop, the developers of Numeric and Numarray asked themselves the question: could we have a single library that would embrace these two models and even surpass them?
Travis Oliphant, a young professor at a University of Utah, undertook the colossal task of building such a library, which he named NumPy, at the risk of jeopardizing his academic career. NumPy was written partly in Python. However, some elements requiring very fast calculations were written in C or C++.
NumPy was ready in 2005. On arrival, there was reason to be surprised both by the number of functions offered by NumPy but also by the flexibility of its tools, especially if you use an interface such as Jupiter notebook, and finally by the speed of operation. Travis Oliphant undertook a work of conviction so that his tool would be widely adopted by the Python scientific community – which implied for those interested to adapt the programs developed in Numeric or Numarray.
And so, NumPy has gradually established itself as the standard for calculations on arrays. The NumPy library is completely open source and accessible to any Python developer from the site numpy.org.
A memory management model
Gaël Varoquaux, Research Director at Inria, explains what makes NumPy special: “ One of the reasons NumPy has been so successful is that it has a well-defined way of accessing memory and allows memory to be shared between various digital tools without having to worry about management of this memory. In other words, with NumPy, memory can be manipulated for numerical computation, without having to deal with queries such as ” where is a particular portion of memory located », « which has just been released », « which is not free ” Where ” what piece of memory corresponds to what “, etc.
” Python does this basically for the objects it manipulates, and Numpy applies this approach to large arrays of numbers that can be transmitted to scientific libraries. To take an analogy: if I give a phone number of a person to someone else, I do not transfer the person, but his address. Similarly, the idea of a tool such as NumPy is not to copy information but to transfer pointers relative to values », resumes Gaël Varoquaux.
How does NumPy work?
The NumPy package consists of:
- the array structure of NumPy known as the N-dimensional Numpy array;
- associated mathematical operations: exponential, sine, statistical tools…;
- functions for manipulating the elements of an array.
A NumPy array is a multidimensional set of regular elements. It is characterized on the one hand, by the type of elements contained.
A NumPy array can contain all kinds of elements: dates, booleans, complex numbers… However, each array can only contain one and the same type of element. For example, we might have an array of 10,000 32-bit floating numbers, so we would know from the start exactly how much memory space is being used.
A NumPy array can have any dimension, but it must be clearly specified. Thus, in the case of the table of 10,000 numbers mentioned above, the shape could be 10 rows x 1,000 columns, 20 x 500, 100 x 100…
Another important element that must be defined is the increment (strides, in English) which allows you to go from one number to another. Let’s remember that a NumPy array is described by two main attributes:
- form (shape): the shape of the array. For example, (8,8) defines a two-dimensional array of 8 rows and 8 columns. (4, 4, 4) would define a three-dimensional array;
- not (strides): the number of bytes that must be “jumped” (passed) in memory to access the next element. For example, in an array (8,8) the strides would be (8,1), which is 8 bytes to reach the next row and 1 byte to reach the next column.
Basically, we could say that a NumPy array describes one or more blocks of data stored in computer memory. This results in an extremely simple manipulation of the elements of an array, and the possibility of applying numerical operations to them.
The code is easy to use as shown in this example where we square a series of numbers:
>>> import numpy
>>> a = numpy array ([2, 4, 6, 8])
array ([4, 16, 36, 64])
Tables and matrices
In fact, NumPy introduced two new data types in Python:
- the tables (arrays) which can have all kinds of dimensions;
- matrices that have the form of arrays but only have two dimensions and will benefit from the mathematics specific to matrix calculation, namely linear algebra.
For the record, linear algebra operates on concepts of lines, planes, vector spaces… And so, if you ask yourself the question: can we use Numpy to manipulate matricesthe answer is yes: NumPy offers the np.matrix structure and knows how to handle linear algebra relating to matrices.
What is NumPy used for?
As we have seen, NumPy is used for numerical calculation. And as we find numerical calculation in the most diverse forms, NumPy is used in many fields. NumPy’s applications are countless and range from video game to space exploration. They include:
- image manipulation;
- big data analysis;
- Meteorology ;
- traffic forecast;
- calculations of physical deformation of structures;
- climatology applications;
- medical imaging…
Who uses it?
Thanks to the scope of its functions, NumPy has been widely adopted by the scientific community, in laboratories as well as in academies. As NumPy has many scientific computing modules, it has won over researchers in this field and a real NumPy ecosystem has developed, particularly in research and experimental mathematics. However, NumPy is also being leveraged in industry. In particular, it helps with the numerical simulation of new products and materials that may involve complex physical parameters. Many websites use NumPy for data analysis. So quite often when a site makes purchase recommendations, the underlying numerical calculations were done using NumPy.
What are the pros and cons of NumPy?
The advantage of NumPy is to allow, in the logic of Python, an interactive approach, which makes its use similar to that of a calculator. And like Python, it produces clean, clear code even for the uninitiated, and easy to extend.
The question is often asked: if you want to perform mathematical calculations, why adopt NumPy rather than Pandas ? Simply put, Pandas interface is more user-friendly, but NumPy is much faster, especially on large volumes of data.
On the other hand, NumPy is not well suited to handling data of irregular types. If one is sensitive to this aspect, Pandas may seem more appropriate. Another disadvantage of NumPy is that it is not designed to work easily on graphics cards (GPUs), since these operate on their own memories. As a result, other similar libraries have taken off in this environment, including Jax, which Google has developed for artificial intelligence.
How to learn to use it?
Can we consider launch your Data Scientist career with NumPy ? Absolutely, if you are willing to train in the mysteries of this language. The NumPy community offers various learning tools. The numpy.org site offers to take the novice by the hand with many initiations. You can also visit the website Scipy Lectures Notes which has the advantage of covering initiation to NumPy as well as Python or other libraries such as Matplotlib. Only concern: these various initiations are in English. However, there are various training courses available in French, particularly on YouTube.
Travis Oliphant himself has produced a NumPy learning guide published exclusively on Amazon, but again only in English and at a price that may seem excessive.
What are similar tools?
Before the appearance of NumPy, it was the Fortran language that was prized by the scientific community and despite its age and its difficulty in learning, it retains a certain popularity.
Similarly, historically, it is the MATLAB tool that has been particularly popular in this area. It was the first to offer a high-level computing environment and maintains a large user base. The same is true for the Mathematica software published by Wolfram Research since 1988.
In the Python universe, the essential alternative is Google’s Jax language, but also Dask, which is based on NumPy and promotes out-of-memory calculations for particularly large arrays.
We would like to thank the author of this article for this awesome content
How did NumPy establish itself in scientific computing on Python?
You can view our social media profiles here as well as additional related pages here.https://www.ai-magazine.com/related-pages/