This exercise teaches students the basics of the Jupyter interface (e.g.,
creating cells, executing cells, tab completion), and teaches them to use
numpy.lookfor
to look up documentation about specific topics.
The main purpose of this exercise is to ensure that students are comfortable with Jupyter and that their environments are functioning properly.
In this section we motivate the need for special-purpose data structures like
the numpy.ndarray
in numerical computing.
The goal of this section is for students to develop an intuition for why Python's built-in data structures are inefficient for large-scale numerical computing.
The goal of this section is to briefly review Python lists. This review will serve as a point of comparison when we introduce numpy arrays in subsequent sections. (Notes in parentheses indicate points of future comparison).
This section also serves to point out features of Python lists that we often take for granted (e.g., the fact that we can put values of different types into a list), which we'll give up with numpy in exchange for better performance.
- Indexing
- Scalar access is 0-indexed. (Same as numpy)
- Negative indexing counts from end of list. (Same as numpy)
- Slicing allows us to copy at regular intervals from a region. (Syntax same as numpy; different copying semantics)
- Efficiently size-mutable (unlike numpy).
- Can hold objects of different types (unlike numpy).
The goal of this section is to show that numerical algorithms are unacceptably slow when implemented using python's built-in lists and integers.
We demonstrate this implementing a matrix multiplication function using
matrices represented as lists-of-lists and showing how long it takes to
multiply matrices of various sizes. We compare this performance to an
alternative implementation in Fortran using f2py (via the Jupyter Notebook's
%%fortran
magic), and we learn that Python's performance for this task is
several orders of magnitude worse than the native fortran implementation.
In this section, we explain why the Python code we showed in the previous section is so much slower than the native fortran code.
We highlight two important observations:
- Numerical algorithms on lists of Python ints/floats are slow because we have to pay the overhead of dynamic dispatch on every data element in our lists.
- Numerical algorithms on lists of Python ints/floats are slow because we have to pay the overhead of bytecode interpretation on every operation we perform.
We observe that, very often, both of these overheads are wasteful. If we know we have a list containing only values of a single type, then we don't need dynamic dispatch on each element. Similarly, if we want to perform the same operation in every iteration of a loop, then bytecode interpretation is unnecessary.
In this section, we explain how numpy allows us to fix the performance problems observed in the previous section by giving up some flexibility in how it allows us to use its data structures.
Numpy solves the performance problems listed above by:
-
Providing containers that can only have one kind of element (which removes the cost of dynamic dispatch).
-
Providing "vectorized" functions that operate on entire arrays at once (which removes the cost of interpretation).
- Show an example of constructing a numpy array from a list/list of lists.
- Illustrate (1) by trying (and failing) to insert an object into an array of ints.
- Illustrate (2) by showing that
ndarray
overloads binary operators like+
and*
to perform element-wise operations. - Show that dot product using numpy array is competitive with Fortran.
In this section, we show core features and numpy idioms that should form users' primary day-to-day toolbox. This section is the core content of the tutorial.
The main goals of this section are:
- Provide an overview of important numpy features and concepts.
- Provide a set of examples of solving problems by "thinking with arrays".
We're going to see a bunch of different ways to create arrays in the upcoming examples, so we start by showing various examples of ways to create arrays.
The important takeaway from this section is just that there are lots of useful built-in ways to create arrays.
np.array
np.arange
/np.linspace
/np.logspace
np.full
/np.empty
/np.ones
/np.zeros
np.full_like
/np.empty_like
/np.ones_like
/np.zeros_like
np.eye
/np.identity
/np.diag
np.RandomState
np.load
/np.save
np.loadtxt
/pandas.read_csv
Now that we know how to create arrays, it's useful to know how to change
arrays' shape. In this section, we introduce the .reshape
and .transpose
methods, which are commonly used to change arrays' shapes.
Universal Functions (UFuncs) allow us to apply operations to every element of an array (or on every pair of elements in multiple arrays).
UFuncs can be implemented either with functions (e.g. np.sqrt
) or with
operators (e.g. +
, -
, ==
). All of the binary operators have a corresponding
function as well, (e.g. +
-> np.add
).
Binary UFuncs all provide a suite of methods that implement reduction/traversal algorithms in terms of their associated binary operator.
ufunc.reduce
: Folds the binary operator over a dimension. For example,add.reduce
->sum
.ufunc.accumulate
: Folds the binary operator over a dimension, keeping a running tally of results. For exampleadd.accumulate
->cumsum
.ufunc.outer
: Applies binary operator to pairs of elements from two arrays.ufunc.reduceat
: Appliesreduce
to successive slices of array. I've never seen this used in the wild. We're not going to spend time on it.ufunc.at
: Also never seen this used.
Often we want to operate on just a part of an array. Numpy provides many ways to select sub-sections out of an array.
All the intuitions we have from list
carry over straightforwardly.
a[i]
a[i:j]
a[i:]
a[:j]
a[i:j:k]
One important difference is that slicing from an array doesn't make a copy; the underlying data is still shared. This can be a powerful tool for saving memory and CPU, but it can also bite you if you're not careful. We show examples of surprising behavior due to numpy's view semantics.
Slicing also generalizes to multiple dimensions in the way you'd expect.
a[i, j]
a[i:, j:]
- ...
In addition to the usual list-like semantics for slicing, there are two other important techniques for selecting subsets of an array: index arrays and masks.
Passing an array of indices selects the elements at each index we passed.
a[[1, 3, 5]]
gets the elements at indices 1, 3, and 5. This is often useful
for permutations. For example, we can use argsort
to build a permutation
array that, when used as an indexer, would sort another array.
"Masking" allows us to select elements from an array by indexing with a boolean array. This selects (after broadcasting) all the elements from the first array where the boolean array was True.
Masking is most often used for filtering expressions like a[a > 5]
or
a[(a > 3) & (a < 5)]
(note the parentheses here; they're necessary because of
the relative precedence of &
and <
).
Many common operations on numpy arrays are "reductions". These operations take
an N-dimensional array and reduce it to a lower-dimensional array by
aggregating the values of the array across one or more axes. For example
np.sum
takes an array and computes the sum of the array across one or more
axes, resulting in a lower-dimensional array.
In this section, we introduce the concept of reductions, and show examples of
common reductions such as sum
, mean
, and median
.
So far, we've only seen operations on arrays that have exactly the same shape, but one of the most powerful tools in numpy is its ability to combine arrays of different dimensions. This technique is called "broadcasting". Broadcasting allows us to apply operations on two arrays of unequal shape as long as the arrays' shapes are "compatible".
Examples:
array + scalar
array2d - array2d.mean(axis=0)
array2d - array2d.mean(axis=1)
(n x 1)
+(1 x m)
->n x m
(n,)
+(m x 1)
->n x m
The general intuition for "compatible" is that two arrays are compatible if they can be converted to the same shape by replicating themselves along missing axes.
NOTE: The broadcasting exercises bring together skills learned in all the previous sections to solve a more complex, real-world problem, so we spend more time in the exercises here than in previous sections.
The purpose of this section is to reinforce the material we covered in the preceding sections and to emphasize the ways in which the different features of numpy complement one another.
- Array Creation
- UFuncs
- Broadcasting
- Slicing and Selection
We've seen how we can use numpy's core features to solve problems, but we haven't talked much about how those features work. In this section, we talk about how Numpy arrays are represented in memory, and we discuss how that representation enables some surprising optimizations (and occasionally some unexpected behaviors).
The main benefit of understanding this material for a numpy user is that it helps build intuition for reasoning about the efficiency of various operations. In particular, seeing how arrays are implemented is important for understanding what operations create copies vs. what operations just create views.
It's OK if some portion of this section gets cut for time.
An array consists of the following information:
- Data: Pointer to the start of a
char *
array in C that contains the actual data stored in the array. - DType: Describes the data stored in element of the array. In particular, this encodes the size of each element, as well as information used to dispatch UFuncs to the appropriate low-level information.
- Shape: Tuple describing the number and length of the dimensions of the array.
- Strides: Tuple describing how to get to an element at a particular coordinate.
Of these attributes, strides
is the most interesting. If the array has
strides (x, y, z)
, then that means that the array element at index (i, j, k)
is stored at offset (i * x, j * y, k * z)
from the start of the data
buffer. This representation enables lots of clever tricks for avoiding copies
and creating "virtual" copies of arrays.
Example 1: Zero-Copy Slices
We can implement a[::2]
without copying the underlying data by cutting the
shape in half and multiplying the strides by 2:
In [128]: data = np.arange(10)
In [129]: data
Out[129]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [130]: data.strides, data.shape
Out[130]: ((8,), (10,))
In [131]: data2 = data[::2]
In [132]: data2
Out[132]: array([0, 2, 4, 6, 8])
In [133]: data2.strides, data2.shape
Out[133]: ((16,), (5,))
In general, we can implement any pure slicing operation without making a copy of the underlying data:
- Slicing with a
start
just moves thedata
pointer forward in the new array. - Slicing with a
stop
just shrinks theshape
of the new array. - Slicing with a
step
shrinks the shape and applies a multiplier to the strides.
Example 2: Virtual Copies
Sometimes we want the semantics of copying an array, but we don't actually want to pay the memory and CPU cost of making the copy. We can play tricks with array strides to create arrays that appear to make copies, but without actually making the copies.
In [138]: data
Out[138]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [139]: from numpy.lib.stride_tricks import as_strided
In [140]: data
Out[140]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [141]: as_strided(data, shape=(3, 10), strides=(0, 8))
Out[141]:
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [142]: as_strided(data, shape=(10, 3), strides=(8, 0))
Out[142]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7],
[8, 8, 8],
[9, 9, 9]])
Although the above operations look like they've created copies, the underlying data is never duplicated. This can be a powerful tool, but can also create nasty and confusing bugs, so be careful when playing these sorts of games.
In [155]: x = as_strided(data, shape=(3, 10), strides=(0, 8))
In [156]: x
Out[156]:
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [157]: data
Out[157]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [158]: data[0] = 100
In [159]: data
Out[159]: array([100, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [160]: x
Out[160]:
array([[100, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[100, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[100, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
Key Takeaways:
-
Numerical algorithms are slow in pure Python because the overhead of dynamic dispatch and interpretation dominates our runtime.
-
Numpy solves these problems by imposing additional restrictions on the contents of arrays and by moving the inner loops of our algorithms into compiled C code.
-
Using Numpy effectively often requires reworking a algorithms to use vectorized operations instead of explicit loops, but the resulting operations are usually simpler, clearer, and (much) faster than the pure Python equivalents.
Next Steps:
-
scipy
implements many algorithms for optimization, signal processing, linear algebra, and other important scientific needs. -
pandas
adds labelled, heterogenous data structures on top of numpy. It also provides efficient I/O routines and implementations of rolling/grouped algorithms. -
matplotlib
provides high-quality plotting routines using numpy arrays as its primary data format. -
scikit-learn
provides machine learning algorithms using numpy arrays as its primary data format. -
dask
provides the same API as numpy, but supports parallell and distributed execution.