NumPy Advanced Operations#

import numpy as np

Copying Arrays#

Note

simply using = does not make a copy, but much like with lists, you will just have multiple names pointing to the same ndarray object

We need to understand if two arrays, A and B point to:

  • the same array, including shape and data/memory space

  • the same data/memory space, but perhaps different shapes (a view)

  • a separate copy of the data (i.e. stored completely separately in memory)

All of these are possible:

  • B = A

    this is assignment. No copy is made. A and B point to the same data in memory and share the same shape, etc. They are just two different labels for the same object in memory

  • B = A[:]

    this is a view or shallow copy. The shape info for A and B are stored independently, but both point to the same memory location for data

  • B = A.copy()

    this is a deep copy. A completely separate object will be created in memory, with a completely separate location in memory.

Let’s look at examples

a = np.arange(10)
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here is assignment—we can just use the is operator to test for equality

b = a
b is a
True

Since b and a are the same, changes to the shape of one are reflected in the other—no copy is made.

b.shape = (2, 5)
print(b)
a.shape
[[0 1 2 3 4]
 [5 6 7 8 9]]
(2, 5)
b is a
True
a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

a shallow copy creates a new view into the array—the data is the same, but the array properties can be different

a = np.arange(12)
c = a[:]
a.shape = (3,4)

print(a)
print(c)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[ 0  1  2  3  4  5  6  7  8  9 10 11]

since the underlying data is the same memory, changing an element of one is reflected in the other

c[1] = -1
a
array([[ 0, -1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Even slices into an array are just views, still pointing to the same memory

d = c[3:8]
d
array([3, 4, 5, 6, 7])
d[:] = 0 
print(a)
print(c)
print(d)
[[ 0 -1  2  0]
 [ 0  0  0  0]
 [ 8  9 10 11]]
[ 0 -1  2  0  0  0  0  0  8  9 10 11]
[0 0 0 0 0]

There are lots of ways to inquire if two arrays are the same, views, own their own data, etc

print(c is a)
print(c.base is a)
print(c.flags.owndata)
print(a.flags.owndata)
False
True
False
True

to make a copy of the data of the array that you can deal with independently of the original, you need a deep copy

d = a.copy()
d[:,:] = 0.0

print(a)
print(d)
[[ 0 -1  2  0]
 [ 0  0  0  0]
 [ 8  9 10 11]]
[[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]

Boolean Indexing#

There are lots of fun ways to index arrays to access only those elements that meet a certain condition

a = np.arange(12).reshape(3,4)
a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Here we set all the elements in the array that are > 4 to zero

a[a > 4] = 0
a
array([[0, 1, 2, 3],
       [4, 0, 0, 0],
       [0, 0, 0, 0]])

and now, all the zeros to -1

a[a == 0] = -1
a
array([[-1,  1,  2,  3],
       [ 4, -1, -1, -1],
       [-1, -1, -1, -1]])
a == -1
array([[ True, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]])

if we have 2 tests, we need to use logical_and() or logical_or()

a = np.arange(12).reshape(3,4)
a[np.logical_and(a > 3, a <= 9)] = 0.0
a
array([[ 0,  1,  2,  3],
       [ 0,  0,  0,  0],
       [ 0,  0, 10, 11]])

Our test that we index the array with returns a boolean array of the same shape:

a > 4
array([[False, False, False, False],
       [False, False, False, False],
       [False, False,  True,  True]])

Avoiding Loops#

In general, you want to avoid loops over elements on an array.

Here, let’s create 1-d x and y coordinates and then try to fill some larger array

M = 128
N = 256
xmin = ymin = 0.0
xmax = ymax = 1.0

x = np.linspace(xmin, xmax, M, endpoint=False)
y = np.linspace(ymin, ymax, N, endpoint=False)

print(x.shape)
print(y.shape)
(128,)
(256,)

we’ll time out code

import time
t0 = time.time()

g = np.zeros((M, N))

for i in range(M):
    for j in range(N):
        g[i,j] = np.sin(2.0*np.pi*x[i]*y[j])
        
t1 = time.time()
print(f"time elapsed: {t1-t0} s")
time elapsed: 0.0397031307220459 s

Now let’s instead do this using all array syntax. First will extend our 1-d coordinate arrays to be 2-d. NumPy has a function for this (meshgrid())

Let’s look at how meshgrid() works first

x2d, y2d = np.meshgrid([0, 1, 2, 3], [10, 20, 30], indexing="ij")
x2d
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])
y2d
array([[10, 20, 30],
       [10, 20, 30],
       [10, 20, 30],
       [10, 20, 30]])

We see that this creates 2 two-dimensional arrays, one with the x-values changing across rows and one with y-values changing across columns. This means we can index all points (x[i], y[j]) through the pair x2d, y2d

Now let’s do our same example this method.

t0 = time.time()
x2d, y2d = np.meshgrid(x, y, indexing="ij")
g2 = np.sin(2.0*np.pi*x2d*y2d)
t1 = time.time()
print(f"time elapsed: {t1-t0} s")
time elapsed: 0.022920846939086914 s

A final check to make sure they give the same answer

print(np.max(np.abs(g2-g)))
0.0

Numerical differencing on NumPy arrays#

Now we want to construct a derivative,

\[ \frac{d f}{dx} \]
x = np.linspace(0, 2*np.pi, 25)
f = np.sin(x)

We want to do this without loops—we’ll use views into arrays offset from one another. Recall from calculus that a derivative is approximately:

\[ \frac{df}{dx} = \frac{f(x+h) - f(x)}{h} \]

Here, we’ll take \(h\) to be a single adjacent element

dx = x[1] - x[0]
dfdx = (f[1:] - f[:-1])/dx
dfdx
array([ 0.98861593,  0.92124339,  0.79108963,  0.60702442,  0.38159151,
        0.13015376, -0.13015376, -0.38159151, -0.60702442, -0.79108963,
       -0.92124339, -0.98861593, -0.98861593, -0.92124339, -0.79108963,
       -0.60702442, -0.38159151, -0.13015376,  0.13015376,  0.38159151,
        0.60702442,  0.79108963,  0.92124339,  0.98861593])