Clustering

Clustering#

Clustering seeks to group data into clusters based on their properties and then allow us to predict which cluster a new member belongs.

import numpy as np
import matplotlib.pyplot as plt

We’ll use a dataset generator that is part of scikit-learn called make_moons. This generates data that falls into 2 different sets with a shape that looks like half-moons.

from sklearn import datasets
def generate_data():
    xvec, val = datasets.make_moons(200, noise=0.2)

    # encode the output to be 2 elements
    x = []
    v = []
    for xv, vv in zip(xvec, val):
        x.append(np.array(xv))
        v.append(vv)

    return np.array(x), np.array(v)
x, v = generate_data()

Let’s look at a point and it’s value

print(f"x = {x[0]}, value = {v[0]}")
x = [ 0.16387768 -0.11163774], value = 1

Now let’s plot the data

def plot_data(x, v):
    xpt = [q[0] for q in x]
    ypt = [q[1] for q in x]

    fig, ax = plt.subplots()
    ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
    ax.set_aspect("equal")
    return fig
fig = plot_data(x, v)
../_images/d5980b5f510b2401b0d531709fa5fa0042e3c9b31948e900d630ab17c8b54f85.png

We want to partition this domain into 2 regions, such that when we come in with a new point, we know which group it belongs to.

First we setup and train our network

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Input
from keras.optimizers import RMSprop
2025-12-02 18:29:03.569315: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-12-02 18:29:03.614026: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-12-02 18:29:04.907079: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
model = Sequential()
model.add(Input(shape=(2,)))
model.add(Dense(50, activation="relu"))
model.add(Dense(20, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
2025-12-02 18:29:05.093576: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
rms = RMSprop()
model.compile(loss='binary_crossentropy',
              optimizer=rms, metrics=['accuracy'])
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 50)             │           150 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 20)             │         1,020 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1)              │            21 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 1,191 (4.65 KB)
 Trainable params: 1,191 (4.65 KB)
 Non-trainable params: 0 (0.00 B)

We seem to need a lot of epochs here to get a good result

epochs = 100
results = model.fit(x, v, batch_size=50, epochs=epochs, verbose=2)
Epoch 1/100
4/4 - 0s - 121ms/step - accuracy: 0.8100 - loss: 0.6525
Epoch 2/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.6133
Epoch 3/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.5837
Epoch 4/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.5557
Epoch 5/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.5293
Epoch 6/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.5043
Epoch 7/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.4818
Epoch 8/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.4609
Epoch 9/100
4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.4440
Epoch 10/100
4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.4263
Epoch 11/100
4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.4110
Epoch 12/100
4/4 - 0s - 6ms/step - accuracy: 0.8500 - loss: 0.3976
Epoch 13/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3837
Epoch 14/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3718
Epoch 15/100
4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.3611
Epoch 16/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3513
Epoch 17/100
4/4 - 0s - 7ms/step - accuracy: 0.8600 - loss: 0.3415
Epoch 18/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3333
Epoch 19/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3265
Epoch 20/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3179
Epoch 21/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3111
Epoch 22/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.3045
Epoch 23/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2987
Epoch 24/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2931
Epoch 25/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2886
Epoch 26/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2835
Epoch 27/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2790
Epoch 28/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2745
Epoch 29/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2714
Epoch 30/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2676
Epoch 31/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2632
Epoch 32/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2619
Epoch 33/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2584
Epoch 34/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2546
Epoch 35/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2526
Epoch 36/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2498
Epoch 37/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2484
Epoch 38/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2477
Epoch 39/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2465
Epoch 40/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2425
Epoch 41/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2416
Epoch 42/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2396
Epoch 43/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2388
Epoch 44/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2361
Epoch 45/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2361
Epoch 46/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2332
Epoch 47/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2337
Epoch 48/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2341
Epoch 49/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2305
Epoch 50/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2290
Epoch 51/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2271
Epoch 52/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2283
Epoch 53/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2258
Epoch 54/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2236
Epoch 55/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2231
Epoch 56/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2216
Epoch 57/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2203
Epoch 58/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2193
Epoch 59/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2202
Epoch 60/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2179
Epoch 61/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2154
Epoch 62/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2158
Epoch 63/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2141
Epoch 64/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2138
Epoch 65/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2114
Epoch 66/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2112
Epoch 67/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2103
Epoch 68/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2108
Epoch 69/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2099
Epoch 70/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2085
Epoch 71/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2067
Epoch 72/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2049
Epoch 73/100
4/4 - 0s - 7ms/step - accuracy: 0.9150 - loss: 0.2075
Epoch 74/100
4/4 - 0s - 6ms/step - accuracy: 0.9200 - loss: 0.2024
Epoch 75/100
4/4 - 0s - 6ms/step - accuracy: 0.9200 - loss: 0.2017
Epoch 76/100
4/4 - 0s - 6ms/step - accuracy: 0.9200 - loss: 0.2011
Epoch 77/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.1998
Epoch 78/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.1996
Epoch 79/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.1987
Epoch 80/100
4/4 - 0s - 7ms/step - accuracy: 0.9250 - loss: 0.1981
Epoch 81/100
4/4 - 0s - 7ms/step - accuracy: 0.9250 - loss: 0.1965
Epoch 82/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.1957
Epoch 83/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.1929
Epoch 84/100
4/4 - 0s - 6ms/step - accuracy: 0.9250 - loss: 0.1925
Epoch 85/100
4/4 - 0s - 6ms/step - accuracy: 0.9200 - loss: 0.1911
Epoch 86/100
4/4 - 0s - 6ms/step - accuracy: 0.9250 - loss: 0.1898
Epoch 87/100
4/4 - 0s - 7ms/step - accuracy: 0.9250 - loss: 0.1915
Epoch 88/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.1881
Epoch 89/100
4/4 - 0s - 7ms/step - accuracy: 0.9250 - loss: 0.1896
Epoch 90/100
4/4 - 0s - 6ms/step - accuracy: 0.9250 - loss: 0.1848
Epoch 91/100
4/4 - 0s - 7ms/step - accuracy: 0.9250 - loss: 0.1836
Epoch 92/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.1829
Epoch 93/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.1811
Epoch 94/100
4/4 - 0s - 7ms/step - accuracy: 0.9250 - loss: 0.1799
Epoch 95/100
4/4 - 0s - 6ms/step - accuracy: 0.9250 - loss: 0.1798
Epoch 96/100
4/4 - 0s - 6ms/step - accuracy: 0.9250 - loss: 0.1807
Epoch 97/100
4/4 - 0s - 6ms/step - accuracy: 0.9200 - loss: 0.1775
Epoch 98/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.1764
Epoch 99/100
4/4 - 0s - 6ms/step - accuracy: 0.9250 - loss: 0.1741
Epoch 100/100
4/4 - 0s - 6ms/step - accuracy: 0.9200 - loss: 0.1762
score = model.evaluate(x, v, verbose=0)
print(f"score = {score[0]}")
print(f"accuracy = {score[1]}")
score = 0.17096355557441711
accuracy = 0.925000011920929

Let’s look at a prediction. We need to feed in a single point as an array of shape (N, 2), where N is the number of points

res = model.predict(np.array([[-2, 2]]))
res
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 30ms/step

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 40ms/step
array([[1.3234276e-09]], dtype=float32)

We see that we get a floating point number. We will need to convert this to 0 or 1 by rounding.

Let’s plot the partitioning

M = 128
N = 128

xmin = -1.75
xmax = 2.5
ymin = -1.25
ymax = 1.75

xpt = np.linspace(xmin, xmax, M)
ypt = np.linspace(ymin, ymax, N)

To make the prediction go faster, we want to feed in a vector of these points, of the form:

[[xpt[0], ypt[0]],
 [xpt[1], ypt[1]],
 ...
]

We can see that this packs them into the vector

pairs = np.array(np.meshgrid(xpt, ypt)).T.reshape(-1, 2)
pairs[0]
array([-1.75, -1.25])

Now we do the prediction. We will get a vector out, which we reshape to match the original domain.

res = model.predict(pairs, verbose=0)
res.shape = (M, N)

Finally, round to 0 or 1

domain = np.where(res > 0.5, 1, 0)

and we can plot the data

fig, ax = plt.subplots()
ax.imshow(domain.T, origin="lower",
          extent=[xmin, xmax, ymin, ymax], alpha=0.25)
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]

ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
<matplotlib.collections.PathCollection at 0x7f0d48cd2490>
../_images/c27b0976d355ac072a3bc9916fe6622913e46e303560cf33d2358811e31dc023.png