Clustering

Clustering#

Clustering seeks to group data into clusters based on their properties and then allow us to predict which cluster a new member belongs.

import numpy as np
import matplotlib.pyplot as plt

We’ll use a dataset generator that is part of scikit-learn called make_moons. This generates data that falls into 2 different sets with a shape that looks like half-moons.

from sklearn import datasets
def generate_data():
    xvec, val = datasets.make_moons(200, noise=0.2)

    # encode the output to be 2 elements
    x = []
    v = []
    for xv, vv in zip(xvec, val):
        x.append(np.array(xv))
        v.append(vv)

    return np.array(x), np.array(v)
x, v = generate_data()

Let’s look at a point and it’s value

print(f"x = {x[0]}, value = {v[0]}")
x = [-0.88315383  0.81492065], value = 0

Now let’s plot the data

def plot_data(x, v):
    xpt = [q[0] for q in x]
    ypt = [q[1] for q in x]

    fig, ax = plt.subplots()
    ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
    ax.set_aspect("equal")
    return fig
fig = plot_data(x, v)
../_images/8a680210ad8b43a9ea761c3025b31f07d7bf35574389eb86e750c619fa2d87e8.png

We want to partition this domain into 2 regions, such that when we come in with a new point, we know which group it belongs to.

First we setup and train our network

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Input
from keras.optimizers import RMSprop
2025-06-27 14:02:34.846720: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-27 14:02:34.849924: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-27 14:02:34.858581: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1751032954.872907    2577 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1751032954.877115    2577 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1751032954.888701    2577 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751032954.888713    2577 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751032954.888715    2577 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751032954.888717    2577 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-06-27 14:02:34.893081: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
model = Sequential()
model.add(Input(shape=(2,)))
model.add(Dense(50, activation="relu"))
model.add(Dense(20, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
2025-06-27 14:02:36.689938: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
rms = RMSprop()
model.compile(loss='binary_crossentropy',
              optimizer=rms, metrics=['accuracy'])
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 50)             │           150 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 20)             │         1,020 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1)              │            21 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 1,191 (4.65 KB)
 Trainable params: 1,191 (4.65 KB)
 Non-trainable params: 0 (0.00 B)

We seem to need a lot of epochs here to get a good result

epochs = 100
results = model.fit(x, v, batch_size=50, epochs=epochs, verbose=2)
Epoch 1/100
4/4 - 0s - 115ms/step - accuracy: 0.7400 - loss: 0.6595
Epoch 2/100
4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.6193
Epoch 3/100
4/4 - 0s - 7ms/step - accuracy: 0.8400 - loss: 0.5919
Epoch 4/100
4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.5681
Epoch 5/100
4/4 - 0s - 6ms/step - accuracy: 0.8200 - loss: 0.5472
Epoch 6/100
4/4 - 0s - 6ms/step - accuracy: 0.8250 - loss: 0.5283
Epoch 7/100
4/4 - 0s - 6ms/step - accuracy: 0.8300 - loss: 0.5107
Epoch 8/100
4/4 - 0s - 6ms/step - accuracy: 0.8250 - loss: 0.4932
Epoch 9/100
4/4 - 0s - 6ms/step - accuracy: 0.8300 - loss: 0.4777
Epoch 10/100
4/4 - 0s - 6ms/step - accuracy: 0.8300 - loss: 0.4623
Epoch 11/100
4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4485
Epoch 12/100
4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4349
Epoch 13/100
4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4218
Epoch 14/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4096
Epoch 15/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3982
Epoch 16/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3873
Epoch 17/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3774
Epoch 18/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3680
Epoch 19/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3595
Epoch 20/100
4/4 - 0s - 6ms/step - accuracy: 0.8500 - loss: 0.3522
Epoch 21/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3447
Epoch 22/100
4/4 - 0s - 6ms/step - accuracy: 0.8500 - loss: 0.3384
Epoch 23/100
4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.3323
Epoch 24/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3261
Epoch 25/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3212
Epoch 26/100
4/4 - 0s - 8ms/step - accuracy: 0.8650 - loss: 0.3164
Epoch 27/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3112
Epoch 28/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.3066
Epoch 29/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.3025
Epoch 30/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2986
Epoch 31/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2957
Epoch 32/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2923
Epoch 33/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2882
Epoch 34/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2872
Epoch 35/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2827
Epoch 36/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2810
Epoch 37/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2793
Epoch 38/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2761
Epoch 39/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2743
Epoch 40/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2724
Epoch 41/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2702
Epoch 42/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2683
Epoch 43/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2658
Epoch 44/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2645
Epoch 45/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2652
Epoch 46/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2612
Epoch 47/100
4/4 - 0s - 7ms/step - accuracy: 0.8700 - loss: 0.2599
Epoch 48/100
4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.2593
Epoch 49/100
4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.2596
Epoch 50/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2563
Epoch 51/100
4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.2552
Epoch 52/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2537
Epoch 53/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2537
Epoch 54/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2519
Epoch 55/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2516
Epoch 56/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2495
Epoch 57/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2489
Epoch 58/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2479
Epoch 59/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2463
Epoch 60/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2459
Epoch 61/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2456
Epoch 62/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2455
Epoch 63/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.2431
Epoch 64/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2420
Epoch 65/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2428
Epoch 66/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2411
Epoch 67/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2412
Epoch 68/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2388
Epoch 69/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2378
Epoch 70/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2380
Epoch 71/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2390
Epoch 72/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2375
Epoch 73/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2370
Epoch 74/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2350
Epoch 75/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2340
Epoch 76/100
4/4 - 0s - 8ms/step - accuracy: 0.8800 - loss: 0.2366
Epoch 77/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2340
Epoch 78/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2341
Epoch 79/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2315
Epoch 80/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2311
Epoch 81/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2309
Epoch 82/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2299
Epoch 83/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2300
Epoch 84/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2298
Epoch 85/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2278
Epoch 86/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2286
Epoch 87/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2272
Epoch 88/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2295
Epoch 89/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2261
Epoch 90/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2251
Epoch 91/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2253
Epoch 92/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2240
Epoch 93/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2243
Epoch 94/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2226
Epoch 95/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2221
Epoch 96/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2225
Epoch 97/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2219
Epoch 98/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2197
Epoch 99/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2216
Epoch 100/100
4/4 - 0s - 9ms/step - accuracy: 0.8950 - loss: 0.2225
score = model.evaluate(x, v, verbose=0)
print(f"score = {score[0]}")
print(f"accuracy = {score[1]}")
score = 0.21789388358592987
accuracy = 0.8899999856948853

Let’s look at a prediction. We need to feed in a single point as an array of shape (N, 2), where N is the number of points

res = model.predict(np.array([[-2, 2]]))
res
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 31ms/step

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step
array([[2.2588192e-07]], dtype=float32)

We see that we get a floating point number. We will need to convert this to 0 or 1 by rounding.

Let’s plot the partitioning

M = 128
N = 128

xmin = -1.75
xmax = 2.5
ymin = -1.25
ymax = 1.75

xpt = np.linspace(xmin, xmax, M)
ypt = np.linspace(ymin, ymax, N)

To make the prediction go faster, we want to feed in a vector of these points, of the form:

[[xpt[0], ypt[0]],
 [xpt[1], ypt[1]],
 ...
]

We can see that this packs them into the vector

pairs = np.array(np.meshgrid(xpt, ypt)).T.reshape(-1, 2)
pairs[0]
array([-1.75, -1.25])

Now we do the prediction. We will get a vector out, which we reshape to match the original domain.

res = model.predict(pairs, verbose=0)
res.shape = (M, N)

Finally, round to 0 or 1

domain = np.where(res > 0.5, 1, 0)

and we can plot the data

fig, ax = plt.subplots()
ax.imshow(domain.T, origin="lower",
          extent=[xmin, xmax, ymin, ymax], alpha=0.25)
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]

ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
<matplotlib.collections.PathCollection at 0x7ff188dafc90>
../_images/fe76c595c370f344878c4a79deade21cde763944ba8bbdd5d15dea6102d1b616.png