Clustering

Clustering#

Clustering seeks to group data into clusters based on their properties and then allow us to predict which cluster a new member belongs.

import numpy as np
import matplotlib.pyplot as plt

We’ll use a dataset generator that is part of scikit-learn called make_moons. This generates data that falls into 2 different sets with a shape that looks like half-moons.

from sklearn import datasets

def generate_data():
    xvec, val = datasets.make_moons(200, noise=0.2)

    # encode the output to be 2 elements
    x = []
    v = []
    for xv, vv in zip(xvec, val):
        x.append(np.array(xv))
        v.append(vv)

    return np.array(x), np.array(v)

x, v = generate_data()

Let’s look at a point and it’s value

print(f"x = {x[0]}, value = {v[0]}")

x = [-0.88315383  0.81492065], value = 0

Now let’s plot the data

def plot_data(x, v):
    xpt = [q[0] for q in x]
    ypt = [q[1] for q in x]

    fig, ax = plt.subplots()
    ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
    ax.set_aspect("equal")
    return fig

fig = plot_data(x, v)

../_images/8a680210ad8b43a9ea761c3025b31f07d7bf35574389eb86e750c619fa2d87e8.png

We want to partition this domain into 2 regions, such that when we come in with a new point, we know which group it belongs to.

First we setup and train our network

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Input
from keras.optimizers import RMSprop

2025-06-27 14:02:34.846720: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-27 14:02:34.849924: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-27 14:02:34.858581: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1751032954.872907    2577 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1751032954.877115    2577 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1751032954.888701    2577 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751032954.888713    2577 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751032954.888715    2577 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751032954.888717    2577 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-06-27 14:02:34.893081: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

model = Sequential()
model.add(Input(shape=(2,)))
model.add(Dense(50, activation="relu"))
model.add(Dense(20, activation="relu"))
model.add(Dense(1, activation="sigmoid"))

2025-06-27 14:02:36.689938: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)

rms = RMSprop()
model.compile(loss='binary_crossentropy',
              optimizer=rms, metrics=['accuracy'])

model.summary()

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 50)             │           150 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 20)             │         1,020 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1)              │            21 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 1,191 (4.65 KB)

 Trainable params: 1,191 (4.65 KB)

 Non-trainable params: 0 (0.00 B)

We seem to need a lot of epochs here to get a good result

epochs = 100
results = model.fit(x, v, batch_size=50, epochs=epochs, verbose=2)

Epoch 1/100

4/4 - 0s - 115ms/step - accuracy: 0.7400 - loss: 0.6595

Epoch 2/100

4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.6193

Epoch 3/100

4/4 - 0s - 7ms/step - accuracy: 0.8400 - loss: 0.5919

Epoch 4/100

4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.5681

Epoch 5/100

4/4 - 0s - 6ms/step - accuracy: 0.8200 - loss: 0.5472

Epoch 6/100

4/4 - 0s - 6ms/step - accuracy: 0.8250 - loss: 0.5283

Epoch 7/100

4/4 - 0s - 6ms/step - accuracy: 0.8300 - loss: 0.5107

Epoch 8/100

4/4 - 0s - 6ms/step - accuracy: 0.8250 - loss: 0.4932

Epoch 9/100

4/4 - 0s - 6ms/step - accuracy: 0.8300 - loss: 0.4777

Epoch 10/100

4/4 - 0s - 6ms/step - accuracy: 0.8300 - loss: 0.4623

Epoch 11/100

4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4485

Epoch 12/100

4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4349

Epoch 13/100

4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4218

Epoch 14/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4096

Epoch 15/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3982

Epoch 16/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3873

Epoch 17/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3774

Epoch 18/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3680

Epoch 19/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3595

Epoch 20/100

4/4 - 0s - 6ms/step - accuracy: 0.8500 - loss: 0.3522

Epoch 21/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3447

Epoch 22/100

4/4 - 0s - 6ms/step - accuracy: 0.8500 - loss: 0.3384

Epoch 23/100

4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.3323

Epoch 24/100

4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3261

Epoch 25/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3212

Epoch 26/100

4/4 - 0s - 8ms/step - accuracy: 0.8650 - loss: 0.3164

Epoch 27/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3112

Epoch 28/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.3066

Epoch 29/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.3025

Epoch 30/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2986

Epoch 31/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2957

Epoch 32/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2923

Epoch 33/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2882

Epoch 34/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2872

Epoch 35/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2827

Epoch 36/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2810

Epoch 37/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2793

Epoch 38/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2761

Epoch 39/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2743

Epoch 40/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2724

Epoch 41/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2702

Epoch 42/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2683

Epoch 43/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2658

Epoch 44/100

4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2645

Epoch 45/100

4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2652

Epoch 46/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2612

Epoch 47/100

4/4 - 0s - 7ms/step - accuracy: 0.8700 - loss: 0.2599

Epoch 48/100

4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.2593

Epoch 49/100

4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.2596

Epoch 50/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2563

Epoch 51/100

4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.2552

Epoch 52/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2537

Epoch 53/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2537

Epoch 54/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2519

Epoch 55/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2516

Epoch 56/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2495

Epoch 57/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2489

Epoch 58/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2479

Epoch 59/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2463

Epoch 60/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2459

Epoch 61/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2456

Epoch 62/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2455

Epoch 63/100

4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.2431

Epoch 64/100

4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2420

Epoch 65/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2428

Epoch 66/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2411

Epoch 67/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2412

Epoch 68/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2388

Epoch 69/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2378

Epoch 70/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2380

Epoch 71/100

4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2390

Epoch 72/100

4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2375

Epoch 73/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2370

Epoch 74/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2350

Epoch 75/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2340

Epoch 76/100

4/4 - 0s - 8ms/step - accuracy: 0.8800 - loss: 0.2366

Epoch 77/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2340

Epoch 78/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2341

Epoch 79/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2315

Epoch 80/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2311

Epoch 81/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2309

Epoch 82/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2299

Epoch 83/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2300

Epoch 84/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2298

Epoch 85/100

4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2278

Epoch 86/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2286

Epoch 87/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2272

Epoch 88/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2295

Epoch 89/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2261

Epoch 90/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2251

Epoch 91/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2253

Epoch 92/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2240

Epoch 93/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2243

Epoch 94/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2226

Epoch 95/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2221

Epoch 96/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2225

Epoch 97/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2219

Epoch 98/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2197

Epoch 99/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2216

Epoch 100/100

4/4 - 0s - 9ms/step - accuracy: 0.8950 - loss: 0.2225

score = model.evaluate(x, v, verbose=0)
print(f"score = {score[0]}")
print(f"accuracy = {score[1]}")

score = 0.21789388358592987
accuracy = 0.8899999856948853

Let’s look at a prediction. We need to feed in a single point as an array of shape (N, 2), where N is the number of points

res = model.predict(np.array([[-2, 2]]))
res

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 31ms/step


1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step

array([[2.2588192e-07]], dtype=float32)

We see that we get a floating point number. We will need to convert this to 0 or 1 by rounding.

Let’s plot the partitioning

M = 128
N = 128

xmin = -1.75
xmax = 2.5
ymin = -1.25
ymax = 1.75

xpt = np.linspace(xmin, xmax, M)
ypt = np.linspace(ymin, ymax, N)

To make the prediction go faster, we want to feed in a vector of these points, of the form:

[[xpt[0], ypt[0]],
 [xpt[1], ypt[1]],
 ...
]

We can see that this packs them into the vector

pairs = np.array(np.meshgrid(xpt, ypt)).T.reshape(-1, 2)
pairs[0]

array([-1.75, -1.25])

Now we do the prediction. We will get a vector out, which we reshape to match the original domain.

res = model.predict(pairs, verbose=0)
res.shape = (M, N)

Finally, round to 0 or 1

domain = np.where(res > 0.5, 1, 0)

and we can plot the data

fig, ax = plt.subplots()
ax.imshow(domain.T, origin="lower",
          extent=[xmin, xmax, ymin, ymax], alpha=0.25)
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]

ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")

<matplotlib.collections.PathCollection at 0x7ff188dafc90>

../_images/fe76c595c370f344878c4a79deade21cde763944ba8bbdd5d15dea6102d1b616.png