Clustering#
Clustering seeks to group data into clusters based on their properties and then allow us to predict which cluster a new member belongs.
import numpy as np
import matplotlib.pyplot as plt
We’ll use a dataset generator that is part of scikit-learn called make_moons
. This generates data that falls into 2 different sets with a shape that looks like half-moons.
from sklearn import datasets
def generate_data():
xvec, val = datasets.make_moons(200, noise=0.2)
# encode the output to be 2 elements
x = []
v = []
for xv, vv in zip(xvec, val):
x.append(np.array(xv))
v.append(vv)
return np.array(x), np.array(v)
x, v = generate_data()
Let’s look at a point and it’s value
print(f"x = {x[0]}, value = {v[0]}")
x = [0.75648382 1.03602905], value = 0
Now let’s plot the data
def plot_data(x, v):
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]
fig, ax = plt.subplots()
ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
ax.set_aspect("equal")
return fig
fig = plot_data(x, v)

We want to partition this domain into 2 regions, such that when we come in with a new point, we know which group it belongs to.
First we setup and train our network
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Input
from keras.optimizers import RMSprop
2025-06-13 15:27:21.972462: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-13 15:27:21.975661: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-13 15:27:21.984298: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1749828441.998601 2594 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749828442.002783 2594 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1749828442.014572 2594 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749828442.014589 2594 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749828442.014591 2594 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749828442.014593 2594 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-06-13 15:27:22.018914: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
model = Sequential()
model.add(Input(shape=(2,)))
model.add(Dense(50, activation="relu"))
model.add(Dense(20, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
2025-06-13 15:27:23.873470: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
rms = RMSprop()
model.compile(loss='binary_crossentropy',
optimizer=rms, metrics=['accuracy'])
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 50) │ 150 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 20) │ 1,020 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 1) │ 21 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 1,191 (4.65 KB)
Trainable params: 1,191 (4.65 KB)
Non-trainable params: 0 (0.00 B)
We seem to need a lot of epochs here to get a good result
epochs = 100
results = model.fit(x, v, batch_size=50, epochs=epochs, verbose=2)
Epoch 1/100
4/4 - 0s - 118ms/step - accuracy: 0.4900 - loss: 0.6761
Epoch 2/100
4/4 - 0s - 7ms/step - accuracy: 0.6050 - loss: 0.6478
Epoch 3/100
4/4 - 0s - 7ms/step - accuracy: 0.7150 - loss: 0.6252
Epoch 4/100
4/4 - 0s - 6ms/step - accuracy: 0.7600 - loss: 0.6069
Epoch 5/100
4/4 - 0s - 6ms/step - accuracy: 0.8000 - loss: 0.5906
Epoch 6/100
4/4 - 0s - 7ms/step - accuracy: 0.8200 - loss: 0.5754
Epoch 7/100
4/4 - 0s - 6ms/step - accuracy: 0.8150 - loss: 0.5611
Epoch 8/100
4/4 - 0s - 6ms/step - accuracy: 0.8150 - loss: 0.5480
Epoch 9/100
4/4 - 0s - 6ms/step - accuracy: 0.8200 - loss: 0.5336
Epoch 10/100
4/4 - 0s - 7ms/step - accuracy: 0.8150 - loss: 0.5206
Epoch 11/100
4/4 - 0s - 8ms/step - accuracy: 0.8150 - loss: 0.5080
Epoch 12/100
4/4 - 0s - 6ms/step - accuracy: 0.8250 - loss: 0.4959
Epoch 13/100
4/4 - 0s - 6ms/step - accuracy: 0.8300 - loss: 0.4834
Epoch 14/100
4/4 - 0s - 6ms/step - accuracy: 0.8300 - loss: 0.4720
Epoch 15/100
4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4600
Epoch 16/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4491
Epoch 17/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4377
Epoch 18/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4268
Epoch 19/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4175
Epoch 20/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.4065
Epoch 21/100
4/4 - 0s - 8ms/step - accuracy: 0.8600 - loss: 0.3963
Epoch 22/100
4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.3873
Epoch 23/100
4/4 - 0s - 7ms/step - accuracy: 0.8700 - loss: 0.3775
Epoch 24/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.3687
Epoch 25/100
4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.3614
Epoch 26/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.3535
Epoch 27/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.3460
Epoch 28/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.3392
Epoch 29/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.3342
Epoch 30/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.3276
Epoch 31/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.3220
Epoch 32/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.3173
Epoch 33/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.3128
Epoch 34/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.3095
Epoch 35/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.3043
Epoch 36/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.3020
Epoch 37/100
4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2986
Epoch 38/100
4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2947
Epoch 39/100
4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2916
Epoch 40/100
4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2892
Epoch 41/100
4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2880
Epoch 42/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2841
Epoch 43/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2817
Epoch 44/100
4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2814
Epoch 45/100
4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2799
Epoch 46/100
4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2766
Epoch 47/100
4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2751
Epoch 48/100
4/4 - 0s - 7ms/step - accuracy: 0.8950 - loss: 0.2732
Epoch 49/100
4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2742
Epoch 50/100
4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2713
Epoch 51/100
4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2689
Epoch 52/100
4/4 - 0s - 7ms/step - accuracy: 0.8950 - loss: 0.2686
Epoch 53/100
4/4 - 0s - 7ms/step - accuracy: 0.8950 - loss: 0.2661
Epoch 54/100
4/4 - 0s - 7ms/step - accuracy: 0.8950 - loss: 0.2653
Epoch 55/100
4/4 - 0s - 7ms/step - accuracy: 0.8950 - loss: 0.2636
Epoch 56/100
4/4 - 0s - 7ms/step - accuracy: 0.8950 - loss: 0.2616
Epoch 57/100
4/4 - 0s - 7ms/step - accuracy: 0.8950 - loss: 0.2606
Epoch 58/100
4/4 - 0s - 7ms/step - accuracy: 0.9000 - loss: 0.2600
Epoch 59/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2603
Epoch 60/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2569
Epoch 61/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2563
Epoch 62/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2547
Epoch 63/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2547
Epoch 64/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2516
Epoch 65/100
4/4 - 0s - 7ms/step - accuracy: 0.9000 - loss: 0.2535
Epoch 66/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2500
Epoch 67/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2494
Epoch 68/100
4/4 - 0s - 7ms/step - accuracy: 0.9000 - loss: 0.2520
Epoch 69/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2471
Epoch 70/100
4/4 - 0s - 7ms/step - accuracy: 0.9000 - loss: 0.2476
Epoch 71/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2453
Epoch 72/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2447
Epoch 73/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2443
Epoch 74/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2441
Epoch 75/100
4/4 - 0s - 7ms/step - accuracy: 0.9000 - loss: 0.2423
Epoch 76/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2406
Epoch 77/100
4/4 - 0s - 7ms/step - accuracy: 0.9000 - loss: 0.2415
Epoch 78/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2392
Epoch 79/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2391
Epoch 80/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2371
Epoch 81/100
4/4 - 0s - 8ms/step - accuracy: 0.9050 - loss: 0.2370
Epoch 82/100
4/4 - 0s - 7ms/step - accuracy: 0.9050 - loss: 0.2363
Epoch 83/100
4/4 - 0s - 7ms/step - accuracy: 0.9100 - loss: 0.2358
Epoch 84/100
4/4 - 0s - 7ms/step - accuracy: 0.9100 - loss: 0.2360
Epoch 85/100
4/4 - 0s - 7ms/step - accuracy: 0.9150 - loss: 0.2337
Epoch 86/100
4/4 - 0s - 7ms/step - accuracy: 0.9150 - loss: 0.2350
Epoch 87/100
4/4 - 0s - 7ms/step - accuracy: 0.9100 - loss: 0.2338
Epoch 88/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.2339
Epoch 89/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.2322
Epoch 90/100
4/4 - 0s - 6ms/step - accuracy: 0.9200 - loss: 0.2311
Epoch 91/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.2290
Epoch 92/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.2290
Epoch 93/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.2280
Epoch 94/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.2271
Epoch 95/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.2267
Epoch 96/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2273
Epoch 97/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.2256
Epoch 98/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2239
Epoch 99/100
4/4 - 0s - 7ms/step - accuracy: 0.9150 - loss: 0.2235
Epoch 100/100
4/4 - 0s - 7ms/step - accuracy: 0.9200 - loss: 0.2239
score = model.evaluate(x, v, verbose=0)
print(f"score = {score[0]}")
print(f"accuracy = {score[1]}")
score = 0.2208416610956192
accuracy = 0.9200000166893005
Let’s look at a prediction. We need to feed in a single point as an array of shape (N, 2)
, where N
is the number of points
res = model.predict(np.array([[-2, 2]]))
res
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 32ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step
array([[5.165168e-06]], dtype=float32)
We see that we get a floating point number. We will need to convert this to 0 or 1 by rounding.
Let’s plot the partitioning
M = 128
N = 128
xmin = -1.75
xmax = 2.5
ymin = -1.25
ymax = 1.75
xpt = np.linspace(xmin, xmax, M)
ypt = np.linspace(ymin, ymax, N)
To make the prediction go faster, we want to feed in a vector of these points, of the form:
[[xpt[0], ypt[0]],
[xpt[1], ypt[1]],
...
]
We can see that this packs them into the vector
pairs = np.array(np.meshgrid(xpt, ypt)).T.reshape(-1, 2)
pairs[0]
array([-1.75, -1.25])
Now we do the prediction. We will get a vector out, which we reshape to match the original domain.
res = model.predict(pairs, verbose=0)
res.shape = (M, N)
Finally, round to 0 or 1
domain = np.where(res > 0.5, 1, 0)
and we can plot the data
fig, ax = plt.subplots()
ax.imshow(domain.T, origin="lower",
extent=[xmin, xmax, ymin, ymax], alpha=0.25)
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]
ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
<matplotlib.collections.PathCollection at 0x7f26e491a010>
