netneurotools.cluster.match_assignments

netneurotools.cluster.match_assignments(assignments, target=None, seed=None)[source]

Re-label clusters in columns of assignments to best match target.

Uses match_cluster_labels() to align cluster assignments.

Parameters:
  • assignments ((N, M) array_like) – Array of M clustering assignments for N subjects

  • target ((N,) array_like, optional) – Target clustering assignments to which all columns should be matched. If provided as an integer the relevant column in assignments will be selected. If not specified a (semi-)random column in assignments is chosen; because of the potential discontinuity introduced when matching an N-cluster solution to an N+1-cluster solution, the “random” target columns will be one assignments with the lowest cluster number. See Examples for more information. Default: None

  • seed ({int, np.random.RandomState instance, None}, optional) – Seed for random number generation; only used if target is not provided. Default: None

Returns:

assignments – Provided array with re-labeled cluster solutions to better match across M assignments

Return type:

(N, M) numpy.ndarray

Examples

>>> from netneurotools import cluster

First we can construct a matrix of N samples clustered M times (in this case, M is three) . Since cluster labels are generally arbitrary we can see that, while the same clusters were found each time, they were given different labels:

>>> assignments = np.array([[0, 0, 1],
...                         [0, 0, 1],
...                         [0, 0, 1],
...                         [1, 2, 0],
...                         [1, 2, 0],
...                         [1, 2, 0],
...                         [2, 1, 2],
...                         [2, 1, 2]])

We would like to match the assignments so they’re all the same. Since one of the columns will be randomly picked as the “target” solution, we provide a seed to ensure reproducibility in the selection:

>>> cluster.match_assignments(assignments, seed=1234)
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 0],
       [2, 2, 2],
       [2, 2, 2]])

Alternatively, if assignments has clustering solutions with different numbers of clusters and no target is specified, the chosen target will be one of the columns with the smallest number of clusters:

>>> assignments = np.array([[0, 0, 1],
...                         [0, 0, 1],
...                         [0, 0, 1],
...                         [1, 2, 0],
...                         [1, 2, 0],
...                         [1, 2, 0],
...                         [1, 1, 2],
...                         [1, 1, 2]])
>>> cluster.match_assignments(assignments)
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 2, 2],
       [1, 2, 2]])