velocyto.neighbors module

velocyto.neighbors.balance_knn_loop[source]

Fast and greedy algorythm to balance a K-NN graph so that no node is the NN to more than maxl other nodes

Parameters:
  • dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
  • dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
  • lsi (np.ndarray (samples,)) – degree of connectivity (l) sorted indexes
  • maxl (int) – max degree of connectivity (from others to self) allowed
  • k (int) – number of neighbours in the final graph
  • return_distance (bool) – whether to return distance
Returns:

  • dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
  • dist_new (np.ndarray (samples, k+1)) – distances to the NN
  • l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i

velocyto.neighbors.balance_knn_loop_constrained[source]

Fast and greedy algorythm to balance a K-NN graph so that no node is the NN to more than maxl other nodes

Parameters:
  • dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
  • dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
  • lsi (np.ndarray (samples,)) – degree of connectivity (l) sorted indexes
  • groups (np.ndarray (samples,)) – labels of the samples that constrain the connectivity
  • maxl (int) – max degree of connectivity (from others to self) allowed
  • k (int) – number of neighbours in the final graph
  • return_distance (bool) – whether to return distance
Returns:

  • dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
  • dist_new (np.ndarray (samples, k+1)) – distances to the NN
  • l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i

velocyto.neighbors.knn_balance(dsi: numpy.ndarray, dist: numpy.ndarray = None, maxl: int = 200, k: int = 60, constraint: numpy.ndarray = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]

Balance a K-NN graph so that no node is the NN to more than maxl other nodes

Parameters:
  • dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
  • dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
  • maxl (int) – max degree of connectivity allowed
  • k (int) – number of neighbours in the final graph
  • constraint (np.ndarray (samples,)) – labels of the samples that constrain the connectivity
Returns:

  • dist_new (np.ndarray (samples, k+1)) – distances to the NN
  • dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
  • l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i

class velocyto.neighbors.BalancedKNN(k: int = 50, sight_k: int = 100, maxl: int = 200, constraint: numpy.ndarray = None, mode: str = 'distance', metric: str = 'euclidean', n_jobs: int = 4)[source]

Bases: object

Greedy algorythm to balance a K-nearest neighbour graph

It has an API similar to scikit-learn

Parameters:
  • k (int (default=50)) – the number of neighbours in the final graph
  • sight_k (int (default=100)) – the number of neighbours in the initialization graph It correspondent to the farthest neighbour that a sample is allowed to connect to when no closest neighbours are allowed. If sight_k is reached then the matrix is filled with the sample itself
  • maxl (int (default=200)) – max degree of connectivity allowed. Avoids the presence of hubs in the graph, it is the maximum number of neighbours that are allowed to contact a node before the node is blocked
  • constraint (np.ndarray (default=None)) – a numpy array defining the dirrent groups within wich connectivity is allowed if “clusters” it uses the clusters as in self.clusters_ix
  • mode (str (default="connectivity")) – decide wich kind of utput “distance” or “connectivity”
  • n_jobs (int (default=4)) – parallelization of the standard KNN search preformed at initialization
n_samples
fit(data: numpy.ndarray, sight_k: int = None) → Any[source]

Fits the model

data: np.ndarray (samples, features)
np
sight_k: int
the farthest point that a node is allowed to connect to when its closest neighbours are not allowed
kneighbors(X: numpy.ndarray = None, maxl: int = None, mode: str = 'distance') → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]

Finds the K-neighbors of a point.

Returns indices of and distances to the neighbors of each point.

Parameters:
  • X (array-like, shape (n_query, n_features),) – The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.
  • maxl (int) – max degree of connectivity allowed
  • mode ("distance" or "connectivity") – Decides the kind of output
Returns:

  • dist_new (np.ndarray (samples, k+1)) – distances to the NN
  • dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
  • l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i
  • NOTE
  • First column (0) correspond to the sample itself, the nearest nenigbour is at the second column (1)

kneighbors_graph(X: numpy.ndarray = None, maxl: int = None, mode: str = 'distance') → scipy.sparse.csr.csr_matrix[source]

Retrun the K-neighbors graph as a sparse csr matrix

Parameters:
  • X (array-like, shape (n_query, n_features),) – The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.
  • maxl (int) – max degree of connectivity allowed
  • mode ("distance" or "connectivity") – Decides the kind of output
Returns:

  • neighbor_graph (scipy.sparse.csr_matrix) – The values are either distances or connectivity dependig of the mode parameter
  • NOTE (The diagonal will be zero even though the value 0 is actually stored)

smooth_data(data_to_smooth: numpy.ndarray, X: numpy.ndarray = None, maxl: int = None, mutual: bool = False, only_increase: bool = True) → numpy.ndarray[source]

Use the wights learned from knn to smooth any data matrix

Parameters:data_to_smooth ((features, samples) !! NOTE !! this is different from the input (for speed issues)) – if the data is provided (samples, features), this will be detected and the correct operation performed at cost of some effciency In the case where samples == samples then the shape (features, samples) will be assumed
velocyto.neighbors.knn_distance_matrix(data: numpy.ndarray, metric: str = None, k: int = 40, mode: str = 'connectivity', n_jobs: int = 4) → scipy.sparse.csr.csr_matrix[source]

Calculate a nearest neighbour distance matrix

Notice that k is meant as the actual number of neighbors NOT INCLUDING itself To achieve that we call kneighbors_graph with X = None

velocyto.neighbors.make_mutual(knn: scipy.sparse.csr.csr_matrix) → scipy.sparse.coo.coo_matrix[source]

Removes edges between neighbours that are not mutual

velocyto.neighbors.connectivity_to_weights(mknn: scipy.sparse.csr.csr_matrix, axis: int = 1) → scipy.sparse.lil.lil_matrix[source]

Convert a binary connectivity matrix to weights ready to be multiplied to smooth a data matrix

velocyto.neighbors.min_n(row_data: numpy.ndarray, row_indices: numpy.ndarray, n: int) → Tuple[numpy.ndarray, numpy.ndarray][source]

Find the smallest entry and smallest indices of a row

velocyto.neighbors.take_top(matrix: scipy.sparse.base.spmatrix, n: int) → scipy.sparse.lil.lil_matrix[source]

Filter the top nearest neighbours from a sprse distance matrix

velocyto.neighbors.convolve_by_sparse_weights(data: numpy.ndarray, w: scipy.sparse.csr.csr_matrix) → numpy.ndarray[source]

Use the wights learned from knn to convolve any data matrix

NOTE: A improved implementation could detect wich one is sparse and wich kind of sparse and perform faster computation

velocyto.neighbors.knn_smooth_weights(matrix: numpy.ndarray, metric: str = 'euclidean', k_search: int = 20, k_mutual: int = 10, n_jobs: int = 10) → Tuple[scipy.sparse.base.spmatrix, scipy.sparse.csr.csr_matrix][source]

Find the weights to smooth the dataset using efficient sparse matrix operations

Parameters:
  • matrix – (genes, cells) expression matrix
  • metric
  • k_search – int the first k nearest neighbour search number of neighbours
  • k_mutual – int the number of mutual neighbours to select
  • n_jobs
  • return_knn
Retruns
weights (, knn)