velocyto.neighbors module¶

velocyto.neighbors.balance_knn_loop[source]¶

Fast and greedy algorythm to balance a K-NN graph so that no node is the NN to more than maxl other nodes

Parameters:

dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
lsi (np.ndarray (samples,)) – degree of connectivity (l) sorted indexes
maxl (int) – max degree of connectivity (from others to self) allowed
k (int) – number of neighbours in the final graph
return_distance (bool) – whether to return distance

Returns:

dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
dist_new (np.ndarray (samples, k+1)) – distances to the NN
l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i

velocyto.neighbors.balance_knn_loop_constrained[source]¶

Fast and greedy algorythm to balance a K-NN graph so that no node is the NN to more than maxl other nodes

Parameters:

dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
lsi (np.ndarray (samples,)) – degree of connectivity (l) sorted indexes
groups (np.ndarray (samples,)) – labels of the samples that constrain the connectivity
maxl (int) – max degree of connectivity (from others to self) allowed
k (int) – number of neighbours in the final graph
return_distance (bool) – whether to return distance

Returns:

dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
dist_new (np.ndarray (samples, k+1)) – distances to the NN
l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i

velocyto.neighbors.knn_balance(dsi: numpy.ndarray, dist: numpy.ndarray = None, maxl: int = 200, k: int = 60, constraint: numpy.ndarray = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶

Balance a K-NN graph so that no node is the NN to more than maxl other nodes

Parameters:

dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
maxl (int) – max degree of connectivity allowed
k (int) – number of neighbours in the final graph
constraint (np.ndarray (samples,)) – labels of the samples that constrain the connectivity

Returns:

dist_new (np.ndarray (samples, k+1)) – distances to the NN
dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i

class velocyto.neighbors.BalancedKNN(k: int = 50, sight_k: int = 100, maxl: int = 200, constraint: numpy.ndarray = None, mode: str = 'distance', metric: str = 'euclidean', n_jobs: int = 4)[source]¶

Bases: object

Greedy algorythm to balance a K-nearest neighbour graph

It has an API similar to scikit-learn

Parameters:

k (int (default=50)) – the number of neighbours in the final graph
sight_k (int (default=100)) – the number of neighbours in the initialization graph It correspondent to the farthest neighbour that a sample is allowed to connect to when no closest neighbours are allowed. If sight_k is reached then the matrix is filled with the sample itself
maxl (int (default=200)) – max degree of connectivity allowed. Avoids the presence of hubs in the graph, it is the maximum number of neighbours that are allowed to contact a node before the node is blocked
constraint (np.ndarray (default=None)) – a numpy array defining the dirrent groups within wich connectivity is allowed if “clusters” it uses the clusters as in self.clusters_ix
mode (str (default="connectivity")) – decide wich kind of utput “distance” or “connectivity”
n_jobs (int (default=4)) – parallelization of the standard KNN search preformed at initialization

n_samples¶

fit(data: numpy.ndarray, sight_k: int = None) → Any[source]¶

Fits the model

data: np.ndarray (samples, features): np
sight_k: int: the farthest point that a node is allowed to connect to when its closest neighbours are not allowed

kneighbors(X: numpy.ndarray = None, maxl: int = None, mode: str = 'distance') → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶

Finds the K-neighbors of a point.

Returns indices of and distances to the neighbors of each point.

Parameters:

X (array-like, shape (n_query, n_features),) – The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.
maxl (int) – max degree of connectivity allowed
mode ("distance" or "connectivity") – Decides the kind of output

Returns:

dist_new (np.ndarray (samples, k+1)) – distances to the NN
dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i
NOTE
First column (0) correspond to the sample itself, the nearest nenigbour is at the second column (1)

kneighbors_graph(X: numpy.ndarray = None, maxl: int = None, mode: str = 'distance') → scipy.sparse.csr.csr_matrix[source]¶

Retrun the K-neighbors graph as a sparse csr matrix

Parameters:

X (array-like, shape (n_query, n_features),) – The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.
maxl (int) – max degree of connectivity allowed
mode ("distance" or "connectivity") – Decides the kind of output

Returns:

neighbor_graph (scipy.sparse.csr_matrix) – The values are either distances or connectivity dependig of the mode parameter
NOTE (The diagonal will be zero even though the value 0 is actually stored)

smooth_data(data_to_smooth: numpy.ndarray, X: numpy.ndarray = None, maxl: int = None, mutual: bool = False, only_increase: bool = True) → numpy.ndarray[source]¶

Use the wights learned from knn to smooth any data matrix

Parameters:	data_to_smooth ((features, samples) !! NOTE !! this is different from the input (for speed issues)) – if the data is provided (samples, features), this will be detected and the correct operation performed at cost of some effciency In the case where samples == samples then the shape (features, samples) will be assumed

velocyto.neighbors.knn_distance_matrix(data: numpy.ndarray, metric: str = None, k: int = 40, mode: str = 'connectivity', n_jobs: int = 4) → scipy.sparse.csr.csr_matrix[source]¶

Calculate a nearest neighbour distance matrix

Notice that k is meant as the actual number of neighbors NOT INCLUDING itself To achieve that we call kneighbors_graph with X = None

velocyto.neighbors.make_mutual(knn: scipy.sparse.csr.csr_matrix) → scipy.sparse.coo.coo_matrix[source]¶: Removes edges between neighbours that are not mutual

velocyto.neighbors.connectivity_to_weights(mknn: scipy.sparse.csr.csr_matrix, axis: int = 1) → scipy.sparse.lil.lil_matrix[source]¶: Convert a binary connectivity matrix to weights ready to be multiplied to smooth a data matrix

velocyto.neighbors.min_n(row_data: numpy.ndarray, row_indices: numpy.ndarray, n: int) → Tuple[numpy.ndarray, numpy.ndarray][source]¶: Find the smallest entry and smallest indices of a row

velocyto.neighbors.take_top(matrix: scipy.sparse.base.spmatrix, n: int) → scipy.sparse.lil.lil_matrix[source]¶: Filter the top nearest neighbours from a sprse distance matrix

velocyto.neighbors.convolve_by_sparse_weights(data: numpy.ndarray, w: scipy.sparse.csr.csr_matrix) → numpy.ndarray[source]¶

Use the wights learned from knn to convolve any data matrix

NOTE: A improved implementation could detect wich one is sparse and wich kind of sparse and perform faster computation

velocyto.neighbors.knn_smooth_weights(matrix: numpy.ndarray, metric: str = 'euclidean', k_search: int = 20, k_mutual: int = 10, n_jobs: int = 10) → Tuple[scipy.sparse.base.spmatrix, scipy.sparse.csr.csr_matrix][source]¶

Find the weights to smooth the dataset using efficient sparse matrix operations

Parameters:	matrix – (genes, cells) expression matrix metric – k_search – int the first k nearest neighbour search number of neighbours k_mutual – int the number of mutual neighbours to select n_jobs – return_knn –

Retruns: weights (, knn)