velocyto.neighbors module¶

velocyto.neighbors.
balance_knn_loop
[source]¶ Fast and greedy algorythm to balance a KNN graph so that no node is the NN to more than maxl other nodes
Parameters:  dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
 dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
 lsi (np.ndarray (samples,)) – degree of connectivity (l) sorted indexes
 maxl (int) – max degree of connectivity (from others to self) allowed
 k (int) – number of neighbours in the final graph
 return_distance (bool) – whether to return distance
Returns:  dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
 dist_new (np.ndarray (samples, k+1)) – distances to the NN
 l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i

velocyto.neighbors.
balance_knn_loop_constrained
[source]¶ Fast and greedy algorythm to balance a KNN graph so that no node is the NN to more than maxl other nodes
Parameters:  dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
 dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
 lsi (np.ndarray (samples,)) – degree of connectivity (l) sorted indexes
 groups (np.ndarray (samples,)) – labels of the samples that constrain the connectivity
 maxl (int) – max degree of connectivity (from others to self) allowed
 k (int) – number of neighbours in the final graph
 return_distance (bool) – whether to return distance
Returns:  dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
 dist_new (np.ndarray (samples, k+1)) – distances to the NN
 l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i

velocyto.neighbors.
knn_balance
(dsi: numpy.ndarray, dist: numpy.ndarray = None, maxl: int = 200, k: int = 60, constraint: numpy.ndarray = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Balance a KNN graph so that no node is the NN to more than maxl other nodes
Parameters:  dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
 dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
 maxl (int) – max degree of connectivity allowed
 k (int) – number of neighbours in the final graph
 constraint (np.ndarray (samples,)) – labels of the samples that constrain the connectivity
Returns:  dist_new (np.ndarray (samples, k+1)) – distances to the NN
 dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
 l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i

class
velocyto.neighbors.
BalancedKNN
(k: int = 50, sight_k: int = 100, maxl: int = 200, constraint: numpy.ndarray = None, mode: str = 'distance', metric: str = 'euclidean', n_jobs: int = 4)[source]¶ Bases:
object
Greedy algorythm to balance a Knearest neighbour graph
It has an API similar to scikitlearn
Parameters:  k (int (default=50)) – the number of neighbours in the final graph
 sight_k (int (default=100)) – the number of neighbours in the initialization graph It correspondent to the farthest neighbour that a sample is allowed to connect to when no closest neighbours are allowed. If sight_k is reached then the matrix is filled with the sample itself
 maxl (int (default=200)) – max degree of connectivity allowed. Avoids the presence of hubs in the graph, it is the maximum number of neighbours that are allowed to contact a node before the node is blocked
 constraint (np.ndarray (default=None)) – a numpy array defining the dirrent groups within wich connectivity is allowed if “clusters” it uses the clusters as in self.clusters_ix
 mode (str (default="connectivity")) – decide wich kind of utput “distance” or “connectivity”
 n_jobs (int (default=4)) – parallelization of the standard KNN search preformed at initialization

n_samples
¶

fit
(data: numpy.ndarray, sight_k: int = None) → Any[source]¶ Fits the model
 data: np.ndarray (samples, features)
 np
 sight_k: int
 the farthest point that a node is allowed to connect to when its closest neighbours are not allowed

kneighbors
(X: numpy.ndarray = None, maxl: int = None, mode: str = 'distance') → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Finds the Kneighbors of a point.
Returns indices of and distances to the neighbors of each point.
Parameters:  X (arraylike, shape (n_query, n_features),) – The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.
 maxl (int) – max degree of connectivity allowed
 mode ("distance" or "connectivity") – Decides the kind of output
Returns:  dist_new (np.ndarray (samples, k+1)) – distances to the NN
 dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
 l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i
 NOTE
 First column (0) correspond to the sample itself, the nearest nenigbour is at the second column (1)

kneighbors_graph
(X: numpy.ndarray = None, maxl: int = None, mode: str = 'distance') → scipy.sparse.csr.csr_matrix[source]¶ Retrun the Kneighbors graph as a sparse csr matrix
Parameters:  X (arraylike, shape (n_query, n_features),) – The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.
 maxl (int) – max degree of connectivity allowed
 mode ("distance" or "connectivity") – Decides the kind of output
Returns:  neighbor_graph (scipy.sparse.csr_matrix) – The values are either distances or connectivity dependig of the mode parameter
 NOTE (The diagonal will be zero even though the value 0 is actually stored)

smooth_data
(data_to_smooth: numpy.ndarray, X: numpy.ndarray = None, maxl: int = None, mutual: bool = False, only_increase: bool = True) → numpy.ndarray[source]¶ Use the wights learned from knn to smooth any data matrix
Parameters: data_to_smooth ((features, samples) !! NOTE !! this is different from the input (for speed issues)) – if the data is provided (samples, features), this will be detected and the correct operation performed at cost of some effciency In the case where samples == samples then the shape (features, samples) will be assumed

velocyto.neighbors.
knn_distance_matrix
(data: numpy.ndarray, metric: str = None, k: int = 40, mode: str = 'connectivity', n_jobs: int = 4) → scipy.sparse.csr.csr_matrix[source]¶ Calculate a nearest neighbour distance matrix
Notice that k is meant as the actual number of neighbors NOT INCLUDING itself To achieve that we call kneighbors_graph with X = None

velocyto.neighbors.
make_mutual
(knn: scipy.sparse.csr.csr_matrix) → scipy.sparse.coo.coo_matrix[source]¶ Removes edges between neighbours that are not mutual

velocyto.neighbors.
connectivity_to_weights
(mknn: scipy.sparse.csr.csr_matrix, axis: int = 1) → scipy.sparse.lil.lil_matrix[source]¶ Convert a binary connectivity matrix to weights ready to be multiplied to smooth a data matrix

velocyto.neighbors.
min_n
(row_data: numpy.ndarray, row_indices: numpy.ndarray, n: int) → Tuple[numpy.ndarray, numpy.ndarray][source]¶ Find the smallest entry and smallest indices of a row

velocyto.neighbors.
take_top
(matrix: scipy.sparse.base.spmatrix, n: int) → scipy.sparse.lil.lil_matrix[source]¶ Filter the top nearest neighbours from a sprse distance matrix

velocyto.neighbors.
convolve_by_sparse_weights
(data: numpy.ndarray, w: scipy.sparse.csr.csr_matrix) → numpy.ndarray[source]¶ Use the wights learned from knn to convolve any data matrix
NOTE: A improved implementation could detect wich one is sparse and wich kind of sparse and perform faster computation

velocyto.neighbors.
knn_smooth_weights
(matrix: numpy.ndarray, metric: str = 'euclidean', k_search: int = 20, k_mutual: int = 10, n_jobs: int = 10) → Tuple[scipy.sparse.base.spmatrix, scipy.sparse.csr.csr_matrix][source]¶ Find the weights to smooth the dataset using efficient sparse matrix operations
Parameters:  matrix – (genes, cells) expression matrix
 metric –
 k_search – int the first k nearest neighbour search number of neighbours
 k_mutual – int the number of mutual neighbours to select
 n_jobs –
 return_knn –
 Retruns
 weights (, knn)