velocyto.neighbors module¶
-
velocyto.neighbors.
balance_knn_loop
[source]¶ Fast and greedy algorythm to balance a K-NN graph so that no node is the NN to more than maxl other nodes
Parameters: - dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
- dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
- lsi (np.ndarray (samples,)) – degree of connectivity (l) sorted indexes
- maxl (int) – max degree of connectivity (from others to self) allowed
- k (int) – number of neighbours in the final graph
- return_distance (bool) – whether to return distance
Returns: - dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
- dist_new (np.ndarray (samples, k+1)) – distances to the NN
- l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i
-
velocyto.neighbors.
balance_knn_loop_constrained
[source]¶ Fast and greedy algorythm to balance a K-NN graph so that no node is the NN to more than maxl other nodes
Parameters: - dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
- dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
- lsi (np.ndarray (samples,)) – degree of connectivity (l) sorted indexes
- groups (np.ndarray (samples,)) – labels of the samples that constrain the connectivity
- maxl (int) – max degree of connectivity (from others to self) allowed
- k (int) – number of neighbours in the final graph
- return_distance (bool) – whether to return distance
Returns: - dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
- dist_new (np.ndarray (samples, k+1)) – distances to the NN
- l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i
-
velocyto.neighbors.
knn_balance
(dsi: numpy.ndarray, dist: numpy.ndarray = None, maxl: int = 200, k: int = 60, constraint: numpy.ndarray = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Balance a K-NN graph so that no node is the NN to more than maxl other nodes
Parameters: - dsi (np.ndarray (samples, K)) – distance sorted indexes (as returned by sklearn NN)
- dist (np.ndarray (samples, K)) – the actual distance corresponding to the sorted indexes
- maxl (int) – max degree of connectivity allowed
- k (int) – number of neighbours in the final graph
- constraint (np.ndarray (samples,)) – labels of the samples that constrain the connectivity
Returns: - dist_new (np.ndarray (samples, k+1)) – distances to the NN
- dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
- l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i
-
class
velocyto.neighbors.
BalancedKNN
(k: int = 50, sight_k: int = 100, maxl: int = 200, constraint: numpy.ndarray = None, mode: str = 'distance', metric: str = 'euclidean', n_jobs: int = 4)[source]¶ Bases:
object
Greedy algorythm to balance a K-nearest neighbour graph
It has an API similar to scikit-learn
Parameters: - k (int (default=50)) – the number of neighbours in the final graph
- sight_k (int (default=100)) – the number of neighbours in the initialization graph It correspondent to the farthest neighbour that a sample is allowed to connect to when no closest neighbours are allowed. If sight_k is reached then the matrix is filled with the sample itself
- maxl (int (default=200)) – max degree of connectivity allowed. Avoids the presence of hubs in the graph, it is the maximum number of neighbours that are allowed to contact a node before the node is blocked
- constraint (np.ndarray (default=None)) – a numpy array defining the dirrent groups within wich connectivity is allowed if “clusters” it uses the clusters as in self.clusters_ix
- mode (str (default="connectivity")) – decide wich kind of utput “distance” or “connectivity”
- n_jobs (int (default=4)) – parallelization of the standard KNN search preformed at initialization
-
n_samples
¶
-
fit
(data: numpy.ndarray, sight_k: int = None) → Any[source]¶ Fits the model
- data: np.ndarray (samples, features)
- np
- sight_k: int
- the farthest point that a node is allowed to connect to when its closest neighbours are not allowed
-
kneighbors
(X: numpy.ndarray = None, maxl: int = None, mode: str = 'distance') → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Finds the K-neighbors of a point.
Returns indices of and distances to the neighbors of each point.
Parameters: - X (array-like, shape (n_query, n_features),) – The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.
- maxl (int) – max degree of connectivity allowed
- mode ("distance" or "connectivity") – Decides the kind of output
Returns: - dist_new (np.ndarray (samples, k+1)) – distances to the NN
- dsi_new (np.ndarray (samples, k+1)) – indexes of the NN, first column is the sample itself
- l (np.ndarray (samples)) – l[i] is the number of connections from other samples to the sample i
- NOTE
- First column (0) correspond to the sample itself, the nearest nenigbour is at the second column (1)
-
kneighbors_graph
(X: numpy.ndarray = None, maxl: int = None, mode: str = 'distance') → scipy.sparse.csr.csr_matrix[source]¶ Retrun the K-neighbors graph as a sparse csr matrix
Parameters: - X (array-like, shape (n_query, n_features),) – The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.
- maxl (int) – max degree of connectivity allowed
- mode ("distance" or "connectivity") – Decides the kind of output
Returns: - neighbor_graph (scipy.sparse.csr_matrix) – The values are either distances or connectivity dependig of the mode parameter
- NOTE (The diagonal will be zero even though the value 0 is actually stored)
-
smooth_data
(data_to_smooth: numpy.ndarray, X: numpy.ndarray = None, maxl: int = None, mutual: bool = False, only_increase: bool = True) → numpy.ndarray[source]¶ Use the wights learned from knn to smooth any data matrix
Parameters: data_to_smooth ((features, samples) !! NOTE !! this is different from the input (for speed issues)) – if the data is provided (samples, features), this will be detected and the correct operation performed at cost of some effciency In the case where samples == samples then the shape (features, samples) will be assumed
-
velocyto.neighbors.
knn_distance_matrix
(data: numpy.ndarray, metric: str = None, k: int = 40, mode: str = 'connectivity', n_jobs: int = 4) → scipy.sparse.csr.csr_matrix[source]¶ Calculate a nearest neighbour distance matrix
Notice that k is meant as the actual number of neighbors NOT INCLUDING itself To achieve that we call kneighbors_graph with X = None
-
velocyto.neighbors.
make_mutual
(knn: scipy.sparse.csr.csr_matrix) → scipy.sparse.coo.coo_matrix[source]¶ Removes edges between neighbours that are not mutual
-
velocyto.neighbors.
connectivity_to_weights
(mknn: scipy.sparse.csr.csr_matrix, axis: int = 1) → scipy.sparse.lil.lil_matrix[source]¶ Convert a binary connectivity matrix to weights ready to be multiplied to smooth a data matrix
-
velocyto.neighbors.
min_n
(row_data: numpy.ndarray, row_indices: numpy.ndarray, n: int) → Tuple[numpy.ndarray, numpy.ndarray][source]¶ Find the smallest entry and smallest indices of a row
-
velocyto.neighbors.
take_top
(matrix: scipy.sparse.base.spmatrix, n: int) → scipy.sparse.lil.lil_matrix[source]¶ Filter the top nearest neighbours from a sprse distance matrix
-
velocyto.neighbors.
convolve_by_sparse_weights
(data: numpy.ndarray, w: scipy.sparse.csr.csr_matrix) → numpy.ndarray[source]¶ Use the wights learned from knn to convolve any data matrix
NOTE: A improved implementation could detect wich one is sparse and wich kind of sparse and perform faster computation
-
velocyto.neighbors.
knn_smooth_weights
(matrix: numpy.ndarray, metric: str = 'euclidean', k_search: int = 20, k_mutual: int = 10, n_jobs: int = 10) → Tuple[scipy.sparse.base.spmatrix, scipy.sparse.csr.csr_matrix][source]¶ Find the weights to smooth the dataset using efficient sparse matrix operations
Parameters: - matrix – (genes, cells) expression matrix
- metric –
- k_search – int the first k nearest neighbour search number of neighbours
- k_mutual – int the number of mutual neighbours to select
- n_jobs –
- return_knn –
- Retruns
- weights (, knn)