velocyto.analysis module¶

class
velocyto.analysis.
VelocytoLoom
(loom_filepath: str) → None[source]¶ Bases:
object
A convenient object to store the data of a velocyto loom file.
Data will be stored in memory
Examples
>>> import velocyto as vcy >>> vlm = vcy.VelocytoLoom("foo.loom") >>> vlm.normalize("S", size=True, log=True) >>> vlm.default_filter_and_norm() >>> vlm.default_fit_preparation() >>> vlm.fit_gammas() >>> vlm.predict_U() >>> vlm.calculate_velocity() >>> vlm.calculate_shift(assumption="constant_velocity") >>> vlm.extrapolate_cell_at_t(delta_t=1) >>> vlm.perform_TSNE() >>> vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts") >>> vlm.calculate_embedding_shift(sigma_corr = 0.05) >>> vlm.calculate_grid_arrows(smooth=0.8, steps=(40, 40), n_neighbors=300) >>> vlm.plot_grid_arrows(scatter_kwargs_dict={"alpha":0.35, "lw":0.35, "edgecolor":"0.4", "s":38, "rasterized":True}, min_mass=24, angles='xy', scale_units='xy', headaxislength=2.75, headlength=5, headwidth=4.8, quiver_scale=0.47)
Variables:  S (np.ndarray) – Expressed spliced molecules
 U (np.ndarray) – Unspliced molecule count
 A (np.ndarray) – Ambiguous molecule count
 ca (dict) – Column attributes of the loom file
 ra (dict) – Row attributes of the loom file
 loom_filepath (str) – The original path the loom files has been read from
 initial_cell_size (int) – The sum of spliced molecules
 initial_Ucell_size (int) – The sum of unspliced molecules

to_hdf5
(filename: str, **kwargs) → None[source]¶ Serialize the VelocytoLoom object in its current state
Parameters:  filename – The name of the file that will be generated (the suffix hdf5 is suggested but not enforced)
 **kwargs – The fucntion accepts the arguments of dump_hdf5
Returns: Return type: Nothing. It saves a file that can be used to recreate the object in another session.
Note
The object can be reloaded calling
load_velocyto_hdf5(filename)

plot_fractions
(save2file: str = None) → None[source]¶ Plots a barplot showing the abundance of spliced/unspliced molecules in the dataset
Parameters: save2file (str (default: None)) – If not None specifies the file path to wich plots get saved Returns: Return type: Nothing, it plots a barplot

filter_cells
(bool_array: numpy.ndarray) → None[source]¶ Filter cells using a boolean array.
Parameters: bool_array (np.ndarray (size )) – array describing the cells to keep (True). Returns: Return type: Nothing but it removes some cells from S and U.

set_clusters
(cluster_labels: numpy.ndarray, cluster_colors_dict: Dict[str, List[float]] = None, colormap: Any = None) → None[source]¶ Utility function to set cluster labels, names and colormap
Parameters:  cluster_labels (np.ndarray) – A vector of strings containing the name of the cluster for each cells
 cluster_colors_dict (dict[str, List[float]]) – A mapping cluster_name > rgb_color_triplette for example “StemCell”:[0.65,0.1,0.4]
 colormap – (optional) In alternative to cluster_colors_dict a colormap object (e.g. from matplotlib or similar callable) can be passed
Returns: Return type: Notghing, the attributes cluster_labels, colorandum, cluster_ix, cluster_uid are created.

cluster_uid
¶

cluster_ix
¶

score_cv_vs_mean
(N: int = 3000, min_expr_cells: int = 2, max_expr_avg: float = 20, svr_gamma: float = None, winsorize: bool = False, winsor_perc: Tuple[float, float] = (1, 99.5), plot: bool = False) → numpy.ndarray[source]¶ Rank genes on the basis of a CV vs mean fit, it uses a nonparametric fit (Support Vector Regression)
Parameters:  N (int) – the number to select
 min_expr_cells (int) – minimum number of cells that express that gene for it to be considered in the fit
 max_expr_avg (float) – The maximum average accepted before discarding from the the gene as housekeeping/outlier
 svr_gamma (float) – the gamma hyperparameter of the SVR
 winsorize (bool) – Wether to winsorize the data for the cv vs mean model
 winsor_perc (tuple, default=(1, 99.5)) – the up and lower bound of the winsorization
 plot (bool, default=False) – whether to show a plot
Returns:  Nothing but it creates the attributes
 cv_mean_score (np.ndarray) – How much the observed CV is higher than the one predicted by a noise model fit to the data
 cv_mean_selected (np.ndarray bool) – on the basis of the N parameter
 Note (genes excluded from the fit will have in the output the same score as the less noisy gene in the dataset.)
 To perform the filtering use the method filter_genes

score_cluster_expression
(min_avg_U: float = 0.02, min_avg_S: float = 0.08) → numpy.ndarray[source]¶ Prepare filtering genes on the basis of clusterwise expression threshold
Parameters:  min_avg_U (float) – Include genes that have unspliced average bigger than min_avg_U in at least one of the clusters
 min_avg_S (float) – Include genes that have spliced average bigger than min_avg_U in at least one of the clusters
 Note (the two conditions are combined by and "&" logical operator) –
Returns:  Nothing but it creates the attribute
 clu_avg_selected (np.ndarray bool) – The gene cluster that is selected
 To perform the filtering use the method filter_genes

score_detection_levels
(min_expr_counts: int = 50, min_cells_express: int = 20, min_expr_counts_U: int = 0, min_cells_express_U: int = 0) → numpy.ndarray[source]¶ Prepare basic filtering of genes on the basis of their detection levels
Parameters:  min_expr_counts (float) – The minimum number of spliced molecules detected considering all the cells
 min_cells_express (float) – The minimum number of cells that express spliced molecules of a gene
 min_expr_counts_U (float) – The minimum number of unspliced molecules detected considering all the cells
 min_cells_express_U (float) – The minimum number of cells that express unspliced molecules of a gene
 Note (the conditions are combined by and "&" logical operator) –
Returns:  Nothing but an attribute self.detection_level_selected is created
 To perform the filtering by detection levels use the method filter_genes

filter_genes
(by_detection_levels: bool = False, by_cluster_expression: bool = False, by_cv_vs_mean: bool = False, by_custom_array: Any = None, keep_unfiltered: bool = False) → None[source]¶ Filter genes taking care that all the matrixes and all the connected annotation get filtered accordingly
Attributes affected: .U, .S, .ra
Parameters:  by_detection_levels (bool, default=False) – filter genes by the score_detection_levels result
 by_cluster_expression (bool, default=False) – filter genes by the score_cluster_expression result
 by_cv_vs_mean (bool, default=False) – filter genes by the score_cluster_expression result
 np.ndarray, default=None (by_custom_array,) – provide a boolean or index array
 keep_unfiltered (bool, default=False) – whether to create attributes self.S_prefilter, self.U_prefilter, self.ra_prefilter, (array will be made sparse to minimize memory footprint) or just overwrite the previous arrays
Returns: Return type: Nothing but it updates the self.S, self.U, self.ra attributes

custom_filter_attributes
(attr_names: List[str], bool_filter: numpy.ndarray) → None[source]¶ Filter attributes given a boolean array. attr_names can be dictionaries or numpy arrays
Parameters:  attr_names (List[str]) – a list of the attributes to be modified. The can be 1d arrays, dictionary of 1d arrays, ndarrays, will be filtered by axis=0 if .T is specified by axis=1
 bool_filter – the boolean filter to be applied
Returns: Return type: Nothing it filters the specified attributes

normalize
(which: str = 'both', size: bool = True, log: bool = True, pcount: float = 1, relative_size: numpy.ndarray = None, use_S_size_for_U: bool = False, target_size: Tuple[float, float] = (None, None)) → None[source]¶ Normalization interface
Parameters:  which (either 'both', 'S', 'U', "imputed", "Sx", "Ux") – which attributes to normalize. “both” corresponds to “S” and “U” “imputed” corresponds to “Sx” and “Ux”
 size (bool) – perform size normalization
 log (bool) – perform log normalization (if size==True, this comes after the size normalization)
 pcount (int, default: 1) – The extra count added when logging (log2)
 relative_size (np.ndarray, default=None) – if None it calculate the sums the molecules per cell (self.S.sum(0)) if an array is provided it use it for the normalization
 use_S_size_for_U (bool) – U is size normalized using the sum of molecules of S
 target_size (float or Tuple[float, float] (depending if the which parameter implies 1 or more normalizations)) – the size of the cells after normalization will be set to. If tuple the order is (S, U) or (Sx, Ux) If None the target size is the average of the cell sizes
Returns:  Nothing but creates the attributes U_norm, U_sz and S_norm, “S_sz”
 or Ux_norm, Ux_sz and Sx_norm, “Sx_sz”

perform_PCA
(which: str = 'S_norm', n_components: int = None, div_by_std: bool = False) → None[source]¶ Perform PCA (cells as samples)
Parameters:  which (str, default="S_norm") – The name of the attribute to use for the calculation (e.g. S_norm or Sx_norm)
 n_components (int, default=None) – Number of components to keep. If None all the components will be kept.
 div_by_std (bool, default=False) – Wether to divide by standard deviation
Returns:  Returns nothing but it creates the attributes
 pca (np.ndarray) – a numpy array of shape (cells, npcs)

normalize_by_total
(min_perc_U: float = 0.5, plot: bool = False, skip_low_U_pop: bool = True) → None[source]¶ Normalize the cells using the (initial) total molecules as size estimate
Parameters:  min_perc_U (float) – the percentile to use as a minimum value allowed for the size normalization
 plot (bool, default=False) – whether
 skip_low_U_pop (bool, default=True) –
Returns:  Returns nothing but it creates the attributes
 small_U_pop (np.ndarray) – Cells with extremely low unspliced count

adjust_totS_totU
(skip_low_U_pop: bool = True, normalize_total: bool = False, fit_with_low_U: bool = True, svr_C: float = 100, svr_gamma: float = 1e06, plot: bool = False) → None[source]¶ Adjust the spliced count on the base of the relation S_sz_tot and U_sz_tot
Parameters:  skip_low_U_pop (bool, default=True) – Do not normalize the low unspliced molecules cell population to avoid overinflated values
 normalize_total (bool, default=False) – If this is True the function results in a normalization by median of both U and S. NOTE: Legacy compatibility, I might want to split this into a different function.
 fit_with_low_U (bool, default=True) – Wether to consider the low_U population for the fit
 svr_C (float) – The C parameter of scikitlearn Support Vector Regression
 svr_gamma (float) – The gamma parameter of scikitlearn Support Vector Regression
 plot (bool) – Whether to plot the results of the fit
Returns:  Nothing but it modifies the attributes
 U_sz (np.ndarray)

normalize_median
(which: str = 'imputed', skip_low_U_pop: bool = True) → None[source]¶ Normalize cell size to the median, for both S and U.
Parameters:  which (str, default="imputed") – “imputed” or “renormalized”
 skip_low_U_pop (bool=True) – Whether to skip the low U population defined in normalize_by_total
Returns:  Nothing but it modifies the attributes
 S_sz (np.ndarray)
 U_sz (np.ndarray)
 or
 Sx_sz (np.ndarray)
 Ux_sz (np.ndarray)

plot_pca
(dim: List[int] = [0, 1, 2], elev: float = 60, azim: float = 140) → None[source]¶ Plot 3d PCA

plot_pca_imputed
(dim: List[int] = [0, 1, 2], elev: float = 60, azim: float = 140) → None[source]¶ Plot 3d PCA of the smoothed data

knn_imputation
(k: int = None, pca_space: float = True, metric: str = 'euclidean', diag: float = 1, n_pca_dims: int = None, maximum: bool = False, size_norm: bool = True, balanced: bool = False, b_sight: int = None, b_maxl: int = None, n_jobs: int = 8) → None[source]¶ Performs knn smoothing of the data matrix
Parameters:  k (int) – number of neighbors. If None the default it is chosen to be 0.025 * Ncells
 pca_space (bool, default=True) – if True the knn will be performed in PCA space (pcs) otherwise it will use log2 size normalized data (S_norm)
 metric (str) – “euclidean” or “correlation”
 diag (int, default=1) – before smoothing this value is substituted in the diagonal of the knn contiguity matrix Resulting in a reduction of the smoothing effect. E.g. if diag=8 and k=10 value of Si = (8 * S_i + sum(S_n, with n in 5nn of i)) / (8+5)
 maximum (bool, default=False) – If True the maximum value of the smoothing and the original matrix entry is taken.
 n_pca_dims (int, default=None) – number of pca to use for the knn distance metric. If None all pcs will be used. (used only if pca_space == True)
 balanced (bool) – whether to use BalancedKNN version
 b_sight (int) – the sight parameter of BalancedKNN (used only if balanced == True)
 b_maxl (int) – the maxl parameter of BalancedKNN (used only if balanced == True)
 n_jobs (int, default 8) – number of parallel jobs in knn calculation
Returns:  Nothing but it creates the attributes
 knn (scipy.sparse.csr_matrix) – knn contiguity matrix
 knn_smoothig_w (scipy.sparse.lil_matrix) – the weights used for the smoothing
 Sx (np.ndarray) – smoothed spliced
 Ux (np.ndarray) – smoothed unspliced

knn_imputation_precomputed
(knn_smoothig_w: scipy.sparse.lil.lil_matrix, maximum: bool = False) → None[source]¶ Performs knn imputation (like .knn_imputation()) but with a precomputed weight matrix
Parameters:  knn_smoothig_w (sparse.lil_matrix) – the sparse matrix to be convolved with self.S_sz and self.U_sz This should be the result of somethign like: connectivity.setdiag(diagonal_value) knn_smoothig_w = connectivity_to_weights(connectivity)
 maximum (bool, default=False) – whether to take the maximum value of the smoothing and the original matrix
Returns:  Nothing but it creates the attributes
 Sx (np.ndarray) – smoothed spliced
 Ux (np.ndarray) – smoothed unspliced

gene_knn_imputation
(k: int = 15, pca_space: float = False, metric: str = 'correlation', diag: float = 1, scale_weights: bool = True, balanced: bool = True, b_sight: int = 100, b_maxl: int = 18, n_jobs: int = 8) → None[source]¶ Performs genes knn smoothing of the genes
Parameters:  k (int, default=15) – number of neighbors
 pca_space (bool, default=False) – if True the knn will be performed in PCA space (pcs) otherwise it will use log2 size normalized data (S_norm)
 metric (str, default="correlation") – “euclidean” or “correlation”
 diag (int, default=1) – before smoothing this value is substituted in the diagonal of the knn contiguity matrix Resulting in a reduction of the smoothing effect E.g. if diag=8 and k=10 value of Si = (8 * S_i + sum(S_n, with n in 5nn of i)) / (8+5)
 scale_weights (bool, default=True) – whether to scale weights by gene total expression/yield
 balanced (bool, default=True) – whether to use BalancedKNN version
 b_sight (int, default=100) – the sight parameter of BalancedKNN (used only if balanced == True)
 b_maxl (int, default=18) – the maxl parameter of BalancedKNN (used only if balanced == True)
 n_jobs (int, default=8) – number of parallel jobs in knn calculation
Returns:  Nothing but it creates the attributes
 gknn (scipy.sparse.csr_matrix) – genes knn contiguity matrix
 gknn_smoothig_w (scipy.sparse.lil_matrix) – the weights used for the smoothing of the genes
 Sx (np.ndarray) – smoothed spliced
 Ux (np.ndarray) – smoothed unspliced

fit_gammas
(steady_state_bool: numpy.ndarray = None, use_imputed_data: bool = True, use_size_norm: bool = True, fit_offset: bool = True, fixperc_q: bool = False, weighted: bool = True, weights: numpy.ndarray = 'maxmin_diag', limit_gamma: bool = True, maxmin_perc: List[float] = [2, 98], maxmin_weighted_pow: float = 15) → None[source]¶ Fit gamma using spliced and unspliced data
Parameters:  steady_state_bool (np.ndarray, default=None) – if a boolean array is specified, gamma is fitted using only the corresponding cells
 use_imputed_data (bool, default=True) – use knn smoothed data
 use_size_norm (bool, default=False) – use size normalized data for the fit
 fit_offset (bool, default=True) – Fit with offset
 fixperc_q (bool, default=False) – (when fit_offset==False) Wether to fix the offset to a lower percentile of the unspliced
 weighted (bool, default=True) – use weights for the least squares fit
 weights (string or np.ndarray, default="maxmin_diag") – the method to determine the weights of the least squares fit. “maxmin_diag”, “maxmin”, “sum”, “prod”, “maxmin_weighted” are supported if a 2d np.ndarray is provided the entry (i,j) is the weight of the cell j when fitting gamma to gene i
 limit_gamma (np.ndarray, default=True) – whether to limit gamma when unspliced is much higher than spliced
 maxmin_perc (List[flaot], default=[2,98]) – the percentile to use if weights = “maxmin” or “maxmin_diag”
Returns:  Nothing it just creates the attributes
 gammas (np.ndarray) – the vector of the gammas fit to each gene
 q (np.ndarray) – the vector of offsets of the fit
 R2 (np.ndarray (optional)) – The vector of squared coefficient of determination

filter_genes_good_fit
(minR: float = 0.1, min_gamma: float = 0.01) → None[source]¶ Use the coefficient of determination to filter away genes that have an irregular/complex phase portrait
Parameters:  minR (float, default=0.1) – Minimum coefficient of determination allowed
 min_gamma (float, default=0.01) – Filter away low gammas as a secondary filter
Returns:  Nothing but modifies it filters out the genes that do not satisfy the conditions
 This affects (“U”, “U_sz”, “U_norm”, “Ux”, “Ux_sz”, “Ux_norm”, “S”, “S_sz”, “S_norm”, “Sx”, “Sx_sz”, “Sx_norm”, “gammas”, “q”, “R2”)

predict_U
(which_gamma: str = 'gammas', which_S: str = 'Sx_sz', which_offset: str = 'q') → None[source]¶ Predict U (gamma * S) given the gamma model fit
Parameters:  which_gamma (str, default="gammas") – name of the attribute to use as gamma
 which_S (str, default="Sx_sz") – name of the attribute to use as S
 which_offset (str, default="q") – name of the attribute containing the offset
Returns:  Noting but it creates the attribute
 Upred (np.ndarray) – unspliced esimated as gamma * S

calculate_velocity
(kind: str = 'residual') → None[source]¶ Calculate velocity
Parameters:  kind (str, default="residual") – “residual” calculates the velocity as U_measured  U_predicted
 Results –
  –
 but it creates the attribute (Nothing) –
 velocity (np.ndarray) – U_measured  U_predicted

calculate_shift
(assumption: str = 'constant_velocity', kind: str = 'difference', delta_t: float = 1, min_S: float = 0.0001) → None[source]¶ Find the change (deltaS) in gene expression for every cell
Parameters:  assumption (str, default="constant_velocity") – constant_velocity (described in the paper as Model I) constant_unspliced (described in the paper as Model II)
 delta_t (float, default=1) – the time step for extrapolation
 kind (str, default="difference") – For now only difference is supported. (DEPRECATED)
 min_S (float, default=min_S) – constant small value added in case of log2_ratio estimation. (DEPRECATED)
Returns:  Nothing it only creates the following attributes
 delta_S (np.ndarray) – The variation in gene expression

extrapolate_cell_at_t
(delta_t: float = 1, clip: bool = True) → None[source]¶ Extrapolate the gene expression profile for each cell after delta_t
Parameters:  delta_t (float, default=1) – the time step considered for the extrapolation
 clip (bool, default=True) – If True negative values are clipped to zero
Returns:  Nothing but it creates the attributes
 Sx_sz_t (np.ndarray) – the extrapolated expression profile
 used_delta_t (float) – stores delta_t for future usage

perform_TSNE
(n_dims: int = 2, perplexity: float = 30, initial_pos: numpy.ndarray = None, theta: float = 0.5, n_pca_dim: int = None, max_iter: int = 1000) → None[source]¶ Perform TSNE on the PCA using barnes hut approximation

estimate_transition_prob
(hidim: str = 'Sx_sz', embed: str = 'ts', transform: str = 'sqrt', ndims: int = None, n_neighbors: int = None, psc: float = 1.0, knn_random: bool = False, sampled_fraction: float = 0.3, delta_kind: str = 'clipped', sampling_pobs: Tuple[float, float] = (0.5, 0.1), max_dist_embed: float = None, n_jobs: int = 4, threads: int = None, random_seed: int = 15071990) → None[source]¶ Use correlation to estimate transition probabilities for every cells to its embedding neighborhood
Parameters:  hidim (str, dafault="Sx_sz") – The name of the attribute containing the high dimensional space. It will be retrieved as getattr(self, hidim) The updated vector at time t is assumed to be getattr(self, hidim + “_t”) Appending .T to the string will transpose the matrix (usefull in case we want to use S or Sx)
 embed (str, dafault="ts") – The name of the attribute containing the embedding. It will be retrieved as getattr(self, embed)
 transform (str, default="sqrt") – The transformation that is applies on the high dimensional space. If None the raw data will be used
 ndims (int, default=None) – The number of dimensions of the high dimensional space to work with. If None all will be considered It makes sense only when using principal components
 n_neighbors (int, default=None) – The number of neighbours to take into account
 psc (float, default=1) – pseudocount added in variance normalizing tranform
 knn_random (bool, default=True) – whether to random sample the neighboroods to speedup calculation
 delta_kind (clipped, unclipped, residual) – whether to clip the delta
 sampling_pobs (Tuple, default=(0.5, 1)) –
 max_dist_embed (float, default=None) – CURRENTLY NOT USED The maximum distance allowed If None it will be set to 0.25 * average_distance_two_points_taken_at_random
 n_jobs (int, default=4) – number of jobs to calkulate knn this only applies to the knn search, for the more time consuming correlation computation see threads
 threads (int, default=None) – The threads will be used for the actual correlation computation by default half of the total.
 random_seed (int, default=15071990) – Random seed to make knn_random mode reproducible

calculate_embedding_shift
(sigma_corr: float = 0.05) → None[source]¶ Use the transition probability to project the velocity direction on the embedding
Parameters: sigma_corr (float, default=0.05) – the kernel scaling Returns:  Nothing but it creates the following attributes
 transition_prob (np.ndarray) – the transition probability calculated using the exponential kernel on the correlation coefficient
 delta_embedding (np.ndarray) – The resulting vector

calculate_grid_arrows
(embed: str = 'embedding', smooth: float = 0.5, steps: Tuple = (40, 40), n_neighbors: int = 100, n_jobs: int = 4) → None[source]¶ Calculate the velocity using a points on a regular grid and a gaussian kernel
Note: the function should work also for ndimensional grid
Parameters:  embed (str, default=embedding) – The name of the attribute containing the embedding. It will be retrieved as getattr(self, embed) The updated vector at time t is assumed to be getattr(self, embed + ‘_t’) Or the difference vector is getattr(self, ‘delta’ + ‘_’ + embed)
 smooth (float, smooth=0.5) – Higher value correspond to taking in consideration further points the standard deviation of the gaussian kernel is smooth * stepsize
 steps (tuple, default) – the number of steps in the grid for each axis
 n_neighbors – number of neighbors to use in the calculation, bigger number should not change too much the results.. …as soon as smooth is small Higher value correspond to slower execution time
 n_jobs – number of processes for parallel computing
Returns:  Nothing but it sets the attributes
 flow_embedding (np.ndarray) – the coordinates of the embedding
 flow_grid (np.ndarray) – the gridpoints
 flow (np.ndarray) – vector field coordinates
 flow_magnitude (np.ndarray) – magnitude of each vector on the grid
 total_p_mass (np.ndarray) – density at each point of the grid

prepare_markov
(sigma_D: numpy.ndarray, sigma_W: numpy.ndarray, direction: str = 'forward', cells_ixs: numpy.ndarray = None) → None[source]¶ Prepare a transition probability for Markov process
Parameters:  sigma_D (float) – the standard deviation used on the localitylimiting component
 sigma_W (float) – the standard deviation used on the noise component
 direction (str, default="backwards") – whether to diffuse forward of backwards
 cells_ixs (np.ndarray, default=None) – Cells to use, if None all the cells will be considered.
Returns:  Nothing but it creates the following attributes
 tr (np.ndarray) – the transition probability matrix

run_markov
(starting_p: numpy.ndarray = None, n_steps: int = 2500, mode: str = 'time_evolution') → None[source]¶ Run a Markov process
Parameters:  starting_p (np.ndarray, default=None) – sepcifies the starting density if None is passed an array of 1/self.tr.shape[0] will be created
 n_steps (np.ndarray, default=2500) – Numbers of steps to be performed
 mode (str, default="time_evolution") – this argument is passed to the Diffusion.diffuse call
Returns:  Nothing but it creates the attribute
 diffused (np.ndarray) – The probability to be found at any of the states

default_filter_and_norm
(min_expr_counts: int = None, min_cells_express: int = None, N: int = None, min_avg_U: float = None, min_avg_S: float = None) → None[source]¶ Useful function to get started with velocyto: it performs initial filtering and feature selection, it uses some heuristics to determine the thresholds, results might be suboptimal.
See the analysis quick start guide for further info.
Parameters:  min_expr_counts (int, default=None) – filtering condition: the minimum spliced counts
 min_cells_express (int, default=None) – filtering condition: the minimum number of cells expressing the gene
 N (int, default=None) – number of genes selected by the feature selection procedure
 min_avg_U (float, default=None) – if cluster have been specified beforehand (using the function set_clusters) then this is the minimum average unspliced molecules per cluster
 min_avg_S (float, default=None) – if cluster have been specified beforehand (using the function set_clusters) then this is the minimum average spliced molecules per cluster

default_fit_preparation
(k: int = None, n_comps: int = None) → None[source]¶ Useful function to get started with velocyto: it performs PCA and kNN smoothing, it uses some heuristics to determine the parameters, results might be suboptimal.
See the analysis quick start guide for further info.
Parameters:  k (int, default=None) – k in kNearestNeighbours smoothing
 n_comps (int, default=None) – numbed of components in pca

plot_phase_portraits
(genes: List[str]) → None[source]¶ Plot splicedunspliced scatterplots resembling phase portraits
Parameters: genes (List[str]) – A list of gene symbols.

plot_grid_arrows
(quiver_scale: float = None, min_mass: float = 1, min_magnitude: float = None, scatter_kwargs_dict: Dict = None, plot_dots: bool = False, **quiver_kwargs) → None[source]¶ Plots vector field averaging velocity vectors on a grid
Parameters:  quiver_scale (float, dafault=None) – Rescaling factor applied to the arrow field to enhance visibility
 min_mass (float, default=1) – the minimum density around a grid point for it to be considered
 min_magnitude (float, default=None) – the minimum magnitude of the velocity to plot
 scatter_kwargs_dict (dict, default=None) – the keyword arguments to pass to scatter
 plot_dots (bool, default= True) – whether to plot the cell symbols
 **quiver_kwargs (dict) – keyword arguments to pass to quiver

plot_arrows_embedding
(choice: int = 1000, plot_scatter: bool = False, color_arrow: str = 'cluster', quiver_scale: float = None, new_fig: bool = False, **quiver_kwargs) → None[source]¶ Plots the results of the arrows_from_hidim_to_embedding
Parameters:  choice (int, default = 1000) – the number of cells to randomly pick to plot the arrows
 plot_scatter (bool, default = False) – whether to plot the points
 color_arrow (str, default = "cluster) – the color of the arrows, if “cluster” the arrows are colored the same as the cluster
 epsilon (float, default = None) – the minimal size of the arrow that can be plotted before it becomes invisible
 quiver_scale (float) – scale argument passed to the matplotlib quiver function
Returns: Return type: Nothing, just plots the tsne with arrows

plot_cell_transitions
(cell_ix: int = 0, alpha: float = 0.1, alpha_neigh: float = 0.2, cmap_name: str = 'RdBu_r', plot_arrow: bool = True, mark_cell: bool = True, head_width: int = 3) → None[source]¶ Plot the probability of a cell to transition to any other cell
This function is untested

plot_velocity_as_color
(gene_name: str = None, cmap: Any = <matplotlib.colors.LinearSegmentedColormap object>, gs: Any = None, which_tsne: str = 'ts', **kwargs) → None[source]¶ Plot velocity as color on the Tsne embedding
Parameters:  gene_name (str) – The name of the gene, should be present in self.S
 cmap (maplotlib.cm.Colormap, dafault=maplotlib.cm.RdBu_r) – Colormap to use, devergent are better, RdBu_r is default Notice that 0 will be always set as the center of the colormap. (e.g. white in RdBu_r)
 gs (Gridspec subplot) – Gridspec subplot to plot on.
 which_tsne (str, default="ts") – the name of the attributed where the desired embedding is stored
 **kwargs (dict) – other keywords arguments will be passed to the plt.scatter call
Returns: Return type: Nothing

plot_expression_as_color
(gene_name: str = None, imputed: bool = True, cmap: Any = <matplotlib.colors.LinearSegmentedColormap object>, gs: Any = None, which_tsne: str = 'ts', **kwargs) → None[source]¶ Plot expression as color on the Tsne embedding
Parameters:  gene_name (str) – The name of the gene, should be present in self.S
 imputed (bool, default=True) – whether to plot the smoothed or the raw data
 cmap (maplotlib.cm.Colormap, dafault=maplotlib.cm.Greens) – Colormap to use.
 gs (Gridspec subplot) – Gridspec subplot to plot on.
 which_tsne (str, default="ts") – the name of the attributed where the desired embedding is stored
 **kwargs (dict) – other keywords arguments will be passed to the plt.scatter call
Returns: Return type: Nothing

velocyto.analysis.
scatter_viz
(x: numpy.ndarray, y: numpy.ndarray, *args, **kwargs) → Any[source]¶ A wrapper of scatter plot that guarantees that every point is visible in a very crowded scatterplot
Parameters:  x (np.ndarray) – x axis coordiantes
 y (np.ndarray) – y axis coordiantes
 and kwargs (args) – positional and keyword arguments as in matplotplib.pyplot.scatter
 Retruns –
  –
 the graph and returns the axes object (Plots) –

velocyto.analysis.
ixs_thatsort_a2b
(a: numpy.ndarray, b: numpy.ndarray, check_content: bool = True) → numpy.ndarray[source]¶ This is super duper magic sauce to make the order of one list to be like another

velocyto.analysis.
scale_to_match_median
(sparse_matrix: scipy.sparse.csr.csr_matrix, genes_total: numpy.ndarray) → scipy.sparse.csr.csr_matrix[source]¶ Normalize contribution of different neighbor genes to match the median totals
Parameters:  sparse_matrix (sparse.csr_matrix) – weights matrix
 genes_total (sparse.csr_matrix shape=(sparse_matrix.shape[0])) – array of the total molecules detected for each gene
Returns:  knn_weights (sparse.csr_matrix) – sparse_matrix after the normalization
 # NOTE, since the use I made of this later I could have changed sparse_matrix in place

velocyto.analysis.
gaussian_kernel
(X: numpy.ndarray, mu: float = 0, sigma: float = 1) → numpy.ndarray[source]¶ Compute gaussian kernel

velocyto.analysis.
load_velocyto_hdf5
(filename: str) → velocyto.analysis.VelocytoLoom[source]¶ Loads a Velocyto loom object from an hdf5 file
Parameters: filename (str) – The name of the serialized file Returns: Return type: A VelocytoLoom object Note
The hdf5 file must have been created using
VelocytoLoom.to_hdf5
or thedump_hdf5
function