Algorithms

At the moment, the following algorithms are supported.

MOTMLE (Fits the MOT number from CMOS images)
Peak (Displays a peak in the SSD pulse data)
PeakFinder (Finds peaks in the SSD pulse data)

class data_eng_utokyo.algorithms.MOTMLE(c, references: list, do_subtract_dead_pixels: bool = True, dead_pixel_percentile: float = 5.0)[source]

Bases: object

Applies Maximum Likelihood Estimation to extract the MOT number from an image.

Parameters:

c – Lookup for the constants
references (list[str]) – List of files (images) which are to be used as reference for subtracting dead pixels.
do_subtract_dead_pixels (bool) – Should we guess and subtract the dead pixels before the fitting and plotting.
dead_pixels_percentile (float) – Guess of the fraction of dead pixels in the image.
dead_pixel_percentile (float)

Example

from data_eng_utokyo.analysis import MOTMLE

mot_mle = MOTMLE(c=c_ccd, references=[], do_subtract_dead_pixels=False)
perform_analysis = mot_mle.perform_analysis
perform_analysis(
    source="path_to_image_file.xlsx",
    target="visualization.png",
    mode="mot number",
    min_signal=0,
    time="1st of January 2000 at 1 p.m."
)

c[source]: Lookup for the constants

references[source]

List of files (images) which are to be used as reference for subtracting dead pixels.

Type:: list[str]

do_subtract_dead_pixels[source]

Should we guess and subtract the dead pixels before the fitting and plotting.

Type:: bool

dead_pixels_percentile[source]

Guess of the fraction of dead pixels in the image.

Type:: float

self.dead_pixels[source]

Array with the dead pixels and their mean value.

Type:: np.array

self.dead_pixel_sum[source]

Sum of the values of the dead pixels.

Type:: int

perform_analysis(source: str, target: str, mode: str, min_signal: int = 0, time: str = 'unknown time')[source]

Executes the fitting for a single image.

Loads the image data, fits a 2D gaussian model on it, generates a plot of the original data and a fit, saves the plot, and returns the statistics of the fit.

Source is the filepath of the original data and target is the filepath of the plot. The mode can be either ‘power’ or ‘mot number’. If the total sum of the df is less than min_signal, then we terminate the analysis.

Parameters:

source (str) – Filepath of the image file.
target (str) – Filepath of the plot we want to create.
mode (str) – Either ‘power’ or ‘mot number’, depending on what observable we want to fit.
min_signal (int) – Threshold, when the sum of the image is less than this, then we skip the image.
time (str) – Time at which the image was taken, will be added to the plot.

Returns:

Lookup of the results. Contains at least the keys “fit_successful”, “total_sum”,: ”enough_pulses”, and more when the fit is successful.

Return type:

statistics (dict)

_load(source: str) → DataFrame[source]

Load the pandas dataframe and the right constants.

Parameters:: source (str) – Filepath to image.
Returns:: The image as pandas dataframe.
Return type:: DataFrame

_df_to_array(df: DataFrame) → array[source]

Takes the image as df and returns it as np.array.

Parameters:: df (pd.DataFrame) – The dataframe representing the image.
Returns:: Image as np.array
Return type:: array

_precalculate_dead_pixels()[source]

Calculates a heuristic for finding the dead pixels.

Takes a list of reference images, and finds the dead pixels by calculating the ratio standard deviation / max(average, 1) of the same pixel across the reference images. The dead pixels are the ones with the lowest std. Sets the member variables that are later used in the method _subtract_dead_pixels().

The assumption is that dead pixels have high median of this value. So dead pixels are the ones with the smallest ratio 1 / max(median, 1).

Stores the array that represents the mean value that the dead pixels have, with the healthy pixels set to zero.

_plot_dead_pixels(signal_mean, ratio, signal_std, dead_pixels)[source]

Create heatmaps of the signal mean, ratio, std and estimated dead pixels.

Parameters:

signal_mean (np.array) – Mean calculated signal as given by _precalculate_dead_pixels() method.
ratio (np.array) – See _precalculate_dead_pixels() method.
signal_std (np.array) – See _precalculate_dead_pixels() method.
dead_pixels (np.array) – See _precalculate_dead_pixels() method.

_subtract_dead_pixels(data: dict)[source]

Subtracts the values of the dead pixels from the z-values of the data.

Replaces the value with 0 if they become negative. Modifies the argument of the method, as dicts are passed by reference.

Parameters:: data (dict) – Lookup of the data with keys x, y, z and arrays are values.

_preprocess(df: DataFrame, mode: str) → dict[source]

Takes the image data as pandas dataframe and converts into numpy arrays. Converts the unit of the z-axis.

The conversion of the z axis is based on the setup constants and the mode.

Parameters:

df (pd.DataFrame) – The dataframe representing the image.
mode (str) – Either ‘power’ or ‘mot number’, depending on what observable we want to fit.

Returns:

Lookup representing the data with keys x, y, and z. Values are np.arrays.

Return type:

dict

_get_scaling_factor(mode: str) → float[source]

Loads a physical scaling parameter depending on the mode.

The unit of the z axis can be converted using a scaling factor. This function returns the scaling factor, which can be determined from the mode, which is either ‘power’ or ‘mot number’.

Returns:: The scaling factor as float.
Parameters:: mode (str)
Return type:: float

_fitting(model: callable, data: dict, mode: str)[source]

Fits the model to the data.

Parameters:

model (callable) – Model to be fitted.
data (dict) – Lookup of the data.
mode (str) – Either ‘power’ or ‘mot number’. Changes the initial guess of the fitting procedure.

Returns:

Lookup with the statistics of the fit.

_extract_statistics(r_squared, chi2, popt, pcov, perr, signal_sum)[source]

Convert the fit results to a convenient lookup.

Parameters:

r_squared
chi2
popt (np.array) – Optimal parameters.
pcov (np.array) – Covariance matrix of the optimal fit parameters. Used to get the uncertainty of the fit.
perr
signal_sum – Sum of the array to be fitted.

_get_initial_guess(data: dict, mode: str)[source]

Proposes the initial guesses for the fitting parameters with heuristics.

Parameters:

data (dict) – Data as lookup table.
mode (str) – Either ‘power’ or ‘mot number’. Changes the initial guess of the fitting procedure.

_generate_fit_data(model: callable, data: dict, statistics: dict, df: DataFrame)[source]

Takes the x, y values of the data and the fit parameter, and returns fitted z values.

Does this on a x, y grid in the same format as the data.

Args:
model (callable): Model to be fitted. data (dict): Lookup of the data. statistics (dict): Lookup of the statistics of the fit result.

Returns:

Fit data as lookup in the same format as the original data. Has three keys x, y, z, corresponding to the coordinate x, y, and the fitted z value, respectively.

Parameters:

model (callable)
data (dict)
statistics (dict)
df (DataFrame)

_plot_fit_result(data: dict, fit_data: dict, target: str, mode: str, time: str)[source]

Plots the 3d data and the fit. Saves the image to the url.

Parameters:

data (dict) – Original data.
fit_data (dict) – Fitted data.
target (str) – Filename of the plot which is to be created and saved.
mode (str) – Either ‘power’ or ‘mot number’. Changes the initial guess of the fitting procedure.
time (str) – Time when the image was taken. Is added to the title of the plot.

_plot_heatmap(data: dict, fit_data, target: str, mode: str, time: str, df: DataFrame)[source]

Plots the 3d data and the fit as heatmap. Saves the image to the url.

Parameters:

data (dict) – Original data.
fit_data (dict) – Fitted data.
target (str) – Filename of the plot which is to be created and saved.
mode (str) – Either ‘power’ or ‘mot number’. Changes the initial guess of the fitting procedure.
time (str) – Time when the image was taken. Is added to the title of the plot.
df (DataFrame)

_print_stats(statistics: dict)[source]

Prints the fit statistics.

Parameters:: statistics (dict) – Statistics of the fit result.

class data_eng_utokyo.algorithms.Peak(timestamp: int, events: list, background: float)[source]

Bases: object

Represents a peak of the SSD data and can perform analysis on itself.

Parameters:

timestamp (int) – Time of the peak.
events (list) – List of pulses making up the peak.
background (float) – Pulse rate [1/s] representing the background.

timestamp[source]

Time of the peak.

Type:: int

events[source]

List of pulses making up the peak.

Type:: list

background[source]

Pulse rate [1/s] representing the background.

Type:: float

pulses[source]

Number of pulses making up the peak.

Type:: int

pulses_background[source]

Expected number of background pulses in the time interval provided by the events.

Type:: int

pulses_peak[source]

Excess of pulses. How many more pulses do we see as expected by the background.

Type:: int

half_life_time[source]

Estimated half-life time as given in ns.

Type:: int

estimate()[source]

Calculates the half-life time with MLE.

The result can be derived with MLE by adapting the reasoning from here: https://math.stackexchange.com/questions/101481/calculating-maximum-likelihood-estimation-of-the-exponential-distribution-and-pr

plot(url: str)[source]

Visualizes the data and fit and saves the image to the url.

Parameters:: url (str) – Name of the file to which the plot should be saved.

_ts_ns_to_timestamp(ts_list: list)[source]

Takes a list of timestamps in ns and returns a list of datetimes.

Parameters:

ts_List (list[int]) – List of timestamps in ns.
ts_list (list)

Returns:

A list of timestamps.

as_dataframe() → DataFrame[source]

Return the pulse as pandas dataframe.

Return type:: DataFrame

class data_eng_utokyo.algorithms.PeakFinder(recorder, plot_filename: str = '')[source]

Bases: object

Class for finding peaks in the SSD data, which come from the release of many atoms by heating the Yttrium.

Parameters:: plot_filename (str)

get_new_peaks(df) → list[source]

Loads the new data, estimates the background, finds the peaks and returns them.

Return type:: list

_get_new_data(df) → DataFrame[source]

Returns the part of the data which has not been processed yet.

Return type:: DataFrame

_calculate_new_background(new_df: DataFrame)[source]

Update the estimation of the background based on the new data.

Parameters:: new_df (DataFrame)

_find_peaks(new_df: DataFrame) → list[source]

Uses a sliding window approach to find times (peaks) in which many signals were recorded. Returns the peaks as a list of timestamps.

Parameters:: new_df (DataFrame)
Return type:: list

_find_peaks_in_1d_array(array: list, timestamps: list)[source]

Takes the array and finds local maxima which have at least a certain time difference.

Parameters:

array (list)
timestamps (list)

_find_maximum(df) → float[source]

Generates a histogram of the timestamps with 50 bins. Returns the timestamp of the left side of the bin with the highest count.

Return type:: float

_generate_peaks(df: DataFrame, peak_timestamps: list) → list[source]

Takes the new data and a list of the timestamps of the new peaks, and builds a list of the Peak instances.

Parameters:

df (DataFrame)
peak_timestamps (list)

Return type:

list

_get_data_near_peak(df: DataFrame, ts: int, left: int, right: int)[source]

Returns all rows of df with timestamp in [ts-left, ts+right].

Parameters:

df (DataFrame)
ts (int)
left (int)
right (int)

_is_peak(df: DataFrame, ts)[source]

Returns if the peak has more pulses than the background would.

Parameters:: df (DataFrame)