Data

class nnero.data.DataPartition(early_train: ndarray, early_valid: ndarray, early_test: ndarray, total_train: ndarray, total_valid: ndarray, total_test: ndarray)[source]

Bases: object

DataPartition class.

Partitioning of the data into a training set, a testing set and a validation set.

Parameters:

early_train (np.ndarray) – indices of the data array with an early enough reionization used for training
early_valid (np.ndarray) – indices of the data array with an early enough reionization used for validation
early_test (np.ndarray) – indices of the data array with an early enough reionization used for testing
total_train (np.ndarray) – all indices of the data array used for training
total_valid (np.ndarray) – all indices of the data array used for validation
total_test (np.ndarray) – all indices of the data array used for testing

classmethod load(path: str) → Self[source]

Load a previously saved data partition.

Parameters:: path (str) – path to the data partition saved file.
Return type:: DataPartition

save(name: str) → None[source]

Save the data partition.

Parameters:: name (str) – name of the data partition file

class nnero.data.DataSet(file_path: str, z: ndarray | None = None, *, frac_test: float = 0.1, frac_valid: float = 0.1, seed_split: int = 1994, extras: list[str] | None = None)[source]

Bases: object

DataSet class

Compile the data necessary for training.

Parameters:

file_path (str) – path to the file that contains the raw data
z (np.ndarray) – array of the redshits of interpolation of the nn
use_PCA (bool, optional) – prepare the data to perform the regression in the principal component basis, default is True
precision_PCA (float, optional) – if use_PCA is True, select the number of useful eigenvectors from this coefficient – only the eigenvectors with eigenvalues larger than precision_PCA * the largest eigenvalue are considered as useful
frac_test (float, optional) – fraction of test data out of the total sample, default is 0.1
frac_valid (float, optional) – fraction of validation data out of the total sample, default is 0.1
seed_split (int, optional) – random seed for data partitioning, default is 1994

init_principal_components(pca_precision: float = 0.001) → int[source]

Initialise the principal component analysis decomposition

Parameters:: pca_precision (float, optional) – precision for the principal analysis reconstruction, by default 1e-3
Returns:: number of necessary eigenvectors to reach the desired precision
Return type:: int

class nnero.data.MetaData(z: ndarray, parameters_name: list | ndarray, parameters_min_val: ndarray, parameters_max_val: ndarray)[source]

Bases: object

MetaData class

Metadata that is saved with the neural network for predictions.

Parameters:

z (np.ndarray) – array of redshifts
parameters_name (list | np.ndarray) – name of the parameters (input features)
parameters_min_val (np.ndarray) – minimum value of the parameters (input features)
parameters_max_val (np.ndarray) – maximum value of the parameters (input features)

classmethod load(path: str) → Self[source]

Load a previously saved metadata file.

Parameters:: path (str) – path to the metadata saved file.
Return type:: MetaData

save(name: str) → None[source]

Save the metadata.

Parameters:: name (str) – name of the metadata file

class nnero.data.TorchDataset(x_data: ndarray, y_data: ndarray)[source]

Bases: Dataset

Wrapper of torch Dataset.

Parameters:

x_data (np.ndarray) – input features
y_data (np.ndarray) – output labels

nnero.data.latex_labels(labels: list[str]) → list[str][source]

nnero.data.preprocess_raw_data(file_path: str, *, random_seed: int = 1994, frac_test: float = 0.1, frac_valid: float = 0.1, extras: list[str] | None = None) → None[source]

Preprocess a raw .npz file. Creates another numpy archive that can be directly used to create a DataSet object.

Parameters:

file_path (str) – Path to the raw data file. The raw data must be a .npz file with the following information. - z (or z_glob): redshift array - features_run: Sequence of drawn input parameters for which there is a value for the ionization fraction - features_fail: Sequence of drawn input parameters for which the simulator failed because reionization was too late - …
random_seed (int, optional) – Random seed for splitting data into a training/validation/testing subset, by default 1994.
frac_test (float, optional) – Fraction of the total data points in the test subset, by default 0.1.
frac_valid (float, optional) – Fraction of the total data points in the validation subset, by default 0.1

nnero.data.true_to_uniform(x: float | ndarray, min: float | ndarray, max: float | ndarray) → float | ndarray[source]

Transforms features uniformely distributed along [a, b] into features uniformely distributed between [0, 1] as fed to the neural networks.

Parameters:

x (float | np.ndarray) – input featurs distributed uniformely on [a, b]
min (float | np.ndarray) – minimum value a
max (float | np.ndarray) – maximum value b

Return type:

float | np.ndarray

Raises:

ValueError – min should be less than max

nnero.data.uniform_to_true(x: float | ndarray, min: float | ndarray, max: float | ndarray) → float | ndarray[source]

Inverse transformation of true_to_uniform.

Parameters:

x (float | np.ndarray) – input featurs distributed uniformely on [0, 1]
min (float | np.ndarray) – minimum value a
max (float | np.ndarray) – maximum value b

Return type:

float | np.ndarray

Raises:

ValueError – min should be less than max