Data
- class nnero.data.DataPartition(early_train: ndarray, early_valid: ndarray, early_test: ndarray, total_train: ndarray, total_valid: ndarray, total_test: ndarray)[source]
Bases:
object
DataPartition class.
Partitioning of the data into a training set, a testing set and a validation set.
- Parameters:
early_train (np.ndarray) – indices of the data array with an early enough reionization used for training
early_valid (np.ndarray) – indices of the data array with an early enough reionization used for validation
early_test (np.ndarray) – indices of the data array with an early enough reionization used for testing
total_train (np.ndarray) – all indices of the data array used for training
total_valid (np.ndarray) – all indices of the data array used for validation
total_test (np.ndarray) – all indices of the data array used for testing
- class nnero.data.DataSet(file_path: str, z: ndarray | None = None, *, frac_test: float = 0.1, frac_valid: float = 0.1, seed_split: int = 1994, extras: list[str] | None = None)[source]
Bases:
object
DataSet class
Compile the data necessary for training.
- Parameters:
file_path (str) – path to the file that contains the raw data
z (np.ndarray) – array of the redshits of interpolation of the nn
use_PCA (bool, optional) – prepare the data to perform the regression in the principal component basis, default is True
precision_PCA (float, optional) – if use_PCA is True, select the number of useful eigenvectors from this coefficient – only the eigenvectors with eigenvalues larger than precision_PCA * the largest eigenvalue are considered as useful
frac_test (float, optional) – fraction of test data out of the total sample, default is 0.1
frac_valid (float, optional) – fraction of validation data out of the total sample, default is 0.1
seed_split (int, optional) – random seed for data partitioning, default is 1994
- init_principal_components(pca_precision: float = 0.001) int [source]
Initialise the principal component analysis decomposition
- Parameters:
pca_precision (float, optional) – precision for the principal analysis reconstruction, by default 1e-3
- Returns:
number of necessary eigenvectors to reach the desired precision
- Return type:
int
- class nnero.data.MetaData(z: ndarray, parameters_name: list | ndarray, parameters_min_val: ndarray, parameters_max_val: ndarray)[source]
Bases:
object
MetaData class
Metadata that is saved with the neural network for predictions.
- Parameters:
z (np.ndarray) – array of redshifts
parameters_name (list | np.ndarray) – name of the parameters (input features)
parameters_min_val (np.ndarray) – minimum value of the parameters (input features)
parameters_max_val (np.ndarray) – maximum value of the parameters (input features)
- class nnero.data.TorchDataset(x_data: ndarray, y_data: ndarray)[source]
Bases:
Dataset
Wrapper of torch Dataset.
- Parameters:
x_data (np.ndarray) – input features
y_data (np.ndarray) – output labels
- nnero.data.preprocess_raw_data(file_path: str, *, random_seed: int = 1994, frac_test: float = 0.1, frac_valid: float = 0.1, extras: list[str] | None = None) None [source]
Preprocess a raw .npz file. Creates another numpy archive that can be directly used to create a
DataSet
object.- Parameters:
file_path (str) – Path to the raw data file. The raw data must be a .npz file with the following information. - z (or z_glob): redshift array - features_run: Sequence of drawn input parameters for which there is a value for the ionization fraction - features_fail: Sequence of drawn input parameters for which the simulator failed because reionization was too late - …
random_seed (int, optional) – Random seed for splitting data into a training/validation/testing subset, by default 1994.
frac_test (float, optional) – Fraction of the total data points in the test subset, by default 0.1.
frac_valid (float, optional) – Fraction of the total data points in the validation subset, by default 0.1
- nnero.data.true_to_uniform(x: float | ndarray, min: float | ndarray, max: float | ndarray) float | ndarray [source]
Transforms features uniformely distributed along [a, b] into features uniformely distributed between [0, 1] as fed to the neural networks.
- Parameters:
x (float | np.ndarray) – input featurs distributed uniformely on [a, b]
min (float | np.ndarray) – minimum value a
max (float | np.ndarray) – maximum value b
- Return type:
float | np.ndarray
- Raises:
ValueError – min should be less than max
- nnero.data.uniform_to_true(x: float | ndarray, min: float | ndarray, max: float | ndarray) float | ndarray [source]
Inverse transformation of true_to_uniform.
- Parameters:
x (float | np.ndarray) – input featurs distributed uniformely on [0, 1]
min (float | np.ndarray) – minimum value a
max (float | np.ndarray) – maximum value b
- Return type:
float | np.ndarray
- Raises:
ValueError – min should be less than max