asi_core.make_dataset

This module provides functions to create ASI datasets, e.g., for machine learning applications.

Attributes

Q25_LEFT_CENTRATION_THRESHOLD

Q25_RIGHT_CENTRATION_THRESHOLD

Functions

load_asi_list(csv_files[, asi_root, col_timestamp, ...])

Loads list of ASI from csv files. If asi_root is passed, the absolute path of each ASI image is appended to the

read_asi_meta_data(filename[, is_mobotix, ...])

Extracts meta data of an all-sky image from its name.

check_asi_list(asi_list[, tz, limit_exp_time, ...])

Checks a list of all-sky image files for corruption and returns dataframe with additional data.

check_asi(filename[, tz, is_mobotix, limit_exp_time, ...])

Checks a single ASI (All-Sky Image) file for corruption and extracts metadata.

load_transform_save_asi(rel_path, all_sky_imager, ...)

Loads, transforms and saves transformed all-sky image.

create_asi_list(asi_root[, do_check, name_convention, ...])

Gets all asi within asi_root and save the list to csv.

create_asi_dataset(asi_series, source_dir, target_dir, ...)

Creates an ASI dataset from all passed filenames in target_dir.

read_asi_dataset(csv_file[, img_dir, asi_path_col, ...])

Reads an ASI dataset from a CSV file and optionally filters by date.

merge_meteo_and_asi_data(df_meteo, df_asi[, ...])

Merges meteorological data with ASI data based on timestamps.

map_asi_to_timestamps(df[, round_to, max_delta_t, ...])

Maps asi acqusition time to a rounded timestamp.

select_by_dni_var_classes(dni_var_classes, ...[, ...])

Selects timestamp by dni variability class. A timestamp is selected if the timestamp itself or the included time

check_Q25_asi_cropping(rel_path_to_image, asi_root)

This function can be used to determine how asis from the Q25 all sky imager have been cropped for custom

get_dates_from_csv(csv_file[, col_name])

Extracts unique dates from a specified column in a CSV file.

filter_timestamps_by_sun_elevation(ts, min_el[, ...])

Filters timestamps based on minimum solar elevation.

Module Contents

asi_core.make_dataset.Q25_LEFT_CENTRATION_THRESHOLD = 200
asi_core.make_dataset.Q25_RIGHT_CENTRATION_THRESHOLD = 400
asi_core.make_dataset.load_asi_list(csv_files, asi_root=None, col_timestamp='timestamp', col_rel_path='rel_path', col_filename='file_name')

Loads list of ASI from csv files. If asi_root is passed, the absolute path of each ASI image is appended to the resulting dataframe.

Parameters:
  • csv_files – list of csv files (full paths).

  • asi_root – root directory of asi images.

  • col_timestamp – column name of timestamps of ASI.

  • col_rel_path – column name of relative path of ASI wrt root directory.

  • col_filename – column name of asi file names.

Returns:

dataframe of merged csv files.

asi_core.make_dataset.read_asi_meta_data(filename, is_mobotix=True, name_convention='dlr', tz='UTC+0100')

Extracts meta data of an all-sky image from its name.

Parameters:
  • filename – file path of all-sky image.

  • name_convention – determines naming convention of all-sky images.

Returns:

meta data as dict.

asi_core.make_dataset.check_asi_list(asi_list, tz=1, limit_exp_time=None, name_convention='dlr', n_workers=0)

Checks a list of all-sky image files for corruption and returns dataframe with additional data.

Parameters:
  • asi_list – list of asi files.

  • tz – timezone as int (+/- UTC).

  • limit_exp_time – limit of valid exposure time.

  • name_convention – asi file name convention.

  • n_workers – number of workers to use for parallel processing.

Returns:

meta data of asi files as dataframe.

asi_core.make_dataset.check_asi(filename, tz='UTC+0100', is_mobotix=True, limit_exp_time=None, name_convention='dlr')

Checks a single ASI (All-Sky Image) file for corruption and extracts metadata.

Parameters:
  • filename – Path to the ASI image file.

  • tz – Timezone for parsing metadata timestamps. Default is “UTC+0100”.

  • is_mobotix – Boolean indicating whether the image follows the Mobotix format. Default is True.

  • limit_exp_time – Optional threshold for maximum exposure time. If exceeded, a warning is logged.

  • name_convention – Naming convention used for parsing metadata. Default is ‘dlr’.

Returns:

A dictionary containing: - ‘name’: Extracted image name from metadata (or NaN if unavailable). - ‘timestamp’: Extracted timestamp from metadata (or NaN if unavailable). - ‘exposure_time’: Extracted exposure time from metadata (or NaN if unavailable). - ‘illuminance’: Extracted illuminance value from metadata (or NaN if unavailable). - ‘width’: Image width in pixels. - ‘height’: Image height in pixels. - ‘corrupted’: Boolean indicating if the image is corrupted (True if corrupted, False otherwise).

Raises:

Warning – Logs a warning if the exposure time exceeds limit_exp_time.

asi_core.make_dataset.load_transform_save_asi(rel_path, all_sky_imager, source_dir, target_dir)

Loads, transforms and saves transformed all-sky image.

Parameters:
  • rel_path – relative file path of image.

  • all_sky_imager (AllSkyImager.) – camera used to take image.

  • source_dir – directory of raw images.

  • target_dir – directory of transformed images.

Returns:

True/False depending on success.

asi_core.make_dataset.create_asi_list(asi_root, do_check=False, name_convention='dlr', csv_file=None, n_workers=0)

Gets all asi within asi_root and save the list to csv.

Parameters:
  • asi_root – root folder where images are stored.

  • do_check – if true, all images are checked for validity.

  • name_convention – asi file name convention.

  • csv_file – csv file to save results.

  • n_workers – number of workers to use for parallel processing.

Returns:

None.

asi_core.make_dataset.create_asi_dataset(asi_series, source_dir, target_dir, camera_data_dir, n_workers=0, asi_tfms=None)

Creates an ASI dataset from all passed filenames in target_dir.

Parameters:
  • asi_series – pd.Series of all-sky images, with timestamp of acquisition as index and camera name as name.

  • source_dir – directory of raw images.

  • target_dir – directory of transformed images.

  • camera_data_dir – directory of yaml files containing camera data.

  • n_workers – number of workers to use for parallel processing.

  • kwargs – kwargs for applying transformation.

Returns:

pd.Series of successfully saved (transformed) images.

asi_core.make_dataset.read_asi_dataset(csv_file, img_dir=None, asi_path_col='rel_path', drop_asi_filepath=True, filter_dates=None)

Reads an ASI dataset from a CSV file and optionally filters by date.

Parameters:
  • csv_file – Path to the CSV file containing ASI metadata.

  • img_dir – Optional directory path where ASI images are stored. If provided, file paths will be adjusted accordingly.

  • asi_path_col – Column name in the CSV that contains the relative file paths of ASI images. Default is ‘rel_path’.

  • drop_asi_filepath – Whether to drop the ASI file path column from the returned DataFrame. Default is True.

  • filter_dates – Optional list of dates to filter the dataset. Only entries matching these dates will be retained.

Returns:

  • asi_files: A Pandas Series containing file paths to ASI images.

  • df: A Pandas DataFrame with metadata, optionally filtered and with the ASI path column removed.

asi_core.make_dataset.merge_meteo_and_asi_data(df_meteo, df_asi, temporal_resolution='30s', max_delta_t=15, parameters_to_cast=None)

Merges meteorological data with ASI data based on timestamps.

Parameters:
  • df_meteo – Pandas DataFrame containing meteorological data indexed by timestamp.

  • df_asi – Pandas DataFrame containing ASI metadata indexed by timestamp.

  • temporal_resolution – Time rounding resolution for ASI timestamps (e.g., ’30s’ for 30 seconds). Default is ’30s’.

  • max_delta_t – Maximum allowed time difference (in seconds) for matching ASI data to meteorological data. Default is 15 seconds.

  • parameters_to_cast – Optional dictionary specifying data types for certain parameters after merging.

Returns:

A Pandas DataFrame with meteorological and ASI data merged, indexed by timestamp.

asi_core.make_dataset.map_asi_to_timestamps(df, round_to='60s', max_delta_t=10, valid_exp_times=None, max_delta_exp_time=10, multi_exposure=False, inplace=False)

Maps asi acqusition time to a rounded timestamp.

Parameters:
  • df – dataframe containing a column ‘timestamp’.

  • round_to – string of resolution to round timestamps to.

  • max_delta_t – maximal allowed deviation to rounded timestamp in seconds.

  • valid_exp_times – tuple of valid exposure times to be considered.

  • inplace – if true, overwrites existing dataframe.

Returns:

dataframe with rounded timestamp as index.

asi_core.make_dataset.select_by_dni_var_classes(dni_var_classes, selected_classes, include_by='H')

Selects timestamp by dni variability class. A timestamp is selected if the timestamp itself or the included time frame has a dni variability class contained in selected_classes.

Parameters:
  • dni_var_classes – pd.Series of dni var classes with DatetimeIndex

  • selected_classes – list of dni var classes to filter by.

  • include_by – determines size of time frame (e.g., ‘H’ means hour)

Returns:

selected timestamps as DatetimeIndex.

asi_core.make_dataset.check_Q25_asi_cropping(rel_path_to_image, asi_root)

This function can be used to determine how asis from the Q25 all sky imager have been cropped for custom resolution. The function is used for images of the Kontas camera in the interval from ‘20160920’ to ‘20190612’.

Parameters:
  • rel_path_to_image – path to asi image, relative to asi root directory

  • asi_root – absolute path to asi root directory

:return string specifying how image has been cropped (left, center, right). If the image can’t be opened the function returns nan.

asi_core.make_dataset.get_dates_from_csv(csv_file, col_name='date')

Extracts unique dates from a specified column in a CSV file.

Parameters:
  • csv_file – Path to the CSV file containing date information.

  • col_name – Column name in the CSV that contains date values. Default is ‘date’.

Returns:

A NumPy array of unique dates extracted from the specified column.

asi_core.make_dataset.filter_timestamps_by_sun_elevation(ts, min_el, sun_el=None, latitude=None, longitude=None, altitude=None)

Filters timestamps based on minimum solar elevation.

Parameters:
  • ts – Pandas DatetimeIndex of timestamps to be filtered.

  • min_el – Minimum solar elevation angle (in degrees) required for timestamps to be retained.

  • sun_el – Optional Pandas Series containing precomputed solar elevations for the timestamps. If None, solar elevation will be computed using latitude, longitude, and altitude.

  • latitude – Latitude of the location (required if sun_el is not provided).

  • longitude – Longitude of the location (required if sun_el is not provided).

  • altitude – Altitude of the location in meters (optional, used when computing solar elevation).

Returns:

A filtered Pandas DatetimeIndex containing only timestamps where solar elevation exceeds min_el.