exofop.extract

Contents

exofop.extract#

This sub-package provides functionality for extracting data downloaded from the ExoFOP website into a uniform format facilitating further analysis.

Outline#

exofop.extract.LightCurveTableListclass

A list of LightCurveTable instances with additional functionalities.

exofop.extract.LightCurveTableclass

A class representing a light curve table extracted from ExoFOP.

exofop.extract.SynonymMapclass

A class for handling the standardisation of column names.

exofop.extract.SynonymMapLcclass

A class for handling the standardisation of column names, with additional functionalities for the essential light curve attributes ‘time’, ‘flux’, and ‘flux_err’.

exofop.extract.EssentialLightcurveAttributesclass

A class for handling the essential light curve attributes ‘time’, ‘flux’, and ‘flux_err’.

Example

The following is a basic example of how to use the class exofop.extract.LightCurveTableList to extract data downloaded from ExoFOP:

>>> from exofop.extract import LightCurveTableList
>>> target_dir = "path/to/your/directory/with/downloaded/measurements"
>>> lctl = LightCurveTableList.load_exofop_data(target_dir=target_dir)
>>> lctl.standardise_column_names()
>>> lctl.apply_time_correction_in_case_of_discrepancy()
>>> print(lctl.info_df)
>>> lctl_complete = lctl.complete
>>> lctl_complete.save()

For a more detailed introduction, consult the tutorial Extract and standardise ExoFOP time series observations.

Classes#

class exofop.extract.EssentialLightcurveAttributes(time: str = 'time', flux: str = 'flux', flux_err: str = 'flux_err')[source]#

Bases: NamedTuple

A class for handling the essential light curve attributes ‘time’, ‘flux’, and ‘flux_err’.

time#

The primary alias for the time attribute.

Type:

str

flux#

The primary alias for the flux attribute.

Type:

str

flux_err#

The primary alias for the flux error attribute.

Type:

str

flux: str#

Alias for field number 1

flux_err: str#

Alias for field number 2

time: str#

Alias for field number 0

update_primary_alias(old_primary_alias, new_primary_alias)[source]#

Rename a primary alias and update its synonyms accordingly.

Parameters:
  • old_primary_alias (str) – The primary alias to be renamed.

  • new_primary_alias (str) – The new primary alias to replace the old one.

Raises:

KeyError – If the old primary alias does not exist in the mapping.

class exofop.extract.LightCurveTable(*args, name: str | None = None, file_name: str | None = None, **kwargs)[source]#

Bases: Table

A custom table class for representing light curve data.

This class extends the functionality of astropy.table.Table and includes an additional attribute ‘complete’ to store information about the completeness of the light curve data.

Parameters:
  • data (array-like, dict, or Table, optional) – The data to be stored in the table. This can be an array-like object, a dictionary, or another Table. If not provided, an empty table will be created.

  • name (str, optional) – The name of the observation.

  • file_name (str, optional) – The measurement filename of the observation.

  • meta (dict, optional) – A dictionary containing meta data for the table.

  • args

  • kwargs

  • Attributes (Class) –

  • ----------------

  • synonym_map (SynonymMapLc) – A map containing synonyms for the standardisation of column names.

  • time_threshold (float) – Threshold value for considering a time difference as significant (default is 3 days).

  • simple_synonym_map (SynonymMapLc) – A minimalistic map for the standardisation of column names.

  • default_synonym_map (SynonymMapLc) – A map containing the default synonyms for the standardisation of column names.

Examples

>>> from exofop.extract import LightCurveTable
>>> lc = LightCurveTable(
...     [[1., 2, 3], [1., 0.9, 1.1], [0.05, 0.1, 0.075], [0.1, 0.1, 0.1], [0.2, 0.3, 0.4]],
...     names=["time", "flux", "flux_err", "sky", "airmass"],
...     file_name='TIC254113311.01_20200805_ASTEP-ANTARCTICA_Rc_compstar-lightcurves.csv'
... )
<LightCurveTable length=3>
time    flux  flux_err   sky   airmass
float64 float64 float64  float64 float64
------- ------- -------- ------- -------
    1.0     1.0     0.05     0.1     0.2
    2.0     0.9      0.1     0.1     0.3
    3.0     1.1    0.075     0.1     0.4

We can standardise the column names using the synonym map:

>>> lc.standardise_column_names()
>>> lc.is_complete
True

After standardisation, we can access time, flux, and flux_err columns as attributes:

>>> lc.time.value
array([1., 2., 3.])
>>> lc.flux.value
array([1. , 0.9, 1.1])

We can also apply a time correction if there is a discrepancy between the time in the file name and the time array:

>>> lc.apply_time_correction_in_case_of_discrepancy()

We can get the cotrending basis vectors (CBVs), i.e. all primary aliases of lc.synonym_map that do not represent time, flux or flux_err:

>>> lc.cbvs()
<LightCurveTable length=3>
AIRMASS Sky/Pixel_T1
float64   float64
------- ------------
    0.2          0.1
    0.3          0.1
    0.4          0.1
apply_time_correction_in_case_of_discrepancy(time_threshold: float | None = 3)[source]#

Apply time correction to the time array of a table if needed.

This function checks for a discrepancy between the starting time of the array and the starting time given in the file_name_components, which was obtained by parsing the measurement file name. If the difference is significant, it applies a time correction by subtracting an integer number of days from the time array.

Parameters:

time_threshold (int, optional) – Threshold value for considering a time difference as significant (default is 3 days).

Returns:

The DataFrame is modified in-place.

Return type:

None

cbvs(synonym_map: SynonymMapLc | None = None) LightCurveTable[source]#

Get the cotrending basis vectors (CBVs) contained in a standardized table.

This is a convenience function to get the CBVs, i.e. all primary aliases of synonym_map that do not represent time, flux or flux_err.

Parameters:

synonym_map (SynonymMapLc, optional) – A synonym map for the column names (default is None).

Returns:

A table containing the CBVs.

Return type:

LightCurveTable

Notes

This function assumes that the table is already standardised, otherwise it might return an incomplete set of CBVs.

check_completeness(synonym_map: SynonymMapLc | None = None, log_level: int | None = 20) bool[source]#

Check if a complete light curve is contained in the table.

Parameters:#

synonym_mapSynonymMapLc

An object representing the synonym map for the Table columns.

log_levelint, optional

The log level for logging missing columns (default is 20, i.e. ‘info’).

Returns:#

bool

A boolean indicating whether the Table is complete and non-degenerate.

default_synonym_map: SynonymMapLc = {'AIRMASS': ('airmass',), 'BJD_TDB': ('time', 'TIME', 'BJD', 'BJD_TDB_MOBS', '#BJD_TDB'), 'FWHM_T1': ('fwhm', 'FWHM'), 'Sky/Pixel_T1': ('sky', 'SKY'), 'X(IJ)_T1': ('x_coord',), 'Y(IJ)_T1': ('y_coord',), 'rel_flux_T1_n': ('flux', 'FLUX', 'rel_flux', 'rel_flux_T1', 'flux_star_0'), 'rel_flux_err_T1_n': ('flux_err', 'ERRFLUX', 'rel_flux_err_T1')}#
errorbar(ax=None, **kwargs)[source]#
property file_name#

The measurement filename of the observation.

property flux#

The flux column of the observation.

property flux_err#

The flux error column of the observation.

classmethod from_pandas(data_frame: DataFrame, name: str | None = None, file_name: str | None = None, meta: dict | None = None, **kwargs)[source]#

Create a LightCurveTable from a pandas DataFrame.

Parameters:
  • data_frame (pandas.DataFrame) – The DataFrame containing the light curve data.

  • name (str) – The name of the observation.

  • file_name (str) – The measurement filename of the observation, e.g. ‘TIC254113311.01_20200805_ASTEP-ANTARCTICA_Rc_compstar-lightcurves.csv’ from whicht the meta data is attempted to be derived.

  • **kwargs – Additional keyword arguments to be passed to the astropy.table.Table constructor.

property is_complete: bool#

A boolean indicating whether the table is complete and non-degenerate.

light_curve(synonym_map: SynonymMapLc | None = None) LightCurveTable[source]#

Get the light curve contained in a standardised table, i.e. the time, flux and flux_err columns.

Parameters:

synonym_map (SynonymMapLc, optional) – A synonym map for the column names (default is None).

Returns:

A table containing the light curve data.

Return type:

LightCurveTable

Notes

This function assumes that the table is already standardised, otherwise it might return an incomplete set of CBVs.

property name#

The name of the observation. Synonym of observation_name.

property observation_name#

The name of the observation. Synonym of name.

parse_and_format_exofop_file_name(file_name: str | None = None) dict[source]#
plot(ax=None, **kwargs)[source]#
scatter(ax=None, **kwargs)[source]#
simple_synonym_map: SynonymMapLc = {'BJD_TDB': ('time', 'TIME', 'BJD', 'BJD_TDB_MOBS', '#BJD_TDB'), 'rel_flux_T1_n': ('flux', 'FLUX', 'rel_flux', 'rel_flux_T1', 'flux_star_0'), 'rel_flux_err_T1_n': ('flux_err', 'ERRFLUX', 'rel_flux_err_T1')}#
standardise_column_names(synonym_map: SynonymMap | dict | None = None, inplace: bool = True, log_level: int | None = 20)[source]#

Renames the columns of the table according to the synonym map.

property synonym_map: SynonymMapLc#

Synonym map to standardise column names.

property time#

The time column of the observation.

time_threshold: float = 3#
class exofop.extract.LightCurveTableList(*args, target_dir: str | None = None, **kwargs)[source]#

Bases: list

A list-like container for LightCurveTable objects with additional functionalities.

Parameters:

target_dir (str, optional) – The directory containing the downloaded tag directories (default is None).

Examples

>>> tbl_list = LightCurveTableList.load_exofop_data(
...     target_dir='path/to/your/directory/with/downloaded/measurements',
... )
>>> tbl_list.info_df

Notes

This class provides a convenient way to handle multiple light curve observations. It is designed to be used in combination with the LightCurveTable class and provides the following functionalities:

  • Loads potentially heterogeneous measurement files from the specified directory.

  • Extracts information from measurement file names.

  • Creates or loads a summary DataFrame with information on the observations.

  • Standardizes column and index names of the measurement DataFrames.

  • Applies time corrections in case of time discrepancies between file_name and data.

  • Identifies essential columns for light curves and thus checks their completeness.

See also

exofop.extract.LightCurveTable

A custom table class for representing light curve data.

exofop.extract.SynonymMapLc

A map containing synonyms for the standardisation of column names.

apply_time_correction_in_case_of_discrepancy(time_threshold: float | None = 3)[source]#

Apply time correction to the time array of individual observations in the list if needed.

This function checks for a discrepancy between the starting time of the array and the starting time given in the file_name_components, which was obtained by parsing the measurement file name. If the difference is significant, it applies a time correction by subtracting an integer number of days from the time array.

Parameters:

time_threshold (int, optional) – Threshold value for considering a time difference as significant (default is 3 days).

property complete: LightCurveTableList#

Subset of complete observations.

property default_save_dir: str | None#

Default directory for saving the data, namely target_dir/output.

property incomplete: LightCurveTableList#

Subset of incomplete observations.

property info_df: DataFrame#

pandas.DataFrame summarizing information on observations.

classmethod load(load_dir: str = '.') LightCurveTableList[source]#

Load the LightCurveTableList from a directory.

Parameters:

load_dir (str, optional) – The directory from which the data should be loaded (default is ‘.’).

Returns:

The loaded LightCurveTableList.

Return type:

LightCurveTableList

Notes

You can modify the info.csv file to remove observations from being loaded, e.g. if they are incomplete, corrupted or superfluous.

See also

exofop.extract.LightCurveTableList.save

Soad the data from a directory.

classmethod load_exofop_data(target_dir: str, observation_names: List[str] | None = None, synonym_map_lc: SynonymMapLc = {'BJD_TDB': ('time', 'TIME', 'BJD', 'BJD_TDB_MOBS', '#BJD_TDB'), 'rel_flux_T1_n': ('flux', 'FLUX', 'rel_flux', 'rel_flux_T1', 'flux_star_0'), 'rel_flux_err_T1_n': ('flux_err', 'ERRFLUX', 'rel_flux_err_T1')}, allow_multiple_filetypes=True, **kwargs) LightCurveTableList[source]#

Load and standardize ExoFOP data for a given system.

Parameters:
  • target_dir (str) – The directory where the ExoFOP data of the given system is stored.

  • observation_names (List[str], optional) – A list of observation names to consider. If not provided, the function will use all available observations sorted by tag.

  • time_threshold (int, optional) – A threshold for applying time corrections to data in case of discrepancies in time specifications between the file name and the data in units of days, default is 3.

Returns:

A list of LightCurveTables containing the ExoFOP data extracted from target_dir.

Return type:

LightCurveTableList

Notes

This function loads and standardizes ExoFOP data for specific planets. It performs the following steps:

  1. Loads potentially heterogeneous measurement files from the specified directory.

  2. Extracts information from measurement file names.

  3. Unpacks multiple measurements from the same tag into separate observations.

Examples

>>> target_dir = '/path/to/data'
>>> observation_names = None
>>> synonym_map_lc = SYNONYM_MAP_LC
>>> allow_multiple_filetypes = True
>>> exofop_data = LightCurveTableList.load_exofop_data(
...     target_dir=target_dir,
...     observation_names=observation_names,
...     synonym_map_lc=synonym_map_lc,
...     allow_multiple_filetypes=allow_multiple_filetypes,
... )
>>> exofop_data
classmethod load_from_pickle(file_path) LightCurveTableList[source]#

Load an instance from a saved file.

missing_cbvs(synonym_map: SynonymMapLc | None = None) dict | None[source]#

Check for missing CBVs in the observations contained in the list.

This is a convenience function to check which CBVs are missing in which observations.

Parameters:

synonym_map (SynonymMapLc, optional) – A synonym map for the column names (default is None).

Returns:

A dictionary containing the names of the observations and the missing CBVs. If all observations contain all CBVs, None is returned.

Return type:

dict or None

property names: List[str]#

Names of all observations.

number_of_cbvs(synonym_map: SynonymMapLc | None = None)[source]#

Number of CBVs for all observations.

save(save_dir: str | None = None)[source]#

Save the LightCurveTableList to a directory.

Parameters:

save_dir (str, optional) – The directory where the data should be saved. Defaults to the target_dir.

Notes

The data is saved in the following format: - The info DataFrame is saved to a file ‘info.csv’. - Each observation is saved to a file ‘observation_name.ecsv’. - The observation names are used as file names.

See also

exofop.extract.LightCurveTableList.load

Load the data from a directory.

save_to_pickle(file_name: str | None = None)[source]#

Save the instance to a file using pickle serialization.

standardise_column_names(synonym_map: SynonymMap | dict | None = None, log_level: int | None = 20)[source]#

Standardise column names of all observations in the list.

Parameters:
  • synonym_map (SynonymMap or dict, optional) – A synonym map for the column names (default is None).

  • log_level (int, optional) – The log level for logging missing columns (default is 20, i.e. ‘info’).

property synonym_map: SynonymMapLc#

Synonym map to standardise column names.

property time: List#

List of time arrays of all observations.

update_info_df() DataFrame[source]#

Update or create a DataFrame with extracted file name components.

This function updates an existing DataFrame or creates a new one by populating it with extracted components from the ‘file_name_components_dict’. The dictionary should contain observation names as keys and corresponding file name components as values.

If ‘info_df’ is not provided, a new DataFrame will be created.

Parameters:
  • info_df (pandas.DataFrame or None, optional) – An existing DataFrame to be updated (default is None).

  • file_name_components_dict (dict or None, optional) – A dictionary containing observation names and corresponding file name components.

Returns:

The updated or newly created DataFrame containing the extracted components.

Return type:

pandas.DataFrame

Examples

>>> file_name_components_dict = {
>>>     'obs1': parse_and_format_exofop_file_name(
            'TIC254113311.01_20200805_ASTEP-ANTARCTICA_Rc_measurements.csv'),
>>>     'obs2': parse_and_format_exofop_file_name(
            'TIC254113311.02_20200901_ASTEP-ANTARCTICA_Rc_measurements.csv'),
>>> }
>>> updated_df = update_info_df_by_file_name_components(
    file_name_components_dict=file_name_components_dict
)
            Date      BJD       Observatory  ... pp Measurement file name    full_file_name
obs1  2020-08-05  2459066  ASTEP-ANTARCTICA  ...  1      measurements.csv  TIC254113311....
obs2  2020-09-01  2459093  ASTEP-ANTARCTICA  ...  2      measurements.csv  TIC254113311....
class exofop.extract.SynonymMap[source]#

Bases: defaultdict

A dictionary-like class that allows the association of synonyms with a primary alias.

This class plays a pivotal role in achieving consistent column naming across ExoFOP (Exoplanet Follow-up Observing Program) data files. With a wide range of observatories and processing pipelines contributing to ExoFOP, there are cases where different column names are used to denote the same underlying concept. For instance, the column representing time may be labelled as ‘BJD TBD’, ‘time’, ‘TIME’, and so on.

This class addresses this complexity by facilitating the organization of mappings between a primary concept alias and its corresponding synonyms. This systematic approach ensures that a consistent and intuitive nomenclature can be achieved, regardless of the original source.

Parameters:

None

Examples

Initialise the synonym map

>>> synonym_map = SynonymMap()
>>> synonym_map['time'] = ('Time', 'TIME', 'BJD TBD', 'another_synonym')
>>> synonym_map.get_rename_dict('time')
{'Time': 'time', 'TIME': 'time', 'BJD TBD': 'time', 'another_synonym': 'time'}

Add a new primary alias and its synonyms

>>> synonym_map['flux'] = ('flux', 'FLUX', 'rel_flux')

Add a new synonym to an existing primary alias

>>> synonym_map.add_synonyms('flux', 'rel_flux_T1')

Rename a primary alias and update its synonyms accordingly

>>> synonym_map.rename_primary_alias('flux', 'FLUX')
>>> synonym_map.keys()
dict_keys(['time', 'FLUX'])
add_synonyms(primary_alias: str, synonym: tuple | str) None[source]#

Add synonyms to an existing primary alias.

classmethod from_dict(data: dict)[source]#
get_primary_alias(synonym: str) str | None[source]#

Retrieve the primary alias associated with a synonym, if it is contained.

get_rename_dict(primary_alias) Dict[str, str][source]#

Retrieve a dictionary for renaming synonyms to their primary alias.

Parameters:

primary_alias (str) – The primary alias for which to retrieve the renaming dictionary

Returns:

Rename dict.

Return type:

dict

classmethod load_from_yaml(file_path)[source]#
property primary_alias_to_synonyms: Dict[str, Dict[str, str]]#

A dictionary mapping primary aliases to their synonyms.

rebuild_synonym_mapping()[source]#

Update the synonym_to_primary_mapping dictionary based on the current state of the SynonymMap.

remove_synonym(primary_alias: str, synonym: str) None[source]#
rename_primary_alias(old_primary_alias, new_primary_alias)[source]#

Rename a primary alias and update its synonyms accordingly.

Parameters:
  • old_primary_alias (str) – The primary alias to be renamed.

  • new_primary_alias (str) – The new primary alias to replace the old one.

Raises:

KeyError – If the old primary alias does not exist in the mapping.

save_to_yaml(file_path)[source]#
update(synonym_map: Dict[str, str | Set[str] | List[str] | Tuple[str, ...]]) None[source]#

Update the synonym map with a dictionary of primary aliases and their synonyms.

Parameters:

synonym_map (Dict[str, Tuple[str, ...]]) – A dictionary of primary aliases and their synonyms.

class exofop.extract.SynonymMapLc(light_curve_attributes: EssentialLightcurveAttributes | None = None, time: str = 'BJD_TDB', flux: str = 'rel_flux_T1_n', flux_err='rel_flux_err_T1_n')[source]#

Bases: SynonymMap

A subclass of SynonymMap specifically tailored for handling light curve attributes, with the ability to manipulate primary aliases and their synonyms.

Parameters:
  • light_curve_attributes (Optional[EssentialLightcurveAttributes], optional) – An instance of EssentialLightcurveAttributes defining primary aliases for time, flux, and flux error. If None, default aliases will be used.

  • time (str, optional) – Primary alias for time attribute. Default is “BJD_TDB”.

  • flux (str, optional) – Primary alias for flux attribute. Default is “rel_flux_T1_n”.

  • flux_err (str, optional) – Primary alias for flux error attribute. Default is “rel_flux_err_T1_n”.

light_curve_attributes#

The primary aliases for the essential light curve attributes.

Type:

EssentialLightcurveAttributes

Example

>>> synonym_map_lc = SynonymMapLc(time="BJD_TDB", flux="flux", flux_err="flux_err")
>>> synonym_map_lc["BJD_TDB"] = ["time", "BJD", "BJD_TDB_MOBS", "#BJD_TDB"]
>>> synonym_map_lc["flux"] = ["flux", "FLUX", "rel_flux", "rel_flux_T1"]
>>> synonym_map_lc["flux_err"] = ["flux_err", "ERRFLUX", "rel_flux_err_T1"]
>>> synonym_map_lc["cbv_0"] = ["cbv_0_synonym_0", "cbv_0_synonym_1"]

You can access the primary aliases and their synonyms as attributes

>>> synonym_map_lc.light_curve_attributes
EssentialLightcurveAttributes(time='BJD_TDB', flux='flux', flux_err='flux_err')

For convenience, you can access the primary aliases of the light curve attributes as properties >>> synonym_map_lc.time (‘time’, ‘BJD’, ‘BJD_TDB_MOBS’, ‘#BJD_TDB’) >>> synonym_map_lc.rename_primary_alias(“BJD_TDB”, “time”) >>> synonym_map_lc.keys() dict_keys([‘flux’, ‘flux_err’, ‘cbv_0’, ‘time’])

property cbv_names: List[str]#

List of primary aliases for the cotrending basis vectors (CBVs).

copy() a shallow copy of D.[source]#
deepcopy() SynonymMapLc[source]#
property flux: Tuple[str, ...]#

Tuple of synonyms for the flux attribute.

property flux_err: Tuple[str, ...]#

Tuple of synonyms for the flux error attribute.

classmethod load_from_config(reset=False) SynonymMapLc[source]#

Load the synonym map from the local config directory.

Parameters:

reset (bool, optional) – If True, the synonym map is reset to the default configuration. If False, the synonym map is loaded from the local config directory. Default is False.

Returns:

The synonym map instance.

Return type:

SynonymMapLc

Examples

Load the user-modified default synonym map from the local config directory

>>> synonym_map_lc = SynonymMapLc.load_from_config()

Load the default synonym map from the local config directory as shipped with the package

>>> synonym_map_lc = SynonymMapLc.load_from_config(reset=True)
classmethod load_from_yaml(file_path) SynonymMapLc[source]#

Load the synonym map from a YAML file.

Parameters:

file_path (str) – The path to the YAML file.

rename_primary_alias(old_primary_alias, new_primary_alias)[source]#

Rename a primary alias and update its synonyms accordingly.

Parameters:
  • old_primary_alias (str) – The primary alias to be renamed.

  • new_primary_alias (str) – The new primary alias to replace the old one.

Raises:

KeyError – If the old primary alias does not exist in the mapping.

save_to_config()[source]#

Save the synonym map to the local config directory.

The synonym map is saved to the local config directory as ‘synonym_map_lc_local.yaml’.

save_to_yaml(file_path)[source]#
property time: Tuple[str, ...]#

Tuple of synonyms for the time attribute.