exofop.extract#
This sub-package provides functionality for extracting data downloaded from the ExoFOP website into a uniform format facilitating further analysis.
Outline#
exofop.extract.LightCurveTableList
classA list of LightCurveTable instances with additional functionalities.
exofop.extract.LightCurveTable
classA class representing a light curve table extracted from ExoFOP.
exofop.extract.SynonymMap
classA class for handling the standardisation of column names.
exofop.extract.SynonymMapLc
classA class for handling the standardisation of column names, with additional functionalities for the essential light curve attributes ‘time’, ‘flux’, and ‘flux_err’.
exofop.extract.EssentialLightcurveAttributes
classA class for handling the essential light curve attributes ‘time’, ‘flux’, and ‘flux_err’.
Example
The following is a basic example of how to use the class exofop.extract.LightCurveTableList
to extract data downloaded from ExoFOP:
>>> from exofop.extract import LightCurveTableList
>>> target_dir = "path/to/your/directory/with/downloaded/measurements"
>>> lctl = LightCurveTableList.load_exofop_data(target_dir=target_dir)
>>> lctl.standardise_column_names()
>>> lctl.apply_time_correction_in_case_of_discrepancy()
>>> print(lctl.info_df)
>>> lctl_complete = lctl.complete
>>> lctl_complete.save()
For a more detailed introduction, consult the tutorial Extract and standardise ExoFOP time series observations.
Classes#
- class exofop.extract.EssentialLightcurveAttributes(time: str = 'time', flux: str = 'flux', flux_err: str = 'flux_err')[source]#
Bases:
NamedTuple
A class for handling the essential light curve attributes ‘time’, ‘flux’, and ‘flux_err’.
- time#
The primary alias for the time attribute.
- Type:
str
- flux#
The primary alias for the flux attribute.
- Type:
str
- flux_err#
The primary alias for the flux error attribute.
- Type:
str
- flux: str#
Alias for field number 1
- flux_err: str#
Alias for field number 2
- time: str#
Alias for field number 0
- update_primary_alias(old_primary_alias, new_primary_alias)[source]#
Rename a primary alias and update its synonyms accordingly.
- Parameters:
old_primary_alias (str) – The primary alias to be renamed.
new_primary_alias (str) – The new primary alias to replace the old one.
- Raises:
KeyError – If the old primary alias does not exist in the mapping.
- class exofop.extract.LightCurveTable(*args, name: str | None = None, file_name: str | None = None, **kwargs)[source]#
Bases:
Table
A custom table class for representing light curve data.
This class extends the functionality of astropy.table.Table and includes an additional attribute ‘complete’ to store information about the completeness of the light curve data.
- Parameters:
data (array-like, dict, or Table, optional) – The data to be stored in the table. This can be an array-like object, a dictionary, or another Table. If not provided, an empty table will be created.
name (str, optional) – The name of the observation.
file_name (str, optional) – The measurement filename of the observation.
meta (dict, optional) – A dictionary containing meta data for the table.
args –
kwargs –
Attributes (Class) –
---------------- –
synonym_map (SynonymMapLc) – A map containing synonyms for the standardisation of column names.
time_threshold (float) – Threshold value for considering a time difference as significant (default is 3 days).
simple_synonym_map (SynonymMapLc) – A minimalistic map for the standardisation of column names.
default_synonym_map (SynonymMapLc) – A map containing the default synonyms for the standardisation of column names.
Examples
>>> from exofop.extract import LightCurveTable >>> lc = LightCurveTable( ... [[1., 2, 3], [1., 0.9, 1.1], [0.05, 0.1, 0.075], [0.1, 0.1, 0.1], [0.2, 0.3, 0.4]], ... names=["time", "flux", "flux_err", "sky", "airmass"], ... file_name='TIC254113311.01_20200805_ASTEP-ANTARCTICA_Rc_compstar-lightcurves.csv' ... ) <LightCurveTable length=3> time flux flux_err sky airmass float64 float64 float64 float64 float64 ------- ------- -------- ------- ------- 1.0 1.0 0.05 0.1 0.2 2.0 0.9 0.1 0.1 0.3 3.0 1.1 0.075 0.1 0.4
We can standardise the column names using the synonym map:
>>> lc.standardise_column_names() >>> lc.is_complete True
After standardisation, we can access time, flux, and flux_err columns as attributes:
>>> lc.time.value array([1., 2., 3.]) >>> lc.flux.value array([1. , 0.9, 1.1])
We can also apply a time correction if there is a discrepancy between the time in the file name and the time array:
>>> lc.apply_time_correction_in_case_of_discrepancy()
We can get the cotrending basis vectors (CBVs), i.e. all primary aliases of lc.synonym_map that do not represent time, flux or flux_err:
>>> lc.cbvs() <LightCurveTable length=3> AIRMASS Sky/Pixel_T1 float64 float64 ------- ------------ 0.2 0.1 0.3 0.1 0.4 0.1
- apply_time_correction_in_case_of_discrepancy(time_threshold: float | None = 3)[source]#
Apply time correction to the time array of a table if needed.
This function checks for a discrepancy between the starting time of the array and the starting time given in the file_name_components, which was obtained by parsing the measurement file name. If the difference is significant, it applies a time correction by subtracting an integer number of days from the time array.
- Parameters:
time_threshold (int, optional) – Threshold value for considering a time difference as significant (default is 3 days).
- Returns:
The DataFrame is modified in-place.
- Return type:
None
- cbvs(synonym_map: SynonymMapLc | None = None) LightCurveTable [source]#
Get the cotrending basis vectors (CBVs) contained in a standardized table.
This is a convenience function to get the CBVs, i.e. all primary aliases of synonym_map that do not represent time, flux or flux_err.
- Parameters:
synonym_map (SynonymMapLc, optional) – A synonym map for the column names (default is None).
- Returns:
A table containing the CBVs.
- Return type:
Notes
This function assumes that the table is already standardised, otherwise it might return an incomplete set of CBVs.
- check_completeness(synonym_map: SynonymMapLc | None = None, log_level: int | None = 20) bool [source]#
Check if a complete light curve is contained in the table.
Parameters:#
- synonym_mapSynonymMapLc
An object representing the synonym map for the Table columns.
- log_levelint, optional
The log level for logging missing columns (default is 20, i.e. ‘info’).
Returns:#
- bool
A boolean indicating whether the Table is complete and non-degenerate.
- default_synonym_map: SynonymMapLc = {'AIRMASS': ('airmass',), 'BJD_TDB': ('time', 'TIME', 'BJD', 'BJD_TDB_MOBS', '#BJD_TDB'), 'FWHM_T1': ('fwhm', 'FWHM'), 'Sky/Pixel_T1': ('sky', 'SKY'), 'X(IJ)_T1': ('x_coord',), 'Y(IJ)_T1': ('y_coord',), 'rel_flux_T1_n': ('flux', 'FLUX', 'rel_flux', 'rel_flux_T1', 'flux_star_0'), 'rel_flux_err_T1_n': ('flux_err', 'ERRFLUX', 'rel_flux_err_T1')}#
- property file_name#
The measurement filename of the observation.
- property flux#
The flux column of the observation.
- property flux_err#
The flux error column of the observation.
- classmethod from_pandas(data_frame: DataFrame, name: str | None = None, file_name: str | None = None, meta: dict | None = None, **kwargs)[source]#
Create a LightCurveTable from a pandas DataFrame.
- Parameters:
data_frame (pandas.DataFrame) – The DataFrame containing the light curve data.
name (str) – The name of the observation.
file_name (str) – The measurement filename of the observation, e.g. ‘TIC254113311.01_20200805_ASTEP-ANTARCTICA_Rc_compstar-lightcurves.csv’ from whicht the meta data is attempted to be derived.
**kwargs – Additional keyword arguments to be passed to the astropy.table.Table constructor.
- property is_complete: bool#
A boolean indicating whether the table is complete and non-degenerate.
- light_curve(synonym_map: SynonymMapLc | None = None) LightCurveTable [source]#
Get the light curve contained in a standardised table, i.e. the time, flux and flux_err columns.
- Parameters:
synonym_map (SynonymMapLc, optional) – A synonym map for the column names (default is None).
- Returns:
A table containing the light curve data.
- Return type:
Notes
This function assumes that the table is already standardised, otherwise it might return an incomplete set of CBVs.
See also
- property name#
The name of the observation. Synonym of observation_name.
- property observation_name#
The name of the observation. Synonym of name.
See also
- simple_synonym_map: SynonymMapLc = {'BJD_TDB': ('time', 'TIME', 'BJD', 'BJD_TDB_MOBS', '#BJD_TDB'), 'rel_flux_T1_n': ('flux', 'FLUX', 'rel_flux', 'rel_flux_T1', 'flux_star_0'), 'rel_flux_err_T1_n': ('flux_err', 'ERRFLUX', 'rel_flux_err_T1')}#
- standardise_column_names(synonym_map: SynonymMap | dict | None = None, inplace: bool = True, log_level: int | None = 20)[source]#
Renames the columns of the table according to the synonym map.
- property synonym_map: SynonymMapLc#
Synonym map to standardise column names.
- property time#
The time column of the observation.
- time_threshold: float = 3#
- class exofop.extract.LightCurveTableList(*args, target_dir: str | None = None, **kwargs)[source]#
Bases:
list
A list-like container for LightCurveTable objects with additional functionalities.
- Parameters:
target_dir (str, optional) – The directory containing the downloaded tag directories (default is None).
Examples
>>> tbl_list = LightCurveTableList.load_exofop_data( ... target_dir='path/to/your/directory/with/downloaded/measurements', ... ) >>> tbl_list.info_df
Notes
This class provides a convenient way to handle multiple light curve observations. It is designed to be used in combination with the LightCurveTable class and provides the following functionalities:
Loads potentially heterogeneous measurement files from the specified directory.
Extracts information from measurement file names.
Creates or loads a summary DataFrame with information on the observations.
Standardizes column and index names of the measurement DataFrames.
Applies time corrections in case of time discrepancies between file_name and data.
Identifies essential columns for light curves and thus checks their completeness.
See also
exofop.extract.LightCurveTable
A custom table class for representing light curve data.
exofop.extract.SynonymMapLc
A map containing synonyms for the standardisation of column names.
- apply_time_correction_in_case_of_discrepancy(time_threshold: float | None = 3)[source]#
Apply time correction to the time array of individual observations in the list if needed.
This function checks for a discrepancy between the starting time of the array and the starting time given in the file_name_components, which was obtained by parsing the measurement file name. If the difference is significant, it applies a time correction by subtracting an integer number of days from the time array.
- Parameters:
time_threshold (int, optional) – Threshold value for considering a time difference as significant (default is 3 days).
- property complete: LightCurveTableList#
Subset of complete observations.
- property default_save_dir: str | None#
Default directory for saving the data, namely target_dir/output.
- property incomplete: LightCurveTableList#
Subset of incomplete observations.
- property info_df: DataFrame#
pandas.DataFrame summarizing information on observations.
- classmethod load(load_dir: str = '.') LightCurveTableList [source]#
Load the LightCurveTableList from a directory.
- Parameters:
load_dir (str, optional) – The directory from which the data should be loaded (default is ‘.’).
- Returns:
The loaded LightCurveTableList.
- Return type:
Notes
You can modify the info.csv file to remove observations from being loaded, e.g. if they are incomplete, corrupted or superfluous.
See also
exofop.extract.LightCurveTableList.save
Soad the data from a directory.
- classmethod load_exofop_data(target_dir: str, observation_names: List[str] | None = None, synonym_map_lc: SynonymMapLc = {'BJD_TDB': ('time', 'TIME', 'BJD', 'BJD_TDB_MOBS', '#BJD_TDB'), 'rel_flux_T1_n': ('flux', 'FLUX', 'rel_flux', 'rel_flux_T1', 'flux_star_0'), 'rel_flux_err_T1_n': ('flux_err', 'ERRFLUX', 'rel_flux_err_T1')}, allow_multiple_filetypes=True, **kwargs) LightCurveTableList [source]#
Load and standardize ExoFOP data for a given system.
- Parameters:
target_dir (str) – The directory where the ExoFOP data of the given system is stored.
observation_names (List[str], optional) – A list of observation names to consider. If not provided, the function will use all available observations sorted by tag.
time_threshold (int, optional) – A threshold for applying time corrections to data in case of discrepancies in time specifications between the file name and the data in units of days, default is 3.
- Returns:
A list of LightCurveTables containing the ExoFOP data extracted from target_dir.
- Return type:
Notes
This function loads and standardizes ExoFOP data for specific planets. It performs the following steps:
Loads potentially heterogeneous measurement files from the specified directory.
Extracts information from measurement file names.
Unpacks multiple measurements from the same tag into separate observations.
Examples
>>> target_dir = '/path/to/data' >>> observation_names = None >>> synonym_map_lc = SYNONYM_MAP_LC >>> allow_multiple_filetypes = True >>> exofop_data = LightCurveTableList.load_exofop_data( ... target_dir=target_dir, ... observation_names=observation_names, ... synonym_map_lc=synonym_map_lc, ... allow_multiple_filetypes=allow_multiple_filetypes, ... ) >>> exofop_data
- classmethod load_from_pickle(file_path) LightCurveTableList [source]#
Load an instance from a saved file.
- missing_cbvs(synonym_map: SynonymMapLc | None = None) dict | None [source]#
Check for missing CBVs in the observations contained in the list.
This is a convenience function to check which CBVs are missing in which observations.
- Parameters:
synonym_map (SynonymMapLc, optional) – A synonym map for the column names (default is None).
- Returns:
A dictionary containing the names of the observations and the missing CBVs. If all observations contain all CBVs, None is returned.
- Return type:
dict or None
- property names: List[str]#
Names of all observations.
- number_of_cbvs(synonym_map: SynonymMapLc | None = None)[source]#
Number of CBVs for all observations.
- save(save_dir: str | None = None)[source]#
Save the LightCurveTableList to a directory.
- Parameters:
save_dir (str, optional) – The directory where the data should be saved. Defaults to the target_dir.
Notes
The data is saved in the following format: - The info DataFrame is saved to a file ‘info.csv’. - Each observation is saved to a file ‘observation_name.ecsv’. - The observation names are used as file names.
See also
exofop.extract.LightCurveTableList.load
Load the data from a directory.
- save_to_pickle(file_name: str | None = None)[source]#
Save the instance to a file using pickle serialization.
- standardise_column_names(synonym_map: SynonymMap | dict | None = None, log_level: int | None = 20)[source]#
Standardise column names of all observations in the list.
- Parameters:
synonym_map (SynonymMap or dict, optional) – A synonym map for the column names (default is None).
log_level (int, optional) – The log level for logging missing columns (default is 20, i.e. ‘info’).
- property synonym_map: SynonymMapLc#
Synonym map to standardise column names.
- property time: List#
List of time arrays of all observations.
- update_info_df() DataFrame [source]#
Update or create a DataFrame with extracted file name components.
This function updates an existing DataFrame or creates a new one by populating it with extracted components from the ‘file_name_components_dict’. The dictionary should contain observation names as keys and corresponding file name components as values.
If ‘info_df’ is not provided, a new DataFrame will be created.
- Parameters:
info_df (pandas.DataFrame or None, optional) – An existing DataFrame to be updated (default is None).
file_name_components_dict (dict or None, optional) – A dictionary containing observation names and corresponding file name components.
- Returns:
The updated or newly created DataFrame containing the extracted components.
- Return type:
pandas.DataFrame
Examples
>>> file_name_components_dict = { >>> 'obs1': parse_and_format_exofop_file_name( 'TIC254113311.01_20200805_ASTEP-ANTARCTICA_Rc_measurements.csv'), >>> 'obs2': parse_and_format_exofop_file_name( 'TIC254113311.02_20200901_ASTEP-ANTARCTICA_Rc_measurements.csv'), >>> } >>> updated_df = update_info_df_by_file_name_components( file_name_components_dict=file_name_components_dict ) Date BJD Observatory ... pp Measurement file name full_file_name obs1 2020-08-05 2459066 ASTEP-ANTARCTICA ... 1 measurements.csv TIC254113311.... obs2 2020-09-01 2459093 ASTEP-ANTARCTICA ... 2 measurements.csv TIC254113311....
- class exofop.extract.SynonymMap[source]#
Bases:
defaultdict
A dictionary-like class that allows the association of synonyms with a primary alias.
This class plays a pivotal role in achieving consistent column naming across ExoFOP (Exoplanet Follow-up Observing Program) data files. With a wide range of observatories and processing pipelines contributing to ExoFOP, there are cases where different column names are used to denote the same underlying concept. For instance, the column representing time may be labelled as ‘BJD TBD’, ‘time’, ‘TIME’, and so on.
This class addresses this complexity by facilitating the organization of mappings between a primary concept alias and its corresponding synonyms. This systematic approach ensures that a consistent and intuitive nomenclature can be achieved, regardless of the original source.
- Parameters:
None –
Examples
Initialise the synonym map
>>> synonym_map = SynonymMap() >>> synonym_map['time'] = ('Time', 'TIME', 'BJD TBD', 'another_synonym') >>> synonym_map.get_rename_dict('time') {'Time': 'time', 'TIME': 'time', 'BJD TBD': 'time', 'another_synonym': 'time'}
Add a new primary alias and its synonyms
>>> synonym_map['flux'] = ('flux', 'FLUX', 'rel_flux')
Add a new synonym to an existing primary alias
>>> synonym_map.add_synonyms('flux', 'rel_flux_T1')
Rename a primary alias and update its synonyms accordingly
>>> synonym_map.rename_primary_alias('flux', 'FLUX') >>> synonym_map.keys() dict_keys(['time', 'FLUX'])
See also
- add_synonyms(primary_alias: str, synonym: tuple | str) None [source]#
Add synonyms to an existing primary alias.
- get_primary_alias(synonym: str) str | None [source]#
Retrieve the primary alias associated with a synonym, if it is contained.
- get_rename_dict(primary_alias) Dict[str, str] [source]#
Retrieve a dictionary for renaming synonyms to their primary alias.
- Parameters:
primary_alias (str) – The primary alias for which to retrieve the renaming dictionary
- Returns:
Rename dict.
- Return type:
dict
- property primary_alias_to_synonyms: Dict[str, Dict[str, str]]#
A dictionary mapping primary aliases to their synonyms.
- rebuild_synonym_mapping()[source]#
Update the synonym_to_primary_mapping dictionary based on the current state of the SynonymMap.
- rename_primary_alias(old_primary_alias, new_primary_alias)[source]#
Rename a primary alias and update its synonyms accordingly.
- Parameters:
old_primary_alias (str) – The primary alias to be renamed.
new_primary_alias (str) – The new primary alias to replace the old one.
- Raises:
KeyError – If the old primary alias does not exist in the mapping.
- class exofop.extract.SynonymMapLc(light_curve_attributes: EssentialLightcurveAttributes | None = None, time: str = 'BJD_TDB', flux: str = 'rel_flux_T1_n', flux_err='rel_flux_err_T1_n')[source]#
Bases:
SynonymMap
A subclass of SynonymMap specifically tailored for handling light curve attributes, with the ability to manipulate primary aliases and their synonyms.
- Parameters:
light_curve_attributes (Optional[EssentialLightcurveAttributes], optional) – An instance of EssentialLightcurveAttributes defining primary aliases for time, flux, and flux error. If None, default aliases will be used.
time (str, optional) – Primary alias for time attribute. Default is “BJD_TDB”.
flux (str, optional) – Primary alias for flux attribute. Default is “rel_flux_T1_n”.
flux_err (str, optional) – Primary alias for flux error attribute. Default is “rel_flux_err_T1_n”.
- light_curve_attributes#
The primary aliases for the essential light curve attributes.
Example
>>> synonym_map_lc = SynonymMapLc(time="BJD_TDB", flux="flux", flux_err="flux_err") >>> synonym_map_lc["BJD_TDB"] = ["time", "BJD", "BJD_TDB_MOBS", "#BJD_TDB"] >>> synonym_map_lc["flux"] = ["flux", "FLUX", "rel_flux", "rel_flux_T1"] >>> synonym_map_lc["flux_err"] = ["flux_err", "ERRFLUX", "rel_flux_err_T1"] >>> synonym_map_lc["cbv_0"] = ["cbv_0_synonym_0", "cbv_0_synonym_1"]
You can access the primary aliases and their synonyms as attributes
>>> synonym_map_lc.light_curve_attributes EssentialLightcurveAttributes(time='BJD_TDB', flux='flux', flux_err='flux_err')
For convenience, you can access the primary aliases of the light curve attributes as properties >>> synonym_map_lc.time (‘time’, ‘BJD’, ‘BJD_TDB_MOBS’, ‘#BJD_TDB’) >>> synonym_map_lc.rename_primary_alias(“BJD_TDB”, “time”) >>> synonym_map_lc.keys() dict_keys([‘flux’, ‘flux_err’, ‘cbv_0’, ‘time’])
- property cbv_names: List[str]#
List of primary aliases for the cotrending basis vectors (CBVs).
- deepcopy() SynonymMapLc [source]#
- property flux: Tuple[str, ...]#
Tuple of synonyms for the flux attribute.
- property flux_err: Tuple[str, ...]#
Tuple of synonyms for the flux error attribute.
- classmethod load_from_config(reset=False) SynonymMapLc [source]#
Load the synonym map from the local config directory.
- Parameters:
reset (bool, optional) – If True, the synonym map is reset to the default configuration. If False, the synonym map is loaded from the local config directory. Default is False.
- Returns:
The synonym map instance.
- Return type:
Examples
Load the user-modified default synonym map from the local config directory
>>> synonym_map_lc = SynonymMapLc.load_from_config()
Load the default synonym map from the local config directory as shipped with the package
>>> synonym_map_lc = SynonymMapLc.load_from_config(reset=True)
- classmethod load_from_yaml(file_path) SynonymMapLc [source]#
Load the synonym map from a YAML file.
- Parameters:
file_path (str) – The path to the YAML file.
- rename_primary_alias(old_primary_alias, new_primary_alias)[source]#
Rename a primary alias and update its synonyms accordingly.
- Parameters:
old_primary_alias (str) – The primary alias to be renamed.
new_primary_alias (str) – The new primary alias to replace the old one.
- Raises:
KeyError – If the old primary alias does not exist in the mapping.
- save_to_config()[source]#
Save the synonym map to the local config directory.
The synonym map is saved to the local config directory as ‘synonym_map_lc_local.yaml’.
- property time: Tuple[str, ...]#
Tuple of synonyms for the time attribute.