Extract and standardise ExoFOP time series observations#

This notebook demonstrates how to conveniently load and standardise time series observations from ExoFOP.

The main challenge here is achieving consistent column naming across ExoFOP data files. With a wide range of observatories and processing pipelines contributing to ExoFOP, there are cases where different column names are used to denote the same underlying concept. For instance, the column representing time may be labelled as ‘BJD TBD’, ‘time’, ‘TIME’, and so on. By mapping all of these synonyms to a primary alias, a consistent and intuitive nomenclature can be achieved, regardless of the original source. To achieve this, we use SynonymMap(collections.defaultdict) a dictionary-like class that allows the association of synonyms with a primary alias facilitating the standardisation of column names.

Here is an example that illustrates how this class works.

from exofop.extract import LightCurveTableList, LightCurveTable, SynonymMap, SynonymMapLc

# For nice logs, we reformat the root loggger's StreamHandler
from exofop.utils.logger import configure_root_logger
configure_root_logger()
# Initialise a synonym map from a dictionary
synonym_map = SynonymMap()

# Define a primary alias and its synonyms
synonym_map['primary_alias_a'] = ['synonym_a_0', 'synonym_a_1', 'synonym_a_2']

# Alternatively, we can use `update` to add multiple primary aliases at once
synonym_map.update(
    {
        'primary_alias_b': ['synonym_b_0', 'synonym_b_1', 'synonym_b_2'],
        'primary_alias_c': ['synonym_c_0'],
    }
)

# If we realise that we were missing a synonym of `primary_alias_a`, we can add it
synonym_map.add_synonyms('primary_alias_a', 'synonym_a_3')

# And if we want to rename `primary_alias_b` to `primary_alias_b_new`, we can do so
synonym_map.rename_primary_alias('primary_alias_b', 'new_primary_alias_b_new')

synonym_map
SynonymMap(tuple,
           {'primary_alias_a': ('synonym_a_0',
             'synonym_a_1',
             'synonym_a_2',
             'synonym_a_3'),
            'primary_alias_c': ('synonym_c_0',),
            'new_primary_alias_b_new': ('synonym_b_0',
             'synonym_b_1',
             'synonym_b_2',
             'primary_alias_b')})

Extract the data#

The extracted data is contained in a list of LightCurveTable objects. This class extends the functionality of its parent class astropy.table.Table, giving access to additional attribute stored in the meta attribute of the table and providing various methods.

target_dir = './TOI_1130'  # As generated in the previous tutorial

exofop_data = LightCurveTableList.load_exofop_data(
        target_dir=target_dir,
        observation_names=None,  # defaults to all folders in data_dir
        allow_multiple_filetypes=True,
    )
exofop_data
13:56:12 :: INFO     :: Tags of datasets processed: 5903
LightCurveTableList(['5903'])

All successfully loaded tables can be inspected by index or name using

exofop_data[-1][:5]
LightCurveTable length=5
BJD_TDBrel_flux_T1_nrel_flux_err_T1_nmagmag_errvar_imagcomp_imagx_coordy_coorddist_centerfwhmairmassskyrel_flux_magrel_flux_err_mag
float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64
2458758.0218171.00341363917806770.001478684984114128211.37390.001611.300111.9067813.5899462.288468.09832.94421.0847472.46291.00341363917806770.0014786849841141282
2458758.02264341.00693166885180220.001483869344231537411.37010.001611.293511.9054812.7702461.769167.88383.12781.0864444.08181.00693166885180220.0014838693442315374
2458758.02347261.0006449317051170.001474604865999518411.37690.001611.29811.903813.2566461.411568.48023.34111.0882450.58041.0006449317051170.0014746048659995184
2458758.02513231.00258222351506760.001477459764714636411.37480.001611.294511.9813.8261460.748869.35182.99261.0918425.40031.00258222351506760.0014774597647146364
2458758.02595521.00563411991609940.001481957205455998511.37150.001611.292411.9031813.3081460.568769.11682.81761.0936443.67411.00563411991609940.0014819572054559985

All information extracted from the file names is contained in the .meta attribute of each LightCurveTable.

exofop_data[0].meta
{'target': 'TIC254113311',
 'pp': 1,
 'yyyymmdd': '2019-10-01',
 'observatory': 'PEST',
 'filter': 'Rc',
 'filetype': 'data',
 'ext': 'txt',
 'full_file_name': 'TIC254113311-01_20191001_PEST_Rc_data.txt',
 'jd': 2458757.5,
 'is_complete': True,
 'name': '5903'}

Standardise essential light curve attributes#

Standardisation of the column names in LightCurveTable objects is achieved using instances of either SynonymMap or SynonymMapLc. SynonymMapLc is a subclass of SynonymMap specifically tailored for handling attributes related to light curves. It provides additional functionality to manipulate primary aliases and their synonyms, with default attributes for time, flux, and flux error. By default, LightCurveTable().synonym_map provides a set of synonyms used for standardization. However, creating a custom synonym map allows for tailored adjustments to better suit specific needs.

Let’s delve into creating a custom synonym map to demonstrate its utility and ease of refinement.

simple_synonym_map = SynonymMapLc()
simple_synonym_map.update(
    {
        'BJD_TDB': ('time', 'TIME', 'BJD', 'BJD_TDB_MOBS', '#BJD_TDB'),
        'rel_flux_T1_n': ('flux', 'FLUX', 'rel_flux', 'rel_flux_T1'),
        'rel_flux_err_T1_n': ('flux_err', 'ERRFLUX', 'rel_flux_err_T1')
    }
)
print(simple_synonym_map.light_curve_attributes)
print(simple_synonym_map)
EssentialLightcurveAttributes(time='BJD_TDB', flux='rel_flux_T1_n', flux_err='rel_flux_err_T1_n')
SynonymMapLc(<class 'tuple'>, {'BJD_TDB': ('time', 'TIME', 'BJD', 'BJD_TDB_MOBS', '#BJD_TDB'), 'rel_flux_T1_n': ('flux', 'FLUX', 'rel_flux', 'rel_flux_T1'), 'rel_flux_err_T1_n': ('flux_err', 'ERRFLUX', 'rel_flux_err_T1')})

Next, we utilize simple_synonym_map to search for secondary aliases within the column names of each table. This search is performed for each primary alias that is not already present in the table.

Following this, we then check for any discrepancies between the time given in the table and the time stamp specified in the measurement file name. This verification step is crucial because CSV files may sometimes express time relative to a reference point, such as 2 450 000 BJD. If a discrepancy is detected, a reference time is inferred based on the disparity between the timestamp of the file name and the time column. This reference time is then rounded and added to the time column to ensure consistency. A warning log informs the user whenever a time correction is applied.

exofop_data.standardise_column_names(synonym_map=simple_synonym_map)
exofop_data.apply_time_correction_in_case_of_discrepancy()
13:56:12 :: INFO     :: All observations contain a complete light curve.

We can also view an information summary table for all measurements.

exofop_data.info_df
Date BJD Observatory Filter target pp Measurement file name full_file_name texp Duration Efficiency cbvs ncols
5903 2019-10-01 2458757 PEST Rc TIC254113311 1 data.txt TIC254113311-01_20191001_PEST_Rc_data.txt 23.852154 3.968364 75.4 15 15

If matplotlib is installed, you can also visually inspect any complete and standardised the light curve:

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

fig, ax = plt.subplots(1, figsize=(6.69, 4.14))
exofop_data[-1].errorbar(ax=ax, ls='', marker='.', markersize=2, alpha=1, color='black')

plt.show()
../_images/cddbb0dec4a1043a2773f2cde92cf0ed4f91ab9949db014503282ee5fe66bf75.svg

Standardise naming of cotrending basis vectors (CBVs)#

In this package, we employ the term Cotrending Basis Vectors (CBVs) to denote a collection of vectors commonly utilized for mitigating systematic effects in time-series data. Examples of such CBVs include:

  • AIRMASS: The optical path length through Earth’s atmosphere when observing a celestial object. It accounts for atmospheric refraction and absorption, impacting the observed brightness and position of celestial objects.

  • Sky/Pixel: The background sky brightness associated with a particular set of pixel in an image. This metric helps in accounting for varying levels of sky illumination across the detector array, which can affect the measured signal from celestial objects.

  • X(IJ)/Y(IJ): Spatial coordinates of a pixel within the detector array. These coordinates define the precise location of a pixel in the two-dimensional imaging plane, aiding in accurate positional calibration and image alignment.

  • FWHM: Full Width at Half Maximum is a measurement used to quantify the width of a distribution. In the context of photometry, it quantifies the extent of blurring or spreading of the point spread function (PSF) of a star or celestial object in an image. FWHM provides crucial information about the spatial resolution and image quality, influencing the precision of photometric measurements.

Unlike time, flux and flux_err, which are essential for a complete light curve, CBVs are optional - although highly desirable to decrease the amount of red noise present in the data.

Let’s try and standardise their column names

synonym_map = simple_synonym_map.deepcopy()
synonym_map.update(
    {
        "AIRMASS": ("AIRMASS"),  # ("airmass"),
        "Sky/Pixel_T1": ("sky", "SKY"),
        "X(IJ)_T1": ("x_coord"),
        "Y(IJ)_T1": ("y_coord"),
        "FWHM_T1": ("fwhm", "FWHM"),
    }
)

# To get the names of all primary aliases in the synonym map that are not part of the light curve
print("CBVs:", synonym_map.cbv_names)

exofop_data.standardise_column_names(synonym_map)
13:56:13 :: INFO     :: All observations contain a complete light curve.
CBVs: ['AIRMASS', 'Sky/Pixel_T1', 'X(IJ)_T1', 'Y(IJ)_T1', 'FWHM_T1']

Lets see if there are datasets that are missing any CBVs.

# We seem to be missing the cotrending basis vector airmass for the first light curve
exofop_data.missing_cbvs(synonym_map=synonym_map)
{'5903': ('PEST', ['AIRMASS'])}

The observations from PEST seemt to be missing the AIRMASS. Let’s see if there is indeed no column corresponding to airmass in this table, or if they are using a synonym that we haven’t come across yet.

[np.array(exofop_data[name].colnames) for name in exofop_data.missing_cbvs()]
[array(['BJD_TDB', 'rel_flux_T1_n', 'rel_flux_err_T1_n', 'mag', 'mag_err',
        'var_imag', 'comp_imag', 'X(IJ)_T1', 'Y(IJ)_T1', 'dist_center',
        'FWHM_T1', 'airmass', 'Sky/Pixel_T1', 'rel_flux_mag',
        'rel_flux_err_mag'], dtype='<U17')]

So they do in fact have a column called airmass. Lets add that synonym and rerun the standardisation.

synonym_map.add_synonyms('AIRMASS', 'airmass')
exofop_data.standardise_column_names(synonym_map)

exofop_data.missing_cbvs(synonym_map=synonym_map)
13:56:13 :: INFO     :: All observations contain a complete light curve.

If no synonym map is provided, LightCurveTable().synonym_map is used. If you find additional synonyms that you want to keep permanently, you can save them, using synonym_map.save() or synonym_map.save_to_config().

LightCurveTable().synonym_map
SynonymMapLc(tuple,
             {'AIRMASS': ('airmass',),
              'BJD_TDB': ('time', 'TIME', 'BJD', 'BJD_TDB_MOBS', '#BJD_TDB'),
              'FWHM_T1': ('fwhm', 'FWHM'),
              'Sky/Pixel_T1': ('sky', 'SKY'),
              'X(IJ)_T1': ('x_coord',),
              'Y(IJ)_T1': ('y_coord',),
              'rel_flux_T1_n': ('flux',
               'FLUX',
               'rel_flux',
               'rel_flux_T1',
               'flux_star_0'),
              'rel_flux_err_T1_n': ('flux_err', 'ERRFLUX', 'rel_flux_err_T1')})

Now the data is in a standardised form, ready to be analysed further with the probabilistic time series modelling tool of your choice.

Your light_curve' and cbvs’ can be conveniently accessed and converted to numpy.

exofop_data[0].light_curve().to_pandas()
BJD_TDB rel_flux_T1_n rel_flux_err_T1_n
0 2.458758e+06 1.003414 0.001479
1 2.458758e+06 1.006932 0.001484
2 2.458758e+06 1.000645 0.001475
3 2.458758e+06 1.002582 0.001477
4 2.458758e+06 1.005634 0.001482
... ... ... ...
150 2.458758e+06 1.006932 0.002319
151 2.458758e+06 1.005912 0.002316
152 2.458758e+06 1.004708 0.002313
153 2.458758e+06 1.012512 0.002425
154 2.458758e+06 1.006653 0.002411

155 rows × 3 columns

exofop_data[0].cbvs().to_pandas()
AIRMASS FWHM_T1 Sky/Pixel_T1 X(IJ)_T1 Y(IJ)_T1
0 1.0847 2.9442 472.4629 813.5899 462.2884
1 1.0864 3.1278 444.0818 812.7702 461.7691
2 1.0882 3.3411 450.5804 813.2566 461.4115
3 1.0918 2.9926 425.4003 813.8261 460.7488
4 1.0936 2.8176 443.6741 813.3081 460.5687
... ... ... ... ... ...
150 2.3950 3.6697 537.8053 817.3280 466.8125
151 2.4137 3.4809 537.7516 817.5697 467.7563
152 2.4337 3.6952 514.5619 818.7373 468.5323
153 2.4531 4.0062 529.9705 817.5272 469.4299
154 2.5160 4.0115 545.7678 817.2219 472.1663

155 rows × 5 columns

Finally, we can update the overview table and now see that both datasets contain the 5 CBVs we selected above.

exofop_data.update_info_df()
exofop_data.info_df
Date BJD Observatory Filter target pp Measurement file name full_file_name texp Duration Efficiency cbvs ncols
5903 2019-10-01 2458757 PEST Rc TIC254113311 1 data.txt TIC254113311-01_20191001_PEST_Rc_data.txt 23.852154 3.968364 75.4 5 15

Save standardised data#

There are two options to save the standardised data:

  • exofop_data.save(save_dir: str): Saves all tables individually to save_dir with tbl.name as file name. Additionally, the info_df is saved to the same folder. LightCurveTableList.load(cls, load_dir: str) can then be used to reload the data.

  • exofop.save_to_pickle(file_path: str): Saves the object to a pickle file, which can be reloaded using LightCurveTableList.load_from_pickle(file_path: str)

exofop_data.save(save_dir=None)
print("Saved to",  exofop_data.default_save_dir)
Saved to ./TOI_1130/output

We can later reload all files again using

LightCurveTableList.load(load_dir=exofop_data.default_save_dir)
LightCurveTableList(['5903'])

Conclusion#

In this notebook, we’ve shown how to extract ExoFOP light curves from a set of folders and brought the measurement files of each observation into a standardised format.