Using MonoTools.lightcurve

The fit and search modules take in, as default, lightcurve.lc objects. Here we’ll give a very quick overview of how to use them.

The simplest way to use a lightcurve object, is simply to load externally your data, and make a lightcurve object from them, by passing time, flux and flux_err arrays.

First let’s import the module and other useful stuff:

[1]:

%load_ext autoreload
%autoreload 2
from MonoTools import lightcurve as lc
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

/Users/hosborn/miniconda3/envs/newxo/lib/python3.9/site-packages/theano/configparser.py:255: UserWarning: Theano does not recognise this flag: gcc.cxxflags
  warnings.warn(f"Theano does not recognise this flag: {key}")
Matplotlib created a temporary config/cache directory at /var/folders/p0/tmr0j01x4jb3qrbc5b0gnxcw0000gn/T/matplotlib-hmgdrkv4 because the default path (/Users/hosborn/.matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.

Loading a TESS, K2 or Kepler lightcurve

The easiest way to use MonoTools.lightcurve is to not bother with the load_lc function defined below, but instead to jump straight to extracting a multilc class from a search of Kepler, K2 and TESS data.

This is possible simply by passing an id and mission argument to the multilc class.

It will search the given mission, but if do_search is True, it will also search other missions for corresponding data using the multilc.get_all_lightcurves() function. This function is assisted by including a radec in astropy.coordinates.SkyCoord format as an argument when initialising the multilc class (as otherwise, we have to access the input catalogue and then use the ra & dec to search for extra data).

Set load=False if you do not want to load from file.

TESS Example: Nu2 Lupi

[12]:

tesslc=lc.multilc(136916387, 'tess')

Getting all IDs
Accessing online catalogues to match ID to RA/Dec (may be slow) mission= tess
          ID         ra        dec     pmRA    pmDEC    Tmag objType  typeSrc  \
0  136916387  230.45063 -48.317628 -1624.05 -276.024  5.0494    STAR  tmgaia2

    version    HIP  ... splists      e_RA     e_Dec     RA_orig   Dec_orig  \
0  20190415  75181  ...     NaN  2.864235  2.064396  230.440115 -48.318817

  e_RA_orig e_Dec_orig raddflag wdflag  dstArcSec
0  0.068131   0.057039        1      0        0.0

[1 rows x 125 columns]
Empty TableList
Sector 48 not (yet) found on MAST | RESPONCE:404
Sector 49 not (yet) found on MAST | RESPONCE:404

If other missions have been searched, the info should end up in multilc.all_ids.

You can see that the TESS input catalog has been searched and stored at tesslc.all_ids['tess']['data'], which is useful.

[13]:

#In this case there are no extra lightcurves from e.g. Kepler, K2 or CoRoT:
tesslc.all_ids

[13]:

{'tess': {'id': 136916387,
  'data': ID            136916387
  ra            230.45063
  dec          -48.317628
  pmRA           -1624.05
  pmDEC          -276.024
                  ...
  e_RA_orig      0.068131
  e_Dec_orig     0.057039
  raddflag              1
  wdflag                0
  dstArcSec           0.0
  Name: 0, Length: 125, dtype: object,
  'search': [12, 38]},
 'k2': {},
 'kepler': {},
 'corot': {}}

Plotting

Now you have a lightcurve, then you can plot using lc.plot(). For high-cadence data (<30 minutes), it will perform binning and plot these are stronger points.

Useful arguments include: - plot_rows How many rows would you like to split the data over? - timeseries: If you want to plot more than just flux, then include those timeseries here (e.g. bg_flux, raw_flux, flux_flat etc). These other timeseries can be offset using the y_offset argument - ylim and xlim: If you don’t want to plot the whole x or y span, then cut them here. - plot_ephem: You can easily mark where the transits of planets are by including a dictionary in the form plot_ephem={'planet_name':{'p':10.0,'t0':2300.0,'tdur':0.4}}. Transits are then marked with triangles

[14]:

tesslc.plot(plot_rows=2,plot_ephem={'b':{'p':11.57797,'t0':1944.3726,'tdur':3.9/24},
                                    'c':{'p':27.59221,'t0':1954.4099,'tdur':3.2/24}})

[16]:

tesslc.flatten(knot_dist=0.6,ephems={'b':{'p':11.57797,'t0':1944.3726,'tdur':3.9/24},
                                    'c':{'p':27.59221,'t0':1954.4099,'tdur':3.2/24}})
tesslc.plot(timeseries=['flux_flat'])

TESS + K2 Example - K2-167

As the lightcurve module cross-checked with other catalogues, it needs and RA & Dec, so unless you specify an RA & Dec along with the initialisation, it goes searching for one from the input catalogue (which can be slow).

[63]:

from astropy.coordinates import SkyCoord,FK5
from astropy import units as u

k2lc=lc.multilc(248777106,'k2',radec=SkyCoord("10h18m41.06s +10d07m43.85s",unit=(u.hourangle,u.deg)))

Getting all IDs
DEBUG [urllib3.connectionpool._new_conn()]: Starting new HTTPS connection (1): exofop.ipac.caltech.edu:443
DEBUG [urllib3.connectionpool._make_request()]: https://exofop.ipac.caltech.edu:443 "GET /k2/download_target.php?id=248777106 HTTP/1.1" 200 6893
DEBUG [urllib3.connectionpool._new_conn()]: Starting new HTTPS connection (1): mast.stsci.edu:443
DEBUG [urllib3.connectionpool._make_request()]: https://mast.stsci.edu:443 "GET /api/v0/invoke?request=%7B%22service%22%3A%20%22Mast.Name.Lookup%22%2C%20%22params%22%3A%20%7B%22input%22%3A%20%22EPIC%20248777106%22%2C%20%22format%22%3A%20%22json%22%7D%7D HTTP/1.1" 200 None
DEBUG [urllib3.connectionpool._make_request()]: https://mast.stsci.edu:443 "POST /api/v0/invoke HTTP/1.1" 200 None
DEBUG [urllib3.connectionpool._make_request()]: https://mast.stsci.edu:443 "POST /api/v0/invoke HTTP/1.1" 200 None
Sector 48 not (yet) found on MAST | RESPONCE:404
Sector 49 not (yet) found on MAST | RESPONCE:404
INFO  [everest.user.DownloadFile()]: Found cached file.
INFO  [everest.user.load_fits()]: Loading FITS file for 248777106.

We can see which sectors/campaigns (and with which detrending) are loaded using the cadence_list:

[62]:

k2lc.cadence_list

[62]:

['ts_120_pdc_45', 'ts_120_pdc_46', 'k2_1800_ev_14']

[61]:

k2lc.plot()

TESS + Kepler Example - Kepler-25

Once again, we need the KIC (4349452 in this case).

Accessing Kepler data can take a bit more time (as there’s 4-years worth!).

[76]:

keplc=lc.multilc(4349452,'kepler')

Getting all IDs
DEBUG [urllib3.connectionpool._new_conn()]: Starting new HTTP connection (1): vizier.u-strasbg.fr:80
DEBUG [urllib3.connectionpool._make_request()]: http://vizier.u-strasbg.fr:80 "POST /viz-bin/votable HTTP/1.1" 200 660
DEBUG [urllib3.connectionpool._new_conn()]: Starting new HTTP connection (1): vizier.u-strasbg.fr:80
DEBUG [urllib3.connectionpool._make_request()]: http://vizier.u-strasbg.fr:80 "POST /viz-bin/votable HTTP/1.1" 200 689

[77]:

keplc.cadence_list

[77]:

['ts_120_pdc_14',
 'ts_120_pdc_40',
 'ts_120_pdc_41',
 'k1_1800_pdc_0',
 'k1_1800_pdc_1',
 'k1_59_pdc_2',
 'k1_1800_pdc_3',
 'k1_1800_pdc_4',
 'k1_59_pdc_5',
 'k1_59_pdc_6',
 'k1_59_pdc_7',
 'k1_59_pdc_8',
 'k1_59_pdc_9',
 'k1_59_pdc_10',
 'k1_59_pdc_11',
 'k1_59_pdc_12',
 'k1_59_pdc_13',
 'k1_59_pdc_14',
 'k1_59_pdc_15',
 'k1_59_pdc_16',
 'k1_59_pdc_17']

[90]:

5703.42004-7000

[90]:

-1296.57996

Given all this data, plotting will also take some time…

[91]:

keplc.plot(plot_ephem={'b':{'p':6.2385,'t0':5703.42004-7000,'tdur':0.1},
                            'c':{'p':12.7204,'t0':5711.15013-7000,'dur':0.15}},
           cadences=['k1_1800_pdc_0', 'k1_1800_pdc_1', 'k1_59_pdc_2', 'k1_1800_pdc_3', 'k1_1800_pdc_4', 'k1_59_pdc_5'])

[92]:

keplc.plot(plot_ephem={'b':{'p':6.2385,'t0':5703.42004-7000,'tdur':0.1},
                       'c':{'p':12.7204,'t0':5711.15013-7000,'dur':0.15}},
           cadences=['k1_1800_pdc_6', 'k1_1800_pdc_7', 'k1_59_pdc_8', 'k1_1800_pdc_9', 'k1_1800_pdc_10', 'k1_59_pdc_11'])

[93]:

keplc.plot(plot_ephem={'b':{'p':6.2385,'t0':5703.42004-7000,'tdur':0.1},
                       'c':{'p':12.7204,'t0':5711.15013-7000,'dur':0.15}},
           cadences=['k1_1800_pdc_12', 'k1_1800_pdc_13', 'k1_59_pdc_14', 'k1_1800_pdc_15', 'k1_1800_pdc_16', 'k1_59_pdc_17'],)

[94]:

keplc.plot(plot_ephem={'b':{'p':6.2385,'t0':5703.42004-7000,'tdur':0.1},
                       'c':{'p':12.7204,'t0':5711.15013-7000,'dur':0.15}},
           cadences=['ts_120_pdc_14', 'ts_120_pdc_40', 'ts_120_pdc_41'])

Creating an interactive plot:

We can also create an interactivet plot using the command interactive_plot. This creates a bokeh stand-alone html page:

[20]:

tesslc.interactive_plot(timeseries=['flux','flux_flat'],yoffset=0.8, plot_height=900,overwrite=True,ylim=(-1,2),
                        plot_ephem={'b':{'p':11.57797,'t0':1944.3726,'tdur':3.9/24},
                                    'c':{'p':27.59221,'t0':1954.4099,'tdur':3.2/24}})

binning: ['flux', 'flux_flat']
[nan, nan, nan, 0.0062, 0.0914699, nan, nan, 0.021, 0.03, 0.226, 0.0015675, 110.683, 0.0756251, 0.1, 0.0451056, 0.127035, 0.189496, 0.01985]

Initialising a simple lightcurve from pre-loaded data

The above lightcurve methods create a MonoTools.lightcurve.multilc method. But this inherits the base MonoTools.lightcurve.lc class and is built from combinations of individual lc objects.

The base class MonoTools.lightcurve.lc can also be simply loaded using time/flux/flux_err arrays.

Here’s some important info for calling lc.load_lc().

You must specify the “flux system” underlying the lightcurve flux - ppm: normalised lightcurve with median at 0.0 with units in parts per million - ppt: normalised lightcurve with median at 0.0 with units in parts per thousand (this is the default) - norm0: normalised lightcurve with median at 0.0 with units as a ratio (0->1) - norm1: normalised lightcurve with median at 1.0 with units as a ratio (0->1) - elec: un-normalised lightcurve with units of pure electrons where the median is proportional to stellar magnitude

The flux system can be changed using lc.change_flux_system(new_flx_system) (and is automatically made uniform when stacking lc objects).

As with flx_system, we need to know the base system used for the lightcurve time. The default is the TESS system using jd_base=2457000

The arguments identify the source of the lightcurve, and are necessary if you plan to stack multiple lightcurves together. - mission: the photometric mission providing the lightcurve in lowercase (e.g. k2, tess, kepler, etc) - sect: the sector, campaign or quarter within that mission the data comes from - src: a identifier for the source of the lightcurve detrending. e.g. pdc (most kepler quarters), vand, ev or pdc (for k2 campaigns), pdc, qlp or el (for tess sectors)

Here is a simple example of loading data:

[94]:

fakeflux=np.random.normal(7*np.sin(np.linspace(-4,5,500)),5,500)
fakeflux[np.random.choice(500,5)]*=5

simplelc1 = lc.lc()
simplelc1.load_lc(time=np.arange(2000,2050,0.1),
                  fluxes=fakeflux,
                  flux_errs=np.random.normal(0.01,0.0005,500),
                  flx_system='ppt',jd_base=2457000,
                  mission='tess',sect='99',src='test')
simplelc1.plot()

If we have multiple types of flux (for example, ‘raw’ and ‘detrended’ fluxes, background flux, etc) then we can make our input flux and flux_err arguments dictionaries:

[79]:

fakeflux2=np.random.normal(7*np.sin(np.linspace(5,20,500)),5,500)
fakeflux2[np.random.choice(500,5)]*=5

simplelc2 = lc.lc()
simplelc2.load_lc(time=np.arange(2050,2100,0.1),
                  fluxes={'flux':fakeflux2,'bg_flux':np.random.normal(40,2.5,500)},
                  flux_errs={'flux_err':np.random.normal(0.01,0.0005,500),'bg_flux_err':np.random.normal(2.5,0.01,500)},
                  flx_system='ppt',jd_base=2457000,
                  mission='tess',sect='100',src='test')

#Plotting both of our flux arrays:
simplelc2.plot(timeseries=['flux','bg_flux'],ylim=(-20,60))

One thing to note is the cadence array that is created for these times. This is effectively the unique identifier for the data when stacking lightcurves with the format:

[telescope ID k1/k2/co/te/ch]_[cadence in secs]_[pipeline source]_[Sector/Q/camp]

[8]:

print(simplelc1.cadence[0])

[8]:

'ts_8600_test_99'

Masking, binning & flattening

The lightcurve.lc class is built to allow quick masking, binning & flattening of the data.

Using `lc.make_mask()`

This function calls the make_fluxmask function to mask anomalous points.

Some useful arguments when making a mask: * flux_arr_name: Which timeseries of mask * cut_all_anom_lim: The limit in sigma above which we cut all anomalies (default is 5) * end_of_orbit: Whether to mask end-of-orbit flux jumps (often seen in TESS). Default is True * use_flat: Whether to use the flattened flux array to perform masking. Default is False * mask_islands: Whether to mask small “islands” of flux (defined as <0.5d long segments >0.5d from other data). Default is False * input_mask & in_transit: Two already-determined masks which allow inclusion of custom masks and make sure in-transit points are not cut. * extreme_anom_limit: An extreme flux limit above/below which we cut points (default:0.25 meaning flux must be >25% median flux & <400%)

In the case of a multilc (i.e. a lightcurve stacked from multiple sectors/sources), make_cadmask is also called, which masks cadences in the lc.mask_cadences list. This is useful when there are multiple lightcurve sources for the same data (i.e. qlp & pdc), but we only want to consider one as part of the defauly lightcurve as it is higher-quality.

(Note that this is performed by default when accessing a mission lightcurve from lc.multilc)

[37]:

simplelc1.make_mask()

plt.plot(simplelc1.time,simplelc1.flux,'.')
plt.plot(simplelc1.time[~simplelc1.mask],simplelc1.flux[~simplelc1.mask],'xr')

[37]:

[<matplotlib.lines.Line2D at 0x7fd97dd45e20>]

Using `lc.flatten()`

This is the in-built flattening function. It has two ways of peforming the flattening which can be selected with the flattype argument: - bspline (default): fits smooth splines while iterating away anomalies/transits/etc - polystep: uses out-of-box polynomial fits to smooth data without influencing transit depth

To specify which timeseries to flatten, use the timeseries (default is only the flux array)

To tweak the flattening function, the knot_dist,maxiter,sigmaclip,stepsize, reflect, & polydegree arguments can be used

transit_mask is a key input to the flattten function if transits have been identifed in the timeseries, as this masks the transits from being removed.

lc.flatten() will add a new timeseries to the lc class with the suffix _flat, and in both cases, the best-fit splines are saved to _spline

[48]:

simplelc1.flatten()
simplelc1.plot(timeseries=['flux','flux_spline','flux_flat'],yoffset=12.5,ylim=(-25,50))

Using `lc.bin()`

This is the in-built binning function.

Key arguments: * timeseries: As with plot() and flatten(), you can include the timeseries to be binned * binsize, this determines the size of the bines and defaults to 30mins (e.g. 1/48) * use_masked and extramask affect the masks used when binning.

[61]:

simplelc1.bin(binsize=1.5)

[63]:

#simplelc1.plot() won't show the bins, as the cadence is too large, so let's go back to a standard plotting:

plt.plot(simplelc1.time[simplelc1.mask],simplelc1.flux[simplelc1.mask],'.',markersize=2,alpha=0.5)
plt.errorbar(simplelc1.bin_time,simplelc1.bin_flux,yerr=simplelc1.bin_flux_err,fmt='o')

[63]:

<ErrorbarContainer object of 3 artists>

More useful functions

Stacking `lc` objects

Under the hood of the multilc class is the ability to stack lc objects.

This is done by making a new multilc object (which can be empty) and calling multilc.stack([newlcs]) with a list of lc objects as arguments. For example:

[80]:

testlc = lc.multilc(123456789,'tess',do_search=False) #we need do_search =False here, as we don't actually want to search for other lightcurves for the star TIC123456789
testlc.stack([simplelc1,simplelc2])
testlc.plot(savepng=False)

Saving

Lightcurves are saved, by default, in the same ID-specific folder that other MonoTools modules access, and which can be modified using the $MONOTOOLSPATH system environment variable but is by default in the MonoTools/data path. So, in this case, it would be in the MonoTools/data/TIC00123456789/ folder.

There are two ways to save the lightcurves found by lc and multilc:

This retainins most metadata, although we delete binned and flattened arrays (which are re-derivable from the other timeseries).

[ ]:

tools.MonoData_savepath

[87]:

testlc.save()

This does not retain metadata and only stores those timeseries with the same length as the time array (i.e. time, fluxes, flux_errs, cadence, mask, etc):

[88]:

testlc.save_csv()

[93]:

#Let's see what got stored:
import pandas as pd
import os
from MonoTools.MonoTools import tools

df=pd.read_csv(os.path.join(tools.MonoData_savepath,"TIC00123456789","TIC00123456789_lc.csv"),index_col=0)
df

[93]:

	bg_flux	bg_flux_err	cadence	flux	flux_err	flux_mask	flux_spline	mask	time
0	NaN	NaN	ts_8600_test_99	11.022208	0.010108	True	7.669597	True	2459000.0
1	NaN	NaN	ts_8600_test_99	0.809182	0.009523	True	7.631497	True	2459000.1
2	NaN	NaN	ts_8600_test_99	12.440341	0.010081	True	7.514247	True	2459000.2
3	NaN	NaN	ts_8600_test_99	3.618393	0.008787	True	7.329670	True	2459000.3
4	NaN	NaN	ts_8600_test_99	10.173530	0.010226	True	7.089586	True	2459000.4
...	...	...	...	...	...	...	...	...	...
995	42.598741	2.515070	ts_8600_test_100	0.064905	0.009867	True	NaN	True	2459099.5
996	36.871784	2.505943	ts_8600_test_100	6.688908	0.009028	True	NaN	True	2459099.6
997	43.892122	2.487309	ts_8600_test_100	6.159992	0.010139	True	NaN	True	2459099.7
998	39.806309	2.503166	ts_8600_test_100	2.217753	0.010212	True	NaN	True	2459099.8
999	41.508409	2.518718	ts_8600_test_100	9.515834	0.009651	True	NaN	True	2459099.9

1000 rows × 9 columns

Loading

If you have already called multilc with a given id/mission pair, then the data is usually automatically stored in the MonoTools datapath. The next time you call multilc, it will check this path for existing data and load it using multilc.load_pickle(). This can occasionally cause bugs, so often it’s good to check if load=False fixes problems with initialising multilc objects.