Using Computational Models to Uncover the Parameters of Three Kepler Binaries: KIC 5957123, KIC 8314879, and KIC 10727668*

Padraic E. Odesse; Catherine Lovekin

doi:10.3847/1538-4357/ac4156

1. Introduction

Eclipsing binary (EB) star systems are ideal candidates for testing theoretical stellar models. Using photometric observations, the parameters of EBs can be measured to a high degree of precision, which enables us to test theories and make predictions with tight constraints and greater certainty (Southworth 2019). Past research into stellar core convection suggests that there must be some overshoot beyond the theoretical convective boundary (Lovekin & Guzik 2017; Pedersen et al. 2018; Johnston 2021), but the behavior of overshoot as a function of stellar characteristics such as mass, age, or metallicity, has not yet been fully characterized. Since overshoot can be studied by examining pulsation modes through asteroseismology, any star for which we can perform a pulsation analysis provides insight into how overshoot varies. Therefore, identifying a binary system in which pulsation modes are present would provide a valuable target for accurately probing stellar core convection.

Recent exoplanet surveys such as Kepler (Borucki et al. 2004) and TESS (Ricker 2015) measure variations in stellar flux with very high precision, and as a result are also used to study variable stars. Kepler data are open-access, and over 2800 binary stars have been identified in the Kepler catalog (Kirk et al. 2016). If we could identify binary stars with pulsating components from the Kepler catalog and model them to obtain parameter constraints and extract light-curve residuals, then we could better probe the behavior of convective core overshoot.

Generally, radial velocity measurements are necessary to fully constrain the parameters of eclipsing binary systems (Southworth 2019). However, radial velocities are derived from stellar spectra, which Kepler does not obtain. Obtaining spectra for Kepler binaries through ground-based observations is difficult; many of these binaries are faint (14th to 16th magnitude) and so require long observation times. For instance, Rappaport et al. (2015) report 15–30 s exposure times at 11 epochs over five months to analyze one Kepler binary. For this research, we did not have access to an adequate spectrograph or telescope to conduct our own observations of these systems. A method of constraining stellar parameters using only time-series photometry would be undoubtedly useful, since the observing time required to follow up all of these systems with ground-based spectroscopic observations is prohibitive. In this paper, we hope to evaluate the feasibility of constraining binary system and stellar parameters without stellar spectra, instead using detailed stellar models and marginalizing over appropriate parameters with numerical methods.

Theoretical binary models are quite adept at identifying the light curve produced by a binary system with a given set of physical characteristics (Prša & Zwitter 2005). The inverse of this problem—identifying the parameter values that best describe a binary system that produced a given light curve—is more difficult to solve, but we can attempt to do so by applying various numerical methods to our theoretical binary models (Conroy et al. 2020). Generally, this involves creating an ensemble of binary models with varying parameter values, generating a light curve for each model, and comparing it to the observed photometric data using a χ² evaluation. We can begin to estimate the physical parameters of this binary by identifying which model parameters yield synthetic observables that best match the photometric data. Markov Chain Monte Carlo (MCMC) analysis is used to infer which distribution of parameter values best describes the observed photometric data.

MCMC methods sample the posterior probability of parameter combinations. That is to say, these methods evaluate the probability that a given parameter combination actually generates the observed data. Performing MCMC ensemble sampling yields an estimate of the posterior density function for the parameter space, which provide regions of confidence around our parameter estimates. Although the posterior distributions do not directly correspond to parameter uncertainties (see Hogg et al. 2010 for a mathematical overview, or Maxted et al. 2020, who acknowledge this issue in the context of EBs), the results of MCMC sampling provide insight into parameter correlations and, if used correctly, are a key step toward identifying expectation values and uncertainties for our parameter estimates.

In this work, we investigate three binary systems from the Kepler Input Catalog: KIC 5957123, KIC 8314879, and KIC 10727668. The Fourier spectra of KIC 8314879 and 10727668 show additional frequencies that suggest these are pulsating stars, as well as EBs. Section 2 describes the data obtained from the Kepler mission, and the steps taken to prepare these data for analysis. Section 3 describes the computational models of each binary, and Section 4 describes the numerical methods used to evaluate our parameter fits. Section 5 presents the resulting parameter estimates with uncertainties, followed by an evaluation of our models and their limitations in Section 6. We provide a brief summary of our conclusions in Section 7.

2. Data

2.1. The Kepler Input Catalog

The three systems discussed in this paper were observed by the Kepler spacecraft with a cadence of 29.4 minutes. Four quarters of data are available for each of the three binaries, culminating in 316.8 days' worth of observations and roughly 13,500 data points from which a light curve was constructed for each system.

2.2. Flux Calibration

To allow for proper modeling with PHOEBE, it was necessary to calibrate the differential photometry of the Kepler data. Preflight estimates suggest that flux from a 12th magnitude solar-type star incident on one of Kepler's CCDs releases a current of 1.74 × 10⁵ electrons per second (Farmer et al. 2013).

The Kepler magnitude system is based on the AB magnitude system (Brown et al. 2011). Equation (4) from Tonry et al. (2012) may be used to compute the magnitude (m_AB) of the Sun through the Kepler passband:

$\begin{eqnarray}&&{m}_{\mathrm{AB}}=-2.5{\mathrm{log}}_{10}\left(\displaystyle \frac{\int {f}_{\nu }{\left(h\nu \right)}^{-1}A(\nu )d\nu }{\int 3631\mathrm{Jy}{\left(h\nu \right)}^{-1}A(\nu )d\nu }\right)\end{eqnarray} \tag{ 1 }$

where A(ν) is the Kepler passband transmission function taken from the Kepler Instrument Handbook supplemental (Van Cleve & Caldwell 2009), and f_ν is the spectral flux density of the Sun, obtained from solar spectral irradiance data collected by the Solar Radiation and Climate Experiment (SORCE; Woods et al. 2020).³

Using Equation (1), the magnitude of the Sun in the Kepler passband was computed to be −26.86. By scaling the flux density by a factor of k ≈ 2.8646 × 10⁻¹⁶, Equation (1) will produce a magnitude of 12. So, to obtain the associated flux of a 12th magnitude solar-type star, the total flux of the scaled spectral flux density through the Kepler passband was computed using Equation (2):

$\begin{eqnarray}&&F=k\int {f}_{\nu }A(\nu )d\nu .\end{eqnarray} \tag{ 2 }$

This computation suggests that a solar-type star with a Kepler magnitude of 12 produces a flux of 1.155 × 10⁻¹³ W m⁻². Finally, using the preflight estimate from Farmer et al. (2013), a conversion factor (Equation (3)) was derived to relate flux in electrons per second to flux in Watts per square meter for stars observed in the Kepler passband:

$\begin{eqnarray}&&1\ {{\rm{e}}}^{-}\,{{\rm{s}}}^{-1}=6.640\times {10}^{-19}\ {\rm{W}}\ {{\rm{m}}}^{-2}.\end{eqnarray} \tag{ 3 }$

In our models, luminosities are computed in absolute units based on the mesh intensities, the distance, and third light of our systems. We can therefore constrain the passband luminosity of our models by sampling over temperature, radius, distance, and gravity brightening and limb darkening coefficients, and generating all observables in the Kepler passband.

In PHOEBE, third light produces an additive effect on the light curve, which manifests as a vertical shift upwards in flux. Upon inspection of the Kepler Full Frame Images, the stars were found to be widely separated from neighboring stars, and there was no indication that any specific background sources needed to be considered. Since we could not obtain any prior information to constrain third light, we chose a value of 0 for the third light in each of our binary systems. If this assumption is incorrect, it would lead to an overestimate of the effective temperature and radius of each star.

2.3. Polynomial Fitting to Detrend Kepler Data

Data from the Kepler spacecraft often exhibit significant systematic errors resulting from temperature gradients across the spacecraft, incident cosmic rays, and the planned quarterly rotation of the instrument (Stumpe et al. 2012). Such errors can obscure the intrinsic variability of stars and introduce artificial trends that complicate any comparisons between the data and synthetic models.

Although presearch data conditioning modules are in place within the Kepler data pipeline to correct for these errors (Stumpe et al. 2012), significant trends and discontinuities persisted in the photometric data acquired from the Barbara A. Mikulski Archive for Space Telescopes (MAST). To reduce the significance of such effects, the data were fitted with a set of fifth-order Legendre polynomials connected at each data gap in the light curve, as described in Section 2.6 of Maxted et al. (2020) and shown in Figure 1 for KIC 10727668. For KIC 5957123 and KIC 8314879, linear regressions better characterized the trends, and were used instead of the Legendre polynomials to treat the data. Once all trends were flattened, the data were scaled to the median of all quarters.

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Left: fifth-order Legendre polynomials fitted to the second quarter of KIC 10727668 raw light-curve data. Right: the final processed data obtained by subtracting off these polynomials and averaging the median flux of all quarters. Color alternates to indicate data gaps, where each subset of the data is fitted by a separate polynomial curve (solid line).
Download figure:
Standard image High-resolution image

The flux calibration was applied before adjusting the data to match the median of all quarters. By subtracting the fifth-order Legendre polynomials or linear regressions, we preserve light trends intrinsic to the binary that inform our MCMC sampling, such as ellipsoidal variation, and assume that any additional light trends come from outside the system, namely from the aforementioned errors in the Kepler satellite.

2.4. Interstellar Extinction

Updated foreground-reddening estimates based on work by Schlafly & Finkbeiner (2011) were obtained from the NASA Infrared Science Archive (IRSA)⁴ and used to estimate line-of-sight interstellar extinction values for each of the three binary systems. All three binaries are in a nearby region of our galaxy, so a value of R_V = 3.1 was assumed. Given A_V and R_V, our modeling software (Section 3.1) can process extinction on the fly when computing the forward model.

2.5. The Gaia Mission

Parallax measurements (π'') from the third Gaia data release (Brown et al. 2020) were used to further constrain the binary models. Parallax values for these three star systems are detailed in Table 1. These distance values, alongside the calibrated fluxes described in Section 2.2, constrain the passband luminosity of each binary.

Table 1. Gaia Parallax Measurements

Star	π''	d (pc)
KIC 5957123	0.27 ± 0.02	3700 ± 300
KIC 8314879	0.13 ± 0.02	7700 ± 1400
KIC 10727668	0.18 ± 0.03	5600 ± 990

Download table as: ASCII Typeset image

3. Models

A model was created for each of the three binary systems from which synthetic observables could be computed and compared to the observed photometry. Initial parameter fits were established as described in Section 3.3. Then, MCMC methods were used to explore the parameter space and uncover posterior distributions for each system.

3.1. PHOEBE

The models analyzed in this research were created using the PHysics Of Eclipsing BinariEs (PHOEBE)⁵ stellar modeling package. Originally an adaptation of the 1971 Wilson−Devinney binary modeling code (Wilson & Devinney 1971; Prša & Zwitter 2005), PHOEBE has since been developed into a standalone binary analysis package that supports a number of inverse problem-solving techniques and data analysis tools (Prša et al. 2016; Conroy et al. 2020). This research was performed using PHOEBE version 2.3.41. To aid in reproducibility, we will strive to record the specific settings used for all models and analysis tools throughout this work. For more comprehensive details, a Zenodo repository containing all PHOEBE models generated in this research is available at 10.5281/zenodo.5711245. The data are also available in GitHub.⁶

For each PHOEBE model, synthetic light curves were computed through the Kepler :mean passband, and the passband luminosity mode was set to absolute. Distance estimates from Gaia and extinction estimates from the IRSA were then provided to complete the flux calibration. In most systems, gravity darkening and reflection coefficients were set to 0.9 and 1.0, respectively, as these values are characteristic of stars with radiative atmospheres. However, both of these coefficients were sampled in the later MCMC analysis, as it was not always clear which specific value was acceptable given the temperature of the star. This is especially prevalent when modeling stars in the 6000–8000 K temperature range, for which it is not always obvious if a convective atmosphere or radiative atmosphere is more appropriate.

Synthetic observables were generally computed at 100 time steps over one full orbit. This synthetic light curve was then extrapolated to the time series of the full data set. The eclipses of KIC 10727668 were more densely sampled, so this system's forward model was computed at a total of 119 time steps to better resolve the model during eclipses.

Limb darkening lookup tables were used to interpolate the limb darkening coefficients for most stars with PHOEBE's logarithmic interpolation law. Generally, ck2004 model atmospheres (Castelli & Kurucz 2004) were used when determining the emergent passband intensity of most stars. The only exception was the primary star in the model of KIC 10727668, which appears to be a white dwarf star. For this star, we supplied limb darkening coefficients manually from Claret et al. (2020), and used a blackbody atmosphere model rather than ck2004 model atmospheres, as support for white dwarf atmospheres in PHOEBE is still under development.

For full data set analysis, the noise-nuisance parameter σ_lnf was introduced, which is detailed further in Section 4.2. Most MCMC runs suggested a value for σ_lnf of between −6 and −7. The most reliable method of identifying an appropriate range for σ_lnf seems to be through repeated tests of different values.

3.2. Procedure for Analysis

First, solvers bundled with PHOEBE were used to establish some fundamental parameters of the binaries. The light-curve periodogram was applied to the full data set to estimate the orbital period. Then, the light-curve geometry solver was used to establish the time of superior conjunction (t_0,supconj), the orbital eccentricity (e), and the argument of periastron (ω) of the system. It should be noted that this estimator only works with EBs, so these values were fit manually for the ellipsoidal variable system KIC 5957123.

Any nonzero value for the orbital eccentricity greatly increases computation time, but in many cases its effect on the light-curve geometry could not be ignored. In further analysis, the parameters $e\sin \omega$ and $e\cos \omega$ were sampled over rather than the eccentricity and the argument of periastron, since e and ω are highly correlated, whereas $e\sin \omega$ and $e\cos \omega$ are more orthogonal.

Next, parameter values were manipulated manually to obtain an initial fit, followed by use of PHOEBE's Nelder−Mead optimizer to improve on these manual estimates. This greatly helps reduce the burn-in time of the MCMC sampler, and allowed us to define more reasonable initializing distributions.

Finally, MCMC sampling via the emcee Python package (Foreman-Mackey et al. 2013) was used to obtain estimates of the posterior distributions on all of our parameters. Initially, light-curve data were phase-folded and averaged into 500 bins, and uncertainty on each bin was assigned based on the standard deviation of points in that bin. An initial MCMC run was performed on these binned light-curve data to encourage fast convergence and obtain information on the expected shape of the posterior distribution. Once adequate sampling in the binned data set run was complete, the resulting posterior distributions were used as initializing distributions for MCMC analysis of the full data set, from which more robust posteriors may be derived. This follow-up run was necessary, as analysis of a binned light curve generally does not yield appropriate uncertainty estimates. This issue is discussed in more detail in Sections 5.1 and 6.5.2.

3.3. Initial Parameters

Initial solutions were obtained for each system using PHOEBE solvers, manual fitting, and prepublished estimates (where available). We cannot generalize the procedure used to obtain initial solutions, as each system required unique treatment. Instead, we will detail how we arrived at an initial model for each system. Initial parameters for each model are displayed in Table 2, and the geometry of each system at the time of secondary eclipse is shown in the mesh plots of Figure 2.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Mesh plots illustrating the geometry of each system. All plots are displayed at phase ϕ = 0.5, corresponding to the time of secondary eclipse when the primary star eclipses the secondary. KIC 5957123 does not eclipse; KIC 8314879 eclipses only partially; and the white dwarf primary of KIC 10727668 fully transits its companion.
Download figure:
Standard image High-resolution image

Table 2. Initial Model Parameters

Parameters	Symbol	KIC 5957123	KIC 8314879	KIC 10727668
System Parameters
Period	P	2.82232 days	0.87776 days	2.30592 days
Superior Conjunction	t_0,supconj	1274.621 days	1274.578 days	1275.519 days
Inclination	i	5499	7799	7942
Mass Ratio	q	1.650	0.6916	17.31
Semimajor Axis	a	11.89R_⊙	7.694R_⊙	10.67R_⊙
Temperature Ratio	$\tfrac{{T}_{\mathrm{eff},2}}{{T}_{\mathrm{eff},1}}$	0.9814	0.7662	0.6559
Radius Ratio	$\tfrac{{R}_{2}}{{R}_{1}}$	1.082	0.6838	10.96
Fractional Radii	$\tfrac{{R}_{1}+{R}_{2}}{a}$	0.3837	0.5847	0.2640
Distance	d	3715 pc	7902 pc	5325 pc
Eccentricity	e	0.01037	0.0	0.01038
Arg. of Periastron	ω	2713	00	8886

Stellar Parameters—Primary
Effective Temperature	T_eff,1	8005 K	10,194 K	13,546 K
Equivalent Radius	R_equiv,1	2.190R_⊙	2.672R_⊙	0.2354R_⊙
Mass	M₁	1.067M_⊙	4.689M_⊙	0.1673M_⊙
Gravity Brightening	Gr. Bright.	0.7578	0.9477	0.9463
Reflection Fraction	Refl. Frac.	0.7479	0.9462	0.9006

Stellar Parameters—Secondary
Effective Temperature	T_eff,2	7856 K	7811 K	8885 K
Equivalent Radius	R_equiv,2	2.370R_⊙	1.827R_⊙	2.581R_⊙
Mass	M₂	1.761M_⊙	3.243M_⊙	2.897M_⊙
Gravity Brightening	Gr. Bright.	0.7621	0.7559	0.9509
Reflection Fraction	Refl. Frac.	0.7645	0.8646	0.9037

Download table as: ASCII Typeset image

3.3.1. KIC 5957123

The light-curve geometry of this system presented several difficulties when trying to identify an initial solution. KIC 5957123 appears to be an ellipsoidal variable system with a period P ≈ 2.8223 days. Often, initial solutions identified a small semimajor axis, which caused the model to adopt low masses uncharacteristic of stars at this temperature. A more robust initial solution was eventually obtained by rearranging several constraints to manipulate sums and ratios of parameters, and manually fitting to reasonable values.

Rather than manipulate temperatures and radii directly, the sums and ratios $\tfrac{{T}_{\mathrm{eff},2}}{{T}_{\mathrm{eff},1}}$ , $\tfrac{{R}_{2}}{{R}_{1}}$ , and $\tfrac{{R}_{1}+{R}_{2}}{a}$ , alongside T_eff,1, were sampled over when solving for the initial solution. First, the semimajor axis a was scaled to a value that yielded masses greater than 1M_⊙, as temperature estimates from the KIC suggested T_eff > T_eff,⊙. To manipulate the fractional equivalent radii $\tfrac{{R}_{1}+{R}_{2}}{a}$ to fit the synthetic light-curve amplitude to the photometric data, the passband luminosity mode (pblum_mode) was temporarily switched to data set-scaled. In this mode, PHOEBE automatically adjusts the passband luminosity so that the synthetic light curve is calibrated to the observed light curve. Generally, this behavior is undesirable as it prevents us from properly constraining the luminosity of the system. In this case, however, it allowed us to fit $\tfrac{{R}_{1}+{R}_{2}}{a}$ without requiring additional calibration of the distance or primary effective temperature, both of which scale the light curve without significantly altering its shape. Then, holding $\tfrac{{R}_{1}+{R}_{2}}{a}$ constant, pblum_mode was switched back to absolute whereby the luminosity of the system is fully determined by the intensity of the stellar surfaces and the distance, and the light curve was scaled to match the observed data by manipulating T_eff,1 and a. Finally, values for $\tfrac{{T}_{\mathrm{eff},2}}{{T}_{\mathrm{eff},1}}$ , $\tfrac{{R}_{2}}{{R}_{1}}$ , $e\cos \omega$ , and $e\sin \omega$ were adjusted as appropriate to obtain an adequate initial fit. These values are displayed in Table 2.

3.3.2. KIC 8314879

The light curve for KIC 8314879 has pointed eclipses indicative of a partially eclipsing binary. Some ellipsoidal variation is apparent between eclipses, suggesting that these two stars orbit in close proximity. Initial parameter estimates were quickly derived for this system using PHOEBE's estimators and some manual fitting, followed by optimization with the Nelder−Mead optimizer. These parameter estimates are summarized in Table 2.

3.3.3. KIC 10727668

In an eclipsing binary, the deeper eclipse occurs when the hotter star is obscured. A flat-bottomed eclipse occurs if the smaller of the two stars is completely obscured. The deeper eclipse of KIC 10727668's light curve has a flat eclipse bottom, a combination indicative of a white dwarf component with a F-, A-, or B-type companion (Rappaport et al. 2015). Previous investigations of this system support this assumption (Rappaport et al. 2015; Wang et al. 2018).

Taking the primary star to be the star at superior conjunction near orbital phase ϕ = 0 (Prša 2011), we assigned parameters characteristic of a white dwarf to the primary component. These parameters are presented in Table 2. Support for white dwarf atmospheres in PHOEBE is still under development, so our primary star was computed with a blackbody atmosphere. Limb darkening coefficients were assigned manually for this star; white dwarf limb darkening coefficients were obtained from Claret et al. (2020). Based on the temperature and surface gravity (T_eff,1 = 13,526 K, $\mathrm{log}{g}_{1}=4.95$ ) of the white dwarf in our initial model, we chose bolometric limb darkening coefficients of [0.4885 0.3903].⁷ These are the logarithmic limb darkening coefficients e and f, respectively, for the Kepler passband.

4. Markov Chain Monte Carlo

Generally, both radial velocity data and photometric data are required to constrain the full physical properties of an eclipsing binary, with radial velocities constraining stellar masses, radii, effective temperatures, and the orbital semimajor axis, as detailed in Table 1 of Southworth (2019). As there remain several degeneracies we cannot eliminate without spectroscopic data, any solution we propose is bound to have a high degree of uncertainty. To account for this degeneracy, we include these unconstrained parameters in our MCMC analysis. By drawing upon unconstrained parameters, we obtain a more realistic sampling of the parameter space.

We initialized our MCMC sampler with 64 walkers, a value chosen to simplify parallelization across 64 CPUs. These walkers were initialized at positions drawn from a Gaussian distribution around our initial parameter estimates, the width of which was tuned based on the sensitivity of each system's light curve to different parameter values. To achieve fair and adequate sampling, the ideal number of iterations is over 100 times the autocorrelation time for each parameter. Given the autocorrelation times characteristic of our models, this would mean processing between 10,000 and 100,000 iterations.

Generally, we sampled up to 15 parameters as listed in Table 3. The widths presented for Gaussian distribution types correspond to 1σ widths. Meanwhile, for uniform distributions, the range of values covered by that distribution is displayed.

Table 3. Example Initializing Distribution

Parameter	Width	Distribution
i	20	Gaussian
q	0.1	Gaussian
a	0.1R_⊙	Gaussian
T_eff,1	100 K	Gaussian
$\tfrac{{T}_{2}}{{T}_{1}}$	0.05	Gaussian
$\tfrac{{R}_{2}}{{R}_{1}}$	0.05	Gaussian
$\tfrac{{R}_{1}+{R}_{2}}{a}$	0.025	Gaussian
$e\sin \omega$	0.01	Gaussian
$e\cos \omega$	0.01	Gaussian
d	160 pc	Gaussian
Gr. Bright._1,2	[0.9, 1.0]	Uniform
Refl. Frac._1,2	[0.8, 1.0]	Uniform
σ_lnf	1	Gaussian

Download table as: ASCII Typeset image

In the analysis of KIC 5957123, the time of superior conjunction, t_0,supconj, was also sampled, as the light-curve geometry estimator could not be applied to obtain an adequate fit for t_0,supconj prior to this analysis. Initial estimates for the eccentricity of KIC 8314879 found e = 0.088, which is very close to circular. Thus, $e\cos \omega$ and $e\sin \omega$ were not sampled for this system, since it was found that models with strictly circular orbits (e = 0) could be computed nearly four times as quickly than those with an eccentric orbit. However, more realistic uncertainties may be achieved by sampling $e\cos \omega$ and $e\sin \omega$ .

When sampling a full data set, we included a noise-nuisance parameter, σ_lnf, to account for observational noise and underestimated per-point uncertainties. This parameter is described in more detail in Section 4.2. We chose to sample over the sums and ratios ( $\tfrac{{T}_{\mathrm{eff},2}}{{T}_{\mathrm{eff},2}}$ , $\tfrac{{R}_{2}}{{R}_{1}}$ , and $\tfrac{{R}_{1}+{R}_{2}}{a}$ ), rather than absolute parameter values (T_eff,2, R₁, and R₂) because these sums and ratios are more directly constrained by the photometric data (see Figure 11 of Conroy et al. 2020 for a similar comparison of sampling orthogonal parameters). These parameter combinations produce posteriors that are less correlated and closer to Gaussian distributions, enabling a more robust interpretation of parameter values.

MCMC methods sample the posterior distribution, which describes the probability that a given parameter combination generates our data. This concept follows from Bayesean statistics: Bayes' Theorem is listed in Equation (4), which gives the posterior probability ${ \mathcal P }({\rm{\Theta }})$ of a parameter combination Θ (Speagle 2020):

$\begin{eqnarray}&&{ \mathcal P }({\rm{\Theta }})=\displaystyle \frac{{ \mathcal L }({\rm{\Theta }})\pi ({\rm{\Theta }})}{{ \mathcal Z }},\end{eqnarray} \tag{ 4 }$

where π(Θ) is the prior probability (see Section 4.1) and ${ \mathcal L }({\rm{\Theta }})$ is the likelihood function (see Section 4.2). The evidence, ${ \mathcal Z }$ , is difficult to compute, so we ignore the absolute value of our distributions and perform the simplification shown in Equation (5); (Hogg et al. 2010; Prša 2021):

$\begin{eqnarray}&&{ \mathcal P }({\rm{\Theta }})\propto { \mathcal L }({\rm{\Theta }})\pi ({\rm{\Theta }})\end{eqnarray} \tag{ 5 }$

Inherent in this simplification is the assumption that our model appropriately represents the data we are given. We will cover posterior distributions in more detail in Section 5.1, but for now it should be noted that the widths of the posterior distributions are not necessarily indicative of parameter uncertainties.

4.1. Prior Distributions

For most parameters sampled by MCMC, we did not introduce custom prior distributions, since prior information was only available for a select few parameters. PHOEBE intrinsically places some restrictions on the forward model, and runs checks to ensure that the model is physical before computing. These restrictions act as uninformative priors in extreme cases where walkers attempt to sample regions of the parameter space that are obviously nonphysical.

A Gaussian prior was introduced over the distance for each binary system, with a standard deviation equal to the measurement uncertainties published by Gaia, as shown in Table 1.

Uniform (uninformative) priors were placed on gravity darkening and reflection coefficients based on the temperatures of the primary and secondary stars from the initial solutions. For stars with T_eff > 8000 K, we expect radiative atmospheres for which the most appropriate gravity darkening and reflection coefficients are ≥ 0.9 and ≥ 0.8, respectively. For stars with high-temperature initial solutions, uniform priors were set over this range. There are a few systems where the effective temperature of at least one star is within the range of 6000 K ≤ T_eff ≤ 8000 K. For such stars, the appropriate gravity darkening and reflection coefficients are somewhat ambiguous; with the models available to us in PHOEBE, we cannot determine whether the stellar atmosphere should be radiative or convective for a given set of parameters. Therefore, for stars with 6000 K ≤ T_eff ≤ 8000 K, uniform priors were set that enclose the entire range of possible values, including gravity darkening and reflection coefficients in the ranges [0.32, 1.0] and [0.6, 1.0], respectively.

4.2. The Likelihood Function

To each parameter combination, the MCMC method assigns a probability based on the chosen likelihood function. With PHOEBE, this involves computing the forward model, computing residuals between the model and the observed data, and performing a χ² evaluation. In this research, we used PHOEBE's default log-likelihood function, the exact form of which is given in Equation (6). Here, the log of probability, p, is given as a sum in terms of the model data y_m, the observed data y_o, the per-point uncertainty on the observed data, σ_o, and our noise-nuisance parameter σ_lnf:

$\begin{eqnarray}&&\mathrm{ln}(p)=-0.5\sum \left(\displaystyle \frac{{\left({y}_{m}-{y}_{o}\right)}^{2}}{{\sigma }^{2}}+\mathrm{ln}({\sigma }^{2})\right),\end{eqnarray} \tag{ 6 }$

where

$\begin{eqnarray}&&{\sigma }^{2}={{\sigma }_{o}}^{2}+{{y}_{m}}^{2}{e}^{2{\sigma }_{\mathrm{lnf}}}.\end{eqnarray} \tag{ 7 }$

It is worth bringing to attention the ways in which Equation (6) differs from the expected, simpler form of the log-likelihood function, $\mathrm{ln}(p)=s-0.5{\chi }^{2}$ , notably through the addition of a $\mathrm{ln}({\sigma }^{2})$ term in the sum. The denominator (Equation (7)) contains an additional term ${{y}_{m}}^{2}{e}^{2{\sigma }_{\mathrm{lnf}}}$ . The convergence of an MCMC run and the resulting widths of the posterior distributions are both highly dependent on the per-point observational uncertainties (Conroy et al. 2020). In their paper on solving the inverse problem with PHOEBE, Conroy et al. (2020) explain that in cases where the observational uncertainties are thought to be underestimated by a constant factor (such as in Kepler data; see Jenkins 2017) the noise-nuisance parameter is introduced to increase the scale of the observational uncertainties. We sample over σ_lnf in our MCMC analysis so that any resulting degeneracies with other nuisance parameters are encoded in the posteriors.

The additional $\mathrm{ln}({\sigma }^{2})$ term serves as somewhat of a counter to the noise-nuisance parameter. If we do not wish to include the noise-nuisance parameter in our computation, we assign σ_lnf a value of −∞; this simplifies our uncertainty term to a constant ${\sigma }^{2}={{\sigma }_{o}}^{2}$ . In such a case, we are effectively adding an arbitrary constant to all of our log-likelihood calculations, which does not impact the MCMC walkers' behavior in any significant way.

If instead we elect to include the noise-nuisance parameter (taking σ_lnf to be some finite value), we expect Equation (7) to increase, leading to a more favorable log-likelihood estimate. The $\mathrm{ln}({\sigma }^{2})$ term in Equation (6) prevents $\mathrm{ln}(p)$ from improving arbitrarily as larger values of σ_lnf are sampled, since $\mathrm{ln}({\sigma }^{2})$ increases linearly with σ_lnf, whereas the χ² term decreases exponentially with σ_lnf. Once the noise-nuisance parameter suitably represents the noise of the data, any further improvement we gain in χ² by increasing σ_lnf is smaller than the detriment incurred upon the log-probability by the $\mathrm{ln}({\sigma }^{2})$ term. Modifying the log-probability function in this manner deters overfitting data noise when sampling over σ_lnf.

5. Results of MCMC Analysis

We have performed the MCMC analysis on both binned and full data sets for all three binary systems, uncovering the posterior distributions in the region of the parameter space near our initial solution. The topology of the posterior distribution, especially the modality, should indicate whether or not the MCMC runs are adequately converged (Hogg & Foreman-Mackey 2018). Although MCMC sampling cannot guarantee that we have identified the global minimum of the parameter space, a converged MCMC run that is independent of initialization is likely to have enclosed the global minimum (Hogg & Foreman-Mackey 2018).

5.1. Posterior Distributions

Posterior distributions indicate the probability that a given parameter set will describe the observed data. The shapes of the posterior distributions provide us with some knowledge of the local topology of the parameter space near our solution, indicating parameter correlations and regions of confidence around our parameter estimates.

When drawing parameter uncertainties directly from the widths of posterior distributions, however, we run the risk of misinterpreting the results of our MCMC sampling. We must ensure that we have marginalized over all nuisance parameters—any parameter that has a significant effect on the shape of the light curve—or else we may fail to sample all possible parameter combinations that yield an acceptable fit for our system.

We must also ensure that our MCMC walkers have completed an adequate number of iterations to both converge on the global solution and fully explore that region the parameter space. If too few iterations are completed, then the walkers may not sample an acceptable volume of the posterior distribution, leading to smaller than expected posterior widths.

Finally, we must ensure that our model adequately represents the data provided. As discussed in Section 4.2, Kepler data contain stochastic noise and underestimated errors, which our model attempts to fit with a noise-nuisance parameter. This method often struggles to converge, in which case we turn to performing the MCMC analysis on binned light-curve data. With larger per-point uncertainties and fewer data points, these binned data enable the MCMC walkers to converge quickly and more readily accept new positions, leading to a fuller exploration of the parameter space. However, binned data are not necessarily representative of the binary system observed by Kepler, and lead to posterior distributions with larger widths than our observational precision would allow. Therefore, we must strive to strike a balance between data that encourage convergence and a model that appropriately generates the observed data.

Many parameters were sampled in the MCMC analysis of these systems. As such, the corner plots displaying two-dimensional cross sections of the posterior distributions are quite large. To aid in legibility, full corner plots for each run are shown in the Appendix. The plots displayed in the following subsections do not show the full array of sampled parameters, but rather highlight specific parameter correlations within the larger distribution collection.

Although we cannot practically know whether the MCMC walkers have fully sampled the posterior distribution, we can define some heuristic characteristics, such as autocorrelation time and acceptance fraction, to evaluate the completion of these MCMC runs. Hogg & Foreman-Mackey (2018) suggest that an MCMC run has completed adequate sampling when each walker has traversed the high-probability region of the parameter space multiple times. More precisely, Hogg & Foreman-Mackey (2018) define the autocorrelation time per-parameter as the number of steps needed for the walkers to draw independent samples of that parameter. This can be examined qualitatively with a trace plot, like the one shown in Figure 3. The MCMC sampler is performing well if the chains overlap and take on parameter values across the entire range of values sampled. Ideally, we may consider an MCMC run as converged when the number of iterations completed is 10–100 times greater than the largest autocorrelation time.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** An example of the trace plot from the MCMC analysis of KIC 10727668 using binned data. Each chain is represented by a different color, and every parameter value sampled is shown. This plot examines $\tfrac{{T}_{2}}{{T}_{1}}$ , which has an estimated autocorrelation time of 896.8 iterations. The walkers readily take on new parameter combinations, and the ensemble of walkers appears to have traversed the parameter space multiple times.
Download figure:
Standard image High-resolution image

**Figure 3.** An example of the trace plot from the MCMC analysis of KIC 10727668 using binned data. Each chain is represented by a different color, and every parameter value sampled is shown. This plot examines $\tfrac{{T}_{2}}{{T}_{1}}$ , which has an estimated autocorrelation time of 896.8 iterations. The walkers readily take on new parameter combinations, and the ensemble of walkers appears to have traversed the parameter space multiple times.
Download figure:
Standard image High-resolution image

The acceptance fraction is defined as the ratio of moves accepted by a walker to the total number of iterations completed. In the extreme cases, an acceptance fraction of 0 indicates that the walkers never change position, whereas an acceptance fraction of 1 indicates that the walkers accept every new position proposed. An acceptance fraction in the range of 0.25–0.5 is most desirable, striking a balance between convergence and exploration of the parameter space (Hogg & Foreman-Mackey 2018). We comment on how the acceptance fraction may be improved for our models in Section 6.1.

5.2. KIC 5957123

A total of 7600 iterations were completed in the MCMC analysis of KIC 5957123. Autocorrelation times for each parameter were estimated using emcee's autocorrelation function, given by Equation (12) of Foreman-Mackey et al. (2013). The average autocorrelation time for the parameters was estimated to be 483 ± 117 iterations; however, autocorrelation times could not be estimated for the inclination (i) or the secondary gravity brightening coefficient. This suggests that too few iterations have been completed to estimate the autocorrelation time, or that sampling of these two values is insufficient due to a poor initializing distribution. However, we suspect that our model may not appropriately generate the data, in which case fair sampling of the parameter space will never be achieved. This concern is explored in more detail throughout this section. The average acceptance fraction for this MCMC run was 0.0001. Notably, one walker had an acceptance fraction of 0, suggesting that it did not once change position throughout the 7600 iterations of sampling.

Since the light curve for KIC 5957123, shown in Figure 5, exhibits no distinct eclipses, the effects of mass ratio, temperatures, and stellar radii cannot be decoupled. We therefore expect to uncover broader and less centralized posteriors in our MCMC sampling with strong correlations between parameters. The posterior for q, seen in the right corner plot of Figure 4, is distinctly multimodal, and posteriors for i, d, a, and $\tfrac{{R}_{1}+{R}_{2}}{a}$ appear to have smaller secondary peaks. These posteriors suggest that the MCMC walkers have not converged on a single solution, and as a consequence, we cannot infer parameter uncertainties from their widths. Furthermore, the contour plots in both Figure 4 and the full corner plot shown in the Appendix suggest correlations between several parameters. Due to the sparse sampling of parameter combinations, these correlations are difficult to quantify; however in many of the two-dimensional histograms such as those relating i and $\tfrac{{T}_{2}}{{T}_{1}}$ in Figure 4, samples appear to lie along a line rather than be clustered around one central point. Strong correlations were not present between these parameters in the binned data set analysis (see the full corner plots presented in the Appendix, where the posteriors are more centralized), so the correlations seen in the full data set posteriors may be a result of walkers stuck in separate regions of the parameter space. Nonetheless, this indicates that our model is struggling to generate the observed data.

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** Subsets of the corner plot of the posterior distributions uncovered by the MCMC analysis of KIC 5957123. The left corner plot contains parameters that we expect to be constrained by photometric data (i, $\tfrac{{T}_{2}}{{T}_{1}}$ , $\tfrac{{R}_{2}}{{R}_{1}}$ , and $\tfrac{{R}_{1}+{R}_{2}}{a}$ ), while the right plot shows a collection of parameters sampled as nuisance parameters (q, a, T_eff,1, and d). One-dimensional posterior distributions for each parameter are displayed along the main diagonal, where the dashed lines indicate the 1σ spread of samples.
Download figure:
Standard image High-resolution image

The residuals in Figure 5 show phase-space periodic trends, indicating that our synthetic light curve is a poor fit to the observed data. Additionally, the residuals at phase −0.4 are more positive than those at phase 0.4, suggesting that one peak of the light curve is higher than the other. Unfortunately, our model of KIC 5957123 does not account for this phenomenon. Although we marginalized over many nuisance parameters, the failure of this MCMC run to converge indicates that this difference in peak brightness must result from some additional parameters that we have not considered. Therefore, our model cannot appropriately generate the data, and as such the MCMC walkers will never converge. Realizing this, we stopped this run early at 7600 iterations.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** Phase-folded light curve of KIC 5957123 overplotted with 10 synthetic solutions drawn from the posterior distributions. Below, residuals between these synthetic models and the photometric data are plotted. These residuals exhibit phase-space trends, suggesting that our model does not generate observables that agree with the photometric data.
Download figure:
Standard image High-resolution image

5.3. KIC 8314879

A total of 15,100 iterations were completed in the MCMC analysis of KIC 8314879. The average autocorrelation time for the parameters was estimated to be 1340 ± 260 iterations, where $\tfrac{{T}_{2}}{{T}_{1}}$ had the maximum autocorrelation time of 1540 iterations. In all, 138,900 more iterations would need to be performed to reach our benchmark for fair and unbiased sampling of 100 times the maximum autocorrelation time. The average acceptance fraction for this MCMC run was 0.0006.

Most MCMC walkers eventually converged to the same local minimum; however, some of those spent thousands of iterations stuck in other local minima. A log-probability cutoff of 516,555 was applied to center our results on the highest probability region, fully removing one walker from this solution and truncating the results of those walkers that took longer to converge. However, three walkers identified a different solution above the log-probability cutoff. This solution is not enclosed in the 1σ widths of the posterior distributions, but can be seen on the fringes of Figure 6.

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** Similar to Figure 4, but for the posteriors uncovered by the MCMC analysis of KIC 8314879. The log-probability was cut off at 516,555, eliminating one walker from this solution. Posterior distributions for these parameters are predominantly unimodal, although three walkers found a separate high-probability region visible along the edges of several posteriors.
Download figure:
Standard image High-resolution image

The light-curve residuals presented in Figure 7 exhibit minimal phase-space trends, however the spread of data points is larger near phases 0.0 and 0.5. The synthetic model has not quite captured the shape of the eclipse at these points, most likely due to the resolution of the forward model near these phases. If the spread is a result of limited resolution, it should be possible to reduce the spread by generating the forward model at more times during primary and secondary eclipse. If future testing shows that this change does not improve the residuals, then it may be necessary to sample over $e\sin \omega$ and $e\cos \omega$ instead of fixing these values at 0. The lack of phase-space trends indicates that any pulsations that exist in this star are unlikely to be tidally excited. However, the scatter in the residuals remains relatively large, and may be due to pulsations of one or both components. Future work will model pulsations of this system, using the binary constraints to narrow the possible parameter space. The mass range we find here suggests this system likely contains at least one slowly pulsating B (SPB) star.

Figure 7. Refer to the following caption and surrounding text. — **Figure 7.** Phase-folded light curve of KIC 8314879 overplotted with 10 synthetic solutions drawn from the posterior distributions. Below, residuals between these synthetic models and the photometric data are plotted. There is some noticeable spread in the residuals near each eclipse. This may be resolved by increasing the number of points computed within the eclipse regions, or sampling over e and ω.
Download figure:
Standard image High-resolution image

Parameter correlations uncovered by this MCMC analysis are more densely sampled than in the ellipsoidal variable, KIC 5957123. The one-dimensional posteriors of Figure 6 are unimodal within 1σ, or even 2σ, of the median value, suggesting that most MCMC walkers have converged on a single solution. Computing more iterations will always provide a more precise estimate of the posterior distributions, but eventually the computing cost becomes unfeasible to manage. With limited computing resources, we must aim to strike a balance between time spent computing and relative improvement in parameter estimates. When deciding to conclude an MCMC run, we must rely on heuristic qualitative tests to evaluate convergence, as there is no simple way to evaluate how long an MCMC analysis must be performed to yield reliable results (Hogg & Foreman-Mackey 2018). Since we have marginalized over appropriate nuisance parameters, and 60 of the 64 walkers converged to the same minimum after exploring the parameter space for 15,100 iterations, we report parameter uncertainties from the widths of the posterior distributions. These uncertainties are displayed in Table 4.

Table 4. KIC 8314879 and KIC 10727668 Parameter Values

Parameter	KIC 8314879	KIC 10727668
System Parameters
i (°)	${78.259}_{-0.016}^{+0.04}$	${79.33}_{-0.04}^{+0.3}$
q	${0.689}_{-0.004}^{+0.004}$	${17.2}_{-0.3}^{+0.3}$
a (R_⊙)	${7.52}_{-0.14}^{+0.1}$	${11.1}_{-0.2}^{+0.3}$
$\tfrac{{T}_{\mathrm{eff},2}}{{T}_{\mathrm{eff},1}}$	${0.753}_{-0.003}^{+0.005}$	${0.6501}_{-0.002}^{+0.0013}$
$\tfrac{{R}_{2}}{{R}_{1}}$	${0.6710}_{-0.0015}^{+0.0005}$	${11.05}_{-0.04}^{+0.05}$
$\tfrac{{R}_{1}+{R}_{2}}{a}$	${0.5818}_{-0.0002}^{+0.0002}$	${0.2645}_{-0.004}^{+0.0004}$
d (pc)	${8060}_{-200}^{+160}$	${5460}_{-130}^{+150}$
e	...	${0.018}_{-0.007}^{+0.002}$
ω (°)	...	${88.9}_{-0.9}^{+0.98}$

Stellar Parameters—Primary
T_eff,1 (K)	${10,660}_{-180}^{+78}$	${13,580}_{-40}^{+50}$
R_equiv,1 (R_⊙)	${2.62}_{-0.05}^{+0.04}$	${0.242}_{-0.006}^{+0.008}$
M₁ (M_⊙)	${4.39}_{-0.2}^{+0.19}$	${0.189}_{-0.012}^{+0.011}$
Gr. Bright.	${0.933}_{-0.015}^{+0.02}$	${0.941}_{-0.009}^{+0.008}$
Refl. Frac.	${0.918}_{-0.012}^{+0.019}$	${0.89}_{-0.03}^{+0.05}$

Stellar Parameters—Secondary
T_eff,2 (K)	${8030}_{-80}^{+40}$	${8820}_{-30}^{+30}$
R_equiv,2 (R_⊙)	${1.76}_{-0.03}^{+0.02}$	${2.68}_{-0.07}^{+0.07}$
M₂ (M_⊙)	${3.03}_{-0.17}^{+0.12}$	${3.2}_{-0.2}^{+0.3}$
Gr. Bright.	${0.92}_{-0.09}^{+0.05}$	${0.943}_{-0.007}^{+0.009}$
Refl. Frac.	${0.830}_{-0.006}^{+0.011}$	${0.896}_{-0.009}^{+0.017}$

Download table as: ASCII Typeset image

5.4. KIC 10727668

A total of 14,200 iterations were completed in the MCMC analysis of KIC 10727668. The average autocorrelation time for the parameters was estimated to be 1150 ± 300 iterations, where $\tfrac{{R}_{2}}{{R}_{1}}$ had the maximum autocorrelation time of 1430 iterations. In all, 128,800 more iterations would need to be performed to achieve fair and unbiased sampling. The average acceptance fraction for this MCMC run was 4 × 10⁻⁵.

For this solution, we adopted a log-probability cutoff of 537,075, eliminating three walkers that were stuck in a lower probability region of the parameter space. This allows us to better resolve the two-dimensional Gaussian plots in the high-probability region around the proposed solution, shown in Figure 8. The width of the T_eff,1 posterior distribution was consistently small throughout all runs, indicating that this system is much more sensitive to a change in temperature than the other binaries.

Figure 8. Refer to the following caption and surrounding text. — **Figure 8.** Similar to Figure 4, but for the posteriors uncovered by the MCMC analysis of KIC 10727668. Three walkers with log-probability below 537,075 were eliminated from this solution. Several posterior distributions are centralized and unimodal, however, i and $\tfrac{{R}_{1}+{R}_{2}}{a}$ appear to have secondary modes that are included in the 1σ spread.
Download figure:
Standard image High-resolution image

**Figure 8.** Similar to Figure 4, but for the posteriors uncovered by the MCMC analysis of KIC 10727668. Three walkers with log-probability below 537,075 were eliminated from this solution. Several posterior distributions are centralized and unimodal, however, i and $\tfrac{{R}_{1}+{R}_{2}}{a}$ appear to have secondary modes that are included in the 1σ spread.
Download figure:
Standard image High-resolution image

The light-curve residuals displayed in Figure 9 show no phase-space trends. This indicates that the synthetic solutions drawn from the posterior distributions of our MCMC analysis appropriately model the light-curve variation of this system. Any deviation from this synthetic model appears to result from noise inherent in the data. Previous studies of this system suggest that one component may be a δ Scuti variable star (Wang et al. 2018), which is a likely contributor to the scatter seen throughout the residuals. Indeed, the Fourier spectrum for this system shows a small cluster of frequencies between 10 and 15 c day⁻¹, again consistent with a δ Scuti variable. As for KIC 8314879, future work will involve modeling the pulsations of this system with the binary parameters found here as constraints.

Figure 9. Refer to the following caption and surrounding text. — **Figure 9.** Phase-folded light curve of KIC 10727668 overplotted with 10 synthetic solutions drawn from the posterior distributions. Below, residuals between these synthetic models and the photometric data are plotted. These residuals exhibit few obvious trends in phase space, suggesting that the parameter sets drawn from the posteriors uncovered by our MCMC analysis suitably model this system.
Download figure:
Standard image High-resolution image

Most posterior distributions are centralized and unimodal; however, some parameters seem to exhibit a lower probability secondary mode, such as i and $\tfrac{{R}_{1}+{R}_{2}}{a}$ in the left plot of Figure 8. Walkers do not appear to pass from one mode to the other, suggesting that these secondary modes might be a consequence of stuck walkers that cannot find a path to the high-probability solution, rather than a phenomenon of two distinct solutions. This seems a likely explanation given the rather low acceptance fraction for this system.

The posterior distributions for $e\sin \omega$ and $e\cos \omega$ were excluded from Figure 8, since they did not form any significant correlations with other parameters. Rather, their posteriors appear to be independent and Gaussian, which can be seen in Figure 13, the posterior distribution from the binned data set analysis. This behavior, alongside the relatively uniform light-curve residuals, suggests that we have likely sampled over appropriate nuisance parameters. Since our walkers have also converged to a solution that yields an appropriate fit to the photometric data, we may infer parameter uncertainties from the widths of the posterior distributions. These uncertainties are presented in Table 4.

6. Discussion

This procedure was successful in obtaining robust parameter estimates for the two EBs studied, but struggled to identify parameters and uncertainties for the ellipsoidal variable system. Throughout this section, we identify issues with this procedure and propose modifications for the benefit of future studies.

6.1. Feasibility of This Analysis

MCMC analysis was performed on binned light-curve data to quickly identify the topology of the parameter space, and obtain initializing distributions for a full MCMC run. This significantly improved the effectiveness of the full data set run, since walkers converged quickly and were less likely to get stuck in local minima. Should this procedure be replicated, we strongly recommend that the model resolution be reduced for the initial binned data set run to encourage faster computation times. Larger per-point uncertainties and a smaller number of data points should permit a lower-precision model to yield similar posterior distributions. This would permit quick initial sampling so that more compute resources may be allocated to process a higher-resolution model in the full data set analysis.

The stellar models used in this research struggle to account for noise in the Kepler data, even when the noise-nuisance parameter σ_lnf is implemented. Gaussian processes are supported by PHOEBE, but were not implemented in this analysis, as these options require that our light curve be computed over a range of times that spans the entire time series of the observed data. It is redundant to compute our stellar models for more than one period, since no intrinsic variability is modeled in the stars themselves. Applying Gaussian processes after interpolating from our one-period forward model to the entire time series could help account for stochastic noise without incurring longer computation times, and this could be investigated in future work on these stars.

One goal of this project was to examine the feasibility of establishing parameters for a larger set of Kepler binaries using only time-series photometry. Considering that full MCMC analysis using PHOEBE models required roughly one CPU year of computation time, however, such a procedure is likely not feasible, especially considering the manual tuning required to optimize each MCMC run.

6.1.1. Acceptance Fraction

The acceptance fraction of all three full data set MCMC runs was very low, with values of 0.0001, 0.0006, and 4 × 10⁻⁵ for KIC 5957123, KIC 8314879, and KIC 10727668, respectively. This is much smaller than the acceptance fraction from MCMC analysis of the binned data, which had acceptance fractions of 0.002, 0.003, and 0.008, respectively. A comparison of log-probability functions in Figure 10 shows the differing behaviors of walkers in the binned data set analysis versus the full data set analysis, notably their tendency to accept new positions near the global minimum. Although the initial analysis with binned data successfully promotes sampling of the parameter space and provides a useful (albeit not robust) estimate of the posterior distribution, it does not directly encourage sampling in the subsequent full data set analysis. Rather, it only prevents initialization of walkers in a low-probability region of the parameter space.

Figure 10. Refer to the following caption and surrounding text. — **Figure 10.** Comparison of log-probability functions between the MCMC analyses of KIC 10727668, with burn-in and thinning applied. Each walker is represented with a curve of different color. On the left is the log-probability function from the binned data set run. These chains are converged and readily accept new parameter combinations. On the right is the log-probability function from the full data set run. Notice that walkers in this run tend to become stuck—they will spend many iterations in the same parameter combination, yielding an unchanging log-probability.
Download figure:
Standard image High-resolution image

Hogg & Foreman-Mackey (2018) suggest that the average acceptance fraction can be improved by reducing the step size (the distance between the current position in parameter space and the proposed point) for the MCMC walkers, or by decreasing the variance of the proposal distribution. We found many of our models to be very sensitive to certain parameters (d, T_eff,1, a), which makes sampling for their correlations difficult. We suggest that the step size of emcee be reduced in any further analysis to encourage more appropriate sampling of these parameters.

6.2. Ellipsoidal Variables

Ellipsoidal variables (ELVs) do not exhibit distinct eclipses in their light curves, and ellipsoidal variations cannot be decoupled from heating or reflection effects. As such, ELVs suffer from a lack of radial velocity measurements far more than EBs, since values for i and $\tfrac{{R}_{1}+{R}_{2}}{a}$ cannot be measured directly from their light curves. Using PHOEBE's robust treatment of Roche effects and irradiation alongside MCMC sampling, we expected to produce parameter correlations for the ellipsoidal variable KIC 5957123, albeit to less precision than we could otherwise achieve for an EB.

It was difficult to propose an adequate initial solution for this system, and based on our results, it appears that our MCMC analysis could not find a suitable solution for KIC 5957123. Therefore, we must consider the possibility that our model does not adequately represent this system. Notably, our model does not fit the two peaks of differing height; other researchers have had success fitting similar features by implementing star spots (NegmEldin et al. 2019). There is little prior information available to suggest that either star in this system has spots, nor where such spots should be placed. However, spots are relatively common in F-type main-sequence stars, and Doppler imaging techniques have been used to identify the properties and evolution of star spots in other Kepler binaries (Wang et al. 2021). If spots were to be introduced to this PHOEBE model, then the spot parameters (temperature reduction, radius, latitude, and longitude) should be sampled in the MCMC analysis.

If we had stellar spectra, and subsequently radial velocity measurements, we could measure $a\sin i$ , M₁, M₂ (and by extension q), V_γ (the systemic velocity), T_eff,1, and T_eff,2. Since this is not an eclipsing binary, we do not have constraints on the sum of fractional radii; however, by using the distance estimate from Gaia we may constrain R₁ and R₂ and potentially construct a satisfactory stellar model. With such constraints, it might be possible to identify which phenomenon is causing the difference in peak brightness at ϕ = 0.25 and ϕ = 0.75.

Without spectra, however, we struggle to confirm that our model adequately describes the data. Failing that implies our MCMC analysis of this system is not mathematically sound. We cannot claim that our walkers have adequately sampled the parameter space in the region around the true solution, and so despite our marginalization efforts we cannot quote the posterior widths as parameter uncertainties for this system.

6.3. Eclipsing Binaries

The method presented in this paper proved rather successful for establishing parameter correlations and values for EBs. This is to be expected, as the geometry of an eclipsing binary's light curve provides information about several system parameters and ratios (Prša et al. 2008; Southworth 2019). If steps are taken to appropriately marginalize (read: sample) over all nuisance parameters, it is possible to construct an appropriate model for an eclipsing binary system using only photometric data and distance measurements.

This method required roughly 10,000 iterations of MCMC sampling of the binned data, followed by roughly 14,000 iterations of MCMC sampling over the full data set. For the 64 MCMC walkers, this required a total of roughly 2 core years per system, not accounting for any of the failed runs or trial runs performed while refining this procedure! Therefore, this method is not possible without access to a high performance computing cluster. Successful sampling requires much computing time and human intervention.

Other binary modeling software (ellc, PHOEBE Legacy) will run faster than PHOEBE 2.3.41, but with simplified physics. For this procedure, we wanted to use the most robust physical model possible, especially when trying to marginalize over nuisance parameters that may be constrained by second-order effects. For instance, we can constrain q by the Roche-lobe deformation that causes ellipsoidal variation in the out-of-eclipse regions of the light curves. This would not be possible in software that approximates stellar surfaces as spherical, and may be inaccurate in those that implement simplified potential surfaces. In order to appropriately marginalize over all nuisance parameters, the most thorough physical model was used at the cost of computation time.

In summary, this method of MCMC analysis works for EBs, but requires extensive computing resources and management. One of the main draws of studying eclipsing binary stars is to constrain investigations of asteroseismology, in which case, so long as the light-curve data are treated ahead of time and the same treated data are used across both projects, the posterior distributions obtained from MCMC analysis may be used as prior distributions on asteroseismological models.

6.4. Stellar Pulsations

At least two of the systems investigated here show some evidence of pulsations. KIC 8314879 shows relatively large residuals, and the mass found here is consistent with an SPB variable. B stars have convective cores, and the g-mode pulsations typical of SPB variables can be used to probe the convective overshoot (e.g., Pápics et al. 2017). With the addition of constraints from the binary orbit of the system, it will be possible to use the observed pulsations to find tight constraints on the behavior of interior convection in this star.

KIC 10727668 has been previously proposed to contain a δ Scuti variable (Wang et al. 2018), and that is borne out in our analysis. Based on binary fitting, we find the mass of the secondary to be ${3.2}_{-0.2}^{+0.3}$ M_⊙, which is slightly higher than the typical range of masses seen in δ Scuti stars (1.4–2.5 M_⊙). In the Fourier spectrum of this star, we see a cluster of frequencies between 10 and 15 c day⁻¹, which are likely to be p-mode pulsations. As for KIC 8314879, these pulsation frequencies can be used to constrain the interior structure, including convective overshoot above the core. This analysis will be the subject of future work.

6.5. Concerning the Treatment of Data

6.5.1. Method of Flux Calibration

Although our method of flux calibration appears to have yielded realistic stellar models, a more precise estimate of the Kepler signal to flux conversion may exist elsewhere, and each CCD may require a separate calibration. In the raw data, each quarter had a different median flux—the Kepler spacecraft was rotated between quarters, so light from our binaries is detected by a different CCD each quarter, likely contributing to this flux discrepancy. A more robust method to detrend our data might include calibrating each CCD separately, rather than applying one calibration and averaging the median flux of each quarter. An even better method may be to instead analyze each quarter separately. This would lead to fewer data points and less noise in each run, and allows for the flux of each quarter to be calibrated independently.

6.5.2. Parameter Uncertainties

The widths of the posterior distributions returned by an MCMC run are highly dependent on the per-point uncertainties (Hogg et al. 2010), and to a lesser extent on the number of data points used in our calculation of χ². Binning the light-curve data effectively involves transforming a data set of 13,500 points into a data set of 500 points. The uncertainty on each point is then defined to be the standard deviation of all points in each bin. For all three binaries, these per-point uncertainties were larger than the per-point uncertainties of the points from the Kepler light curve. It would be erroneous to claim that the posterior widths from MCMC analysis on these 500 points with large per-point uncertainties appropriately describes the parameter uncertainties for the binary system that produced those 13,500 points with smaller per-point uncertainties. Binning the light curve as done in this procedure encourages convergence and exploration of the parameter space. This is useful, especially as initial MCMC trial runs with a full data set did not converge in a reasonable time, and did a poor job exploring the parameter space, even with the inclusion of the noise-nuisance parameter. However, binning the light curve also artificially increases the widths of the posterior distributions, which would lead us to claim larger uncertainties than the data should allow.

The binning of light-curve data has its place and its uses. For instance, it eliminates artificial instrumental noise that is not characteristic of the intrinsic variability of the binary, with less risk of eliminating intrinsic variability such as ellipsoidal variation in the out-of-eclipse regions. However, binning light-curve data should not be used when attempting to obtain robust parameter uncertainties through MCMC analysis. Instead, the stellar model should be manipulated whenever possible to account for the behavior of the data set, such as through the implementation of Gaussian processes or the noise-nuisance parameter (as discussed in Section 4.2). We acknowledge, however, that working with Kepler data in this manner often proves difficult. Underestimated per-point uncertainty and high instrumental noise deter convergence and yield unreasonably small posterior widths, mainly because the log-probability function becomes so sensitive that walkers struggle to adopt new positions while exploring the parameter space.

Starting with MCMC analysis of a binned data set and using the resulting posteriors as an initializing distribution for a full data set run ensures that all walkers are initialized in a high-probability region of the parameter space. However, due to the small acceptance fractions and large autocorrelation times characteristic of a full data set run, the full run still requires far more computational time and many more walkers than we would like in order to adequately sample the high-probability region around the global minimum.

7. Conclusions

Using the PHOEBE stellar modeling package, we have constructed computational models of three Kepler binary stars: KIC 5957123, KIC 8314879, and KIC 10727668. By applying MCMC methods to these models, we obtained estimates of the posterior probability distributions of the parameter space for each binary. From these posterior distributions, we attempted to justify our parameter estimates and extract uncertainties on each parameter. These posterior distributions also enabled us to extract light-curve residuals for each binary.

This procedure proved relatively successful for the EBs KIC 8314879 and KIC 10727668, but struggled to produce an adequate model for the ellipsoidal variable system KIC 5957123. There may be some phenomenon present in this system that was not fully realized by our model. It may be necessary to obtain radial velocities for this system in order to apply further constraints and generate a physically appropriate model.

In all models, MCMC walkers struggled to converge due to stochastic noise in the Kepler data and inadequate tuning of sampling parameters such as step size. Although the noise-nuisance parameter can be sampled to account for underestimated uncertainties, it may be necessary to include a more thorough characterization of scatter to appropriately model noisy Kepler data.

Unfortunately, the faint magnitude of these stars (m_K =15.1–16.3) makes spectroscopic follow-up extremely challenging, requiring time on large telescopes with high-resolution spectroscopy. For the KIC 5957123 system, the models presented here are likely as tightly constrained as possible without further observations.

Appropriate residuals were obtained for KIC 8314879 and KIC 10727668, from which we may begin to probe pulsation characteristics in further analysis. Both of these systems show signs of residual pulsation, possibly associated with SPB and δ Scuti variables, respectively. Asteroseismic modeling has been shown to be effective at constraining the internal structure of stars, including the internal rotation profile and convective core overshoot (e.g., Pápics et al. 2017; Pedersen et al. 2018). Future work on these systems will focus on improving the binary models presented here to provide tight constraints on the stellar properties. The addition of the constraints provided from the binary modeling will enable us to place tighter constraints on the internal structure than would be possible with asteroseismology alone.

This research was funded by resources from NSERC. The authors would also like to thank Compute Canada, the PHOEBE team, the MAST archive, and the Kepler GO program. The authors are grateful to the anonymous referee for their helpful comments. This research has made use of NASA's Astrophysics Data System.

Facilities: Kepler - The Kepler Mission, Gaia - , SORCE - .

Software: astropy (Astropy Collaboration et al. 2013, 2018), PHOEBE (Prša & Zwitter 2005), emcee (Foreman-Mackey et al. 2013), numpy (Harris et al. 2020).

Appendix: Posterior Distribution Corner Plots

Here, we display the full corner plots from our MCMC runs, representing all parameters sampled. We have excluded the gravity brightening and reflection fraction parameters from these corner plots, as they produced uniform distributions that did not exhibit any correlation with the other parameters sampled.

A.1. Corner Plots from MCMC Analysis of Binned Data

Figures 11, 12, and 13 show the posteriors for the binned data set runs of KIC 5957123, KIC 8314879, and KIC 10727668, respectively.

Figure 11. Refer to the following caption and surrounding text. — **Figure 11.** Full corner plot showing the estimated posterior distribution from MCMC analysis of the binned light-curve data from KIC 5957123.
Download figure:
Standard image High-resolution image

Figure 12. Refer to the following caption and surrounding text. — **Figure 12.** Full corner plot showing the estimated posterior distribution from MCMC analysis of the binned light-curve data from KIC 8314879.
Download figure:
Standard image High-resolution image

Figure 13. Refer to the following caption and surrounding text. — **Figure 13.** Full corner plot showing the estimated posterior distribution from MCMC analysis of the binned light-curve data from KIC 10727668.
Download figure:
Standard image High-resolution image

A.2. Corner Plots from MCMC Analysis of Full Data

Figures 14, 15, and 16 show the posteriors for the full data set analysis of KIC 5957123, KIC 8314879, and KIC 10727668, respectively. Again, gravity darkening and reflection fractions are excluded from these plots, as is the noise-nuisance parameter.

Figure 14. Refer to the following caption and surrounding text. — **Figure 14.** Full corner plot showing the estimated posterior distribution from MCMC analysis of KIC 5957123.
Download figure:
Standard image High-resolution image

Figure 15. Refer to the following caption and surrounding text. — **Figure 15.** Full corner plot showing the estimated posterior distribution from MCMC analysis of KIC 8314879.
Download figure:
Standard image High-resolution image

Figure 16. Refer to the following caption and surrounding text. — **Figure 16.** Full corner plot showing the estimated posterior distribution from MCMC analysis of KIC 10727668.
Download figure:
Standard image High-resolution image

Author e-mails

Author affiliations

ORCID iDs

Dates

3.3.1. KIC 5957123

3.3.2. KIC 8314879

3.3.3. KIC 10727668

6.1.1. Acceptance Fraction

6.5.1. Method of Flux Calibration

6.5.2. Parameter Uncertainties

Using Computational Models to Uncover the Parameters of Three Kepler Binaries: KIC 5957123, KIC 8314879, and KIC 10727668*

Article metrics

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Data

2.1. The Kepler Input Catalog

2.2. Flux Calibration

2.3. Polynomial Fitting to Detrend Kepler Data

2.4. Interstellar Extinction

2.5. The Gaia Mission

3. Models

3.1. PHOEBE

3.2. Procedure for Analysis

3.3. Initial Parameters

3.3.1. KIC 5957123

3.3.2. KIC 8314879

3.3.3. KIC 10727668

4. Markov Chain Monte Carlo

4.1. Prior Distributions

4.2. The Likelihood Function

5. Results of MCMC Analysis

5.1. Posterior Distributions

5.2. KIC 5957123

5.3. KIC 8314879

5.4. KIC 10727668

6. Discussion

6.1. Feasibility of This Analysis

6.1.1. Acceptance Fraction

6.2. Ellipsoidal Variables

6.3. Eclipsing Binaries

6.4. Stellar Pulsations

6.5. Concerning the Treatment of Data

6.5.1. Method of Flux Calibration

6.5.2. Parameter Uncertainties

7. Conclusions

Appendix: Posterior Distribution Corner Plots

A.1. Corner Plots from MCMC Analysis of Binned Data

A.2. Corner Plots from MCMC Analysis of Full Data

Footnotes

Using Computational Models to Uncover the Parameters of Three Kepler Binaries: KIC 5957123, KIC 8314879, and KIC 10727668^*