December 21, 2025
I often train by running up and down the rows of concrete stairs at the Harvard stadium near where I live. My watch records a lot of information during these runs. For example, the latitude, longitude, and elevation are saved at regular time intervals. Just for fun, I decided to look a bit more closely at this data.
In previous posts, I showed how I retrieved metrics and weather data from my garmin watch, and some basic analysis of this data where I concluded that external factors (weather conditions) had an effect on my running performance. Here I want to look at "internal" factors: how do I pace myself during the run?
First, we need to retrieve the data from the garmin servers. Here is the code I have used:
#!/usr/bin/env python
import argparse
from garminconnect import Garmin, GarminConnectAuthenticationError
import json
# lat, lon of the stadium
LAT_STADIUM = 42.3669
LON_STADIUM = -71.1260
def is_stadium_run(activity):
try:
lat = activity['startLatitude']
lon = activity['startLongitude']
el = activity['elevationGain']
m_per_degrees = 110 * 1000
dist = m_per_degrees * ((lat - LAT_STADIUM)**2 + (lon - LON_STADIUM)**2)**(1/2)
dist_threshold = 100 # m
if dist < dist_threshold and el > 300 and el < 400:
return True
except:
return False
return False
def retrieve_stadium_data(garmin):
start_times = []
durations = []
activities = garmin.get_activities(start=0, limit=1000, activitytype='running')
all_data = []
for activity in activities:
if is_stadium_run(activity):
activity_id = activity["activityId"]
details = garmin.get_activity_details(activity_id)
metrics = details["activityDetailMetrics"]
descriptors = details["metricDescriptors"]
desc_key2metrics_id = {}
for desc in descriptors:
key = desc['key']
metrics_id = desc['metricsIndex']
desc_key2metrics_id[key] = metrics_id
time_id = desc_key2metrics_id['directTimestamp']
time = [m['metrics'][time_id] for m in metrics]
duration_id = desc_key2metrics_id['sumDuration']
duration = [m['metrics'][duration_id] for m in metrics]
distance_id = desc_key2metrics_id['sumDistance']
distance = [m['metrics'][distance_id] for m in metrics]
elevation_id = desc_key2metrics_id['directElevation']
elevation = [m['metrics'][elevation_id] for m in metrics]
all_data.append(
{
'time': time.copy(),
'duration': duration.copy(),
'distance': distance.copy(),
'elevation': elevation.copy(),
}
)
return all_data
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--out', type=str, default='data.json')
args = parser.parse_args()
try:
tokenstore = '~/.garminconnect'
garmin = Garmin()
garmin.login(tokenstore)
except (FileNotFoundError, GarthHTTPError, GarminConnectAuthenticationError):
print(f"Login tokens not found in {tokenstore}.")
data = retrieve_stadium_data(garmin)
with open(args.out, 'w') as f:
json.dump(data, f)
if __name__ == '__main__':
main()
This gives time series of the elevation and the corresponding duration from the start of my run. The time series are remarkably smooth. We can easily distinguish the ups and downs of each row (recall: there is a total of 37 flights of stairs to run up.)
The elevation against time of 4 of my runs. I only show the first five minutes for clarity.
In the above plot, I marked the location of the peaks, computed with the scipy function find_peaks. These peaks correspond to the highest point of each flight of stairs, and they mark a regular spacing of the total progress. This is a very robust way of keeping track of the progress of each run. Using GPS-computed distance would be a lot noisier, especially due to the sharp turns at the ends of each row.
We can now look at the running time between consecutive rows:
Running time between consecutive rows against the index of the starting row. Symbols show the mean over all my recorded runs, and error bars indicate the 5th to 95th percentiles.
We can observe a few things.
All these observations make sense - although I had never noticed that steady decrease in pace before.
Is there a better way to do this? Let's add the average of my 10 best runs on top of this data:
Same plot as before, with the mean coming from my best 10 runs of all times.
Well, it seems like I won't learn too much from the global progress alone: my best runs were just constantly faster than the average run. This can't be the whole story. Let's zoom in individual segments of the time series.
Below, I am plotting the elevation profiles of each up and down, centered around the top of each row. On each plot, there are thus 36 curves (I skip row 37 which is only up, no down). These are 3 of my best runs:
And 3 slower runs:
From these plots, we can see that my runs follow a three-phase structure for each up and down:
This is clearly visible on first set of graphs. Going down takes about the same time as going up.
Let's quantify the time spent in each zone. For each up and down trajectory, I assume that I have a constant vertical speed in each zone. This leads to a piecewise linear approximation of the elevation against time. A fit over all cycles is illustrated below for the three "fast" runs:
The black dot indicates the transition from running to walking up. The average time spent in each zone can be extracted from this approximation. It is shown below for all runs and for my ten fastest runs:
Average time spent in each zone. Zone 1: Run up; Zone 2: walk up; Zone 3: run down.
In my best 10 runs, I was faster in zone 1 (run up) and zone 3 (run down), but not in zone 2 (walk up).
This was a fun deep dive in the elevation data recorded by my watch over the past 77 stadium runs. I now have evidence that my best strategy yet is not to sprint the whole thing, but to go hard at the start of each flight of stairs, followed by a steady walk up to the top, and then keep a steady but relatively high pace on the way down. All that repeated 36 times before running up the last flight of stairs as fast as I can.
The results were produced with the following code:
#!/usr/bin/env python
import argparse
import json
import matplotlib.pyplot as plt
import numpy as np
import os
from scipy.signal import find_peaks
NUM_ROWS = 37
def plot_peaks(data, samples, out_dir=None):
fig, ax = plt.subplots()
for j, sid in enumerate(samples):
t = np.array(data[sid]['duration']) / 60
e = np.array(data[sid]['elevation'])
pids, _ = find_peaks(e, prominence=np.std(e), distance=len(e) // NUM_ROWS // 2)
ax.plot(t, e, color=f'C{j}')
ax.plot(t[pids], e[pids], 'o', ms=10, color=f'C{j}')
ax.set_xlim(0,5)
ax.set_xlabel('duration (min)')
ax.set_ylabel('elevation (m)')
plt.tight_layout()
if out_dir is None:
plt.show()
else:
plt.savefig(os.path.join(out_dir, 'peaks.svg'))
plt.close(fig)
def plot_p2p_durations(data, out_dir=None):
diffs = []
for d in data:
t = np.array(d['duration'])
e = np.array(d['elevation'])
pids, _ = find_peaks(e, prominence=np.std(e), distance=len(e) // NUM_ROWS // 2)
pids = np.append(pids, [len(t)-1])
if len(pids) >= NUM_ROWS:
t_ = t[pids[:NUM_ROWS]]
diffs.append(np.diff(t_))
diffs = np.array(diffs)
fig, ax = plt.subplots()
x = np.arange(1, NUM_ROWS)
y = np.mean(diffs, axis=0)
ylo = np.quantile(diffs, q=0.05, axis=0)
yhi = np.quantile(diffs, q=0.95, axis=0)
yerr = np.stack((y-ylo, yhi-y), axis=0)
ax.errorbar(x, y, yerr, ls='', marker='o', capsize=5, clip_on=False)
ax.set_xlabel('row number')
ax.set_ylabel('time between two tops (s)')
ax.set_xlim(0, NUM_ROWS)
plt.tight_layout()
if out_dir is None:
plt.show()
else:
plt.savefig(os.path.join(out_dir, 'p2pdurations.svg'))
plt.close(fig)
def plot_p2p_durations_and_best(data, out_dir=None):
diffs = []
durations = []
for d in data:
t = np.array(d['duration'])
e = np.array(d['elevation'])
pids, _ = find_peaks(e, prominence=np.std(e), distance=len(e) // NUM_ROWS // 2)
pids = np.append(pids, [len(t)-1])
if len(pids) >= NUM_ROWS:
t_ = t[pids[:NUM_ROWS]]
diffs.append(np.diff(t_))
durations.append(t[-1])
diffs = np.array(diffs)
durations = np.array(durations)
order = np.argsort(durations)
durations = durations[order]
diffs = diffs[order]
fig, ax = plt.subplots()
x = np.arange(1, NUM_ROWS)
y = np.mean(diffs, axis=0)
ylo = np.quantile(diffs, q=0.05, axis=0)
yhi = np.quantile(diffs, q=0.95, axis=0)
yerr = np.stack((y-ylo, yhi-y), axis=0)
ax.errorbar(x, y, yerr, ls='', marker='o', capsize=5, clip_on=False)
y = np.mean(diffs[:10,:], axis=0)
ax.plot(x, y, '-o', ms=7)
ax.set_xlabel('row number')
ax.set_ylabel('time between two tops (s)')
ax.set_xlim(0, NUM_ROWS)
plt.tight_layout()
if out_dir is None:
plt.show()
else:
plt.savefig(os.path.join(out_dir, 'p2pdurations_and_best.svg'))
plt.close(fig)
def main():
parser = argparse.ArgumentParser()
parser.add_argument('json', type=str, help='data to plot')
parser.add_argument('--out-dir', type=str, default=None, help='output directory for plots')
args = parser.parse_args()
out_dir = args.out_dir
with open(args.json, 'r') as f:
data = json.load(f)
plot_peaks(data, samples=[20, 21, 22, 23], out_dir=out_dir)
plot_p2p_durations(data, out_dir=out_dir)
plot_p2p_durations_and_best(data, out_dir=out_dir)
if __name__ == '__main__':
main()
#!/usr/bin/env python
import argparse
import json
import matplotlib.pyplot as plt
import numpy as np
import os
from scipy.signal import find_peaks
NUM_ROWS = 37
def main():
parser = argparse.ArgumentParser()
parser.add_argument('json', type=str, help='data to plot')
parser.add_argument('--run-ids', type=int, nargs='+', default=None, help='plot only for these run ids')
parser.add_argument('--out-dir', type=str, default=None, help='output directory for plots')
args = parser.parse_args()
out_dir = args.out_dir
run_ids = args.run_ids
if out_dir is not None:
os.makedirs(out_dir, exist_ok=True)
with open(args.json, 'r') as f:
data = json.load(f)
plot_id = 0
for run_id, d in enumerate(data):
duration = np.array(d['duration'])
elevation = np.array(d['elevation'])
# High peaks (top of each row)
hpids, _ = find_peaks(elevation,
prominence=np.std(elevation),
distance=len(elevation) // NUM_ROWS // 2)
hpids = np.append(hpids, [len(duration)-1])
if len(hpids) != NUM_ROWS:
continue
# Low peaks (bottom of each row)
lpids, _ = find_peaks(-elevation,
prominence=np.std(elevation),
distance=len(elevation) // NUM_ROWS // 2)
lpids = np.append([0], lpids)
if len(lpids) != NUM_ROWS:
continue
if run_ids is not None and not run_id in run_ids:
continue
profiles = []
# go over every up and down;
# row 37 doesn t count as we do not go down.
fig, ax = plt.subplots(figsize=(4,4))
for i in range(NUM_ROWS-1):
start = lpids[i]
mid = hpids[i]
end = lpids[i+1]
tup = duration[start:mid+1]
eup = elevation[start:mid+1]
tdown = duration[mid:end]
edown = elevation[mid:end]
# normalize
tshift = tup[-1]
#tscale = tdown[-1] - tup[0]
tscale = 1
tup = (tup - tshift) / tscale
tdown = (tdown - tshift) / tscale
eshift = eup[0]
escale = edown[0] - eup[0]
edown = (edown - eshift) / escale
eup = (eup - eshift) / escale
ax.plot(tup, eup, color='C0', alpha = i / NUM_ROWS)
ax.plot(tdown, edown, color='C1', alpha = i / NUM_ROWS)
ax.set_xlabel(r'$t$(s)')
ax.set_ylabel('normalized elevation')
ax.set_xlim(-30, 30)
ax.set_ylim(0, 1)
ax.set_title(f"{int(duration[-1]//60)} minutes {int(duration[-1]%60)} sec")
plt.tight_layout()
print(run_id)
if out_dir is None:
plt.show()
else:
plt.savefig(os.path.join(out_dir, f'{plot_id:03d}.svg'))
plot_id += 1
plt.close(fig)
if __name__ == '__main__':
main()
#!/usr/bin/env python
import argparse
import json
import matplotlib.pyplot as plt
import numpy as np
import os
from scipy.signal import find_peaks
from scipy.optimize import least_squares
NUM_ROWS = 37
def piecewise_linear(t, aup1, tau12, aup2, bup2, ado):
# bup1 + aup1 * tau12 = bup2 + aup2 * tau12
bup1 = bup2 + tau12 * (aup2 - aup1)
bdo = bup2
return np.where(
t < tau12,
np.maximum(bup1 + aup1 * t, 0),
np.where(t < 0,
bup2 + aup2 * t,
bdo + ado * t))
def find_piecewise_linear_params(t, e):
def fun(p):
aup1, tau12, aup2, bup2, ado = p
pred = piecewise_linear(t, *p)
diff = pred - e
# prior
logprior_res = [(tau12 + 15) / 5,
max(aup2 - aup1, 0) / 0.01, # prior: aup1 > aup2
max(-aup1, 0) / 0.01, # prior: aup1 > 0
max(-aup2, 0) / 0.01, # prior: aup2 > 0
max(ado, 0) / 0.01] # prior: ado < 0
diff = np.append(diff, logprior_res)
return diff
res = least_squares(fun, [0.6, -15, 0.4, 13, -0.5])
return res.x
def main():
parser = argparse.ArgumentParser()
parser.add_argument('json', type=str, help='data to plot')
parser.add_argument('--run-ids', type=int, nargs='+', default=None,
help='plot details for these run ids')
parser.add_argument('--out-dir', type=str, default=None, help='output directory for plots')
args = parser.parse_args()
out_dir = args.out_dir
run_ids = args.run_ids
if out_dir is not None:
os.makedirs(out_dir, exist_ok=True)
if run_ids is not None:
os.makedirs(os.path.join(out_dir, 'detailed'), exist_ok=True)
with open(args.json, 'r') as f:
data = json.load(f)
all_params = []
durations = []
plot_id = 0
for run_id, d in enumerate(data):
duration = np.array(d['duration'])
elevation = np.array(d['elevation'])
# High peaks (top of each row)
hpids, _ = find_peaks(elevation,
prominence=np.std(elevation),
distance=len(elevation) // NUM_ROWS // 2)
hpids = np.append(hpids, [len(duration)-1])
if len(hpids) != NUM_ROWS:
continue
# Low peaks (bottom of each row)
lpids, _ = find_peaks(-elevation,
prominence=np.std(elevation),
distance=len(elevation) // NUM_ROWS // 2)
lpids = np.append([0], lpids)
if len(lpids) != NUM_ROWS:
continue
# go over every up and down;
# row 37 doesn t count as we do not go down.
ts = []
es = []
ps = []
for i in range(NUM_ROWS-1):
start = lpids[i]
mid = hpids[i]
end = lpids[i+1]
t = duration[start:end]
e = elevation[start:end]
# normalize
t -= t[mid - start]
e -= e[0]
ps.append(find_piecewise_linear_params(t, e))
ts += t.tolist()
es += e.tolist()
ts = np.array(ts)
es = np.array(es)
ps = np.array(ps)
if run_id in run_ids:
fig, ax = plt.subplots(figsize=(4,4))
aup1, tau12, aup2, bup2, ado = find_piecewise_linear_params(ts, es)
t_ = np.linspace(np.min(ts), np.max(ts), 1000)
e_ = piecewise_linear(t_, aup1, tau12, aup2, bup2, ado)
ax.plot(ts, es, '+', color='C0')
ax.plot(t_, e_, '-', color='C1')
ax.plot([tau12], [bup2 + aup2 * tau12], 'ok')
ax.set_xlabel(r'$t$(s)')
ax.set_ylabel('elevation (m)')
ax.set_xlim(-30, 30)
ax.set_ylim(0, 14)
ax.set_title(f"{int(duration[-1]//60)} minutes {int(duration[-1]%60)} sec")
plt.tight_layout()
if out_dir is None:
plt.show()
else:
plt.savefig(os.path.join(out_dir, 'detailed', f"{plot_id:03d}.svg"))
plt.close(fig)
plot_id += 1
all_params.append(ps.copy())
durations.append(duration[-1])
durations = np.array(durations)
all_params = np.array(all_params) # shape (nsamples, nrows, nparams)
order = np.argsort(durations)
durations = durations[order]
all_params = all_params[order]
aup1, tau12, aup2, bup2, ado = np.transpose(all_params, (2, 0, 1)) # shape(nsamples, nrows)
bup1 = bup2 + tau12 * (aup2 - aup1)
bdo = bup2
tzone1 = bup1 / aup1 + tau12
tzone2 = -tau12
tzone3 = -bdo / ado
tzones = [
np.mean(tzone1),
np.mean(tzone2),
np.mean(tzone3)
]
tzones_fast = [
np.mean(tzone1[:10]),
np.mean(tzone2[:10]),
np.mean(tzone3[:10])
]
fig, ax = plt.subplots()
ax.bar(np.arange(1,4)-0.15, tzones, width=0.3, label='all runs',
hatch='///', facecolor='none', edgecolor='C0', linewidth=2)
ax.bar(np.arange(1,4)+0.15, tzones_fast, width=0.3, label='faster runs',
hatch='\\\\\\', facecolor='none', edgecolor='C1', linewidth=2)
ax.set_ylabel('duration (s)')
ax.set_xticks([1, 2, 3])
ax.set_xticklabels([f'zone {i}' for i in range(1,4)])
ax.legend()
plt.tight_layout()
if out_dir is None:
plt.show()
else:
plt.savefig(os.path.join(out_dir, "time_zones_comp.svg"))
plt.close(fig)
if __name__ == '__main__':
main()