Lume Bench Annotations

Lume TLF+ToF sensor: a controlled bench characterization (2026-04-18 → 2026-04-19) plus continuing desktop-dashboard recordings (2026-04-22 onward). Annotations mark each operator-driven condition change.

Loading data…

air (sensor exposed) water, no shake twist shake / submerged

Drag to zoom · Double-click to reset · Gaps > 1 h compressed

TLF vs Temperature

scatter, coloured by annotation; dashed lines are per-category OLS fits

Drag to zoom · Double-click to reset

TLF vs ToF

scatter, coloured by annotation; dashed lines are per-category OLS fits

Drag to zoom · Double-click to reset

3D surface — TLF as f(ToF, Temperature)

binned median TLF on a ToF×Temp grid (twist-shake + production combo); coloured points overlay all categories

Click-drag to rotate · Scroll to zoom · Double-click to reset

Bias-corrected TLF

Premise: this is clean tap water, so the true fluorescence signal should be a constant across the record. The twist shake condition is the bubble-free baseline — any residual TLF variation in those periods reflects sensor / optical / temperature drift, not bubble scatter, so a correct bias model should drive σ(twist-shake TLF) toward zero using only quantities the deployed sensor can compute on its own. The features here are therefore strictly deployment-realistic: ToF, temp_c, ToF×temp_c (matching the production CFU regression channel set), plus a single global burn-in transient exp(−t_{since_power_on} / τ) with τ = 60 min — one global coefficient learned at calibration and applied universally, on the assumption the firmware tracks time-since-power-on (here we proxy it from > 1 h gaps in the recording, the same signal a power-cycle would produce). Earlier drafts of this fit added per-session intercept dummies and per-session burn-in amplitudes, which drove σ(corrected) lower on this dataset, but those are calibration-time artefacts: a deployed sensor in a new session has no way to compute a session-specific dummy, and per-session burn-in amplitudes can't be transferred either. Likewise no per-segment data cutoffs (e.g. dropping the opening of a segment to avoid a labelling-error spike) — those are post-hoc edits the field can't replicate. The honest cost of removing them is that residual cross-session offsets remain in corrected_TLF — sensor reseating, optical-window contact, calibration drift over weeks; the per-session diagnostic block below shows their magnitude. Upstream preprocessing matches production: rows are filtered to led_raw == 512 and sipm_bias_raw ∈ [2980, 3020] before fitting (TLF response is bias-dependent; the CFU/NTU calibration only applies in that operating mode). The fit uses twist-shake samples only (model_tof_raw < 45) with three further calibration-time pre-filters that do not need to run in deployment: rows within ±2 min of any operator annotation, points flagged by a Hampel filter on TLF, and the top 2% by |residual| after a first-pass OLS. Note: turb_raw and hdc2080_temp_c are dropped because both are byte-identical to other columns in this dataset (turb_raw == model_tof_raw, hdc2080_temp_c == model_temp_c) — they are not independent measurements. The fitted bias is subtracted from raw TLF (mean-preserving) and a centred rolling-mean low-pass filter over LP_WINDOW samples is layered on top.

Computing…

Drag to zoom · Double-click to reset

Test Setup

A single Lume sensor sits in a bench fixture, sampling at one reading per minute (~0.017 Hz). Whenever the operator changes the sample condition (lifts the sensor into air, returns it to still water, or twist-shakes it), an annotation is written inline to the data stream. Recording is continuous across sessions — the chart shows the original 2026-04-18 → 2026-04-19 run plus subsequent observations from the same fixture appended via the desktop dashboard. Three signals are plotted in stacked panels:

sipm_mon2_raw: SiPM monitor channel on the TLF (time-resolved fluorescence) detector — proportional to the fluorescence signal.
model_temp_c: Model-input temperature (°C) derived from the HDC2080.
model_tof_raw: Raw signal from the ToF (time-of-flight) distance sensor that looks into the sample cell.

Conditions

Air — sensor lifted out of the cell and exposed to ambient air.
Water, no shake — sensor submerged in tap water with no deliberate agitation.
Twist shake / submerged — sensor submerged and twist-shaken to dislodge bubbles. The initial “submerged and air cleared” period is grouped with this category because it represents the same intent: a cleanly wetted optical window with no trapped air.

Findings

1. The ToF channel is a clean air-vs-water discriminator.

model_tof_raw is essentially bimodal: it sits near 25 whenever the sensor is in water and jumps to ~70 whenever the sensor is in air. Transitions between the two states are effectively step functions — a fixed threshold near 45 separates the two conditions with no ambiguity on this dataset.

2. The TLF channel alone cannot separate air from water.

Air-phase sipm_mon2_raw values (roughly 2000–2600) overlap heavily with still-water values. The signal is temperature- and condition-sensitive, so any air/water gate built on TLF alone would need the ToF channel as a prior.

3. Still water traps micro-bubbles that drive the TLF strongly and shift the ToF only modestly.

In every water-no-shake region the TLF reads noticeably higher than it does immediately after a twist-shake. The ToF channel does shift — its median moves from ~24.5 in post-shake water to ~27.1 in bubbly water, and its standard deviation inflates roughly 6× (0.6 → 3.4) — but it stays well below the ~45 air/water threshold throughout, so a binary water-vs-air gate built on ToF won’t see the bubbles. The TLF rises far more sharply, consistent with small bubbles scattering extra light into the SiPM. Note: the dataset has a column called turb_raw, but it’s byte-identical to model_tof_raw on every row of this CSV — this sensor doesn’t carry a separate independent turbidity channel, so all the “turbidity” signal we see is just the ToF channel.

4. Twist-shaking reveals the true water-only TLF baseline.

Each twist-shake event produces an immediate drop in sipm_mon2_raw well below the still-water reading that preceded it — typical post-shake TLF values across the recording fall in the ~480–1300 range. Because the water is clean tap water (no measurable real turbidity or fluorophore content), this drop is the true bubble-free optical baseline of the sensor in water. The post-shake baseline is not perfectly repeatable across shake events; with turbidity ruled out by the clean-water condition, the residual variation must come from sensor-side effects — partial re-bubbling between annotations, sensor seating angle, optical-window contact, or temperature-driven drift in the SiPM channel. The bubble-free TLF level is therefore best treated as a recent reference value rather than a fixed threshold.

Implication for the classifier

A two-stage decision makes physical sense: (a) use model_tof_raw as a hard gate for air vs. water, and (b) within water, treat the TLF reading as an upper bound that can be inflated by trapped bubbles. A periodic agitation cycle or a bubble-scrub routine before measurement would make the TLF reading reflect the water itself rather than the air trapped against the window. Because the absolute TLF baseline drifts over time, bubble-detection logic should reference a recent post-disturbance baseline rather than a hard-coded threshold.

Toward a bubble-aware field algorithm

The field problem. In deployment we cannot inspect the optical window — we only see what the sensor reports. ToF reliably tells us the sensor is in water (a hard threshold around 45 separates water from air with no ambiguity in this dataset), but within the water band ToF moves only modestly when bubbles trap against the window. A bubble-inflated TLF reading would be misread by the E. coli classifier as biological signal, producing false positives in microbiologically clean water.

What makes this dataset uniquely useful. The water in the recording is clean tap water with no measurable real turbidity and no real fluorophore content. So any TLF excursion or sub-air-threshold ToF excursion observed while submerged must be coming from bubbles trapped on the window — not from microbes, not from suspended sediment. The numbers below were computed from the 2026-04-18 → 2026-04-19 portion of the recording, which has the densest controlled water-no-shake / twist-shake comparisons. Note: turb_raw in the CSV is byte-identical to model_tof_raw — this sensor does not carry a separate independent turbidity channel, so what was previously called the “turbidity” signal is just ToF in its water band:

TLF in water + bubbles: median 1944 (range 955–2629)
TLF in water, post twist-shake: median 1008 (range 465–1295)
ToF in water + bubbles: median 27.1 (std 3.4)
ToF post twist-shake: median 24.5 (std 0.6)
5-min rolling std of TLF, with bubbles: 1.8
5-min rolling std of TLF, post-shake: 0.8
TLF vs ToF correlation, with bubbles: |r| = 0.98 — ToF tracks the same bubble events that drive TLF, just much more weakly in absolute magnitude

Three signals separate bubbly water from bubble-free water:

ToF deviation within the water band. ToF doesn’t cross its air threshold during bubble periods, but it shifts measurably: median rises from 24.5 to 27.1 (~10%) and its std inflates roughly 6× (0.6 → 3.4). A binary water-vs-air gate misses this; a deviation-from-baseline check inside the water band catches it.
TLF temporal variance jumps. Bubbles form and detach on second-to-minute timescales, producing a noisier TLF trace. The 5-min rolling standard deviation of TLF is roughly 2× higher when bubbles are present.
TLF magnitude jumps. Bubble TLF roughly doubles the bubble-free baseline. The bubble-free baseline itself drifts between sensor reseatings (here 600–1300), so this is most useful as a relative signal against a recent post-disturbance reading rather than a fixed threshold.

Proposed bubble-aware classifier

Run the existing E. coli model exactly as today, then attach a bubble-confidence layer that gates the prediction. Because there is no independent turbidity channel, the gate has to be derived from the same two channels (ToF and TLF) that the prediction itself depends on:

1. ToF-deviation flag (inside the water band): Maintain a rolling baseline (e.g. 24-hour 5th-percentile) of model_tof_raw while ToF reports water (<~45). If the current reading deviates from baseline by more than ~1.5 σ while still in the water band, raise the bubble flag.
2. TLF-variance flag: Compute the rolling 5-min std of sipm_mon2_raw. If it exceeds the per-site post-shake std (call it σ₀) by ~1.5×, raise the bubble flag. This is the only signal that’s reasonably independent of the ToF channel.
3. Confidence downgrade: If either flag is raised, do not emit an alert from the E. coli prediction. Either suppress the reading entirely, or report it with explicit “low-confidence: bubble-suspect” metadata so the consuming dashboard can grey-out or annotate the point.
4. Periodic re-baseline: Whenever a known-disturbance event occurs (programmed agitation, a flow surge, a maintenance touch), record the immediate-post-event ToF and TLF as fresh post-shake baselines. This re-anchors the deviation and variance thresholds and corrects for sensor reseating drift.

Calibration caveat for real water sources

Real source water has actual particulate scatter from suspended sediment, organic matter, and biota — and ToF responds to particulate scatter the same way it responds to bubbles (both reduce the clean returned pulse from the optical window). Deploying these thresholds verbatim would mistake real turbidity for bubbles. The fix is to calibrate per site: on first deployment, capture (a) the natural ToF baseline of the source under quiescent conditions, and (b) the post-agitation baseline. The delta between these two states is the bubble signal — the same quantity this bench dataset measures, just on top of a non-zero source-water ToF floor instead of zero. A more robust long-term fix is to add an independent turbidity channel to the sensor; the bubble-vs-sediment ambiguity is fundamental to a single-optical-channel design.

The existing E. coli model uses TLF + temperature + ToF and has no input that can distinguish bubble-driven TLF from microbial TLF. Adding ToF-deviation and TLF-variance as gates (rather than features, which would require retraining) is the lightest-weight path to deploying a bubble-aware classifier without changing the model itself.

Data & Reproducibility

All three panels share one timebase. Annotation rows in the source CSV contain only a timestamp and a note field; each annotation marks the start of a section that runs until the next annotation. The three plotted categories merge the raw labels: “Air”/“air” → air, “water no shake” → water no shake, “twist shake”/“submerged”/“air cleared” → twist shake. Time gaps longer than 1 hour are visually compressed in the chart with a // break marker.

Source: data.csv in this project. The interactive chart fetches the CSV directly on each page load via Plotly — refresh to pick up new data. A static fallback (plot.png) is also produced by python3 plot.py. New rows from the Lume desktop dashboard are appended via ./update.sh, which pulls lumelog.csv from SweetSenseInc/lume_desktop_dashboard (branch pc-sandbox), appends only timestamps strictly newer than the latest in data.csv, regenerates the static plot, and redeploys.