Lume TLF+ToF sensor: a controlled bench characterization (2026-04-18 → 2026-04-19) plus continuing desktop-dashboard recordings (2026-04-22 onward). Annotations mark each operator-driven condition change.
Premise: this is clean tap water, so the true fluorescence signal should be a constant across the record. The twist shake condition is the bubble-free baseline — any residual TLF variation in those periods reflects sensor / optical / temperature drift, not bubble scatter. Multivariate OLS is fit only on twist-shake samples (model_tof_raw < 45), with three pre-filters: rows within ±2 min of any operator annotation, points flagged by a Hampel filter on TLF, and the top 2% by |residual| after a first-pass OLS. Features: ToF, temp_c, temp_c² (thermal nonlinearity), ToF×temp_c (interaction), and one-hot session dummies (sensor reseating between recordings shifts the intercept). Note: turb_raw and hdc2080_temp_c are dropped because both are byte-identical to other columns in this dataset (turb_raw == model_tof_raw, hdc2080_temp_c == model_temp_c) — they're not independent measurements. The fitted bias is subtracted from raw TLF (mean-preserving) and a centred rolling-mean low-pass filter over LP_WINDOW samples is layered on top.
A single Lume sensor sits in a bench fixture, sampling at one reading per minute (~0.017 Hz). Whenever the operator changes the sample condition (lifts the sensor into air, returns it to still water, or twist-shakes it), an annotation is written inline to the data stream. Recording is continuous across sessions — the chart shows the original 2026-04-18 → 2026-04-19 run plus subsequent observations from the same fixture appended via the desktop dashboard. Three signals are plotted in stacked panels:
model_tof_raw is essentially bimodal: it sits near 25 whenever the sensor is in water and jumps to ~70 whenever the sensor is in air. Transitions between the two states are effectively step functions — a fixed threshold near 45 separates the two conditions with no ambiguity on this dataset.
Air-phase sipm_mon2_raw values (roughly 2000–2600) overlap heavily with still-water values. The signal is temperature- and condition-sensitive, so any air/water gate built on TLF alone would need the ToF channel as a prior.
In every water-no-shake region the TLF reads noticeably higher than it does immediately after a twist-shake. The ToF channel does shift — its median moves from ~24.5 in post-shake water to ~27.1 in bubbly water, and its standard deviation inflates roughly 6× (0.6 → 3.4) — but it stays well below the ~45 air/water threshold throughout, so a binary water-vs-air gate built on ToF won’t see the bubbles. The TLF rises far more sharply, consistent with small bubbles scattering extra light into the SiPM. Note: the dataset has a column called turb_raw, but it’s byte-identical to model_tof_raw on every row of this CSV — this sensor doesn’t carry a separate independent turbidity channel, so all the “turbidity” signal we see is just the ToF channel.
Each twist-shake event produces an immediate drop in sipm_mon2_raw well below the still-water reading that preceded it — typical post-shake TLF values across the recording fall in the ~480–1300 range. Because the water is clean tap water (no measurable real turbidity or fluorophore content), this drop is the true bubble-free optical baseline of the sensor in water. The post-shake baseline is not perfectly repeatable across shake events; with turbidity ruled out by the clean-water condition, the residual variation must come from sensor-side effects — partial re-bubbling between annotations, sensor seating angle, optical-window contact, or temperature-driven drift in the SiPM channel. The bubble-free TLF level is therefore best treated as a recent reference value rather than a fixed threshold.
A two-stage decision makes physical sense: (a) use model_tof_raw as a hard gate for air vs. water, and (b) within water, treat the TLF reading as an upper bound that can be inflated by trapped bubbles. A periodic agitation cycle or a bubble-scrub routine before measurement would make the TLF reading reflect the water itself rather than the air trapped against the window. Because the absolute TLF baseline drifts over time, bubble-detection logic should reference a recent post-disturbance baseline rather than a hard-coded threshold.
The field problem. In deployment we cannot inspect the optical window — we only see what the sensor reports. ToF reliably tells us the sensor is in water (a hard threshold around 45 separates water from air with no ambiguity in this dataset), but within the water band ToF moves only modestly when bubbles trap against the window. A bubble-inflated TLF reading would be misread by the E. coli classifier as biological signal, producing false positives in microbiologically clean water.
What makes this dataset uniquely useful. The water in the recording is clean tap water with no measurable real turbidity and no real fluorophore content. So any TLF excursion or sub-air-threshold ToF excursion observed while submerged must be coming from bubbles trapped on the window — not from microbes, not from suspended sediment. The numbers below were computed from the 2026-04-18 → 2026-04-19 portion of the recording, which has the densest controlled water-no-shake / twist-shake comparisons. Note: turb_raw in the CSV is byte-identical to model_tof_raw — this sensor does not carry a separate independent turbidity channel, so what was previously called the “turbidity” signal is just ToF in its water band:
Three signals separate bubbly water from bubble-free water:
Run the existing E. coli model exactly as today, then attach a bubble-confidence layer that gates the prediction. Because there is no independent turbidity channel, the gate has to be derived from the same two channels (ToF and TLF) that the prediction itself depends on:
model_tof_raw while ToF reports water (<~45). If the current reading deviates from baseline by more than ~1.5 σ while still in the water band, raise the bubble flag.sipm_mon2_raw. If it exceeds the per-site post-shake std (call it σ0) by ~1.5×, raise the bubble flag. This is the only signal that’s reasonably independent of the ToF channel.Real source water has actual particulate scatter from suspended sediment, organic matter, and biota — and ToF responds to particulate scatter the same way it responds to bubbles (both reduce the clean returned pulse from the optical window). Deploying these thresholds verbatim would mistake real turbidity for bubbles. The fix is to calibrate per site: on first deployment, capture (a) the natural ToF baseline of the source under quiescent conditions, and (b) the post-agitation baseline. The delta between these two states is the bubble signal — the same quantity this bench dataset measures, just on top of a non-zero source-water ToF floor instead of zero. A more robust long-term fix is to add an independent turbidity channel to the sensor; the bubble-vs-sediment ambiguity is fundamental to a single-optical-channel design.
The existing E. coli model uses TLF + temperature + ToF and has no input that can distinguish bubble-driven TLF from microbial TLF. Adding ToF-deviation and TLF-variance as gates (rather than features, which would require retraining) is the lightest-weight path to deploying a bubble-aware classifier without changing the model itself.
All three panels share one timebase. Annotation rows in the source CSV contain only a timestamp and a note field; each annotation marks the start of a section that runs until the next annotation. The three plotted categories merge the raw labels: “Air”/“air” → air, “water no shake” → water no shake, “twist shake”/“submerged”/“air cleared” → twist shake. Time gaps longer than 1 hour are visually compressed in the chart with a // break marker.
Source: data.csv in this project. The interactive chart fetches the CSV directly on each page load via Plotly — refresh to pick up new data. A static fallback (plot.png) is also produced by python3 plot.py. New rows from the Lume desktop dashboard are appended via ./update.sh, which pulls lumelog.csv from SweetSenseInc/lume_desktop_dashboard (branch pc-sandbox), appends only timestamps strictly newer than the latest in data.csv, regenerates the static plot, and redeploys.