Strategy for dealing with intermittent data events if you can only poll?

cjard · May 15, 2023

Thought exercise for you folks

I have a device here, let's think of it like an electricity smart meter. It takes a reading every 30 seconds and sends it to a data warehouse. The warehouse has an API that I can poll to get the reading. The device has a clock and it timestamps the readings and I get the timestamp that the device recorded, but I have no idea if the time is accurate.

I'm aiming to poll for a new reading as soon as possible after it will have been made available, without knowing when, exactly, it was taken or when I can poll to get it. Transmission delays and unreliability mean that the timestamp may be minutes old by the time I get it and, remember, I have no guarantee that the time is accurate -but it does seem useful for determining if the value I've just read is the same as one I saw before. Observationally, just because one reading is late or missing it doesn't seem to be the case that a subsequent one will be

For example, suppose the device is reading on 10 seconds and 40 seconds past the minute. The recent readings support this notion: the times always end in a 10 and 40 seconds. I might do a read now, and get a reading timestamped 12:34:10. I might do another read in 30 seconds and get 12:34:40.. Or I might get 12:34:10 again, thanks to delays or unreliability.

I don't want to just blindly read every 0.5s as this will burn through my data allowance, but I'd like to get to a scenario where mostly I can get a reading within a few seconds of it being possible to get it

--

I've employed one strategy, whereby the period I read on is flexible; if I detect a re-read I schedule another to occur in 2 seconds, then 4, then 8, then 16 etc up to 128 seconds. If a device is offline I'll keep trying it every 128 seconds. If I successfully read a different reading this time than last time, I shorten the re-read interval to somewhere between 16 and 29.825 seconds depending on how many re-reads it took to get a new value after a repeat. By reading less than 30 seconds it means that my read attempts will gradually creep closer to the moment when the data was available

I'm also pondering a strategy where I look at the time delta between the current time that I got a new read and the time the meter claims the read was taken. If I take the mean or mode delta I should be able to predict when, according to my own current clock, when the next reading should be available.. But I haven't coded anything for that.

What strategies have you come across, if any, or can envisage for this kind of scenario?

Skydiver · May 15, 2023

The rule of thumb is that you want a sampling rate of at least double the the data rate. So if your data is supposed to be at every 30 seconds, you'll want to be sampling every 15 seconds.

As for the problem of trying to get those samples as close to the edge of the event as possible, I would suggest phase shifting the polls more and more to the left (e.g closer to the edge event) while keeping as success rate of 80% (or whatever percentage that you want). If you've gone too far the left, or the data shifts, then obviously shift to the right to compensate. The distance to shift maybe interesting -- perhaps using the natural logarithm as basis?

JasinCole · May 16, 2023

probably stupid to you, but when I think of data that is being read and adjustments need to be made based on said input for optimal output I allows think of a PID controller.

Strategy for dealing with intermittent data events if you can only poll?

cjard

Well-known member

Skydiver

JasinCole

Well-known member

Similar threads

Share this page

Latest posts