Conversation
|
Looks definitely useful. It has some overlap with LocalStats (which actually can also be requested remotely). Maybe it makes sense to include the new metrics from this module there as well? Then you don't need to enable this for nodes you can reach with a client app. Also, probably good to enforce a minimum broadcast interval to avoid this becoming an airtime hog just like NeighborInfo was. |
I had no idea that LocalStats was remotely requestable. That's good to know. I do think it serves a different purpose though - this one is very much intended as a live diagnostic tool, which is why most of the metrics are windowed. The stuff in LocalStats is mostly on a since-boot basis, isn't it?
Fair point. IMO it should be imposed after a timeout though - because this is a diagnostic tool, I want to be able to set it to e.g. a two minute window specifically in order to get a better resolution on peaks. Would having the value automatically clamped to max(1 hour, configured interval) after e.g. 4 hours of runtime be acceptable? That way small values can stll be configured while actively testing, but it ensures the device doesn't remain in that state indefinitely. |
I believe it's only available using the Python CLI with
Correct. Indeed it's a slightly different purpose and periodic broadcasts can be valuable.
Sounds good to me. |
|
Not actually stale... could someone reopen this please? |
|
@NomDeTom Is your cumulative uptime thing split out somewhere? I'd like to make use of it from this module, in order to have it self-disable if nobody has actively indicated they want it for a while. |
|
We probably should not have two different big packets for stats (local stats already exists and is integrated in the apps and cli), some of these stats are also in device metrics. This diagnostic data burns us over and over by being large. |
This serves a different purpose (see #8455 (comment)). Local stats doesn't have the windowing stuff unfortunately, and thus can't do what this is intended to do. I totally get where you're coming from with stats bloat on the air though - that's why I was asking @NomDeTom about his cumulative uptime thing. Ideally, this module would be default-off, manually enabled when needed, and turn itself off again after a while. It's intended as a testing / diagnostic / planning tool for remote infrastructure sites, not a thing that gets left on all the time, and especially not at a high frequency / short window. However, high frequency is a capability that it needs to have. It doesn't need to be exposed in the apps either IMO; this could easily be a CLI-only thing. I have no issue with it being pretty seriously gated away from casual users. They aren't the target audience for this. |
|
Need to consider the worst case scenario always, assume this will wind up in mesh sense and be queried every minute. CLI calls functions in the firmware, it does not stop huge airtime use. This would probably have to be an admin message as large as it looks and be throttled pretty heavily. |
Every minute is actually a legitimate use-case for this. I use it at that frequency occasionally on some of our high sites here to watch what happens during traffic spikes, so I can get a better idea of what's going on. It's definitely not intended for meshsense. However, ultimately if someone wants to abuse the mesh, they are going to be able to do so no matter how nicely we gate things. There comes a point at which one just needs to accept that people can be dicks about stuff if they are motivitated to be so, and simply hope that they do not. Meshtastic by its very nature is easy to DoS. The trick is just to make sure that people can't do it by accident.
Admin-enabled only, yes. There's zero reason to have this be solicited by random other nodes, and IMO it shouldn't be possible to request it at all; it's something that only the node owner should be able to turn on. The actual stats packets themselves don't need to be admin ones though... they don't contain confidential info, and having the payload be observable outside of an admin session is important.
I strongly disagree on this point. Have it self-disable, yes. And have it automatically increase the interval after a short period, yes. Those measures will ensure that it cannot be accidentally left in an obnoxious state. But throttling it somewhat defeats the purpose; it's intended to be occasionally utilised at a high rate. |
The only place it is located at the moment is in #8378 . @ford-jones was going to look at it, but since the uptime accumulator is a relatively simple piece and I know you're capable of modularising PRs, it could be implemented separately than the role-retirement feature. Adding total-uptime, uptime-in-role, uptime-in-temporary-promotion, etc. would help hang the other functions off it at a later date. My only caveat would be that the total-uptime figure was intended to be quite coarse and long-running - fine enough to capture daytime-only solar nodes, but infrequent enough not to add to the flash burden. I don't know if that aligns with the self-promotion plan. |
Adds some basic routing stats to assist with infrastructure planning and identifying problems. The idea is to have this enabled on infrastructure high sites. I also intend to add replay stats to this later, as part of the replay PR.
Tracked metrics:
Requires meshtastic/protobufs#799
🤝 Attestations