[RFC] Improve package delivery for RF24 modules
-
This is a SN7440 clone from Soviet times, right?
Exactly. Still have a handful of those sitting somewhere in the closet.
A wait() is not a good idea because it introduces recursion of unknown deep, depending from the users receive() implementation.
Yep, I thought the same. Something like (ARD * 15 / 2) seems like a good delay.
But exponential backoff is also not a good idea, because this can become a long delay of a second or more which blocks all time the main loop. But a short random delay of some 10th ms (like mentioned by @skywatch) worked for me in my setup.
Exponential backoff would be a nice feature for NodeManager, but that would only take care to deliver sensor data to the uplink (unless we ack data back all the time). This wouldn't help if the traffic jam occurs somewhere further up.
Another idea I thought of, would be listening if any carrier is present. (nRF24L01+ has that feature) and sending any packets only once there's no carrier + adding a random delay. This would hopefully avoid collisions, even with other things, like WiFi.
@Necromant said in [RFC] Improve package delivery for RF24 modules:
Another idea I thought of, would be listening if any carrier is present. (nRF24L01+ has that feature) and sending any packets only once there's no carrier + adding a random delay. This would hopefully avoid collisions, even with other things, like WiFi.
Yes, this sounds interesting. Do you think of the RPD register value (Received Power Detector) from RF24?
But it will only work with the RF24 transport drivers (like @skywatch remarked above).Meanwhile I'm ready with the first setup to produce a reproduceable packet lost with your idea of sending multiple messages immediately one after other. The test project repo with gateway and one node is:
MySensors.IssueProjectsThe MySensors core changes are in my project fork branch:
topic/mkr/issueTransportHalRetryUnfortunately the #1477/Setting-01 solution don't produces any lost packet with ESP8266/160MHz as gateway and Nano/12MHz as node. So I would say that a fast gateway, sending multiple messages to same node in a sequence without wait() in between, is not the reason for lost packages. I have checked with up to 256 packets in a row. All perfectly transferred without any one lost.
My next try is to use two nodes which send messages at the same time to gateway. I think of somehow synchronize the nodes by a wire between GPIOs. I will let you know.
But maybe someone has a better idea for a setup which produces a packet lost for sure?
-
@Necromant said in [RFC] Improve package delivery for RF24 modules:
Another idea I thought of, would be listening if any carrier is present. (nRF24L01+ has that feature) and sending any packets only once there's no carrier + adding a random delay. This would hopefully avoid collisions, even with other things, like WiFi.
Yes, this sounds interesting. Do you think of the RPD register value (Received Power Detector) from RF24?
But it will only work with the RF24 transport drivers (like @skywatch remarked above).Meanwhile I'm ready with the first setup to produce a reproduceable packet lost with your idea of sending multiple messages immediately one after other. The test project repo with gateway and one node is:
MySensors.IssueProjectsThe MySensors core changes are in my project fork branch:
topic/mkr/issueTransportHalRetryUnfortunately the #1477/Setting-01 solution don't produces any lost packet with ESP8266/160MHz as gateway and Nano/12MHz as node. So I would say that a fast gateway, sending multiple messages to same node in a sequence without wait() in between, is not the reason for lost packages. I have checked with up to 256 packets in a row. All perfectly transferred without any one lost.
My next try is to use two nodes which send messages at the same time to gateway. I think of somehow synchronize the nodes by a wire between GPIOs. I will let you know.
But maybe someone has a better idea for a setup which produces a packet lost for sure?
@virtualmkr I have your sketches a quick look. Seems like you have debug enabled on the gateway. Esp8266 has to deal with WiFi stack &mqtt handling despite running on pretty high speed. This may introduce a lot of delays.
And my wild guess would be that that is enough for m328p to chew up.I'll set up my 'staging' gateway in a radio noisy environment and give your test a spin this weekend.
-
@virtualmkr I have your sketches a quick look. Seems like you have debug enabled on the gateway. Esp8266 has to deal with WiFi stack &mqtt handling despite running on pretty high speed. This may introduce a lot of delays.
And my wild guess would be that that is enough for m328p to chew up.I'll set up my 'staging' gateway in a radio noisy environment and give your test a spin this weekend.
@Necromant You are right, an ESP8266 with WiFi may behave differently than an STM32. Unfortunately I don't have a working STM32 as gateway.
Meanwhile I created two additional test settings with gateway and two nodes, where the used gateway type is not important. The test settings are available in my repo MySensors.IssueProjects.
- Setting-02 creates a race condition of two nodes sending a message to the gateway at the same time.
- Setting-03 creates a conflict when both nodes send a message to each other at the same moment. For that one node is in repeater mode and the other node is associated to the repeater.
The approach of synchronizing the two nodes via GPIOs and a wire to send at the same moment works very well and reproducibly in Setting-02. The only problem was when all the sent messages in a row fail, that then the internal self-healing mechanism of MySensor's transport logic reinitializes the radio. To avoid this, I added logic that sends a successful message after 4 failed messages. With this the MySensors self-healing mechanism is then satisfied.
Setting-03 does not work properly yet, because the N2N logic causes every message from node to the repeater to be sent twice.
I will create an issue about this in the MySensors repo.
-
This is how Setting-02 looks like with my good old Saleae LA when both nodes send at the same moment:

You can see, both nodes send the message 16 times because ARD is 15 by default.
And both without success. -
@Necromant You are right, an ESP8266 with WiFi may behave differently than an STM32. Unfortunately I don't have a working STM32 as gateway.
Meanwhile I created two additional test settings with gateway and two nodes, where the used gateway type is not important. The test settings are available in my repo MySensors.IssueProjects.
- Setting-02 creates a race condition of two nodes sending a message to the gateway at the same time.
- Setting-03 creates a conflict when both nodes send a message to each other at the same moment. For that one node is in repeater mode and the other node is associated to the repeater.
The approach of synchronizing the two nodes via GPIOs and a wire to send at the same moment works very well and reproducibly in Setting-02. The only problem was when all the sent messages in a row fail, that then the internal self-healing mechanism of MySensor's transport logic reinitializes the radio. To avoid this, I added logic that sends a successful message after 4 failed messages. With this the MySensors self-healing mechanism is then satisfied.
Setting-03 does not work properly yet, because the N2N logic causes every message from node to the repeater to be sent twice.
I will create an issue about this in the MySensors repo.
@virtualmkr Great work. Meanwhile, I have updated my homeassistant installation and set up a second, 'staging' network with the modules I have around. I think I can arrange remote access to this setup later, if that's needed.
It seems to me that the issue I've been experiencing is partially related to HomeAssistant's way of working with mysensors actuators. If I 'gang-switch' a bunch of lights I see the following in the log: (note the 7;X;1;1;2;1)
Mar 08 19:14:20 bladeling hass[26390]: 2021-03-08 19:14:20 DEBUG (MainThread) [homeassistant.components.mysensors.gateway] Node update: node 7 child 8 Mar 08 19:14:20 bladeling hass[26390]: 2021-03-08 19:14:20 DEBUG (MainThread) [mysensors.transport] Sending 7;3;1;1;2;1 Mar 08 19:14:20 bladeling hass[26390]: 2021-03-08 19:14:20 DEBUG (MainThread) [mysensors.transport] Sending 7;4;1;1;2;1 Mar 08 19:14:20 bladeling hass[26390]: 2021-03-08 19:14:20 DEBUG (MainThread) [mysensors.transport] Sending 7;5;1;1;2;1 Mar 08 19:14:20 bladeling hass[26390]: 2021-03-08 19:14:20 DEBUG (MainThread) [mysensors.transport] Sending 7;6;1;1;2;1 Mar 08 19:14:20 bladeling hass[26390]: 2021-03-08 19:14:20 DEBUG (MainThread) [mysensors.transport] Sending 7;7;1;1;2;1 Mar 08 19:14:20 bladeling hass[26390]: 2021-03-08 19:14:20 DEBUG (MainThread) [mysensors.transport] Sending 7;8;1;1;2;1 Mar 08 19:14:20 bladeling hass[26390]: 2021-03-08 19:14:20 DEBUG (MainThread) [homeassistant.components.mysensors.device] Entity update: Red Wisp 7 8: value_type 2, value = 1 Mar 08 19:14:20 bladeling hass[26390]: 2021-03-08 19:14:20 DEBUG (MainThread) [homeassistant.components.mysensors.device] Entity update: Red Wisp 7 8: value_type 3, value = 100 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [mysensors.transport] Receiving 4;2;1;0;2;0 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.gateway] Node update: node 4 child 2 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [mysensors.transport] Receiving 4;2;1;0;2;0 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.gateway] Node update: node 4 child 2 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [mysensors.transport] Receiving 4;2;1;0;2;0 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.gateway] Node update: node 4 child 2 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [mysensors.transport] Receiving 4;2;1;0;3;0 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.gateway] Node update: node 4 child 2 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [mysensors.transport] Receiving 4;2;1;0;3;0 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.gateway] Node update: node 4 child 2 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.device] Entity update: Strip Wisp 4 2: value_type 2, value = 0 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.device] Entity update: Strip Wisp 4 2: value_type 3, value = 0 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [mysensors.transport] Receiving 4;2;1;0;3;0 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.gateway] Node update: node 4 child 2 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [mysensors.transport] Receiving 7;253;1;0;37;-45 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.gateway] Node update: node 7 child 253 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [mysensors.transport] Receiving 7;253;1;0;37;-45 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.gateway] Node update: node 7 child 253 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.device] Entity update: Strip Wisp 4 2: value_type 2, value = 0 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.device] Entity update: Strip Wisp 4 2: value_type 3, value = 0 Mar 08 19:14:21 bladeling hass[26390]: 2021-03-08 19:14:21 DEBUG (MainThread) [homeassistant.components.mysensors.device] Entity update: Red Wisp 7 253: value_type 37, value = -45HomeAssistant requests acks for all packets and dumps them all at once to the gateway. With half-duplex nodes this is just asking for trouble. I'm diving right now into hass component code to see if I can add a little extra debugging.
-
This is how Setting-02 looks like with my good old Saleae LA when both nodes send at the same moment:

You can see, both nodes send the message 16 times because ARD is 15 by default.
And both without success.@virtualmkr Thanks for the nice pictures. With that in mind I'm pretty much sure I'm getting a clash with the acks going back the the gw somewhere on the way, especially when a repeater or two is involved.
-
@virtualmkr Thanks for the nice pictures. With that in mind I'm pretty much sure I'm getting a clash with the acks going back the the gw somewhere on the way, especially when a repeater or two is involved.
Hi @necromant, thanks for your HA setup and investigations. Yes, when a message is sent to an actuator, it needs to respond with the new status for HA. If you control multiple actuators quickly one after the other it creates perfect traffic jam in the MySensors network.
Now we just have to find an algorithm how to best resolve these collisions.
Your HA project is then a perfect real-world test for the algorithm. -
I see 2 ways and one hacky way:
- Agressive buffering with some delays, e.g. only switch modes after N ms after last packet is received, so that repeaters absorb all data bursts. Perhaps even TX buffering.
- Adjust HomeAssistant controller code: It should not request ack, but instead expect devices to send back the new state some time soon after flipping state (and retry those packets)
- The hacky way: Make HomeAssistant wait for the ack, before sending the next command for the node. Perhaps the easiest, but we'll have to bother some of the devs working on that integration.
-
I gave a the HAL code a more detailed review, so I think there's a possibility to implement something like a simple 'collision avoidance' using RPD register. First, here's some info from the datasheet:

Now, let's do some calculations.
We have ~42 byte packets max (1 + 5 + 2 + 32 + 2). These take 42/250000.0 * 1000 * 1000 = 168 uS of radio time at 250000. With that in mind, I'd try something like this.LOCAL bool RF24_sendMessage(const uint8_t recipient, const void *buf, const uint8_t len, const bool noACK) { int retry = 5; while (retry--) { RF24_stopListening(); if (RF24_testRPD()) { //Something was talking on the radio, we have to wait for a while RF24_startListening(); //Start listening again and wait. delay_us(180 * 2); // Delay enough time to chew up at least 2 radio packets at 250Kbps }But that would be RF24-specific. Another idea is making the delays dependent on NODE_ID and implement something like a simple bus arbiter, so that nodes with lower NODE_ID have more priority. I will only be able to give this a shot on the weekend, so feel free to try it out.
-
I gave a the HAL code a more detailed review, so I think there's a possibility to implement something like a simple 'collision avoidance' using RPD register. First, here's some info from the datasheet:

Now, let's do some calculations.
We have ~42 byte packets max (1 + 5 + 2 + 32 + 2). These take 42/250000.0 * 1000 * 1000 = 168 uS of radio time at 250000. With that in mind, I'd try something like this.LOCAL bool RF24_sendMessage(const uint8_t recipient, const void *buf, const uint8_t len, const bool noACK) { int retry = 5; while (retry--) { RF24_stopListening(); if (RF24_testRPD()) { //Something was talking on the radio, we have to wait for a while RF24_startListening(); //Start listening again and wait. delay_us(180 * 2); // Delay enough time to chew up at least 2 radio packets at 250Kbps }But that would be RF24-specific. Another idea is making the delays dependent on NODE_ID and implement something like a simple bus arbiter, so that nodes with lower NODE_ID have more priority. I will only be able to give this a shot on the weekend, so feel free to try it out.
Hi @Necromant, thank you for your comments regarding the RPD feature of the nRF24+.
I have done some experiments with it. The result is a new tool TrafficDetectorRF24, which is available in my MySensors.Tools repository. The tool scans a single channel and outputs the current status via a debug pin. This can be used to connect a LED or better an input of a logic analyser.
At first I tried to use the RPD feature based on the MySensors example PassiveNode.
Unfortunately, that didn't work at all for me. I then adapted the code from Rolf Henkel Poor Man's Wireless 2.4GHz Scanner for my purposes.The detector works quite accurately (resolution approx. 140us) so that you can usually detect the transmitted telegram and the ACK response of the receiver individually:

I hope you have more success in your attempts with the RPD feature.