Node stops receiving after some time when using MY_RX_MESSAGE_BUFFER_FEATURE



  • Hi,

    My NRF24L01+ based MySensors network is struggling with nodes that randomly stop receiving messages. Until this week I was not able to find some pattern in this behaviour.

    Now I think I found a link: The nodes that use the MY_RX_MESSAGE_BUFFER_FEATURE with the connected IRQ pin of the transceiver all fail after some time. They are still able to send messages but don't receive anything anymore. Only a reboot fixes the problem. I found that I can replicate this behaviour when I'm sending messages to more than one Child ID of the node in fast succession. But even without doing this the node fails after some time.

    I have played around with different buffer size values (between 10 and 35). It seems like the node is working longer with a higher buffer size.

    As soon as I disable the buffer feature the node works OK and does not stop receiving.

    I can observe this behaviour with version 2.2.0, 2.3.0 and the test version of the 2.3.0 with the modified CE handling. Also on my gateway I am using the PA version of the transceiver, on all of my nodes I have the normal version with PCB antenna.

    Does anybody also have the same problem? Is there maybe a bug in the buffer code? I really would like to use this feature as I have a much quicker and more reliable acknowledgement handling as long as the nodes are running.


  • Mod

    @mathea90 I've used it for over 2 years now on my ethernet gateway and it never failed me.

    I can only imagine that it stops receiving because the stack missed some interrupts from the radio, causing the internal radio queue to be filled. When this queue is totally filled any new incoming messages will be dropped by the radio.
    I don't know why the stack would miss interrupts from the radio though; I assume cabling (especially the interrupt line) is solid. I don't think size of the buffer matters, unless you use slow hardware at a very high message rate.

    What arduino are you using for the nodes?
    Are you sending at a very high rate?
    Could you post node logs and try with serial debug output disabled?



  • @yveaux Thanks for your answer. I am using my own designed PCBs with a bare Atmega328 soldered onto it. The interrupt traces are short with no sharp corners. I have this problem across multiple PCB layouts so I would rule out a design fault. Some of my boards run the Atmega on its internal oscillator at 8MHz and some have an external 16MHz crystal. Both with the same behaviour.

    My gateway is also based on the Atmega328, coupled with a Wiznet W5500 ethernet module. As the NRF24L01+ has to be driven by soft-spi in this configuration, the IRQ pin is not used on my gateway.

    Unfortunately I have not mapped out the RX / TX pins on my PCBs, therefore I have no possibility of logging the serial data ๐Ÿ˜•

    My send rate is not high. One use case e.g. is an LED dimmer with a controllable ramp-time. Therefore, each time before my home automation server sends a new dim value it also sends the desired ramp-time. So it's only two messages directly one after another. Admittedly, I do not know how fast my HA server is sending those messages. But in my understanding the MYS gateway should care about the send timing, right?

    Could it be plausible that the second interrupt fires while the ยตC is still reading the first message from the transceiver, thus getting a messed up message stack? Or maybe there is a problem with clearing the stack after a message has been received. Consequently, it fills up and after that everything is rejected... Unfortunately my C skills are very limited, so I cannot look for a probable cause in the code myself.


  • Mod

    @mathea90 said in Node stops receiving after some time when using MY_RX_MESSAGE_BUFFER_FEATURE:

    Could it be plausible that the second interrupt fires while the ยตC is still reading the first message from the transceiver, thus getting a messed up message stack?

    The interrupt triggering the message reception from the radio will not preempt a running message reception.

    Or maybe there is a problem with clearing the stack after a message has been received. Consequently, it fills up and after that everything is rejected... Unfortunately my C skills are very limited, so I cannot look for a probable cause in the code myself.

    If you don't empty the buffer fast enough it will fill up and new messages get lost. See https://github.com/mysensors/MySensors/blob/development/hal/transport/RF24/MyTransportRF24.cpp#L44

    The variable transportLostMessageCount will be increased for each lost message. Having serial debug output could really help here...



  • @yveaux I try to hack something together to get serial debug data If I find some time this week. I'm personally really curious what I will see there.



  • @yveaux said in Node stops receiving after some time when using MY_RX_MESSAGE_BUFFER_FEATURE:

    @mathea90 I've used it for over 2 years now on my ethernet gateway and it never failed me.

    Sorry for bringing back an old post, but I tried to enable the messagge buffer on my ethernet gw and I failed due to this error:

    #error RF24 IRQ usage cannot be used with Soft SPI

    My setup is Arduino Uno + W5100 Ethernet Shield + NRF24.
    I'm using soft SPI as explained in the ethernet gw page.

    How did you overcome this problem?
    Thanks
    Daniele


  • Mod

    @danielef by not using SoftSPI ๐Ÿ˜‰
    The W5100 ethernet shield can be used with regular spi, but there is an issue with the SPI CS lines on it.
    The CS line for the SD card needs to be set inactive before initializing the MySensors stack.
    On mobile right now, I can post an example later.



  • An example would be great!
    Thank you very much


  • Mod



  • It works like a charm!
    Thank you for the suggestion, maybe it could be useful to add to the guide on building the ethernet gateway.



  • Hello @Mathea90, i have exactly the same issue. Do you find something to fix this issue?



  • @Mathea90 did you find a solution for this?

    After enabling MY_RX_MESSAGE_BUFFER_FEATURE my Serial gateway stops receiving messages. it usually takes about 8 to 12 hours.
    i'm using NRF24L01+LNA+PA and arduino nano.

    when i added the buffer feature, i felt like my nodes were working better so i would like to keep using it, however with lockup it makes it useless.

    thanks


Log in to reply
 

Suggested Topics

19
Online

11.4k
Users

11.1k
Topics

112.7k
Posts