MySensors 2.3 and RFM69 new driver. Problems and solutions


  • Contest Winner

    Hi everyone. I want to share the results of a small investigation into MySensors 2.3 and rfm69 new driver environment. I decided to do it after I bumped into some problems.
    I hope it will be interesting for everyone. Newbies can find simple recommendations to improve MySensors stability. Experienced users end core team members can implement my ideas into the mainline version in the right way.
    Let me remind you of the problem briefly. If you try to run this code to send two messages from the node to gateway only the first message will be sent properly.

    send(message1, true);
    send(message2, true);
    

    Chapter 1. Second ACK
    Let’s see how mysensors with rfm69 processes incoming messages. Just when rfm69 receives the message, it generates interrupt which is handled by rfm69 driver. But… in this interrupt the driver does not process the message. It only sets RFM69_irq flag.

    // IRQ handler: PayloadReady (RX) & PacketSent (TX) mapped to DI0
    LOCAL void RFM69_interruptHandler(void)
    {
    	// set flag
    	RFM69_irq = true;
    }
    

    Later process() function reads this flag and reads messages from rfm69. So the first important point. Incoming messages can be processed only in two places of application code:

    1. Inside the process() function at the beginning of the main mysensors application loop.
    2. Inside the wait() function anywhere you call it (it actually calls the process() function too).

    0_1531061212559_MysLoopDiagram.png
    There is only one exception to this rule. Just when you send any message with ACK request, the rfm69 driver is waiting for the ACK message. So at this moment your application can get a message too. If it is an expected ack it will be processed, otherwise – just received and not processed.

    Now we know a little more about how mysensors processes incoming messages and I will try to describe how a very simple usecase works. Let’s imagine that the node sends one message to the gateway

    send(message1, true);
    

    In the following diagram I tried to explain in detail the behavior of the application from both sides – gateway and the node.

    0_1531061319581_OneMessageDiagram.png

    As you can see this case works properly.
    Now we can imagine a more complex case. The node sends two messages to the gateway one by one.

    send(message1, true);
    send(message2, true);
    

    Yes this is the same code which I wrote at the start of this topic. So the next diagram describes how it works.

    0_1531061380920_TwoMessagesDiagram.png

    Oops… The second ACK will break the communication in this case? Yes – 99% it causes the problems. But why is this second ACK needed at all? I am not entirely sure, but I will try to explain my understanding of this second ack (how it should work).

    0_1531061521771_SoftwareAck.png

    So this second ACK theoretically can help you know if your message was delivered to the node which is behind the gate. But in real life it brakes the communication.

    How can we fix it? How can we send two messages one by one without collision? I will describe some solutions.
    First solution is very simple but very bad and I strongly recommend against implementing it. Just add wait(100) between the lines

    send(message1, true);
    wait(100);
    send(message2, true);
    

    Why is this solution bad? First it causes a dramatic rise in battery consumption. There are two messages in this case. If you want to send 4-5 messages our sketch will work 400ms longer! Second it is absolutely wrong from the application design point. We are trying to fix the transport layer error on the user application layer. So implement this solution as last resort.

    Second solution is not a very good one but I would recommend it for anyone who uses simple mysensors setups with relatively simple sketches without any complicated customization. Just disable the second ACK!

    //MyTransport.cpp line723
    // send ACK, use transportSendRoute since ACK reply is not internal, i.e. if !transportOK do not reply
    #ifndef MY_RADIO_RFM69
      (void)transportSendRoute(_msgTmp);
    #endif
    

    or just comment it 🙂

    // send ACK, use transportSendRoute since ACK reply is not internal, i.e. if !transportOK do not reply
    //(void)transportSendRoute(_msgTmp);
    

    This simple hack will increase your network speed and reliability sharply. For example this code

    for (int m = 1; m < 200; m ++){
         send(msg_sw.set(m), true);
      }
    

    will send 200 messages with ACK requests one by one and not a single NACK has ever appeared. I tried it many times.
    What do you lose if you implement this solution? You will not see this message

    45899 TSF:MSG:READ,0-0-144,s=1,c=1,t=2,pt=1,l=1,sg=0:0
    45905 TSF:MSG:ACK
    

    in the debug serial output of the node. I think 95% of users do not process such messages in the sketch and it is not a big loss.
    If you want to process these ACK messages to check the delivery to the node which is behind the gate you can implement the third more clever solution

    // send ACK, use transportSendRoute since ACK reply is not internal, i.e. if !transportOK do not reply
    #ifndef MY_RADIO_RFM69
    #ifndef MY_GATEWAY_FEATURE
      (void)transportSendRoute(_msgTmp);
    #endif
    #endif
    

    Or

    // send ACK, use transportSendRoute since ACK reply is not internal, i.e. if !transportOK do not reply
    #ifndef MY_RADIO_RFM69
    if (_msg.last != _msg.sender){
      (void)transportSendRoute(_msgTmp);
    }
    #endif
    

    It makes your network more stable too. In the first example we disable software ACK (second ACK) from the gateway to the node. In the second example we disable software ACK from one node to another node if there are no nodes between them (I haven’t tested this case but It can work).

    But imagine you do not want to implement this hack at all. How can we fix this problem differently? Let’s move to the next chapter.

    Chapter 2. The queue of the messages.
    The messages queue is standard in different systems. It is nonsense to omit this feature in a serious system. Moreover, MySensors team implements this feature… but for NRF24 not for RFM69.
    I will not describe in detail how message queue works. You only should know that received messages are read immediately after they are received (inside the interrupt handler) and put into the queue. They can be processed later but we do not omit a single one.
    I will try to implement queue message for the RFM69 transport and drivers level. I have a very good example for NRF24 so it was not very hard. I bumped only into one serious problem (yet:)). NRF24 driver uses autoACK function, which is not supported by rfm69. If we want to send ACK inside the interrupt handler we can’t use time system functions (millis(), delay() and so on). So I implement Ack sending function without forbidden functions.
    You can see my draft solution on the easysensors github (will share soon, may be today). Most of the changes are in the MyTransportRF24.cpp and RFM69_new.cpp. If you want to try it you should also remove mysensors capabilities protection like this in some places

    #if defined(MY_RADIO_RFM69)
    #error Receive message buffering not supported for RFM69!
    #endif
    

    I haven’t tested it thoroughly. But I have tested it to check if «DOUBLE ACK» problem is solved. And yes – it is solved. The next diagram shows how this code works with my new transport and driver

    0_1531062561670_TwoMessagesDiagram_queue.png

    I hope core team members will help me integrate this driver into the mainline code.

    Chapter 3. General recommendations and simple driver fixes.
    You can easily increase communication speed for the new driver. Just change on line in RFM69_new.h

    #define MY_RFM69_CSMA_LIMIT_DBM             (-95)			
    

    -95dbm is a VERY optimistic expected noise level for such popular frequency bands and amateur power supplies. I suggest use -90 or -85.

    It is foolish to send a message if the air is busy. But the new driver (and an old one, too as far as I remember) tries to send a message even if the air is busy (checks the air, waits for 500m… and sends 🙂 ). We should add some lines:

    //RFM69_new.cpp line 382 (before // set radio to standby to load fifo)
    if (!((hwMillis() - CSMA_START_MS) < MY_RFM69_CSMA_TIMEOUT_MS)){
      return false;
    }
    

    And of course it’s irrational to wait for an ACK if the message wasn’t sent. So we need replace one line to prevent it.

    //RFM69_new.cpp line 620 (before // radio is in RX)
    if (!RFM69_send(recipient, (uint8_t *)buffer, bufferSize, flags, !retry)){
    	continue;
    }
    

    Many thanks for reading this article till the end.


  • Mod

    Great work @koresh !

    How much more ram does the queue require? (I assume the queue size is configurable, but what would a reasonable size be?)

    Edit: I guess you'll use MY_RX_MESSAGE_BUFFER_SIZE just as for nrf24. So the ram impact will be MY_RX_MESSAGE_BUFFER_SIZE x (MAX_MESSAGE_LENGTH + 1) which is 20 x (32 + 1) = 660 bytes. That'll be tough to fit on atmega328 nodes, right? But esp8266, esp32, samd and nrf5x will handle it without problems.


  • Contest Winner

    I've forked mys to share draft transport and rfm driver. It is avaliable here

    @mfalkvidd
    Thanks. You answered to your question according the memory yourself 🙂


  • Mod

    @mfalkvidd a 20 entry buffer is rather large. 10 or even 5 will cover most cases I suppose.


  • Mod

    @yveaux I agree. But I would assume that the default value (20) was chosen for a reason?


  • Mod

    @mfalkvidd for a standard (serial) gateway with nrf24 and atmega328 it fits nicely, so that's where the 20 comes from. More doesn't hurt, it only costs ram.
    But if ram is limited, it can be reduced.


 

294
Online

7.6k
Users

8.5k
Topics

91.2k
Posts