Hi everyone. I want to share the results of a small investigation into MySensors 2.3 and rfm69 new driver environment. I decided to do it after I bumped into some problems.
I hope it will be interesting for everyone. Newbies can find simple recommendations to improve MySensors stability. Experienced users end core team members can implement my ideas into the mainline version in the right way.
Let me remind you of the problem briefly. If you try to run this code to send two messages from the node to gateway only the first message will be sent properly.
send(message1, true);
send(message2, true);
Chapter 1. Second ACK
Let’s see how mysensors with rfm69 processes incoming messages. Just when rfm69 receives the message, it generates interrupt which is handled by rfm69 driver. But… in this interrupt the driver does not process the message. It only sets RFM69_irq flag.
// IRQ handler: PayloadReady (RX) & PacketSent (TX) mapped to DI0
LOCAL void RFM69_interruptHandler(void)
{
// set flag
RFM69_irq = true;
}
Later process() function reads this flag and reads messages from rfm69. So the first important point. Incoming messages can be processed only in two places of application code:
- Inside the process() function at the beginning of the main mysensors application loop.
- Inside the wait() function anywhere you call it (it actually calls the process() function too).
There is only one exception to this rule. Just when you send any message with ACK request, the rfm69 driver is waiting for the ACK message. So at this moment your application can get a message too. If it is an expected ack it will be processed, otherwise – just received and not processed.
Now we know a little more about how mysensors processes incoming messages and I will try to describe how a very simple usecase works. Let’s imagine that the node sends one message to the gateway
send(message1, true);
In the following diagram I tried to explain in detail the behavior of the application from both sides – gateway and the node.
As you can see this case works properly.
Now we can imagine a more complex case. The node sends two messages to the gateway one by one.
send(message1, true);
send(message2, true);
Yes this is the same code which I wrote at the start of this topic. So the next diagram describes how it works.
Oops… The second ACK will break the communication in this case? Yes – 99% it causes the problems. But why is this second ACK needed at all? I am not entirely sure, but I will try to explain my understanding of this second ack (how it should work).
So this second ACK theoretically can help you know if your message was delivered to the node which is behind the gate. But in real life it brakes the communication.
How can we fix it? How can we send two messages one by one without collision? I will describe some solutions.
First solution is very simple but very bad and I strongly recommend against implementing it. Just add wait(100) between the lines
send(message1, true);
wait(100);
send(message2, true);
Why is this solution bad? First it causes a dramatic rise in battery consumption. There are two messages in this case. If you want to send 4-5 messages our sketch will work 400ms longer! Second it is absolutely wrong from the application design point. We are trying to fix the transport layer error on the user application layer. So implement this solution as last resort.
Second solution is not a very good one but I would recommend it for anyone who uses simple mysensors setups with relatively simple sketches without any complicated customization. Just disable the second ACK!
//MyTransport.cpp line723
// send ACK, use transportSendRoute since ACK reply is not internal, i.e. if !transportOK do not reply
#ifndef MY_RADIO_RFM69
(void)transportSendRoute(_msgTmp);
#endif
or just comment it
// send ACK, use transportSendRoute since ACK reply is not internal, i.e. if !transportOK do not reply
//(void)transportSendRoute(_msgTmp);
This simple hack will increase your network speed and reliability sharply. For example this code
for (int m = 1; m < 200; m ++){
send(msg_sw.set(m), true);
}
will send 200 messages with ACK requests one by one and not a single NACK has ever appeared. I tried it many times.
What do you lose if you implement this solution? You will not see this message
45899 TSF:MSG:READ,0-0-144,s=1,c=1,t=2,pt=1,l=1,sg=0:0
45905 TSF:MSG:ACK
in the debug serial output of the node. I think 95% of users do not process such messages in the sketch and it is not a big loss.
If you want to process these ACK messages to check the delivery to the node which is behind the gate you can implement the third more clever solution
// send ACK, use transportSendRoute since ACK reply is not internal, i.e. if !transportOK do not reply
#ifndef MY_RADIO_RFM69
#ifndef MY_GATEWAY_FEATURE
(void)transportSendRoute(_msgTmp);
#endif
#endif
Or
// send ACK, use transportSendRoute since ACK reply is not internal, i.e. if !transportOK do not reply
#ifndef MY_RADIO_RFM69
if (_msg.last != _msg.sender){
(void)transportSendRoute(_msgTmp);
}
#endif
It makes your network more stable too. In the first example we disable software
ACK (second ACK) from the gateway to the node. In the second example we disable software
ACK from one node to another node if there are no nodes between them (I haven’t tested this case but It can work).
But imagine you do not want to implement this hack at all. How can we fix this problem differently? Let’s move to the next chapter.
Chapter 2. The queue of the messages.
The messages queue is standard in different systems. It is nonsense to omit this feature in a serious system. Moreover, MySensors team implements this feature… but for NRF24 not for RFM69.
I will not describe in detail how message queue works. You only should know that received messages are read immediately after they are received (inside the interrupt handler) and put into the queue. They can be processed later but we do not omit a single one.
I will try to implement queue message for the RFM69 transport and drivers level. I have a very good example for NRF24 so it was not very hard. I bumped only into one serious problem (yet:)). NRF24 driver uses autoACK function, which is not supported by rfm69. If we want to send ACK inside the interrupt handler we can’t use time system functions (millis(), delay() and so on). So I implement Ack sending function without forbidden functions.
You can see my draft solution on the easysensors github (will share soon, may be today). Most of the changes are in the MyTransportRF24.cpp and RFM69_new.cpp. If you want to try it you should also remove mysensors capabilities protection like this in some places
#if defined(MY_RADIO_RFM69)
#error Receive message buffering not supported for RFM69!
#endif
I haven’t tested it thoroughly. But I have tested it to check if «DOUBLE ACK» problem is solved. And yes – it is solved. The next diagram shows how this code works with my new transport and driver
I hope core team members will help me integrate this driver into the mainline code.
Chapter 3. General recommendations and simple driver fixes.
You can easily increase communication speed for the new driver. Just change on line in RFM69_new.h
#define MY_RFM69_CSMA_LIMIT_DBM (-95)
-95dbm is a VERY optimistic expected noise level for such popular frequency bands and amateur power supplies. I suggest use -90 or -85.
It is foolish to send a message if the air is busy. But the new driver (and an old one, too as far as I remember) tries to send a message even if the air is busy (checks the air, waits for 500m… and sends ). We should add some lines:
//RFM69_new.cpp line 382 (before // set radio to standby to load fifo)
if (!((hwMillis() - CSMA_START_MS) < MY_RFM69_CSMA_TIMEOUT_MS)){
return false;
}
And of course it’s irrational to wait for an ACK if the message wasn’t sent. So we need replace one line to prevent it.
//RFM69_new.cpp line 620 (before // radio is in RX)
if (!RFM69_send(recipient, (uint8_t *)buffer, bufferSize, flags, !retry)){
continue;
}
Many thanks for reading this article till the end.