nrf24 : transmission of data works fine, but constant NACK's produced
I use arduinoUno as serial gateway, nrf24 for radios.
I find this strange, that overall reliable data transmission distance is probably ~30 meters, through the walls. so mysensors reliably send data to GW and actuators receive commands.
But I can not use ACK required feature, as in the logs I see systematically NACK's. for troubleshooting I used nrf24doctor sketches ( https://forum.mysensors.org/topic/9178/nrf24doctor )
So what I find is:
When I am at ~7m range, I nrf24doctor node shows no faults and gateway produces all nice log entries like this:
0;255;3;0;9;227408 TSF:MSG:ACK REQ
But when distance increases by one-two meters, mesages are received, but the log on the GW consistently shows NACK's:
0;255;3;0;9;558817 TSF:MSG:ACK REQ
The same pattern stays when we increase distance further, - messages are received (within ~25-30 meters distance I find 1-2% of faults, but 99% messages go through).
So do I understand this right, that my node manages to deliver to gateway the message with payload, but fails to deliver the ACK? What could be explanation behind?
What version you using 2.3.0 or 2.2.0 - if 2.3.0 please try 2.2.0 and report?
@rzylius there could be three cases:
- The node is unable to receive the message, so it doesn't know it should send an ack
- The node receives the message, but fails to send the ack
- The node sends the ack but the gateway is unable to receive the ack
Checking the node logs will show if it is a case of 1 or (2 or 3). Radio debug logs (#define MY_DEBUG_VERBOSE_RF24) and gateway debug logs (
#define MY_DEBUG_VERBOSE_GATEWAY) might help to get more information.
If there is any repeater in your network you'll need to check those logs as well since these acks only live for one hop.
@rzylius The NACKs indicate a hardware acknowledge from the receiving radio was not received by the sending radio. It doesn't say that the receiving part didn't receive the message!
The nRF24 contains a retry mechanism in hardware, which retries sending the same message a configurable amount of times, with a certain time inbetween if the acknowledge is not received by the sender.
MySensors configures the maximum number of retries to 15 (https://github.com/mysensors/MySensors/blob/development/drivers/RF24/RF24registers.h#L47).
I performed extensive tests in my environment and in practice I typically see either 1 or 2 retries, or 15 with NACK. The transition from a few to NACK happens in only a few metres. This confirms that the hardware ack is useful to correct an occasional failed transmission, but it will not improve range issues.
I have no repeaters. I tried to experiment to answer your questions. I went to the ~15 meters distance where GW shows peristent NACK's.
I understand the normal scenario is, that node send a message, GW receives the message, GW send ACK to node, node receives ACK from gateway. it should be the end of transaction?
GATEWAY LOG ENTRIES:
0;255;3;0;9;477674 TSF:MSG:ACK REQ
0;255;3;0;9;478758 TSF:MSG:ACK REQ
Channel:76 PaLevel:MAX DataRate:250KBPS
0;255;3;0;9;480973 TSF:MSG:ACK REQ
This is what a node shows for the same transmissions:
LOG FROM NODE
message "226" - a node sent message, GW received it, GW sent ACK and node received ACK. Why GW entry shows NACK? What I am missing?
When we look at node entry "227", a node sent a message, GW did not receive it (there is no entry in GW log), so node did not receive the ACK from GW. this all makes sense.
I kind of do not quite understand why it can be that transmission is reliable in my case within 25-30 metres distance (node and GW communicate reliably), but NACK's appear when distance is over 5-6 metres?
I guess I am talking about software ACK's (though not sure).
But, as @itbeyond advised, I downgraded to 2.2.0, recompiled and problems dissapeared. Log is clean, NACK's no more.
@rzylius thanks for that testing - it follows almost exactly the same problems I have seen since the release of 2.3.0 and is the reason I do not use 2.3.0 anymore and have reverted my entire network to 2.2.0. Sorry @mfalkvidd this is another example of the same 2.3.0 problems, I feel I am looking like a problem to the community but I did extensive testing on 2.3.0 - made posts in the release page - received nothing in reply and it seems the errors are continuing. The above logs are very similar in aspect to the testing I did so not sure what other logs you may need but I am happy to try to help if there is something I can do, but it is hard to diagnose anything when the radio network just starts NACK'ing and then eventually the node stops working?
@itbeyond @rzylius are you using nrf24l01+ with pa + lna?
Can you test if the problems still persist using this version: https://github.com/tekka007/MySensors/tree/RF24Test
@tekka have downloaded and will look at this and report shortly. Yes most of my Gateway & Repeater nodes are nrf24l01+ with pa + lna but I have also been using and seen the problem with E01-ML01DP5 but I think this is the same underlying technology.
@tekka Loaded onto my MEGA based Ethernet gateway connected to openHab using a E01-ML01DP5 with 9db antenna - this previously lasted less than 6 hours. Will advise as testing goes ahead. I do not send much with this unit more receive but have some test nodes I will code to toggle back and forth - I am unable to add my signed nodes to this as yet!
I will also grab my MEGA based repeater on a different network and see what happens this is a nrf24l01+ with pa + lna.
@tekka I think you have solved it - I had a look at the code changes you made removing the 10ms pulse and wonder how it could be but without reading the rest of the code and understanding the specs of the card I am only guessing. So can this version of the code still work well with the regular nrf24 modules or will this pulse adjustment have a different impact on them? At present i have only loaded this on the 2 nodes I have mentioned. Will this modification be released as a point release update to the community as something like 2.3.1 or how will this be migrated to a stable release?
Great work by the way and thanks for fixing it.
@itbeyond the CE pulse is apparently used by amplified nrf24 radios to enable the lna receiver. This is not an issue when only sending, but for hardware acks the lna receiver should be enabled until the transmitted message has been acknowledged by the receiving node.
Shortening the CE pulse, and thereby disabling reception causes nacks to occur...
itbeyond last edited by itbeyond
@yveaux The comment above the section indicates that TX starts after 10us, and setting CE high also enables PA+LNA mode - so I wonder why would we be trying to set it low after 10us - seems like a conflicting set of statements. Then the statement datasheet: Pulse CE at least 10us - it is confusing - Does this mean for at least 10us or after 10us and should the pulse be a set of LOW then HIGH > 10us. If I read these statements without the datasheet I would be holding it UP for at least 10us then pulse it quickly LOW/HIGH until the status updates as we need to be high to enable the PA+LNA mode. Anyway I am only looking at a very small part of the code and reading peoples comments. I would need more time to cross check the datasheet. But the removal of the 10us set it LOW is still working.
@itbeyond to me the datasheet is clear and the mySensors implementation was changed to match the Nordic reference implementation.
However, as said that doesn't accommodate for non-Nordic, undocumented pa+lna modules.
Guess we found out the hard way...
@itbeyond yes that's generally how the MySensors releases work
Will 2.3 release be updated with this update? @tekka ?
@sundberg84 Yes, the patch will be included in 2.3.1.