MQTT - RFM69 Gateway stops communicating randomly and doesn't recover
I have built a MySensors network based on RFM69H modules, with an Ethernet MQTT Gateway built on Leonardo which I detailed here: https://forum.mysensors.org/topic/6249/mqtt-ethernet-gateway-using-leonardo-32u4-w5100-rfm69h-hard-spi
My three sensor nodes are mains-powered, a mix of Atmega328 and 32u4 doing temperature, environment, energy and HVAC control.
I'm building on Windows 7, Arduino 1.8.1, MySensors 2.1.1
It works really great most of the time, but I have an issue which has cropped up randomly and is starting to bug me a lot. Occasionally it seems like the gateway will just stop sending or receiving radio messages. None of the sensor reports get in nor do control signals get out. To resolve this I have to power cycle the gateway, and since I'm often away for long periods my system is completely paralysed until then. Sometimes it will work for a week or more without issue, and sometimes it will occur several times a day. I haven't been able to correlate the occurrences with anything that happens on the sensor network or the OpenHAB bus. Once I reset the gateway all of my other nodes re-join the network and start working again, so for now I have assumed they're all working properly and the fault lies within the gateway.
I started logging the debug output from the gateway, and the issue has occurred twice since then. Both times it seems to be related to the radio receiving a garbage packet of some kind. After then it seems like the gateway code cannot talk to the radio module correctly, but doesn't seem to do anything about that eg. reset the radio. It seems to pass sanity checks okay and assume everything is good.
Here's two examples of the garbage packets being received:
0;255;3;0;9;TSF:MSG:READ,11-47-153,s=240,c=4,t=205,pt=6,l=22,sg=0:A4709004C242000524F00D0381C00057D048B18C0808 0;255;3;0;9;!TSF:MSG:LEN,59!=29 0;255;3;0;9;TSM:READY:NWD REQ 0;255;3;0;9;TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK: 0;255;3;0;9;TSF:SAN:OK
0;255;3;0;9;TSF:MSG:READ,144-0-0,s=160,c=5,t=0,pt=5,l=0,sg=1:117440612 0;255;3;0;9;!TSF:MSG:LEN,61!=32 0;255;3;0;9;TSF:SAN:OK 0;255;3;0;9;TSM:READY:NWD REQ 0;255;3;0;9;TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK: 0;255;3;0;9;TSF:SAN:OK
The NWD - MSG SEND - SAN OK cycle repeats continuously thereafter, sometimes punctuated by attempts from the controller to send commands which never get out as such:
0;255;3;0;9;Message arrived on topic: sensors-in/2/4/1/0/2 0;255;3;0;9;!TSF:MSG:SEND,0-0-2-2,s=4,c=1,t=2,pt=0,l=1,sg=0,ft=0,st=NACK:0 0;255;3;0;9;TSM:READY:NWD REQ 0;255;3;0;9;TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK: 0;255;3;0;9;TSF:SAN:OK
It looks like the network discovery routine is succeeding, but I never see any activity on the gateway LEDs after the fault occurs so I assume it's returning a successful result without actually interrogating the network at all (maybe that's by design? I haven't gone digging to see).
The garbage packets could possibly be related to the Oregon Scientific weather station I have on my roof. I believe it communicates using PAM 433MHz and I wouldn't be surprised if it isn't friendly about sharing the spectrum and just talked over the MyS packets. This would fit with the randomness of the failure, since it requires a OS weather sensor to clash with a MyS sensor message. I would've thought (hoped) that the gateway would be able to ignore and recover from such an event though.
Try adding the watchdog timer that will reset the gateway in case the code hangs somewhere. Or you could move to mysensors 2.2.0 that got some bugs fixed
@gohan Thanks, I didn't realise 2.2 had been released yet
Edit: My bad, 2.2 hasn't been released it's the dev branch
Yes, it's dev branch but it is working. Give it a try
I have upgraded the whole network to 2.2.0-beta using the new RFM69 driver. I started by upgrading just the gateway but it wasn't backwards compatible with the sensors. Maybe the new radio driver changes something.
Will wait and see if it fails again.
if you have upgraded to new rfm69 driver, you need all nodes+gw updated too.
@scalz Thanks yeah I have upgraded all the nodes and they're all talking again on the new driver
So far so good. I'm feeling tentatively like this problem may be solved in 2.2-beta, and my sensor network is starting to gain my trust again.
@Carywin Is your gateway still working without losing connection ?
I have the same problem as yours: my gatewayMQTTclient with RFM69 by softSPI is losing connection with nodes. Sometimes it works by couple of days, sometimes only few hours.
I have tried to implement new RFM69 driver version of MySensors (2.2 dev), but arduino has not compiled my sketch according to errors in RFM69_new.cpp file (some variables are not defined). With old driver version sketch has compiled.
How have you run your gateway with MySensors 2.2 ?
Have you defined MY_RFM69_NEW_DRIVER parameter to work with new driver ?
@gieemek Yes my gateway and sensors are very reliable now, using 2.2-beta from the dev branch and the new RFM69 driver.
Are you just trying to use the new driver by itself? Or have you updated the whole MySensors library to 2.2? I upgraded the whole library, I'm not sure how much success you'd have just updating the driver itself.
There are a few definitions changed in the library, anywhere that the old defs had MY_RF69 you have to change it to MY_RFM69 - Note the added M.
Here's the radio block from my gateway sketch:
#define MY_RADIO_RFM69 #define MY_RFM69_NETWORKID 137 #define MY_RFM69_ENABLE_ENCRYPTION #define MY_RFM69_NEW_DRIVER #define MY_IS_RFM69HW #define MY_RFM69_FREQUENCY RFM69_433MHZ #define MY_RFM69_IRQ_PIN 2 //#define MY_RFM69_IRQ_NUM 1 #define MY_RF69_SPI_CS 6 #define MY_RFM69_ATC_MODE_DISABLED #define MY_RFM69_TX_POWER_DBM 20
IRQ_NUM seems deprecated now, but it works without it.
I disabled ATC_MODE just on my gateway when I was troubleshooting a different issue, but I don't care much about power consumption there so I didn't bother to enable it again.
@Carywin The problem with compiling by Arduino is with softSPI. When I switch it off my sketch is compiling properly. I have to implement softSPI library to new driver. Thanks.
@gieemek Oh right, well I'm using hardware SPI so that would explain it. Good luck!
@Carywin OK, my gatewayMQTTclient with RFM69 via SoftSPI is working now. There was a problem in RFM69_new.cpp file with SPI_HAS_TRANSACTION variable which is defined in hard SPI.h but not in SoftSPI.
Now I test my gateway and I hope it will work without losing connection. Thank you once more for help.
@gieemek Glad to hear it's working. I don't think SoftSPI needs to be atomic in its transactions the same way hardSPI does, so you're probably not missing anything important.