Multiple sensor node freeze
I have built a network using the dev branch of My Sensors - MQTT Gateway and HomeAssistant. Each of my 5 nodes has several sensors and relays, so they have to communicate up to 16 messages each to the gateway.
If I stress the nodes with quickly successive commands, they freeze soon or later and do not get out from this state. The "stress test" was done via HA interface or using IR remote (the relays can be triggered using a remote) or triggering the water sensor. If I let them works, the freeze does not happens in few days, only rare. Is there a limit of messages to be send/received one time per node or per network? I suspect this happens when other node is updating its state, so when the network deals with many messages.
Could it be related also to the bug I have mentioned in other post: the gateway sensors react when sending commands to the nodes with the same Child_ID ?
I plan to modify my code to give some time to each message to be sent, so instead of:
Hum = dht.getHumidity(); Temp = dht.getTemperature(); send(msgTemp.set(Temp, 1)); send(msgHum.set(Hum, 1));
modify the sequence order like this:
Hum = dht.getHumidity(); send(msgTemp.set(Temp, 1)); Temp = dht.getTemperature(); send(msgHum.set(Hum, 1));
Could this help? Thank you.
I have added a capacitor to the power supply - 1000uF, still blocking, then change the Arduino Nano and reduce the messages number and now it seems to work, only few continuous hours testing.
Adding a delay between sends has helped several people so your re-arranging is probably a good idea.
What does the serial debug log on the sensor and the gateway say when things fail?
Have you seen the troubleshooting tips at http://forum.mysensors.org/topic/666/debug-faq-and-how-ask-for-help ?
It's strange, but activate the debug option the freezing just go off, but the debug should be off during the "commercial" usage. I have checked most of advice's and more than that I already add capacitors to the NFR24 (100uF + 100nF) and also to relay shield (100uF) and Arduino Nano (100uF +220nF) even if they are not explicitly requested, but I was thinking they may help.
My problem may be related to the power supply and also to the Arduino Nano, may be also the message re-arrangement helps also.
After some testing time I plan to publish my project (most of code is always published on GitHub and links available on MySensors and HomeAssistant).
Thank you for advice's, any help is always very welcome!
I succeed to retrieve the error message!
Setup: MQTT gateway connected via Ethernet to the broker, only one node in the same MySensors network (development), HomeAssistant, switching very aggressive a group of 4 switches, in my sketch each message is sent three times with wait() between them.
The last few messages are:
read: 0-0-10 s=4,c=1,t=2,pt=0,l=3,sg=0:OFF send: 10-10-0-0 s=4,c=1,t=2,pt=0,l=3,sg=0,st=fail:OFF send: 10-10-0-0 s=4,c=1,t=2,pt=0,l=3,sg=0,st=ok:OFF read: 0-0-10 s=9,c=1,t=2,pt=0,l=3,sg=0:OFF send: 10-10-0-0 s=9,c=1,t=2,pt=0,l=3,sg=0,st=fail:OFF send: 10-10-0-0 s=9,c=1,t=2,pt=0,l=3,sg=0,st=ok:OFF read: 0-0-10 s=1,c=1,t=2,pt=0,l=2,sg=0:ON send: 10-10-0-0 s=1,c=1,t=2,pt=0,l=2,sg=0,st=fail:ON send: 10-10-0-0 s=1,c=1,t=2,pt=0,l=2,sg=0,st=ok:ON read: 0-0-10 s=2,c=1,t=2,pt=0,l=2,sg=0:ON send: 10-10-0-0 s=2,c=1,t=2��J*���+���=fail:ON send: 10-10-0-0 s=2,c=1,t=2,pt=0,l=2,sg=�+�� 99re����0-10 s=4,c=1,t=2,pt=0,l=2,sg=0:ON setIK`}Starting repeater (RNNRA-, 2.0.0-beta) Radio init failed. Check wiring.
The gateway and node was not moved at all. Resetting the node from the power (keep its position to check if the wires made some problem) the node come back alive. Shaking the node while change switches from interface less aggressive: the node continue to works fine.
Sending again a lot of messaged by changing the switches it gives some errors from time to time, but it recover from them and continue to work:
read: 0-0-10 s=4,c=1,t=2,pt=0,l=3,sg=0:OFF send: 10-10-0-0 s=4,c=1,t=2,pt=0,l=3,sg=0,st=fail:OFF send: 10-10-�0 s=4,c=1,t=2,pt=0,l=3,sg=0,st=ok:OFF read: 0-0-10 s=9,c=1,t=�J�0,l=3,sg=0:OFF send: 10-10-0-0 s=9,,�|������Y���3�]��(Starting repeater (RNNRA-, 2.0.0-beta) Radio init successful. send: 10-10-0-0 s=255,c=3,t=15,pt=0,l=2,sg=0,st=ok: send: 10-10-0-0 s=255,c=0,t=18,pt=0,l=10,sg=0,st=ok:2.0.0-beta send: 10-10-0-0 s=255,c=3,t=6,pt=1,l=1,sg=0,st=fail:0 read: 0-0-10 s=4,c=1,t=2,pt=0,l=2,sg=0:ON
So from time to time, after a burst of messages, some messages have some errors and the node made a init of radio and some times it doesn't succeed to recover from there to start over, it blocks itself.
While monitor also the NRF25 power (from other source than Arduino itself) I see it get ~11-13mA while working and goes to zero when the node blocks.
I have tested 4 switches/relays on the gateway and they work fine, no blocking during stress test (mostly group switch, like I did for node test).
I have checked the free memory of a node based on http://playground.arduino.cc/Code/AvailableMemory , the results are the following:
While normal usage, low messages rate, the measurement is the following and it remains constant:
After a burst of messages, the time when it doesn't block (I mention that because it blocks quite often while handling many messages) I got this:
Other measurement close to blocking point:
freeMemory()=-28671 freeMemory()=-27900 freeMemory()=-27900 freeMemory()=-27900 freeMemory()=-27900 freeMemory()=25761
The problem could be related to variables that exceed their memory limits or loose free memory by fragment it and not able to recover.
I observed once after some errors I got Radio init successful and the freeMemory get back its value of 825 in my case.
I might have the same issue. I have nodes with 4 and 6 relays on them and they stop communicating with the gateway from time to time. It still functions when pressing the physical buttons attached to the arduino, only communication hangs. After power off/on it works again.
This only happens on nodes with more than one message.
I am using 1.5.4 and never saw anything in logs.
What kind of delay did you use? Was anything changed in the meantime on the code?
@parachutesj Inserting just Delay() into the code doesn't help always, while it may insert some other troubles, but trying to send messages with some other line code between next sending help a little bit.
My error messages was revealed after a lot of testing and stress of the nodes.
I am waiting for the MySensors library improving, I saw a lot of work there, so I will keep the actual config, minimize the testing nights for now. Of course I am hard resetting nodes from time to time, when they fail to respond. The gateway seems to works fine.
Thank you. I added a delay of 2 sec into my controller rules. This does not harm, those rules are really not time critical. Will see if this changes something at the end.
Resetting nodes does not seem to be an permanent option as most of them are behind my switches in the wall and do not want to open that all the time.
can you report an update? I also have problems with freezing nodes, did you find a solution?
@siod I changed most of the power supplies for the nodes with relays. Seems to be fixed. All simple switches or sensors with temp etc. never hang. It is only with actuators and relays. Also upgrading to latest version might have improved the situation. But I am almost certain that I had a few bad power supplies.
I read other times that relays can cause some power fluctuations that can cause problems.