[solved] RS485 nodes stop sending data after some hours or days
-
Can you measure bus voltage when everything is "dead"?
Is it idle or is some of the nodes pulling it up or down? -
@pjr I'll try to do some measurements next time everything is really dead.
But I really doubt if this is only related to a bus problem or also a unlucky combination of at least two things:- Bus:
Prior to having read your post this morning, I noticed everything being offline again. So I began to reset some of the nodes.
Some more background: Yesterday I noticed Node_2 was sending again when I reset Node_1, so my first attempt was to start with that one and blaming it to be somehow faulty and expected the rest to show up automatically. It indeed started sending again, and so did Node_3 (without reset!). But still Node_2 showed no sign of life. So I also reset that one - again with the effect it was reporting data as expected. Node_4 also showed no pir data, so I finally also reset that one.
-Second possible root cause:
https://forum.mysensors.org/topic/7743/node-with-ds18b20-relay-dies-also-with-watchdog
3 of my nodes also have relay functionality, two of them with several DS18B20.
Now there's someone reporting nodes "dying" also with the same combination of attached hardware...
the only exception here is Node_4 - it has no temp at all, and also is the node with the least data to be written on the bus. So the only node that comes back is the one without "relay" and just a BME280.@rejoe2 said in RS485 nodes stop sending data after some hours or days:
[...]
Now there's someone reporting nodes "dying" also with the same combination of attached hardware... [...]Yes. Same combination of sensor. I did not understand totally your entire setup (sorry, I'm a bit noob :) ) , but we have same sensors combination.
I will swap the temp with a STH31 and - more important - the barebone Atmega with an Arduino Mini 3.3V. I will update asap.
Good luck for your investigating. Really interested :)
- Bus:
-
One more:
Node_2 stopped transmitting for a longer periode during this night, but was online again some minutes ago.
Node_1 was not transmitting, but still showed pir functionality. So code still seemed to work, just communication was broken.
Node_3 was also transmitting, most likely also after a periode of inactivity.Now I cut power to Node_1 and then measured 0.03 V between A+B. So I'll leave the other three nodes online and will see, if they work fine.
Most likely I will have to intensively review the entire wiring on Node_1 one more time, including the 1wire-Networks attached to it. -
One more:
Node_2 stopped transmitting for a longer periode during this night, but was online again some minutes ago.
Node_1 was not transmitting, but still showed pir functionality. So code still seemed to work, just communication was broken.
Node_3 was also transmitting, most likely also after a periode of inactivity.Now I cut power to Node_1 and then measured 0.03 V between A+B. So I'll leave the other three nodes online and will see, if they work fine.
Most likely I will have to intensively review the entire wiring on Node_1 one more time, including the 1wire-Networks attached to it.@rejoe2 I did not understand one thing: what uCU are you using? Atmega328 barebones? If yes... what the setup of BOD?
Tonight I did re-bootload my faulty node with BOD @2.7V. Seems more stable, after about 7h. Just to say... an idea.... -
@rejoe2 I did not understand one thing: what uCU are you using? Atmega328 barebones? If yes... what the setup of BOD?
Tonight I did re-bootload my faulty node with BOD @2.7V. Seems more stable, after about 7h. Just to say... an idea....@sineverba All nodes are ATMega32 based, running at 16MHz, 5V, Chinese Arduino clones. GW is FTDI-based Nano, Node_1 is a CH340G-Nano, the others are pro micros. Communication is via LC-Tech RS485 modules.
When I checked the states some minutes ago, situation was as follows: Node_2 sent last messages around 4:30pm, Node_4 had been reset at around the same time (no watchdog defined), but no pir messages were sent when entering the room, so it seemed to be offline. Node_3 was alive, voltage A+B: around 0.03V.
So now I pulled off the LC-Tech module on Node_2 and put power on again on Node_1. I'll see, if and when this one will go offline. If this leads also to no clear conclusions, I will think about first adding some caps on 5V or changing the 12V power supply.
Or is it necessary to completely remove also the modules when there's no power to them?
Should I try to use an older board definition (GW's with board defs starting from 1.6.13 had some reboot troubles until version 1.6.18 or so; this is pretty unfunny shooting in the dark....)
Other ideas or recommendations? -
@pjr As Node_2 was not sending any data some minutes ago: between A+B I measured 2.23V...
Then I depowered everything. Short after repowering, I have around 0.03V.What to do with this info?
@rejoe2 said in RS485 nodes stop sending data after some hours or days:
@pjr As Node_2 was not sending any data some minutes ago: between A+B I measured 2.23V...
Then I depowered everything. Short after repowering, I have around 0.03V.What to do with this info?
+-200mV is the magic number with rs485. rs485 line 3 states:
- Va - Vb < -0.2V = "1"
- Va - Vb > 0.2V = "0"
- |Va - Vb| < 0.2V = "idle"
As I know the line should be in idle state when nobody is sending.
So for me it looks like something is pulling the line constantly to state "1" or "0" depending which way you did measure it. This could be caused by faulty transceiver, bug in library code, bug in your code..
Next time can you measure whats coming from arduino? So measure between GND and TX(or pin 9 if using AltSoftSerial). And of course between GND and DE pin. This way we can resolve if the problem is at arduino side or transceiver side. -
@sineverba All nodes are ATMega32 based, running at 16MHz, 5V, Chinese Arduino clones. GW is FTDI-based Nano, Node_1 is a CH340G-Nano, the others are pro micros. Communication is via LC-Tech RS485 modules.
When I checked the states some minutes ago, situation was as follows: Node_2 sent last messages around 4:30pm, Node_4 had been reset at around the same time (no watchdog defined), but no pir messages were sent when entering the room, so it seemed to be offline. Node_3 was alive, voltage A+B: around 0.03V.
So now I pulled off the LC-Tech module on Node_2 and put power on again on Node_1. I'll see, if and when this one will go offline. If this leads also to no clear conclusions, I will think about first adding some caps on 5V or changing the 12V power supply.
Or is it necessary to completely remove also the modules when there's no power to them?
Should I try to use an older board definition (GW's with board defs starting from 1.6.13 had some reboot troubles until version 1.6.18 or so; this is pretty unfunny shooting in the dark....)
Other ideas or recommendations?@rejoe2 said in RS485 nodes stop sending data after some hours or days:
@sineverba All nodes are ATMega32 based, running at 16MHz, 5V, Chinese Arduino clones. GW is FTDI-based Nano, Node_1 is a CH340G-Nano, the others are pro micros. Communication is via LC-Tech RS485 modules.
When I checked the states some minutes ago, situation was as follows: Node_2 sent last messages around 4:30pm, Node_4 had been reset at around the same time (no watchdog defined), but no pir messages were sent when entering the room, so it seemed to be offline. Node_3 was alive, voltage A+B: around 0.03V.
So now I pulled off the LC-Tech module on Node_2 and put power on again on Node_1. I'll see, if and when this one will go offline. If this leads also to no clear conclusions, I will think about first adding some caps on 5V or changing the 12V power supply.
Or is it necessary to completely remove also the modules when there's no power to them?
Should I try to use an older board definition (GW's with board defs starting from 1.6.13 had some reboot troubles until version 1.6.18 or so; this is pretty unfunny shooting in the dark....)
Other ideas or recommendations?Hi,
just to share, I will do also a post in some day. I did get the 96h-no stop configuration. Well, with some stop, but no trouble on re-start.
Power-feed node: optiboot 6.2 with 2.7V bod.
Battery feed nodes: optiboot 6.2 with 1.8 bod.
Watchdog on startup at 2S
3 try on startup and go in loop.If no ack received for 3 times, on every single send (e.g. getting the link, sketch name, temp, relay state, et cetera), delay for 5 sec. << this delay does the "magic". Watchdog restarts the node(s) and loop again.
I did test disconnecting the serial Arduino as gateway for 1h and / or mantaining rebooting push button for 20 minutes (my poor finger :D )
As soon as gateway is on, in several minutes all nodes are alive and transmitting. I did try also remove/put radio on nodes while live. They reconnect as charme.
So, I would force all your nodes to do a deep restart if some trouble occours. Just my 2 cents....
-
@rejoe2 said in RS485 nodes stop sending data after some hours or days:
@sineverba All nodes are ATMega32 based, running at 16MHz, 5V, Chinese Arduino clones. GW is FTDI-based Nano, Node_1 is a CH340G-Nano, the others are pro micros. Communication is via LC-Tech RS485 modules.
When I checked the states some minutes ago, situation was as follows: Node_2 sent last messages around 4:30pm, Node_4 had been reset at around the same time (no watchdog defined), but no pir messages were sent when entering the room, so it seemed to be offline. Node_3 was alive, voltage A+B: around 0.03V.
So now I pulled off the LC-Tech module on Node_2 and put power on again on Node_1. I'll see, if and when this one will go offline. If this leads also to no clear conclusions, I will think about first adding some caps on 5V or changing the 12V power supply.
Or is it necessary to completely remove also the modules when there's no power to them?
Should I try to use an older board definition (GW's with board defs starting from 1.6.13 had some reboot troubles until version 1.6.18 or so; this is pretty unfunny shooting in the dark....)
Other ideas or recommendations?Hi,
just to share, I will do also a post in some day. I did get the 96h-no stop configuration. Well, with some stop, but no trouble on re-start.
Power-feed node: optiboot 6.2 with 2.7V bod.
Battery feed nodes: optiboot 6.2 with 1.8 bod.
Watchdog on startup at 2S
3 try on startup and go in loop.If no ack received for 3 times, on every single send (e.g. getting the link, sketch name, temp, relay state, et cetera), delay for 5 sec. << this delay does the "magic". Watchdog restarts the node(s) and loop again.
I did test disconnecting the serial Arduino as gateway for 1h and / or mantaining rebooting push button for 20 minutes (my poor finger :D )
As soon as gateway is on, in several minutes all nodes are alive and transmitting. I did try also remove/put radio on nodes while live. They reconnect as charme.
So, I would force all your nodes to do a deep restart if some trouble occours. Just my 2 cents....
@sineverba I have some problems with my RS485 sensors too. They working for few days like a charm and than one of them stops sending and receiving data. Most of the time it happend when I click button and relay switch the light. My wiring is ok, i have pull-ups and pulldowns in the middle on master and termination on both ends. I have watchdog enabled
void before() { wdt_disable(); // maybe redundant wdt_enable(WDTO_8S); // sensors.begin(); }But even with that the node won't reboot so i think it may not hangs and only lost communication. Maybe its something wrong with AltSoftSerial lib ??
I should mention that I'm using OneButton lib to extend functionality of my pushbuttons for long press and double click. Maybe that library have some issues with AltSoftSerial or MySensors ?
-
@sineverba I have some problems with my RS485 sensors too. They working for few days like a charm and than one of them stops sending and receiving data. Most of the time it happend when I click button and relay switch the light. My wiring is ok, i have pull-ups and pulldowns in the middle on master and termination on both ends. I have watchdog enabled
void before() { wdt_disable(); // maybe redundant wdt_enable(WDTO_8S); // sensors.begin(); }But even with that the node won't reboot so i think it may not hangs and only lost communication. Maybe its something wrong with AltSoftSerial lib ??
I should mention that I'm using OneButton lib to extend functionality of my pushbuttons for long press and double click. Maybe that library have some issues with AltSoftSerial or MySensors ?
-
@nofox try to remove code as much as you can. Does it still work if you operate it with the button? Is relay opto isolated?
-
Check position of nodes on the bus to in failure conditions.
With RS485 bus drivers is easy possible for one node to block communication on entire bus sending dominant state.
In this situation, nodes near the gateway can "push" their messages to the gateway, other nodes not. -
Hi! Everything working pretty well, but sometimes some random node stops to communicate and react for pressing buttons. I have watchdogs in every nodes so I think that only communication is hanging.. Is it possible that only altsoftserial library hanging inside arduino code ??
-
As I could nail down some more parts (but still do not have a reliably network), also a short update from my side:
- Node_1 (Multi DS18B20 (*12@three pins) + other things) is the biggest troublemaker. It just pulled the Voltage between a+b to +2.8V after some time. There is some hours of delay between the last messages and the node stopping also the pir functionality (no wdt code implemented).
- Node_2 (also Multi DS18B20 (*5@three pins) and other stuff) also stopps communication after some time (it originally worked, this may be related to whatever change happened in between). But this one doesn't kill the entire bus communication and seems to work internally (switches relay on in case a rise of temperature is detected). This also holds my pullpup+pulldown-resistors for RS485.
Yesterday I switched over Node_1 to use HW_SERIAL, as I also suspected altsoftserial to be part of the root causes. At first sight this seems to improve things a lot.
Next, I will review Node_2 for the use of HW_SERIAL.What I have in mind (may not be correct):
- HW_SERIAL uses less memory. So this may prevent the node to have some kind of overflow
- there may be an conflict in internal timers, as 1wire may also need a timer (I use amongst others also PIN10 for 1wire).
-
There is one thing that we all need to try. When you using RS485 than you have power supply somewhere far far away from nodes. Longer power lines means higher inductance and far more noise on power lines. I think we need to try to put some 10 - 100uF electrolitic cap on all nodes (i have 10uF on each node) and few ceramic 100nF near the microprocessor on every node. If you use atmega328p you need at least 3 of 100nF caps ( i forget to put them on my nodes). I’we read that this 100nF caps are very big improvement in power supplying the atmega.
-
Hi to all,
i have the same problem since i changes some node to RS485. Setup
Fhem 5.8
Mysensors 2.2rc1
Gateway Arduino Nano USB to Fhem.
Nodes 5PC all Ardunino MiniPro; Energymeter, Relay,Temp DS18B20What I found out is, the nodes are not hanging. There is no communication to the gateway. Reboot the node does not help. After a reboot of the node pairing does not work.
Reconnect the gateway to Fhen does not help.
AFhem restart works ( shudown restart). After the restart all nodes are appearing them self.Today I will build a new gateway, using ESP8266 and RS485.
I have used USB gateway before, with several problems. After i changes the gateway for NFR24 and RFM69 from USB to ESP8266 most of the problems are gone.I hope this will work. If not RS485 ist history and for this case Myssensors will be exchanges to 1-Wire.
It takes one or two days to get the first results.
Have a good day
Stefan
-
Hi to all,
i have the same problem since i changes some node to RS485. Setup
Fhem 5.8
Mysensors 2.2rc1
Gateway Arduino Nano USB to Fhem.
Nodes 5PC all Ardunino MiniPro; Energymeter, Relay,Temp DS18B20What I found out is, the nodes are not hanging. There is no communication to the gateway. Reboot the node does not help. After a reboot of the node pairing does not work.
Reconnect the gateway to Fhen does not help.
AFhem restart works ( shudown restart). After the restart all nodes are appearing them self.Today I will build a new gateway, using ESP8266 and RS485.
I have used USB gateway before, with several problems. After i changes the gateway for NFR24 and RFM69 from USB to ESP8266 most of the problems are gone.I hope this will work. If not RS485 ist history and for this case Myssensors will be exchanges to 1-Wire.
It takes one or two days to get the first results.
Have a good day
Stefan
@Stefan_NE
Strange, imo using a serial GW is the most reliable option. (May be different in case the nano is a bad fake and problems with CH340G-nanos are also reported in VM environments.Are all nodes reffering to the korrect IO and how is the RS485-WG defined?
Explanation: I also use a second GW and in some cases nodes are assigned to the wrong GW. If you use several /dev/ttyUSBx-defines, the IO may not be functional. See output of "ls -l /dev/serial/by-id".Most likely there is a electrical problem on your bus. Did you measure voltages, esp. between A+B? What type of modules or pcb's do you use?
-
@rejoe2
good idea, that's what is thought meanwile, and i setup a serial gateway with an original Arduino Mega using hardware serial for the rs485. This is running now for 90 min. Let's see what happens. The serial gateway is the only one i use. All other nodes gateway are using wireless gateways. -
If you have more than one MySensors-GW's defined, imo it doesn't matter what type they are. Under some circumstances, nodes may be routed through the "wrong" GW. I would recommend to check that first (may be irritating, but even with the wrong GW assinged as IO, some readings are nevertheless updated when node is reset (presentation info)).
Hardware serial is a good idea, but at least according to my personal experience (and opposite to my estimations in the beginning) my (FTDI-Nano-) GW is one of the most reliable parts in my MySensors-RS485-environment.
Node_2 - my "troublemaker Nr. 1" - also performs reliably now (running without issues since 5+ days) since switched to HW-serial. But also a altsoftserial-Node with BME280 works at the same level of reliability for several weeks now (with less free memory left!) .
Powering issues and capacitors may also be helpful as @nofox suggested. I may do some tests wrt this after switching to HW-serial for Node_2 in case it's still not performing as expected.Last: What modules do you use? In case of the LC-Tech ones, I would recommend to desolder at least the 120Ohm resistor on the "middle" nodes.