RS485 nodes stop sending data after some hours or days



  • Hi all,

    my serial GW recently stops transfering messages to the controller (FHEM) after several hours of operation (or even days). As soon as a "connect"-command on FHEM is issued, new messages from my nodes are processed again, but after a longer period, it will fail again. The "connect" seems to cause a complete reboot of the GW, despite there is no change of date wrt to the initial binding in the filesystem in linux (date in ls -l /dev/serial remains unchanged). Rebooting all or individual nodes does not have any effect.

    Does anyone else have similar observations?

    Some background information:

    • All nodes+GW use MySensors 2.2.0-beta, programmed via Arduino-IDE@linux

    • GW-Arduino is FTDI-based (seems to be no fake - I changed the USB identifiers -, Test-PIN is connected to ground)

    • Everything was fine for weeks using just one of my nodes (Node_1) and the respective GW

    • Problems started as soon as I added two new nodes some days ago.

    • Wiring:
      -- All nodes are wired on just one line, no stubs (had one first to Node_2, but changed this already)
      -- cable (CAT7) starts at GW, 15-20m (just one pair for data) to Node_2, 6-7m to Node_1 (+12V for power supply, provided at Node_1), 8-9m to Node_3 (again: data+12V)
      -- beside the screwed connections to the RS485-modules, there exist one or two additional connections via small WAGO clamps (4 in total until now, at least one in between every node)
      -- all nodes+GW have the "long" modules (chinese source, ebay), all resistors still in place.

    -Nodes (mentioned only the sending childIDs):
    -- Node_1: 7 DS18B20, 1 Counter = 10 infos to be sent every 5 minutes. (some of them in some cases a bit more often)
    -- Node_2: 12 DS18B20, 1 Counter = 15 infos to be sent every 5 minutes (again additional things like 3x motion and a switch when relevant)
    -- NODE_3: BME280 = 3-4 infos every minute

    • Power:
      -- GW is powered by an active USB hub, that also powers 4 other Arduino-based devices
      -- all other nodes are powered by the mentioned 12V-line using just internal regulator (Node_3) + an additional adjustable step-down-module (ok for up to 36V DC in).

    • Some delay in sending is already implemented in Node_2, this seems to help a bit (I originally was convinced to have powering issues with this node, so I applied the known workaround from nRF-tranceivers. But in fact, it seems to work properly and infos are lost on GW side).

    • All Nodes have #define MY_TRANSPORT_WAIT_READY_MS xxx, xxx beeing different on each node (3, 15, 30 seconds)

    Possible root causes and next steps:

    • Change GW-HW (Arduino+Transceiver module), maybe I damaged the later when making test while adding the new nodes (really doubt this now wrt the longer times of correct operation, but we will see).
    • Replace the adjustable power modules by some LMS1117 modules (should provide more power), esp. to Node_2 , where I can see that one of the 3 buses for DS18B20 (5 Sensors) is also not working reliably
    • Try to reduce the amount of data send by the nodes by adding some delay?
    • your ideas...

    Will keep you updated in any case!



  • Short update on the issue, as I tried, how the things go when Node_1 is not online:
    Everything seemed to be fine until yesterday afternoon. Then both remaining nodes stopped working, but not at the same point in time:

    • last info from Node_2 was received around 5 pm, (all 8 values that should be updated have been received)
    • Node_3 send last infos at 7 pm (2/3 Values, temp+hum) resp. 7:30 pm (pressure).

    I then tried the GW-reboot as mentionned in post#1, but still I don't get updates from the nodes.
    Conclusion: seems not to be a GW problem as originally suspected, but what else?!?

    I now will also depower and reboot the remaining two nodes and then see, when they will fail next and have a look in the log of Node_2, if it was sending as regularly as should until fail.

    Any ideas how to solve this nasty problem?


  • Mod

    Did you put termination resistors at both ends of the bus? I remember there was a suggestion to modify the rs485 library to send 3 times the header message before sending the payload in order to avoid collisions.



  • Along the lines of what @gohan was saying about termination resistors, many of the modules that you buy these days have the termination resistors built in. The image below shows examples of two modules and their termination resistors.
    0_1505436639938_upload-cf16fba6-468a-4b3a-acc4-bedd3f620193
    If you have multiple devices on your RS485 bus, typically only your ending node on the bus should have the termination resistors. This image shows a typical RS485 master/slave bus with termination.
    0_1505437525019_upload-ba2d3e83-07b1-4d63-911a-f95c323e2d08
    If slaves 1, 2 or 3 had termination resistors, there is the potential that the bus signal could get attenuated to the point of dropping off. Having only a few devices on the bus all with termination resistors may work. You may have drop offs though as you are seeing in your case. The more devices you put on your bus. the greater the chance of attenuation if the resistors are in place. You may want to try removing the termination resistors on your middle nodes if any and see if that fixes your problem. Take note of where these resistors are in place in the event that you may need to re-solder these to the module. On the two modules shown above you have 20K ohm (203) resistors that go from VCC to B and GND to A, and then there is a 120 ohm (121) resistor between A and B. You need to remove all 3 for the middle nodes.



  • @dbemowsk & @gohan Thanks for pointing me back to the resistor topic, my recent observations also directed towards problems in the electrical design of my bus.
    The modules used are all similar to the second on @dbemowsk's picture (got two versions differing slightly in colour), as already stated in post#1, all resistors are still present.
    The irritating thing is the bus working for hours before problems become visible.

    ( I missed the most recent discussion on resistors and rs485 in the "Build"-section, sorry for that).

    So first step will be to use two desoldered modules for nodes 1&2, shouldn't be a big issue to change these. I'll keep you updated.

    @gohan Just in case this doesn't lead to a permanently working soulution: Do you have a link to the suggestion of sending the header?



  • Those 20K resistors are so small factor that removing them might not be necessary but removing extra termination is. I'm also using 600ohm pull-ups and pull-downs in the middle of the bus but I think 1k might also be ok and it needs little less juice from vcc.
    This might be helpfull: http://alciro.org/tools/RS-485/RS485-resistor-termination-calculator.jsp

    @gohan said in Serial RS485 Gateway stops receiving after some hours or days:

    I remember there was a suggestion to modify the rs485 library to send 3 times the header message before sending the payload in order to avoid collisions.

    For the library code i did this modification to MyTransportRS485.cpp

    Added this to beginning of the lib file:

    #if !defined(MY_RS485_SOH_COUNT)
      #define MY_RS485_SOH_COUNT 3
    #endif
    

    And in the sending code:

    // Start of header by writing multiple SOH
    for(byte w=0; w<1; w++) {
        _dev.write(SOH);
    }
    

    Changed to this:

    // Start of header by writing multiple SOH
    for(byte w=0; w<MY_RS485_SOH_COUNT; w++) {
        _dev.write(SOH);
    }
    


  • So one more intermediate update:

    • Desoldered all resistors (R5 to R7 on the LC-tech rs485-modules) on nodes 1&2 (was not much more work, so also the 20k's, just to be sure...) and added pullup/pulldown 1k's on Node_2 (that is somewhere in a middle position wrt. other planed nodes and near the 12V power source) as proposed.
    • Didn't change anything on the code yet.

    Until now, everything looks fine, I get updated values as expected from all of the three.

    I'll keep you updated, but (I hope so) this will take some time to have longer-term-results.



  • (Changed title because problem seems not to be the GW only)

    Another update on the topic, unfortunately not with good news:

    • I added one more node (Node_4) at the end of my wiring. This is equipped a "full-resistor"-version (LTtech-) module (just 2 Motion sensors attached, not regularly reporting data)
    • Resistors on Node_3 have been removed. So only GW (cable start) and Node_4 (wire end) have 120Ohm (and other) resistors on the RS485 modules, all other resistors have been removed. Additionally pullups/pulldowns at Node_2 (2*1k) are installed.
    • Code base still is "standard"

    Everything starts fine, if I put the 12V on (this powers all my nodes together). Last time I did this was yesterday around 5:30 pm. Today's findings:

    • Node_1 still is sending data as expected, code see here
    • Node_2 stopped transmission around 7:00 pm
    • Node_3 (BME280) sent last data in around half an hour later. I tried to bring it back online by pressing the reset button, but that didn't have any effect.
    • Node_4 seems to be still online, last motion was reported today 7:13 am

    Conclusions and working hypothesis for now:

    • The Bus itself seems to be ok, also the GW. Or did I miss something essential?
    • Also all transceiver modules in general are working (especially no hardware defect(?)), but at some point in time they fail and cannot be reset other than by depowering them.

    So next step will be to apply @pjr's modified version of MyTransportRS485.cpp...



  • @rejoe2 said in RS485 nodes stop sending data after some hours or days:

    So next step will be to apply @pjr's modified version of MyTransportRS485.cpp...

    That should only help in case of collisions.

    Wondering if you could check if those "dead" nodes are trying to send anything by attaching USB-RS485 adapter to the bus and check if there is any activity after pressing reset on "dead" node.

    Other helpful thing could be a "datalogger" for testing nodes that will die eventually: https://forum.mysensors.org/topic/6340/debug-to-a-sd-card-module



  • @pjr Thx for the hint with the USB-RS458-Adapter, I will try to find out if I get additional infos by this means. Building a SD-Card-logging node would be a new experience to me (the necessary hardware for both is laying around)...

    Wrt to collisions: To me, it is not unlikely this is the root cause of my troubles: two of the nodes are sending a unusual (?) high amount of data and using more or less the same timing (300000 ms), the third is sending every minute, so some overlap in transmission timings is most likely at some point in time.
    This also is correlling with a recent finding: Sometimes I have missing data for a longer periode, but then again singals come in until finally the node seems really to stop all transmissing actions.

    So there seems also to exist some kind of buffering on this type of transceiver. Could failure also be related to a kind of buffer-overload? Is the arduino expecting some kind of feedback from the transceiver or just writing data to it as it would do to any serial line?

    One additional thought: Could results be more reliable if I use a higher baud rate on the bus? This measure should shorten transmission times and be ok wrt. the length of my bus. First trial could be with 38400.

    But this would not help if any buffer overflow leads to blocked transceivers, only available time slots for sending data would be increased.

    What to do first?


Log in to reply
 

Looks like your connection to MySensors Forum was lost, please wait while we try to reconnect.