MySensors gateway and network reliability



  • I've been working with MySensors for several years and have 20 active nodes running doing all sorts of jobs.
    One thing I've constantly battled with is reliability. Missed messages are the primary problem. To fully understand the scale of the problem I wrote an app for my controller (Home Assistant) that sends a VTEXT message 'PING' to which my nodes will reply 'ACK'. It does this every 10 seconds. I can then visualise this data in Grafana to get plots of reliability:
    0_1573291105114_2019-11-09 10_17_24-Event logs (Ping Tests) - Grafana.png
    I was using a Raspberry Pi gateway for a long time but I'd get quite a few drop outs sometimes with nodes down for hours at a time sometimes. I'm now using a Arduino Mega Ethernet gateway and after trying a few RF24 radios (and buying recommended good quality ones) found I get very few drop outs - although as you can see i'm still getting some. But more annoying is the whole gateway dropping out like it did this morning - on the first day of being away for a week. Hence all are marked FAIL currently. The 'fix' is to hit the reset button on the Mega and it'll go again. But that's not possible 1000+ miles away... It only totally fails every few weeks at the most but still no good as now all my sensors are off. Very glad its now cold or my garden irrigation would be down!
    I'm writing to ask what tips others have on improving MySensors reliability. Even a short drop out is annoying or in the case of door sensors renders my security system useless. But the whole gateway going is very annoying. I switched away from the Pi because it would crash from time to time but didn't expect the Mega to crash too. Perhaps I could build some device to hard reset the device if the gateway goes down. I can still ping the gateway but no controllers can connect to it.
    As for the nodes perhaps I could add onto my sketches a way to get a confirmation that the message has arrived at the controller otherwise keep trying x numbers of times until it does. For temperature sensors there's no point but door/window sensors... or motion sensors perhaps too.

    What are others doing here to improve their MySensors network? I already switched to channel 50 on the MyS network but haven't experimented much further with channels as its a big job to reprogram every node to change channels. All my nodes use the Newbie PCB with caps on the radios, etc. I try to use good power supplies (from Ikea mostly). All nodes/gateway use MySensors 2.3.1. I'm in an apartment block in the city centre so I'm sure there's a lot of radio and electrical noise. But my wifi always works (well almost always) and I'd like MySensors to be at least mostly reliable especially the gateway.

    Bit of a long rambly message but any tips very welcomed!



  • @nick-willis I have been battling exactly the same problem for a couple of years now and it is the reason I have not built any new nodes for months now.

    Whilst things good enough for knowing weather conditions and expreimenting I have not had sufficient confidence to put more critical things into place like intruder alarms, heating, lighting, cirtical equipment alarms etc.

    Stability and lack of a clear way to make sure a message is received and acted on are both areas that need urgent attention IMO.

    I have had nrf24l01+ gateway with attached to pi 3 directly, with a promini via serial converter and now have a uno connected via USB. All three setups had dropout and occasional hang-ups where http and ssh access ceased mysteriously... Always needing a reboot of the pi to get things going again.

    I have tried many different SD cards (including brand new ones) and now boot and run from a SSD. Still get problems.

    I have tried different power supplies and power cables and have made small capacitor boards to filter spikes and add reserve capacity. Still get problems.

    I have replace the RF modules many times (all 'good' ones according to the threads on here from cdebyte). Still get problems.

    I have added ceramic caps to all radio boards (nodes and gateway). Still get problems.

    I have soldered a lot of connections instead of using dupont connectors, still get ptoblems.

    I have moved the gateway/controller to another room in the house. Still get problems.

    I have added watchdogs to the pi and gateway, still get problems.

    I wish I could suggest something new but I am out of ideas now.



  • @nick-willis I realise this helps neither you nor @skywatch over the technical issues faced, but perhaps a change of band may be worth considering?

    Despite reading about the many perfectly reliable setups when I started out over 2 years ago, it was the problem ones and ever crowded Wifi which swayed me away from the recommended NRF toward rfm69s and 433MHz. Not a single problem aside power failures on the Gateway, but the battery powered Nodes always recovered 100% on restoration of power to the Gateway, now solved permanently with an UPS.
    After 2 years, I'd have given up and thrown something in the bucket...
    Good luck.



  • Thanks for the responses and kind of what I expected. The lack of 'guarantee of service' is a big big problem in MySensors. My system works 'most' of the time but that isn't good enough for anything critical. And with no real pattern to getting things reliable it seems we can only get 95% of messages through at very best.
    I really like the concept of MySensors especially as buying lots of 'smart' devices that go on the IP network worries me from a security point of view. I like that you can just hook up any old sensor to an Arduino and get data logged and automations going very quickly.

    Switching to RFM69 I have considered but I'd basically be starting again as all my existing nodes would need new PCBs + radios. I guess I could try it out then shift over the 'critical' stuff to the reliable network but keep the existing one up for environmental sensing. It's hard when its basically working 'most' of the time. But would that stop the gateway from crashing though? I don't know why it fails so its tricky to know how to respond.



  • @nick-willis + @zboblamont I used nrf24's before I discovered mysensors and never had an issue with them. A DC/DC ups is on my todo list, hopefully by months end.



  • @skywatch "A DC/DC ups is on my todo list..." - After one of the frequent power cuts here finally screwed up the HDD and Domoticz, the DC/DC UPS finally came off my to-do list. The Gateway/Controller has sailed through many outages since, and luckily only lost 10 days data and battery voltage on the Nodes, presumably retrying calls to a dead end. As with backups, lesson learned.
    MySensors and Domoticz were my initial objective, and having established the crude limits where my mobile could pick up Wifi outside, that cast doubt on 2.4GHz comms, and problems reported by some others convinced me to go with the RFMs in preference to the recommended NRFs. 433MHz offered simpler protocol, better structural penetration, and occupied a less "busy" part of the spectrum.
    The router through which I access the Pi suffers an occasional brainfart which I cannot fathom but tolerate as it's non-critical, unlike Node/Gateway comms.
    Not knocking NRFs or 2.4GHz or MySensors, plenty happily use this combo reliably, but problems such as @Nick-Willis and yourself were what put me off it at the start.


  • Hero Member

    @nick-willis On my leak detectors, which I consider to be in the must-not-fail category, I installed an external watchdog, namely TLP-5010:
    https://www.openhardware.io/view/534/Extremely-Simple-Arduino-Pro-Mini-LoRa-Water-Leak-Detector

    I thought it might be overkill, but listening to you guys, maybe not. Anyhow, overkill is better than underkill. 🙂



  • @neverdie No offence intended, but I gathered watchdog was tried and failed on at least one installation, it appears to be the NRF comms medium itself which is common to both complainants, neither of which your setup uses.
    Or were you suggesting an external watchdog (if not already trialled) might shed some light on their issues?



  • I guess the idea of the watchdog is to keep checking the Arduino to see if its responding and if not then do a hard reset.
    I was trying to make something similar for my router as although it almost never crashes, if it does its slways when I'm away!
    Certainly the MySensors gateway that fails every few weeks would certainly benefit from this so will look into the TLP-5010 and try to find a way to just isolate the reset portion of the rather cool looking water leak detector.


  • Plugin Developer

    This is actually a good reminder to implement a watchdog on the gateway 🙂

    // Update:
    Here's my new code:

    /*
    * 
    * The Candle receiver acts as the bridge between the Candle devices and the Candle Controller. 
    * 
    * It only allows communication with other Candle devices that use the same encryption password as it uses itself. 
    * When you install the Candle Manager, a random password is generated for you. If you ever want to change the encryption password used by your network, this can be done in the Candle Manager settings. 
    * Be warned that you will have to re-create this receiver as well as all your devices, since they will all need to have new code with the new password in it.
    * 
    * If you have already installed the MySensors add-on, please temporarily disable it before creating this receiver. Otherwise the MySensors add-on may try to connect to it during the creation process, and thus disrupt it.
    * 
    *
    * SETTINGS */ 
    
    // You can enable and disable the settings below by adding or removing double slashes ( // ) in front of a line.
    
    #define RF_NANO                                     // RF-Nano. Enable this if you are using the RF-Nano Arduino, which has a built in radio. The Candle project uses the RF-Nano.
    
    /* END OF SETTINGS
    *
    *
    *
    */
    
    
    // Enable MySensors debug output to the serial monitor, so you can check if the radio is working ok.
    //#define MY_DEBUG 
    
    #ifdef RF_NANO
    // If you are using an RF-Nano, you have to switch CE and CS pins.
    #define MY_RF24_CS_PIN 9                            // Used by the MySensors library.
    #define MY_RF24_CE_PIN 10                           // Used by the MySensors library.
    #endif
    
    // Enable and select radio type attached
    #define MY_RADIO_RF24
    //#define MY_RADIO_NRF5_ESB
    //#define MY_RADIO_RFM69
    //#define MY_RADIO_RFM95
    
    // Set LOW transmit power level as default, if you have an amplified NRF-module and
    // power your radio separately with a good regulator you can turn up PA level.
    //#define MY_RF24_PA_LEVEL RF24_PA_MIN
    //#define MY_RF24_PA_LEVEL RF24_PA_LOW
    //#define MY_RF24_PA_LEVEL RF24_PA_HIGH
    #define MY_RF24_PA_LEVEL RF24_PA_MAX
    
    // Enable serial gateway
    #define MY_GATEWAY_SERIAL
    
    
    // Mysensors advanced security
    #define MY_ENCRYPTION_SIMPLE_PASSWD "changeme"      // The Candle Manager add-on will change this into the actual password your network uses.
    //#define MY_SECURITY_SIMPLE_PASSWD "changeme"      // Be aware, the length of the password has an effect on memory use.
    //#define MY_SIGNING_SOFT_RANDOMSEED_PIN A7         // Setting a pin to pickup random electromagnetic noise helps make encryption more secure.
    
    // Mysensors advanced settings
    //#define MY_RF24_CHANNEL 100                       // In EU the default channel 76 overlaps with wifi, so you could try using channel 100. But you will have to set this up on every device, and also on the controller. You can even try 115.
    //#define MY_RF24_DATARATE RF24_250KBPS             // Slower datarate increases the range, but the RF-Nano does not support this slow speed.
    #define MY_RF24_DATARATE RF24_1MBPS                 // This datarate is supported by pretty much all NRF24 radios, including the RF-Nano.
    #define MY_SPLASH_SCREEN_DISABLED                   // Saves a little memory.
    
    #include <MySensors.h>                              // The MySensors library, which takes care of creating the wireless network.
    #include <avr/wdt.h>                                // The watchdog timer - if the device becomes unresponsive and doesn't periodically reset the timer, then it will automatically reset once the timer reaches 0.
    
    // Clock for the watchdog
    #define INTERVAL 1000                               // Every second we reset the watchdog timer. If the device freezes, the watchdog will not re reset, and the device will reboot.
    unsigned long previousMillis = 0;                   // Used to run the internal clock
    
    void setup()
    {
    	// Setup locally attached sensors
    
      wdt_enable(WDTO_2S);                              // Starts the watchdog timer. If it is not reset once every 2 seconds, then the entire device will automatically restart.                                 
    }
    
    void presentation()
    {
    	// The receiver does not have any extra children itself.
    }
    
    void loop()
    {
      if(millis() - previousMillis >= INTERVAL){        // Main loop, runs every second.
        previousMillis = millis();                      // Store the current time as the previous measurement start time.
        
        wdt_reset();                                    // Reset the watchdog timer
      }
    }
    
    
    /**
    * The MySensors Arduino library handles the wireless radio link and protocol
    * between your home built sensors/actuators and HA controller of choice.
    * The sensors forms a self healing radio network with optional repeaters. Each
    * repeater and gateway builds a routing tables in EEPROM which keeps track of the
    * network topology allowing messages to be routed to nodes.
    *
    * Created by Henrik Ekblad <henrik.ekblad@mysensors.org>
    * Copyright (C) 2013-2018 Sensnology AB
    * Full contributor list: https://github.com/mysensors/MySensors/graphs/contributors
    *
    * Documentation: http://www.mysensors.org
    * Support Forum: http://forum.mysensors.org
    *
    * This program is free software; you can redistribute it and/or
    * modify it under the terms of the GNU General Public License
    * version 2 as published by the Free Software Foundation.
    */
    

  • Mod

    @alowhum MySensors resets the watchdog every time loop() is exited, so there is no need to call wdt_reset() unless you do something that takes a long time. If you do stuff that takes a long time you should call wait() instead, so messages can be processed.



  • So I should be able to solve my gateway crashes easily by just doing the following in my gateway code:

    #include <avr/wdt.h>
    
    void setup()
    {
     wdt_enable(WDTO_2S);
    }
    

    And that's it?


  • Hero Member

    @nick-willis Sounds like it.

    My gateway runs on an ESP8266, which has its own watchdog enabled by the default ESP8266 Arduino code. i.e. on an ESP8266, even the blink demo program has watchdog enabled without having to do so explicitly. But on an AVR platform, you need to explicitly enable the watchdog or it won't be active.

    Maybe the mysensors code should enable watchdog by default? I bet a lot of people are running without a watchdog and don't even know it.


  • Plugin Developer

    @mfalkvidd said in MySensors gateway and network reliability:

    @alowhum MySensors resets the watchdog every time loop() is exited, so there is no need to call wdt_reset() unless you do something that takes a long time. If you do stuff that takes a long time you should call wait() instead, so messages can be processed.

    Very interesting! If I don't call wait() manually, when does the library process messages? Also at the end of the loop?

    Maybe the mysensors code should enable watchdog by default?

    Sounds good to me! What is the code/memory overhead of doing this?

    If the wdt_reset function is already being called in every loop, doesn't that imply that Arduino also already loads the entire watchdog code? Otherwise, wouldn't that cause a "function not found" error?


  • Hero Member

    @zboblamont What is it that you recommend instead? Maybe (?) there are other issues, but my take is that, at the very least, his watchdog isn't working. So, why not start with that? At least then maybe he wouldn't be locked out of his system while he's on vacation.

    But if you think he should put his effort elsewhere, then where exactly, and why?


  • Mod

    @alowhum said in MySensors gateway and network reliability:

    Very interesting! If I don't call wait() manually, when does the library process messages? Also at the end of the loop?

    Correct



  • @neverdie My query was that having read @skywatch previously failure with WDT, whether an external device offered a different methodology which the soft version did not address.
    Never experienced the problems these lads are trying to resolve as use different comms hardware which are 100% reliable, so no need to learn about WDTs. As I understood it the WDT initiates a reset, which if it clears the problem through re-initialisation implies a lockup on the comms to the NRF or within the NRF itself, since the processor and sketch should be fairly standard.
    Why SOME have problems but not others is a curiosity.



  • @nick-willis

    I also have " wdt_reset(); " in the loop part. I thought that would be needed to reset the WDT, but maybe I misunderstood?



  • I just thought I'd share this with you all in case someone has an idea about it.

    This morning I got no response from the pi via http or ssh. Usual situation when it 'crashes'. But this time I noticed that whilst everything was still powered on the network socket leds on the pi3 were out. Both of them completely unlit.

    So maybe the issue is with the networking on the pi? It would explain why both http and ssh always die at the same time. Maybe if someone else gets another outage they could check to see if the same condition applies?

    Another thing was that the 'act' led was flashing 7 times repeatedly. Apparently this happens if 'kernel.img' cannot be found. Anyone have an idea on why kernel.img might suddenly not be found on a pi running from ssd?



  • If you really enjoy fiddling with connection problems and gateways freezing you should try using the gsm gateway. At first it froze up every day or two. I now run a second mcu just to check connectivity to mycontroller and resetting the gateway when it fails. This sometimes happen a couple of times a day.


Log in to reply
 

Suggested Topics

44
Online

11.5k
Users

11.1k
Topics

112.7k
Posts