Looking to incorporate fail-over into my setup



  • So between yesterday and today I experienced a catastrophic gateway failure that took my house down. It seems as if the radio regulator and the pro mini failed. Which one went first I am not sure. The pro mini seems to have only partially died because when it was plugged in to the FTDI adapter, the light on the pro mini was on. The failed board however caused the FTDI adapter to lock up to the point where the Vera controller couldn't see it. The gateway is built on an Easy Newbie board, so I ended up desoldering the 3.3v regulator and replacing that and flashed a new pro mini and I was back up and running.

    A good number of my lights in the house are controlled with my Vera controller and rely heavily on my MySensors gateway. My basic setup for a light uses one of my scene controller in-wall switches that talks to my Vera which uses scenes to send a signal to the light which at the moment, most are controlled with Sonoffs.

    Where I am going with this is that I am looking for ideas on how to add some fail-over if the gateway were to go down again. I may need to switch out some of the Sonoffs and create MySensors relay nodes to replace them. One question is, can I have my scene switches do node to node communication and still report status back to the gateway? This would effectively do away with the Vera as the man in the middle setup that I have right now. Vera could still be the controller to handle the advanced automation, but the lights would still be able to function if that connection were to fail. Or does anyone have a better way of fixing this situation?



  • You can do node-to-node communication and keep the gateway informed. The sending node just needs to know the ID of the receiving node, and the receiving node has to handle the message. I do this with my doorbell button. It was too slow sending a button press to the gateway and then having the gateway turn on the ringer (the button and ringer are 2 different MySensors nodes). Now I have the button send a message directly to the ringer, then afterwards send a message to the gateway to let it know the button was pressed. It works well.

    You might have to experiment to see which type of variables works best with the data you are sending. I send a V_PERCENTAGE message with a number as the value. When the ringer gets a V_PERCENTAGE, it looks at the number and knows which button was pressed and which sound it should play.

    I usually use signing on my nodes, but it is more difficult when going node to node. For now, I just set my ringer node to not request signing.



  • well, personally I think that the best is to design the home automation system in such a way that it will be usable in case of network, automation hardware or even electricity (to some extent) failure. I mean there always should be a manual override. For example for every light in my house i have a actuator with manual switch which can be controlled also with remote controll - of course if the actuator fails this light will also be not available but only this one. The controller only adds to this like scenes and other rules. I plan also to have motorized blinds and want to use something like Axis Gear which also allows to manually control blinds. The only thing in my house that is not usable without the controller are the floor lights (triggered by motion sensor) but i can live without them.



  • @nagelc Thanks for the info. As mentioned in my OP, I currently use my MySensors scene switches with sonoffs connected to my Vera. You mentioned how your doorbell was too slow sending to the gateway and having the gateway react.. I have been AMAZED with the setup that I have. For the sequence that needs to happen:

    • push the button
    • button sends request to Vera
    • Vera has to check the scene I have defined to act when the button is pressed to send signal to the sonoff
    • signal is sent to the sonoff to turn the light on/off

    All this happens in a split second for the most part. Most times it is near instantaneous. If I push a button to turn a light on, and immediately after push a button to turn it off, the light will come on immediately, but it will take a few seconds for it to go off.

    If I am going to do node to node stuff though, I am going to have to switch out the sonoffs for MySensorized nodes to be able to do the node to node stuff.



  • Imo all ways to get around this kind of problems are not perfect.

    • node to node isn't really flexible and kind of hard-coding. It requires at least a pretty good documentation of all interdependancies, but this kind of infrastructure is - at least to some extend - not relaying on other than the MySensors-part (if the nodes start up without connection to a controller!). Once this is installed, one shuld be carefull about changes.

    • troubleshooting becomes more complex, if there are a lot of (different) elements in the chain. Especially using WiFi requires a lot of underlaying infrastructure to work, so I personally try to avoid any dependencies to it (apart from some comfort functionality).

    • My personal aproach ist to try to get some kind of "isolated logic" on single nodes and let them act indepedently. E.g. PIR function on one node, not seperate pir node + relay node + the node will act as dark, if there is no update on light level for more than ... hours. Stuff like that.

    • Wrt. to the disfunctionality of important (IO) hardware: I try to have a second one by hand (tested as working).



  • @dbemowsk Interesting. I am using Domoticz and it usually works well. Maybe it is something in my setup. It would be nicer to have the controller doing the work as in your setup.

    In Domoticz, there is a setting to reset the gateway if it doesn't receive any messages after a given time. I don't know if Vera has a similar setup. It would be great if you could have the Controller fail over to a different gateway instead of doing the reset. That way you could have a backup gateway ready to take over. I don't think Domoticz can do this, but perhaps there is a way with macros. Might be a good feature request.



  • @nagelc I think an alternate gateway is a good idea. Redundancy is never a bad thing in something like this. If an alternate is defined, all nodes would need to know the ID of the alternate, for obvious reasons. The alternate gateway would also need to have all of the same control as the main gateway. A node would need to have a set of rules for dealing with the alternate if the main was not accessible. I think it would be something doable.


  • Hero Member

    @dbemowsk

    Maybe it was a surge that knocked out your gateway. If you can run it off solar power, then you'll be immune to the power surges that can happen to mains connected devices. After all, most surge protectors don't really work all that well. Alternately, you could run off a battery that rapidly charges from mains and then disconnects. Properly done, that would considerably reduce your risk exposure to surges. Then, connect your gateway using wi-fi rather than ethernet, and you become almost unassailable.


  • Hero Member

    Other thoughts: you could have a watchdog which activates a backup gateway if the first one becomes unresponsive. It would make for an interesting project. Probably not too difficult either.


 

293
Online

7.6k
Users

8.5k
Topics

91.2k
Posts