[SOLVED] GatewayESP8266MQTTClient - Recovery failure after gateway outage



  • Hello,

    I am using the "GatewayESP8266MQTTClient" and "LightSensor" Sketch with default settings.

    My System: MySensorsNet 2.0.0, Arduino IDE 1.6.10 hourly (22.07.2016), Windows 8.1 64bit

    I tried to find out how robust this setup is so i forcefully reset the Gateway (power off) and waited until the "Light Sensor" node entered failure state:

    **
    "Light Sensor"-Node**

    !TSP:SEND:TNR<\n>
    Send failed!<\r><\n>
    !TSP:SEND:TNR<\n>
    vcc: 3280<\r><\n>
    511, 1017<\r><\n>
    !TSP:SEND:TNR<\n>
    Send failed!<\r><\n>
    !TSP:SEND:TNR<\n>
    vcc: 3280<\r><\n>
    511, 1015<\r><\n>
    !TSP:SEND:TNR<\n>
    Send failed!<\r><\n>
    !TSP:SEND:TNR<\n>
    vcc: 3280<\r><\n>
    511, 1017<\r><\n>
    !TSP:SEND:TNR<\n>
    Send failed!<\r><\n>
    !TSP:SEND:TNR<\n>
    vcc: 3280<\r><\n>
    511, 1015<\r><\n>
    !TSP:SEND:TNR<\n>
    Send failed!<\r><\n>
    TSM:FPAR<\n>
    TSP:MSG:SEND 10-10-255-255 s=255,c=3,t=7,pt=0,l=0,sg=0,ft=0,st=bc:<\n>
    !TSP:SEND:TNR<\n>
    vcc: 3280<\r><\n>
    511, 1018<\r><\n>
    !TSP:SEND:TNR<\n>
    Send failed!<\r><\n>
    !TSP:SEND:TNR<\n>
    vcc: 3280<\r><\n>
    511, 1017<\r><\n>
    !TSP:SEND:TNR<\n>
    Send failed!<\r><\n>
    !TSP:SEND:TNR
    

    After the Gateway is up again the "Light Sensor"-Node still can't find back the Gateway

    And the gateways is somehow doing nothing .. at least I do not understand what it is doing.

    .scandone<\n>
    state: 0 -> 2 (b0)<\n>
    state: 2 -> 3 (0)<\n>
    state: 3 -> 5 (10)<\n>
    add 0<\n>
    aid 1<\n>
    cnt <\n>
    <\n>
    connected with FRITZ!Box Fon WLAN 7390, channel 11<\n>
    dhcp client start...<\n>
    .............ip:192.168.178.67,mask:255.255.255.0,gw:192.168.178.1<\n>
    .IP: 192.168.178.67<\r><\n>
    0;255;3;0;9;No registration required<\n>
    0;255;3;0;9;Init complete, id=0, parent=0, distance=0, registration=1<\n>
    IP: 192.168.178.67<\r><\n>
    0;255;3;0;9;Attempting MQTT connection...<\n>
    0;255;3;0;9;MQTT connected<\n>
    0;255;3;0;9;TSP:MSG:READ 10-10-255 s=255,c=3,t=7,pt=0,l=0,sg=0:<\n>
    0;255;3;0;9;TSP:MSG:BC<\n>
    0;255;3;0;9;TSP:MSG:FPAR REQ (sender=10)<\n>
    0;255;3;0;9;TSP:CHKUPL:OK (FLDCTRL)<\n>
    0;255;3;0;9;TSP:MSG:GWL OK<\n>
    0;255;3;0;9;!TSP:MSG:SEND 0-0-10-10 s=255,c=3,t=8,pt=1,l=1,sg=0,ft=0,st=fail:0<\n>
    pm open,type:2 0<\n>
    0;255;3;0;9;TSP:MSG:READ 10-10-255 s=255,c=3,t=7,pt=0,l=0,sg=0:<\n>
    0;255;3;0;9;TSP:MSG:BC<\n>
    0;255;3;0;9;TSP:MSG:FPAR REQ (sender=10)<\n>
    0;255;3;0;9;TSP:CHKUPL:OK<\n>
    0;255;3;0;9;TSP:MSG:GWL OK<\n>
    0;255;3;0;9;!TSP:MSG:SEND 0-0-10-10 s=255,c=3,t=8,pt=1,l=1,sg=0,ft=0,st=fail:0<\n>
    0;255;3;0;9;TSP:MSG:READ 10-10-255 s=255,c=3,t=7,pt=0,l=0,sg=0:<\n>
    0;255;3;0;9;TSP:MSG:BC<\n>
    0;255;3;0;9;TSP:MSG:FPAR REQ (sender=10)<\n>
    0;255;3;0;9;TSP:CHKUPL:OK<\n>
    0;255;3;0;9;TSP:MSG:GWL OK<\n>
    0;255;3;0;9;!TSP:MSG:SEND 0-0-10-10 s=255,c=3,t=8,pt=1,l=1,sg=0,ft=0,st=fail:0<\n>
    0;255;3;0;9;TSP:SANCHK:OK<\n>
    0;255;3;0;9;TSP:SANCHK:OK<\n>
    0;255;3;0;9;TSP:MSG:READ 10-10-255 s=255,c=3,t=7,pt=0,l=0,sg=0:<\n>
    0;255;3;0;9;TSP:MSG:BC<\n>
    0;255;3;0;9;TSP:MSG:FPAR REQ (sender=10)<\n>
    0;255;3;0;9;TSP:CHKUPL:OK<\n>
    0;255;3;0;9;TSP:MSG:GWL OK<\n>
    0;255;3;0;9;!TSP:MSG:SEND 0-0-10-10 s=255,c=3,t=8,pt=1,l=1,sg=0,ft=0,st=fail:0<\n>
    0;255;3;0;9;TSP:MSG:READ 10-10-255 s=255,c=3,t=7,pt=0,l=0,sg=0:
    

    I can see that the gateway is receiving some messages from the node but it "fails?!" to respond accordingly?!

    Anybody could enlighten me?

    Gatewaycode = 99,99% vanilla except credentials and mqtt broker (mosquitto)

    "light sensor" node code only for completenss, ignore the stuff that is commented out

    #define MY_NODE_ID 10 <--- manual node id is set!

    #include <Streaming.h>
    
    
    /**
     * The MySensors Arduino library handles the wireless radio link and protocol
     * between your home built sensors/actuators and HA controller of choice.
     * The sensors forms a self healing radio network with optional repeaters. Each
     * repeater and gateway builds a routing tables in EEPROM which keeps track of the
     * network topology allowing messages to be routed to nodes.
     *
     * Created by Henrik Ekblad <henrik.ekblad@mysensors.org>
     * Copyright (C) 2013-2015 Sensnology AB
     * Full contributor list: https://github.com/mysensors/Arduino/graphs/contributors
     *
     * Documentation: http://www.mysensors.org
     * Support Forum: http://forum.mysensors.org
     *
     * This program is free software; you can redistribute it and/or
     * modify it under the terms of the GNU General Public License
     * version 2 as published by the Free Software Foundation.
     *
     *******************************
     *
     * REVISION HISTORY
     * Version 1.0 - Henrik EKblad
     * 
     * DESCRIPTION
     * Example sketch showing how to measue light level using a LM393 photo-resistor 
     * http://www.mysensors.org/build/light
     */
    
    #define MY_NODE_ID 10
    #define MY_BAUD_RATE 9600
    
    // Enable debug prints to serial monitor
    #define MY_DEBUG 
    
    // Enable and select radio type attached
    #define MY_RADIO_NRF24
    //#define MY_RADIO_RFM69
    
    #include <SPI.h>
    #include <MySensors.h>  
    
    #define CHILD_ID_LIGHT 0
    #define LIGHT_SENSOR_ANALOG_PIN A3
    
    unsigned long SLEEP_TIME = 1000; // Sleep time between reads (in milliseconds)
    
    MyMessage msg(CHILD_ID_LIGHT, V_LIGHT_LEVEL);
    int lastLightLevel;
    
    //#include "WatchdogAVR.h"
    //typedef WatchdogAVR WatchdogType;
    //WatchdogType Watchdog;
    
    void setup()
    {
      //Watchdog.disable();
      //int countdownMS = Watchdog.enable();
      
      //Serial.print("Enabled the watchdog with max countdown of ");
      //Serial.print(countdownMS, DEC);
      //Serial.println(" milliseconds!");
      //Serial.println();
      
      //Serial.begin(57600);
      Serial.println("setup() begin");
        // LightSensor
      pinMode(A3,INPUT_PULLUP);
      pinMode(A2,OUTPUT);
      digitalWrite(A2,LOW);
    }
    void presentation()  {
      // Send the sketch version information to the gateway and Controller
      sendSketchInfo("Light Sensor", "1.0");
    
      // Register all sensors to gateway (they will be created as child devices)
      present(CHILD_ID_LIGHT, S_LIGHT_LEVEL);
    }
    
    void loop()      
    {     
      static long vcc = readVcc();
      static int vccpercent = map(vcc,2400,3007,0,100);
      sendBatteryLevel(max(min(vccpercent,100),0),false);
      Serial << "vcc: " << vcc << endl;
      // Required for ack
      //wait(100);
      
      analogRead(LIGHT_SENSOR_ANALOG_PIN);
      int lightLevel_raw = analogRead(LIGHT_SENSOR_ANALOG_PIN);
      int lightLevel = (1023-lightLevel_raw)/10.23; // as of 1023 !!
      lightLevel = 511;
      Serial.print(lightLevel);
      Serial.print(", ");
      Serial.println(lightLevel_raw);
      //if (lightLevel != lastLightLevel) {
          if(!send(msg.set(lightLevel),true))
            Serial.println("Send failed!");
          lastLightLevel = lightLevel;
      //}
      //Serial.print("RETR=");
      //Serial.println((0x0F & RF24_retrycount()));
      wait(100);
      sleep(SLEEP_TIME);
    }
    // https://forum.mysensors.org/topic/3463/m_ack_variable-or-m_set_variable/2
    void receive(const MyMessage &message) {
      if (message.isAck()) {
          Serial.println("This is an ack from gateway");
          //Serial.println("Reset Watchdog!");
          //Watchdog.reset();
          }
    }
    

    I got headache from this line:

    0;255;3;0;9;!TSP:MSG:SEND 0-0-10-10 s=255,c=3,t=8,pt=1,l=1,sg=0,ft=0,st=fail:0<\n>
    

    After the node requsts a parent .. isn't the node supposed to listen for the response? Where should this happen in the code?!


  • Admin

    @cimba007 I assume

    wait(100)
    sleep(SLEEP_TIME)
    

    is too short to fully reconnect.

    Try this code instead (this prevents the node from sleeping if transport is not operational)

    if(isTransportOK()){
        wait(100);
        sleep(SLEEP_TIME);
      } 
      else {
        wait(SLEEP_TIME);
      }
    


  • @tekka I will try it at once! I was thinking about something like this too. Thanks for pointing out that using isTransportOK() might be the solution.

    A quick first test is looking very good. :relieved:

    Is it possible to include this wait in the send-function? The self-healing feature of the network would be completly transparent to the enduser. Maybe add some configurable timeout function or another paremeter to send?



  • I have the exact same problem: the node never reconnects to gateway after a gateway outage.

    In my case it is an ethernet MQTTClientGateway and a humidity node.

    I have experimented with your suggestion and found that 1000ms was still not enough for a full reconnect before the node goes to sleep, so I made it 3000ms. This seems to allow a reconnect all the time (at least in my environment).

      // @TODO remove after testing gateway outage
      if (!isTransportOK()) {
        #ifdef MY_DEBUG
        Serial.print("Transport ERROR. Waiting for a proper transport reconnect");
        #endif
        wait(3000);
      }
    
      // Sleep for a while to save energy
      sleep(UPDATE_INTERVAL); 
    

    I am unsure why MySensors doesn't have this self-healing element built-in. Maybe it is only a bug, maybe it has some deeper reason.


Log in to reply
 

Looks like your connection to MySensors Forum was lost, please wait while we try to reconnect.