MQTT ethernet gateway Works great for 3+ hours then stops



  • I've built an ethernet gateway from scratch.. using the ENC28J60 Ethernet module and NRF24L01 Radio. I loaded the MQTT gateway into my ATMega268 and started it going...

    For the first few hours everything is working perfectly. I can see it receiving updates from the the sensors (Temperature for the moment), and sending it on to openHAB. I can see a TCP connection on the OpenHAB server and all will be well.

    After 3+ hours though the MQTT will still be receiving traffic from the sensor but will stop Ethernet communication and not send it onto OpenHAB. I suspect it's a overflow issue somewhere.. As soon as I reboot the MQTT gateway all is well again.

    I've seen a few people report issues similar to this and discussing a watchdog to reboot the MQTT if it fails. I'd rather not do this though, and better to find the underlaying issue. I also read that not all the ATMega support it.



  • @GuyP You are lucky my mqtt worked only 3 minutes 🙂


  • Code Contributor

    There have been some updates to the MQTT gateway recently. Are you using the latest version of mysensors?



  • Yup I'm on the latest library 1.4 and the latest version of the MQTT gateway that's posted.

    I'm in the process of designing a PCB and 3D printed case for my temp sensors. But having to reboot the gateway 3 times a day or more is really putting a damper on the project.


  • Code Contributor

    Do you still see the node messages being printed on the serial output of the gateway after the problem happens?



  • @celonunes said:

    Do you still see the node messages being printed on the serial output of the gateway after the problem happens?

    Yup... They continue to be received by the MQTT, so the issue appears to lie with the Ethernet part.


  • Code Contributor

    @GuyP @C.r.a.z.y.
    What happens if you ping the gateway IP?
    What happens if you restart openhab?
    Do you see any error messages in the openhab console?
    Could you try to connect to the gateway using another MQTT client and see if it responds? Like paho-MQTT, MQTT-fx, mosquito_sub



  • @celonunes said:

    @GuyP @C.r.a.z.y.
    What happens if you ping the gateway IP?

    GP> the MQTT stops responding pings

    What happens if you restart openhab?

    GP> nothing! 😞 it doesn't reconnect to the MQTT, in fact I can't telnet to the MQTT on port 1883 when it stops responding. I just have to reboot the MQTT to get it back on track.

    Do you see any error messages in the openhab console?

    GP> No nothing

    Could you try to connect to the gateway using another MQTT client and see if it responds? Like paho-MQTT, MQTT-fx, mosquito_sub

    GP> I'll give it a go but as I'm not able to telnet to it I don't think that will fair any better



  • @GuyP
    You can find MQTT and serial logs in the zip files openhab 1.6.2 and 1.7.0 logs-1.7.0.zip logs-1.6.2.zip



  • Have had this problem. And I am pretty sure it's a power issue. Running 5v lines on a 3.3v. Works flawless when I pipe the ethernet lines through a 74LVC245 or when I replaced my "base station" with a Uno+Shield.

    So try a level shifter or try running it with a 3.3v(3.7v) power supply to test ?(Remember to set the Arduino to 8Mhz so it's not overclocked)

    /Mads



  • @Magiske said:

    Have had this problem. And I am pretty sure it's a power issue. Running 5v lines on a 3.3v. Works flawless when I pipe the ethernet lines through a 74LVC245 or when I replaced my "base station" with a Uno+Shield.

    So try a level shifter or try running it with a 3.3v(3.7v) power supply to test ?(Remember to set the Arduino to 8Mhz so it's not overclocked)

    /Mads

    Interesting... I am running the whole circuit at 3.3Volts from a regulator (LE33CZ-TR) and I have a capacitor across the power lines as well. But it's clocked at 16Mhz..

    I don't have an 8Mhz crystal at hand to test with. I guess I could try the internal clock.



  • Just rechecking the specs, and the ENC28J60 should be 5v tolerant ? I ran my at 8mhz using the internal clock.
    So if it was the levelshift or 8mhz that "fixed" mine is a good question.
    Have to add that the Uno+Shield I am using now for the gateway is a W5100 simply because I ordered the W5100 shield when I started having the issues.
    (And this have also been running flawless from day one!)



  • I setup a continuous ping every 60 seconds to the MQTT gateway and when it stops working the pings stop as well. I also see that the "Green and Yellow" LEDS on the ENC28J60 are no longer lit.



  • Sounds like the same issue I had.. Some time it would run for a day other times it would stop responding after an hour or so.
    After reboot it was back right away. So it looks like the ENC28J60 chip is crashing.
    For me fideling with it and droppinglines to 3.3v running at 8mhz worked(And working was having it up and running for at least 3-4 days compared to "normal"). But I remember seeing others having ENC28J60 issues to ?? Maybe you should check to see if you are running latest ENC28J60 Lib ??



  • I've been doing a great deal of research on this and I think I might have found the issue.

    It seems that the radio is very sensitive to power changes... In my original circuit I had the Arduino and the radio running from the 3.3v line. Since I moved it to the 5Volts line the Gateway has stayed up for over a day now.

    So to sum up. Everything is now powered by 5Volts except the radio which is 3.3v from the LE33CZ-TR regulator. I have it powered direct from a USB power adapter as well rather than the Uno I did have.

    Once I have etched and tested I will post the circuit and PCB layout.



  • So to follow up.. yes indeed it seems that the power was my issue.
    IMG_0402.jpg

    the connector just above the Radio is the 5Volts and ground. The Single connection on the other side is serial TX so I can see the serial diagnostics.



  • @Magiske said:

    ENC28J60

    you connected ENC28J60 to 5v?



  • @AlinClaudiu said:

    @Magiske said:

    ENC28J60

    you connected ENC28J60 to 5v?

    Yes, the ENC28J60 and the Arduino are running on 5V. The only thing connected to the 3.3Volt regulator is the Radio, NRF24L01+



  • And now in it's 3D Printed case..

    IMG_0403.jpg



  • I was still seeing it stop working. However after further diagnostics it seems the issue lies with the ethernet side of things, the Arduino and Radio still continue to function.

    I tried to use the Arduino to reboot it's self, which didn't work, but also would not have rebooted the ethernet modules and thus would not have solved this.

    I've ended up building a 555 timer circuit which is triggered from the Arduino (Digital pin 3), the time fires and triggers a relay to remove power from everything for 2 seconds. Thus we end up with a complete reboot.

    Here's my circuit diagram, take 5Volts directly and then PWR is fed into my previous circuit that way when the relay clicks over all power to the Arduino and ethernet is removed.

    GWReset.png



  • Have you try to init the ENC all ~2 hours or by detecting the connection lost ? (brute force style, but better as powerdown all with a relais)



  • Well it's very simple to detect if the network has gone away.

    In the processEthernetMessage() function, it checks

    if (client) {

    if there's a client connection then the network must be working. If not then either the network has stopped or nothing is connected. Well OpenHAB always maintains a connection.

    So I've just added the

    } else {

    part to say the connection has gone. I also setup a timeout by counting the micros() since boot. so it's not constantly firing the reboot.

    I then raise digital pin 3 which is linked to my 555 timer circuit, which causes the power cycle.

    unsigned long resetCounter = micros();
    int rebootAborted = 0;
    
    void processEthernetMessages() {
    	char inputString[MQTT_MAX_PACKET_SIZE] = "";
    	uint8_t inputSize = 0;
    	EthernetClient client = server.available();
    	if (client) {
                    resetCounter=micros();
                    if (rebootAborted) {
                      Serial.println("Rebooting Aborted");
                      rebootAborted = 0;
                    }
    		while (client.available()) {
    			char inChar = client.read();
    			inputString[inputSize] = inChar;
    			inputSize++;
    		}
    #ifdef TCPDUMP
    		Serial.print("<<");
    		char buf[4];
    		for (uint8_t a=0; a<inputSize; a++) { sprintf(buf,"%02X ", (uint8_t)inputString[a]); Serial.print(buf); } Serial.println("");
    #endif
    		gw.processMQTTMessage(inputString, inputSize);
    	} else {
              if ((micros() - resetCounter) >= 100000000 && !client.connected() && !rebootAborted) {
                  rebootAborted = 1;
                  Serial.println("Rebooting");
                  pinMode(3,OUTPUT);
                  digitalWrite(3,HIGH);
                  delay(100);
                  digitalWrite(3,LOW);
              }
           }
    }
    


  • Yes - noooo 🙂 Right - you are able to detect the connection loss (good to know) but why do you not trigger the NE555, but to re-init the ENC28. Have you try if it response via SPI ?



  • It completely locks up.. I tried to talk with at the same point but it doesn't seem to listen to anything


Log in to reply
 

Suggested Topics

  • 3
  • 1
  • 5
  • 8
  • 2

1
Online

11.4k
Users

11.1k
Topics

112.7k
Posts