Problems with ENC28J60 losing connection/freezing (using UIPEthernet or etherShield)? READ THIS!
-
I'm using an Arduino Nano and still they are loosing connection regularly :-(
-
I'm using an Arduino Nano and still they are loosing connection regularly :-(
@Thomas-Ihmann About how often? ~4hrs every time? Does it come back by itself and if so what's the downtime?
-
After 2-8 hours. After that I have to repower. Before that it doesn't respond to Ping.
-
@ntruchsess after I added the timeout my sketch has been running for almost 24h without crashing. The timeout has been triggered 19 times or so, and the library has recovered nicely. I commited my changes to my fork of the repo:
https://github.com/frolswe/arduino_uip/tree/fix_errata12I don't really think this is a fix to the problem. The hang should not happen at all, but maybe we can use it as a workaround.
@Thomas-Ihmann please try this branch, hopefully your issue are the same as mine :)
-
@frol I will try it as soon as I am back at home, probably tomorrow evening. At the moment I am abroad without access to the Arduino..
-
@frol Thank you for the data. Bad thing TXABRT is not set when timeout occours. I have to give this workaround a second thought. I hope I can somehow avoid the 1sec busy wait. It's not about the 1 sek of having unresponsive ethernet, but it stalls any other processing during that time as well :-(
Did merge your pullrequest so others may test it easily.
- Norbert
-
@ntruchsess Thank you very much. I have uploaded the new version to the three Arduinos just now. So we just have to wait and see....
-
All Arduinos: Running since almost 12 hours, looking good so far :-) :+1: . I will keep you updated. Update: 18h and no lost connection Update 2: 24h and no lost connection, looking good
-
All Arduinos: Running since almost 12 hours, looking good so far :-) :+1: . I will keep you updated. Update: 18h and no lost connection Update 2: 24h and no lost connection, looking good
@Thomas-Ihmann that sounds great, thanks for reporting back :-)
I'm not so lucky. But I am pretty sure my current problems are unrelated to this bug. I haven't had time to debug it further though.
-
I was unlucky as well, after 25h I good a disconnection with one arduino, though that is the one which used to disconnect after a few hours (I am using the arduinos in connection with FHEM). Any ideas for further investigations ?
-
Hmm..., guess mine isn't perfect after all.
At 2.30 last night my sensors had stoped reporting. After repower the gateway and reloaded vera-luup it was up again. Both Vera and Gateway has been running continously for at least a week before this happend and it's not impossible it's related to my use of Datamine-plugin prior to this (due to the datamine-to-nas logging). I'll keep a good look at it from now on.. -
@frol, @Thomas-Ihmann, @m26872
My sensors are still working after I changed the code according to my own findings and solution from the first post. That means I don't have a while() loop which could hang the whole system. In my case I probably don't always get feedback if the packet transmitted OK or not and I probably will lose some packets too. But my gateway never hung itself either :).
I might have some hints for debugging you guys may or may not already have checked.
- Don't just reset the gateway, try other things first.
- Like pinging the gateway
- If you can ping the gateway, try Reload(ing) the Vera system which also will restart the MySensors-plugin and re-establish the connection. Check if this helps.
- Make sure the sensor(s) are still working, start the serial monitor on the gateway and check for incoming messages (might add some debugging code for that)
- DON'T activate the DEBUG-flag in MyConfig.h because this definitely will 'break' the gateway.
- In case you are using a pressure-sensor BMP085/180 make sure NOT to use the sample() function because this will/might make your sensor hang after 180minutes.
I didn't test Norberts fix myself, might do this when I feel for it... I have my system running perfectly now and are logging with DataMine so will rather not f... this up.
-
I discovered that if I power on my Vera and Enc-gateway without my nas connected (but it's still mounted to be used by datamine) the gateway will no longer respond to ping.
@MagKas I will try your while-loop solution... when I feel for it. :-) -
@frol, @Thomas-Ihmann, @m26872
My sensors are still working after I changed the code according to my own findings and solution from the first post. That means I don't have a while() loop which could hang the whole system. In my case I probably don't always get feedback if the packet transmitted OK or not and I probably will lose some packets too. But my gateway never hung itself either :).
I might have some hints for debugging you guys may or may not already have checked.
- Don't just reset the gateway, try other things first.
- Like pinging the gateway
- If you can ping the gateway, try Reload(ing) the Vera system which also will restart the MySensors-plugin and re-establish the connection. Check if this helps.
- Make sure the sensor(s) are still working, start the serial monitor on the gateway and check for incoming messages (might add some debugging code for that)
- DON'T activate the DEBUG-flag in MyConfig.h because this definitely will 'break' the gateway.
- In case you are using a pressure-sensor BMP085/180 make sure NOT to use the sample() function because this will/might make your sensor hang after 180minutes.
I didn't test Norberts fix myself, might do this when I feel for it... I have my system running perfectly now and are logging with DataMine so will rather not f... this up.
@MagKas great your sensors are working correctly.
My sketch to debug this is basically a loop that each second opens a tcp-connection to my server, sends a timestamp, and closes the connection, so not much else is involved. I think my current problem is that not all code-paths in the library calling sendPacket() correctly handles sendPacket() returning FALSE. The reason I say this is that I can see the memhandle used when calling sendPacket() increasing when FALSE is returned in some cases. Eventually the library runs out of available memhandles and my sketch stops working. (This is mostly speculations, as I haven't debugged / understood it enough yet).
Is the error return value in sendPacket() needed?
Even if the packet is sent correctly, it may be dropped by the network, thus the library still needs to handle lost packets. Knowing that a packet failed to send is just a special case of lost packets that I don't think really needs to be handled separately. @MagKas solution without the while-loop basically does this by returning OK even when the enc-chip transmit logic freezes and don't send the packet, and that appears to work. The enc-chip will always be reset before sending the next packet, so it won't hang in any strange state. (I can imagine one problem when sending large packets and re-entering sendPacket() before the previous packet has finished transmitting. I don't know if this actually happens in real life.)
Maybe @ntruchsess or anyone else can educate me as to why the return value is needed or why it is a bad idea to remove it?
-
For TCP you are right - the boolean return-value is not required as the library does free memory not before a packet is acknowledged or the connection is closed. I guess here I can remove some code from the lib. UDP does not retransmit and would loose packets just because of collisions. UDP should loose packets only when they time out or get dropped due to physical failure.
-
Hi Everyone,
I wanted to report my experience here as I thought you all would probably be able to understand it far better than I can, and maybe even find it interesting. I'm a newbie, so bare with me...
For the longest time, I was using the UIPEthernet library 'improperly'. I wanted to open a connection and keep it opened(indefinitely if necesssary), and "stream" my data to my program - a long string of text parameters, about 200 bytes. The easiest way to do this seemed to be just simply not disconnect. Using the basic examples as a starting point however, this results in the need to "ping" the Arduino every time you want data, resulting in a perpetual game of "ping/pong". The connection never actually closes, but using client = server.available(); to call your actions causes this I guess, since it only returns 1 if there is data waiting to be read. This is what all of the examples seemed to use, so I thought it was the right way. Adding fuel to the flames, I found a blog that seemed to call this a bug, and showed how to fix it with the stock Ethernet library, so I thought I was on the right track. I wanted to use UIPEthernet though, and the ping/pong scheme seemed workable, if kludgey, so I implemented it. I had it working for a long time - a couple of months - without any trouble overall. I mean, I could open a telnet session with PuTTY and have it sit there open for days and randomly "ping" the arduino and get a packet.
However, I just knew it didn't seem right, and like I said it was kludgey, made all of the other things I was having the Arduino do all the more difficult. I wanted to fix it, to the point of making a fool of myself asking about it on the Arduino forums.
So anyway, after studying some other examples for a while, and reading some other things, I figured out the "proper" way to do things - store your (up to 4 connections) in an array; that way you can check them for (dis)connections, send data to individual ones, etc. Elementary I assume, but like I said .. you gotta start somewhere. ;) Fantastic! No more pinging!
Except now I've lost the stability I had. Couldn't keep a connection for more than 12 hours it seemed, let alone the days and weeks I had before. Frustrated, I basically dropped the project for a couple of weeks. The other day I decided to pick it up in earnest again. I couldn't find anything wrong with my code after going over and over it, so I hit google again..
This thread came up, and seemed pertinent. I installed the fix version of the library, and what do you know - stability. :) Been almost 48 hours now. Nothing else has changed, so I'm going to call it fixed. I don't really know what to make of all that, but I find it interesting that I had such great stability using the first method overall, even though it was "wrong".
I want to thank everyone for their efforts, especially Norbert for the superb support of his library. It truly makes it a pleasure to use. I can only hope I can contribute to the general community on such a high level one day.
Edit: 72 hours+ and still going strong.
-
Using official Arduino Mega ADK via proper 3.3V level shifter. Powering ENC with 3.3V from the same Arduino.
ENC stopped responding to pings after just one night. The main program continues to work.
The main program polls 1-wire temperature sensor, displays the temperature to an SPI display and controls another device using standard digital outputs.
I strongly suspect that the problem is in Errata 14.
If I have some time (which is not likely to happen any time soon) I'll look into it and will try to fix it for EtherCard library as it is smaller for I will need all that stuff to fit into Pro Mini based on ATmega 328.
-
ha!
everyone have to care about the source. Have t measure the source current and have added a multimeter.This probably add very small voltage drop, but it was more than enough for ping to start loosing packets! It is very sensitive to the power source. After adding just 0.2V pings stop been lost.
-
ha!
everyone have to care about the source. Have t measure the source current and have added a multimeter.This probably add very small voltage drop, but it was more than enough for ping to start loosing packets! It is very sensitive to the power source. After adding just 0.2V pings stop been lost.