Problems with ENC28J60 losing connection/freezing (using UIPEthernet or etherShield)? READ THIS!
-
This looks great, thanks Magnus.
I am currently testing one of my projects using the fix_errata12 branch of UIPEthernet. It has been running for more than an hour with no problems, while it previously froze very often.
Hopefully this solves my enc28 problems :-)
Thanks
/FredrikEDIT:
Froze after three hours or so. Still responds to ping, but not tcp connections. :( -
This looks great, thanks Magnus.
I am currently testing one of my projects using the fix_errata12 branch of UIPEthernet. It has been running for more than an hour with no problems, while it previously froze very often.
Hopefully this solves my enc28 problems :-)
Thanks
/FredrikEDIT:
Froze after three hours or so. Still responds to ping, but not tcp connections. :(@frol thank you the feedback. If it still responds to ping it indicates enc28j60 transmit-logic does not stall and the changes in fix_errata12 branch seem to work. I think the freeze must have a reason that is unrelated to this low-level patch. What sketch are you using for testing?
-
@frol thank you the feedback. If it still responds to ping it indicates enc28j60 transmit-logic does not stall and the changes in fix_errata12 branch seem to work. I think the freeze must have a reason that is unrelated to this low-level patch. What sketch are you using for testing?
@ntruchsess (Sorry for the delay..)
Thats kind of what I expected, didn't seem right that it wasn't completely hung. I tried to reproduce it using a simpler sketch, and couldn't get to the "responds to pings" state. But several times my new simple sketch has hung completely in sendpacket() waiting for the send to complete. To help debug it, I set the ENC28J60DEBUG define and added a timeout in sendpacket() after 10 seconds. To me this looks like what is described in errata12, but the workaround implemented in the branch looks correct, so I dont know :(My sketch is available here. Do you have any tips what I could try to understand this better?
https://github.com/frolswe/uip_debug -
@ntruchsess (Sorry for the delay..)
Thats kind of what I expected, didn't seem right that it wasn't completely hung. I tried to reproduce it using a simpler sketch, and couldn't get to the "responds to pings" state. But several times my new simple sketch has hung completely in sendpacket() waiting for the send to complete. To help debug it, I set the ENC28J60DEBUG define and added a timeout in sendpacket() after 10 seconds. To me this looks like what is described in errata12, but the workaround implemented in the branch looks correct, so I dont know :(My sketch is available here. Do you have any tips what I could try to understand this better?
https://github.com/frolswe/uip_debug@frol Thank you for the data.
Well, it looks as if the workaround for errata12/13 doesn't work as described. TXERIF is never set in the 4 uip_debug-logs :-(, hence the timeout (BTW: it would make sense to return false in case timeout occurs so a packet might be retransmitted in that case. As TXERIF is not set sendPackage returns true in case of error). Strange thing is that even if a packet is not transmitted (and that is not detected) the next outgoing packet will reset transmitlogic anyway - but this doesn't seem to re-enable transmission, does it? (Any output truncated after the timeout?)
Maybe one should poll ESTAT for TXABRT instead of TXERIF (or both at the same time)?- Norbert
-
I tried it with a sketch. After approx. 4 hours the arduino disappeared. No ping, no TCP communication. There has to be a different problem...
-
@frol Thank you for the data.
Well, it looks as if the workaround for errata12/13 doesn't work as described. TXERIF is never set in the 4 uip_debug-logs :-(, hence the timeout (BTW: it would make sense to return false in case timeout occurs so a packet might be retransmitted in that case. As TXERIF is not set sendPackage returns true in case of error). Strange thing is that even if a packet is not transmitted (and that is not detected) the next outgoing packet will reset transmitlogic anyway - but this doesn't seem to re-enable transmission, does it? (Any output truncated after the timeout?)
Maybe one should poll ESTAT for TXABRT instead of TXERIF (or both at the same time)?- Norbert
@ntruchsess I'll try checking ESTAT as well (probably tonight). There is no truncated output since I deliberately stopped with a while(1); whenever the timeout happened. I can try to return false instead and see if everything recovers.
/Fredrik -
I've been runnning the fix_errata12 since the day after release without any freeze or lost connection. (Maybe one or two restarts for other reasons.) I only have some inclusion mode issues but probably not related to this.. (Using VeraLiteUI5).
-
I've been runnning the fix_errata12 since the day after release without any freeze or lost connection. (Maybe one or two restarts for other reasons.) I only have some inclusion mode issues but probably not related to this.. (Using VeraLiteUI5).
-
Well, to me it was actually @ntruchsess latest UIP-release prior to fix_errata12 that did the real fix. I had stoped using my Enc-Uip-gw a few weeks before due to increasingly frequent hungups (from weekly/daily to hourly). (I wrongly thought then that it was a memory-issue and planed to try without bootloader.). Unfortunately I don't remember my uptime before fix_errata. I just know it was more than enough to conclude that (one) problem was solved.
I'm using this Nano and this Enc28j60. I have 10 nodes with 1-3 sensors. Mean update period ~2min. Datamine 40 channels to NAS.
-
Well, to me it was actually @ntruchsess latest UIP-release prior to fix_errata12 that did the real fix. I had stoped using my Enc-Uip-gw a few weeks before due to increasingly frequent hungups (from weekly/daily to hourly). (I wrongly thought then that it was a memory-issue and planed to try without bootloader.). Unfortunately I don't remember my uptime before fix_errata. I just know it was more than enough to conclude that (one) problem was solved.
I'm using this Nano and this Enc28j60. I have 10 nodes with 1-3 sensors. Mean update period ~2min. Datamine 40 channels to NAS.
@m26872 that sounds interesting, what kind of sensors do you use? Also, how do you power the enc28? Im using a 3.3V Mini Pro and this enc28. Vraw on the Arduino and Vcc on the enc28 breakout are connected to 5V from the USB adapter. Currently my sensors (DS18B20) are connected directly to my computer for logging, but my plan is to connect them via this Arduino instead.
-
Shouldn't the ENC28J60 be connected to 3.3V. I know it is 5V resistant, but still. May be it makes a difference ?
@frol Have you been able to run some tests with ESTAT ? -
@m26872 that sounds interesting, what kind of sensors do you use? Also, how do you power the enc28? Im using a 3.3V Mini Pro and this enc28. Vraw on the Arduino and Vcc on the enc28 breakout are connected to 5V from the USB adapter. Currently my sensors (DS18B20) are connected directly to my computer for logging, but my plan is to connect them via this Arduino instead.
@frol Then maybe you should try a Nano instead. I tried really hard to get i working with my Arduino Pro Minis (3.3v, 5V or both, don't remember) at first, But with Nano it worked right away. I know it sounds silly....and I can't explain why, maybe power issue? Now the Nano is powered from "wall wart" to its usb and the enc-module from Nanos 5V-pin.
My nodes look like this with some variations. Sensors used are mostly DS18B20 (one or more per node), DHT22, BMP180, and digital switch.@Thomas-Ihmann My Enc-module (linked above) has "5V" printed on the pin and thats what it wants. I've tried 3.3V but then the power-led and all is dead.
-
@frol Thank you for the data.
Well, it looks as if the workaround for errata12/13 doesn't work as described. TXERIF is never set in the 4 uip_debug-logs :-(, hence the timeout (BTW: it would make sense to return false in case timeout occurs so a packet might be retransmitted in that case. As TXERIF is not set sendPackage returns true in case of error). Strange thing is that even if a packet is not transmitted (and that is not detected) the next outgoing packet will reset transmitlogic anyway - but this doesn't seem to re-enable transmission, does it? (Any output truncated after the timeout?)
Maybe one should poll ESTAT for TXABRT instead of TXERIF (or both at the same time)?- Norbert
@ntruchsess I now added polling of ESTAT also. When the timeout happens nothing different is seen (no TXABRT). I do see some TXERIF in my logs, seemingly caused by collisions, and it looks like they are handled correctly. I pushed new logs and diff to github, and I would be very grateful for any more tips.
Currently I am testing to return false when the timeout happens as you suggested. I'll let you know the results tomorrow.@Thomas-Ihmann the enc28 chip can only handle 3.3v, but the breakout board I use have a dedicated 3.3v regulator.
@m26872 maybe I should try a different Arduino.. But I still believe it these should work, so I won't give up yet. I also suspected power issues before, but now I have one 3.3v regulator for the Arduino, and one for the enc28, so I don't think that should not be an issue anymore.
Tjusig sensor node, väldigt lik den design jag tänkt mig på mina sådana (till och med samma låda etc..) :-) -
I'm using an Arduino Nano and still they are loosing connection regularly :-(
-
I'm using an Arduino Nano and still they are loosing connection regularly :-(
@Thomas-Ihmann About how often? ~4hrs every time? Does it come back by itself and if so what's the downtime?
-
After 2-8 hours. After that I have to repower. Before that it doesn't respond to Ping.
-
@ntruchsess after I added the timeout my sketch has been running for almost 24h without crashing. The timeout has been triggered 19 times or so, and the library has recovered nicely. I commited my changes to my fork of the repo:
https://github.com/frolswe/arduino_uip/tree/fix_errata12I don't really think this is a fix to the problem. The hang should not happen at all, but maybe we can use it as a workaround.
@Thomas-Ihmann please try this branch, hopefully your issue are the same as mine :)
-
@frol I will try it as soon as I am back at home, probably tomorrow evening. At the moment I am abroad without access to the Arduino..
-
@frol Thank you for the data. Bad thing TXABRT is not set when timeout occours. I have to give this workaround a second thought. I hope I can somehow avoid the 1sec busy wait. It's not about the 1 sek of having unresponsive ethernet, but it stalls any other processing during that time as well :-(
Did merge your pullrequest so others may test it easily.
- Norbert