Skip to content
  • MySensors
  • OpenHardware.io
  • Categories
  • Recent
  • Tags
  • Popular
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo
  1. Home
  2. Troubleshooting
  3. Problems with ENC28J60 losing connection/freezing (using UIPEthernet or etherShield)? READ THIS!

Problems with ENC28J60 losing connection/freezing (using UIPEthernet or etherShield)? READ THIS!

Scheduled Pinned Locked Moved Troubleshooting
enc28j60
50 Posts 16 Posters 45.3k Views 9 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M MagKas

    Hi all!

    I just found a bug/flaw in the code for the ENC28J60 chip which is widely used in combination with Arduino cards.

    The problem I encountered was that the MySensors plugin in my Vera Lite controller stopped uppdating values from sensors in a unexplainable way. First I thought it was because of the used NRF2401L radios which can become unstable in some cases. A restart of the gateway solved the problem temporarily.

    But while debugging I noticed that the messages from the sensors were coming in in the gateway but not to the plugin. The gateway didn't hang but the ethernet part controlled by the ENC28J60 chip just stopped sending packets to the Vera Controller (MySensors plugin). Best way to test this is to ping the gateway from your PC. If you don't get a response you will probably have the same problem as I had.

    This 'hanging' would sometimes occur after some minutes and sometimes it could take hours between 2 hangings.

    After lots of hours of debugging and searching the internet I found that there are a lot of people experiencing the same problem and most of them simply changed to a W5100 based ethernetcard. Which also seem to have som problems in combination with the used NRF2401L radio, but that's a different story.

    After simplifying the sketch in the gateway so that it is not a real gateway anymore but just sends a 'temperature message' for Node 1 to the gateway every second. This way I was certain that the problem was only in the ENC28J60 chip or code.

    And now to the solution:
    Most of the developers which have made or adjusted libraries for the ENC28J60 chip are more or less aware of the problems which comes with this chip and which are written down in the following document from Microchip:
    http://ww1.microchip.com/downloads/en/DeviceDoc/80349c.pdf
    It is the ENC28J60 Silicon Errata and Data Sheet Clarification and the above problem is caused by point 12 (and maybe even point 14?).

    Looking at the code for the sendPacket() function (in Enc28J60Network.cpp) in the UIPEthernet library you will find some lines of code which ought to take care of this problem but this fix is implemented wrong:

    // TX start
    writeRegPair(ETXSTL, start);
    
    // Set the TXND pointer to correspond to the packet size given
    writeRegPair(ETXNDL, end);
    
    // send the contents of the transmit buffer onto the network
    writeOp(ENC28J60_BIT_FIELD_SET, ECON1, ECON1_TXRTS);
    
    // Reset the transmit logic problem. See Rev. B4 Silicon Errata point 12.
    if( (readReg(EIR) & EIR_TXERIF) )
      {
        writeOp(ENC28J60_BIT_FIELD_CLR, ECON1, ECON1_TXR**TS**);
      }
    

    In the errata document it says that you have to "reset the internal transmit logic" BEFORE setting the TXRTS flag. In the code above you will see that the reset code comes AFTER the code for setting this flag. The second problem is that it also is the wrong flag/bit that is used for this, namely the ECON1_TXRTS bit in stead of the supposed reset bit ECON1_TXRST. It's just a mix-up of the last two letters S and T but therefore it doesn't work at all.

    Because all of the libraries available for the ENC28J60 are based on eachother the faulty fix has been copied all the time. After looking for just this fix I found that there are a few different versions of the fix and the ones from tuxgraphics.org, EtherCard and NanodeUIP libraries are the same and the best ones.
    The only problem I encountered when implementing their fix is that a few times per day my whole Arduino hangs (deadlock). I also have the same problem with my Weather station based on the tuxgraphics board. There I 'solved' this by enabling the watchdog. I removed their while() loop and the problem of the hanging Arduino disappeared.

    So, after this VERY long explanation my solution to everyone experiencing problems with this ENC-chip is to change or add the following lines to the enc28j60.cpp or Enc28J60Network.cpp file for the function sendPacket(), like this:

    // Check no transmit in progress
    //  while (readOp(ENC28J60_READ_CTRL_REG, ECON1) & ECON1_TXRTS) // Might lead to deadlocks and not explicitly advised by Microchip Errata point 12 so commented out this, MagKas 2014-10-25
    //  {
    // Reset the transmit logic problem. See Rev. B4 Silicon Errata point 12.
             if( (readReg(EIR) & EIR_TXERIF) )
             {
                writeOp(ENC28J60_BIT_FIELD_SET, ECON1, ECON1_TXRST);
                writeOp(ENC28J60_BIT_FIELD_CLR, ECON1, ECON1_TXRST);
                writeOp(ENC28J60_BIT_FIELD_CLR, EIR, EIR_TXERIF); // Might be overkill but advised by Microchip Errata point 12, //MagKas 2014-10-25
             }
    //   }
    

    This code has to come BEFORE setting the ECON1_TXRTS flag according to Microchip. And don't forget to remove all other existing code in sendPacket() function that tried to fix this problem.

    Please let me know if this solved your problem!

    I already contacted Guido Socher (Tuxgraphics.com), Jean-Claude Wippler (EtherCard), Norbert Truchsess (UIPEthernet), Pascal Stang (AVRLib), Stephen Early (NanodeUIP) and Jonathan Oxer (etherShield) about this and asked them to verify my findings and if necessary to update their libraries.

    Of the above libraries, UIPEthernet and etherShield are the ones that have the wrong implementation of this fix. The others have implemented correct but with the While() statement which accordning to me could lead to deadlocks.

    With best regards,
    Magnus Kasper, Sweden

    N Offline
    N Offline
    ntruchsess
    Plugin Developer
    wrote on last edited by
    #4

    @MagKas

    Thank you for this very detailed analysis. That is really a great finding!

    • Norbert
    1 Reply Last reply
    0
    • N Offline
      N Offline
      ntruchsess
      Plugin Developer
      wrote on last edited by
      #5

      @MagKas I did investigate a bit - the implementation in UIPEthernet was based on an older Version of Silicon Errata. Rev B7 has clarified this issue in more detail, in fact Issue 13 contains pseudocode that also should solve your 'deadlock' on 'while (readOp(ENC28J60_READ_CTRL_REG, ECON1) & ECON1_TXRTS)'
      The issue is that eventually TXRTS is not (never) cleared by the transmission-logic after package transmission so the while will never exit. As a workaround the code should wait for either TXIF or TXERIF to be set.

      Here is my code that I just commited to UIPEthernet (https://github.com/ntruchsess/arduino_uip/blob/fix_errata12/utility/Enc28J60Network.cpp#L233):

      // Reset the transmit logic problem. See Rev. B7 Silicon Errata issues 12 and 13
      writeOp(ENC28J60_BIT_FIELD_SET, ECON1, ECON1_TXRST);
      writeOp(ENC28J60_BIT_FIELD_CLR, ECON1, ECON1_TXRST);
      writeOp(ENC28J60_BIT_FIELD_CLR, EIR, EIR_TXERIF | EIR_TXIF);
      // send the contents of the transmit buffer onto the network
      writeOp(ENC28J60_BIT_FIELD_SET, ECON1, ECON1_TXRTS);
      // wait for transmission to complete or fail
      while (((eir = readReg(EIR)) & (EIR_TXIF | EIR_TXERIF)) == 0);
      writeOp(ENC28J60_BIT_FIELD_CLR, ECON1, ECON1_TXRTS);

      The retransmission-logic that is described in Issue 13 is implemented outside of sendPacket-method. On transmission-error it returns false, the packet will not be freed in UIPEthernet::network_send() and transmission will be reattempted on next call to UIPEthernet.tick().

      I also added a fix that allocated the 7 bytes of transmit-status-vector to prevent corruption of other outstanding packets. Code is in branch 'fix_errata12' (https://github.com/ntruchsess/arduino_uip/tree/fix_errata12). Maybe you'd like to test before I release?

      • Norbert
      1 Reply Last reply
      0
      • F Offline
        F Offline
        frol
        wrote on last edited by frol
        #6

        This looks great, thanks Magnus.

        I am currently testing one of my projects using the fix_errata12 branch of UIPEthernet. It has been running for more than an hour with no problems, while it previously froze very often.

        Hopefully this solves my enc28 problems :-)
        Thanks
        /Fredrik

        EDIT:
        Froze after three hours or so. Still responds to ping, but not tcp connections. :(

        N 1 Reply Last reply
        0
        • F frol

          This looks great, thanks Magnus.

          I am currently testing one of my projects using the fix_errata12 branch of UIPEthernet. It has been running for more than an hour with no problems, while it previously froze very often.

          Hopefully this solves my enc28 problems :-)
          Thanks
          /Fredrik

          EDIT:
          Froze after three hours or so. Still responds to ping, but not tcp connections. :(

          N Offline
          N Offline
          ntruchsess
          Plugin Developer
          wrote on last edited by
          #7

          @frol thank you the feedback. If it still responds to ping it indicates enc28j60 transmit-logic does not stall and the changes in fix_errata12 branch seem to work. I think the freeze must have a reason that is unrelated to this low-level patch. What sketch are you using for testing?

          F 1 Reply Last reply
          0
          • N ntruchsess

            @frol thank you the feedback. If it still responds to ping it indicates enc28j60 transmit-logic does not stall and the changes in fix_errata12 branch seem to work. I think the freeze must have a reason that is unrelated to this low-level patch. What sketch are you using for testing?

            F Offline
            F Offline
            frol
            wrote on last edited by
            #8

            @ntruchsess (Sorry for the delay..)
            Thats kind of what I expected, didn't seem right that it wasn't completely hung. I tried to reproduce it using a simpler sketch, and couldn't get to the "responds to pings" state. But several times my new simple sketch has hung completely in sendpacket() waiting for the send to complete. To help debug it, I set the ENC28J60DEBUG define and added a timeout in sendpacket() after 10 seconds. To me this looks like what is described in errata12, but the workaround implemented in the branch looks correct, so I dont know :(

            My sketch is available here. Do you have any tips what I could try to understand this better?
            https://github.com/frolswe/uip_debug

            N 1 Reply Last reply
            0
            • F frol

              @ntruchsess (Sorry for the delay..)
              Thats kind of what I expected, didn't seem right that it wasn't completely hung. I tried to reproduce it using a simpler sketch, and couldn't get to the "responds to pings" state. But several times my new simple sketch has hung completely in sendpacket() waiting for the send to complete. To help debug it, I set the ENC28J60DEBUG define and added a timeout in sendpacket() after 10 seconds. To me this looks like what is described in errata12, but the workaround implemented in the branch looks correct, so I dont know :(

              My sketch is available here. Do you have any tips what I could try to understand this better?
              https://github.com/frolswe/uip_debug

              N Offline
              N Offline
              ntruchsess
              Plugin Developer
              wrote on last edited by
              #9

              @frol Thank you for the data.
              Well, it looks as if the workaround for errata12/13 doesn't work as described. TXERIF is never set in the 4 uip_debug-logs :-(, hence the timeout (BTW: it would make sense to return false in case timeout occurs so a packet might be retransmitted in that case. As TXERIF is not set sendPackage returns true in case of error). Strange thing is that even if a packet is not transmitted (and that is not detected) the next outgoing packet will reset transmitlogic anyway - but this doesn't seem to re-enable transmission, does it? (Any output truncated after the timeout?)
              Maybe one should poll ESTAT for TXABRT instead of TXERIF (or both at the same time)?

              • Norbert
              F 2 Replies Last reply
              0
              • T Offline
                T Offline
                Thomas Ihmann
                wrote on last edited by
                #10

                I tried it with a sketch. After approx. 4 hours the arduino disappeared. No ping, no TCP communication. There has to be a different problem...

                1 Reply Last reply
                0
                • N ntruchsess

                  @frol Thank you for the data.
                  Well, it looks as if the workaround for errata12/13 doesn't work as described. TXERIF is never set in the 4 uip_debug-logs :-(, hence the timeout (BTW: it would make sense to return false in case timeout occurs so a packet might be retransmitted in that case. As TXERIF is not set sendPackage returns true in case of error). Strange thing is that even if a packet is not transmitted (and that is not detected) the next outgoing packet will reset transmitlogic anyway - but this doesn't seem to re-enable transmission, does it? (Any output truncated after the timeout?)
                  Maybe one should poll ESTAT for TXABRT instead of TXERIF (or both at the same time)?

                  • Norbert
                  F Offline
                  F Offline
                  frol
                  wrote on last edited by
                  #11

                  @ntruchsess I'll try checking ESTAT as well (probably tonight). There is no truncated output since I deliberately stopped with a while(1); whenever the timeout happened. I can try to return false instead and see if everything recovers.
                  /Fredrik

                  1 Reply Last reply
                  1
                  • m26872M Offline
                    m26872M Offline
                    m26872
                    Hardware Contributor
                    wrote on last edited by m26872
                    #12

                    I've been runnning the fix_errata12 since the day after release without any freeze or lost connection. (Maybe one or two restarts for other reasons.) I only have some inclusion mode issues but probably not related to this.. (Using VeraLiteUI5).

                    F 1 Reply Last reply
                    1
                    • m26872M m26872

                      I've been runnning the fix_errata12 since the day after release without any freeze or lost connection. (Maybe one or two restarts for other reasons.) I only have some inclusion mode issues but probably not related to this.. (Using VeraLiteUI5).

                      F Offline
                      F Offline
                      frol
                      wrote on last edited by
                      #13

                      @m26872 that is good to hear, it is possible to get enc28 stable :) Maybe I should try with a different switch / network environment.
                      Did you have the problem with hangs prior to the fix_errata12 branch?

                      1 Reply Last reply
                      0
                      • m26872M Offline
                        m26872M Offline
                        m26872
                        Hardware Contributor
                        wrote on last edited by m26872
                        #14

                        Well, to me it was actually @ntruchsess latest UIP-release prior to fix_errata12 that did the real fix. I had stoped using my Enc-Uip-gw a few weeks before due to increasingly frequent hungups (from weekly/daily to hourly). (I wrongly thought then that it was a memory-issue and planed to try without bootloader.). Unfortunately I don't remember my uptime before fix_errata. I just know it was more than enough to conclude that (one) problem was solved.

                        I'm using this Nano and this Enc28j60. I have 10 nodes with 1-3 sensors. Mean update period ~2min. Datamine 40 channels to NAS.

                        F 1 Reply Last reply
                        0
                        • m26872M m26872

                          Well, to me it was actually @ntruchsess latest UIP-release prior to fix_errata12 that did the real fix. I had stoped using my Enc-Uip-gw a few weeks before due to increasingly frequent hungups (from weekly/daily to hourly). (I wrongly thought then that it was a memory-issue and planed to try without bootloader.). Unfortunately I don't remember my uptime before fix_errata. I just know it was more than enough to conclude that (one) problem was solved.

                          I'm using this Nano and this Enc28j60. I have 10 nodes with 1-3 sensors. Mean update period ~2min. Datamine 40 channels to NAS.

                          F Offline
                          F Offline
                          frol
                          wrote on last edited by
                          #15

                          @m26872 that sounds interesting, what kind of sensors do you use? Also, how do you power the enc28? Im using a 3.3V Mini Pro and this enc28. Vraw on the Arduino and Vcc on the enc28 breakout are connected to 5V from the USB adapter. Currently my sensors (DS18B20) are connected directly to my computer for logging, but my plan is to connect them via this Arduino instead.

                          m26872M 1 Reply Last reply
                          0
                          • T Offline
                            T Offline
                            Thomas Ihmann
                            wrote on last edited by
                            #16

                            Shouldn't the ENC28J60 be connected to 3.3V. I know it is 5V resistant, but still. May be it makes a difference ?
                            @frol Have you been able to run some tests with ESTAT ?

                            1 Reply Last reply
                            0
                            • F frol

                              @m26872 that sounds interesting, what kind of sensors do you use? Also, how do you power the enc28? Im using a 3.3V Mini Pro and this enc28. Vraw on the Arduino and Vcc on the enc28 breakout are connected to 5V from the USB adapter. Currently my sensors (DS18B20) are connected directly to my computer for logging, but my plan is to connect them via this Arduino instead.

                              m26872M Offline
                              m26872M Offline
                              m26872
                              Hardware Contributor
                              wrote on last edited by
                              #17

                              @frol Then maybe you should try a Nano instead. I tried really hard to get i working with my Arduino Pro Minis (3.3v, 5V or both, don't remember) at first, But with Nano it worked right away. I know it sounds silly....and I can't explain why, maybe power issue? Now the Nano is powered from "wall wart" to its usb and the enc-module from Nanos 5V-pin.
                              My nodes look like this with some variations. Sensors used are mostly DS18B20 (one or more per node), DHT22, BMP180, and digital switch.

                              @Thomas-Ihmann My Enc-module (linked above) has "5V" printed on the pin and thats what it wants. I've tried 3.3V but then the power-led and all is dead.

                              1 Reply Last reply
                              0
                              • N ntruchsess

                                @frol Thank you for the data.
                                Well, it looks as if the workaround for errata12/13 doesn't work as described. TXERIF is never set in the 4 uip_debug-logs :-(, hence the timeout (BTW: it would make sense to return false in case timeout occurs so a packet might be retransmitted in that case. As TXERIF is not set sendPackage returns true in case of error). Strange thing is that even if a packet is not transmitted (and that is not detected) the next outgoing packet will reset transmitlogic anyway - but this doesn't seem to re-enable transmission, does it? (Any output truncated after the timeout?)
                                Maybe one should poll ESTAT for TXABRT instead of TXERIF (or both at the same time)?

                                • Norbert
                                F Offline
                                F Offline
                                frol
                                wrote on last edited by
                                #18

                                @ntruchsess I now added polling of ESTAT also. When the timeout happens nothing different is seen (no TXABRT). I do see some TXERIF in my logs, seemingly caused by collisions, and it looks like they are handled correctly. I pushed new logs and diff to github, and I would be very grateful for any more tips.
                                Currently I am testing to return false when the timeout happens as you suggested. I'll let you know the results tomorrow.

                                @Thomas-Ihmann the enc28 chip can only handle 3.3v, but the breakout board I use have a dedicated 3.3v regulator.

                                @m26872 maybe I should try a different Arduino.. But I still believe it these should work, so I won't give up yet. I also suspected power issues before, but now I have one 3.3v regulator for the Arduino, and one for the enc28, so I don't think that should not be an issue anymore.
                                Tjusig sensor node, väldigt lik den design jag tänkt mig på mina sådana (till och med samma låda etc..) :-)

                                1 Reply Last reply
                                0
                                • hekH Offline
                                  hekH Offline
                                  hek
                                  Admin
                                  wrote on last edited by
                                  #19

                                  aja baja.. ingen svenska :)

                                  1 Reply Last reply
                                  1
                                  • T Offline
                                    T Offline
                                    Thomas Ihmann
                                    wrote on last edited by
                                    #20

                                    I'm using an Arduino Nano and still they are loosing connection regularly :-(

                                    m26872M 1 Reply Last reply
                                    0
                                    • T Thomas Ihmann

                                      I'm using an Arduino Nano and still they are loosing connection regularly :-(

                                      m26872M Offline
                                      m26872M Offline
                                      m26872
                                      Hardware Contributor
                                      wrote on last edited by
                                      #21

                                      @Thomas-Ihmann About how often? ~4hrs every time? Does it come back by itself and if so what's the downtime?

                                      1 Reply Last reply
                                      0
                                      • T Offline
                                        T Offline
                                        Thomas Ihmann
                                        wrote on last edited by
                                        #22

                                        After 2-8 hours. After that I have to repower. Before that it doesn't respond to Ping.

                                        1 Reply Last reply
                                        0
                                        • F Offline
                                          F Offline
                                          frol
                                          wrote on last edited by
                                          #23

                                          @ntruchsess after I added the timeout my sketch has been running for almost 24h without crashing. The timeout has been triggered 19 times or so, and the library has recovered nicely. I commited my changes to my fork of the repo:
                                          https://github.com/frolswe/arduino_uip/tree/fix_errata12

                                          I don't really think this is a fix to the problem. The hang should not happen at all, but maybe we can use it as a workaround.

                                          @Thomas-Ihmann please try this branch, hopefully your issue are the same as mine :)

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          17

                                          Online

                                          11.7k

                                          Users

                                          11.2k

                                          Topics

                                          113.1k

                                          Posts


                                          Copyright 2025 TBD   |   Forum Guidelines   |   Privacy Policy   |   Terms of Service
                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • MySensors
                                          • OpenHardware.io
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular