Hi all!
I just found a bug/flaw in the code for the ENC28J60 chip which is widely used in combination with Arduino cards.
The problem I encountered was that the MySensors plugin in my Vera Lite controller stopped uppdating values from sensors in a unexplainable way. First I thought it was because of the used NRF2401L radios which can become unstable in some cases. A restart of the gateway solved the problem temporarily.
But while debugging I noticed that the messages from the sensors were coming in in the gateway but not to the plugin. The gateway didn't hang but the ethernet part controlled by the ENC28J60 chip just stopped sending packets to the Vera Controller (MySensors plugin). Best way to test this is to ping the gateway from your PC. If you don't get a response you will probably have the same problem as I had.
This 'hanging' would sometimes occur after some minutes and sometimes it could take hours between 2 hangings.
After lots of hours of debugging and searching the internet I found that there are a lot of people experiencing the same problem and most of them simply changed to a W5100 based ethernetcard. Which also seem to have som problems in combination with the used NRF2401L radio, but that's a different story.
After simplifying the sketch in the gateway so that it is not a real gateway anymore but just sends a 'temperature message' for Node 1 to the gateway every second. This way I was certain that the problem was only in the ENC28J60 chip or code.
And now to the solution:
Most of the developers which have made or adjusted libraries for the ENC28J60 chip are more or less aware of the problems which comes with this chip and which are written down in the following document from Microchip:
http://ww1.microchip.com/downloads/en/DeviceDoc/80349c.pdf
It is the ENC28J60 Silicon Errata and Data Sheet Clarification and the above problem is caused by point 12 (and maybe even point 14?).
Looking at the code for the sendPacket() function (in Enc28J60Network.cpp) in the UIPEthernet library you will find some lines of code which ought to take care of this problem but this fix is implemented wrong:
// TX start
writeRegPair(ETXSTL, start);
// Set the TXND pointer to correspond to the packet size given
writeRegPair(ETXNDL, end);
// send the contents of the transmit buffer onto the network
writeOp(ENC28J60_BIT_FIELD_SET, ECON1, ECON1_TXRTS);
// Reset the transmit logic problem. See Rev. B4 Silicon Errata point 12.
if( (readReg(EIR) & EIR_TXERIF) )
{
writeOp(ENC28J60_BIT_FIELD_CLR, ECON1, ECON1_TXR**TS**);
}
In the errata document it says that you have to "reset the internal transmit logic" BEFORE setting the TXRTS flag. In the code above you will see that the reset code comes AFTER the code for setting this flag. The second problem is that it also is the wrong flag/bit that is used for this, namely the ECON1_TXRTS bit in stead of the supposed reset bit ECON1_TXRST. It's just a mix-up of the last two letters S and T but therefore it doesn't work at all.
Because all of the libraries available for the ENC28J60 are based on eachother the faulty fix has been copied all the time. After looking for just this fix I found that there are a few different versions of the fix and the ones from tuxgraphics.org, EtherCard and NanodeUIP libraries are the same and the best ones.
The only problem I encountered when implementing their fix is that a few times per day my whole Arduino hangs (deadlock). I also have the same problem with my Weather station based on the tuxgraphics board. There I 'solved' this by enabling the watchdog. I removed their while() loop and the problem of the hanging Arduino disappeared.
So, after this VERY long explanation my solution to everyone experiencing problems with this ENC-chip is to change or add the following lines to the enc28j60.cpp or Enc28J60Network.cpp file for the function sendPacket(), like this:
// Check no transmit in progress
// while (readOp(ENC28J60_READ_CTRL_REG, ECON1) & ECON1_TXRTS) // Might lead to deadlocks and not explicitly advised by Microchip Errata point 12 so commented out this, MagKas 2014-10-25
// {
// Reset the transmit logic problem. See Rev. B4 Silicon Errata point 12.
if( (readReg(EIR) & EIR_TXERIF) )
{
writeOp(ENC28J60_BIT_FIELD_SET, ECON1, ECON1_TXRST);
writeOp(ENC28J60_BIT_FIELD_CLR, ECON1, ECON1_TXRST);
writeOp(ENC28J60_BIT_FIELD_CLR, EIR, EIR_TXERIF); // Might be overkill but advised by Microchip Errata point 12, //MagKas 2014-10-25
}
// }
This code has to come BEFORE setting the ECON1_TXRTS flag according to Microchip. And don't forget to remove all other existing code in sendPacket() function that tried to fix this problem.
Please let me know if this solved your problem!
I already contacted Guido Socher (Tuxgraphics.com), Jean-Claude Wippler (EtherCard), Norbert Truchsess (UIPEthernet), Pascal Stang (AVRLib), Stephen Early (NanodeUIP) and Jonathan Oxer (etherShield) about this and asked them to verify my findings and if necessary to update their libraries.
Of the above libraries, UIPEthernet and etherShield are the ones that have the wrong implementation of this fix. The others have implemented correct but with the While() statement which accordning to me could lead to deadlocks.
With best regards,
Magnus Kasper, Sweden