Nrf52 gateway crashes



  • Hey There,
    Something annoying is happening;

    My setup:
    a nrf52832+ ESP8266 as a gateway.

    Several nodes, all have a nrf52832. most run the same sketch, everything is fine with those.
    However, there is one node that is throwing a curveball once in a while. I need to cut the power off the gateway to reset. (the nrf needs to get powered down I assume)

    I've been working on it for a couple of days now but haven't found the answer yet, maybe some of you will.

    The last2 lines of the 4 instances the gateway crashes;

    0;255;3;0;22;2665656
    0;255;3;0;9;2674632 TSF:MSG:READ,114-114-0,s=0,c=3,t=16,pt=0,l=0,sg=1:
    
    216;0;1;0;1;67.6
    0;255;3;0;9;10584601 TSF:MSG:READ,114-114-0,s=0,c=3,t=16,pt=0,l=0,sg=1:
    
    216;0;1;0;1;67.0
    0;255;3;0;9;2311492 TSF:MSG:READ,114-114-0,s=0,c=3,t=16,pt=0,l=0,sg=1:
    
    216;0;1;0;1;59.4
    0;255;3;0;9;11077691 TSF:MSG:READ,114-114-0,s=0,c=3,t=16,pt=0,l=0,sg=1:
    

    Its node 114 that always does this, but sometimes all goes well. I have some good readings from it, so it seems that not every entry makes my gateway crash.

    the sketch for this node is similar to: https://www.mysensors.org/build/temp and im using: MY_SECURITY_SIMPLE_PASSWD since a couple of weeks.

    Can anyone tell me why this is happing?



  • Perhaps it is related to the ESP. Do you have the possibility to exchange and test it with another module?



  • @electrik yup, switche the esp. I use esplink, and when the gateway crashes I can still reach the web-interface. So my guess is that the nrf is the problem, it somehow crashes or refuses to send/read data through the RX/TX.

    It just crashed again: same node..

    0;255;3;0;9;11077691 TSF:MSG:READ,114-114-0,s=0,c=3,t=16,pt=0,l=0,sg=1:
    
    

  • Mod

    @omemanti adding

    #define 	MY_DEBUG_VERBOSE_RF24
    

    to the gateway might give some insight inte what is happening.



  • @mfalkvidd

    Does this give me more information compared to the normal #MY_DEBUG? Skip that, had to change things in MyConig.. lets see what happens

    to be complete; I used: #define MY_DEBUG_VERBOSE_NRF5_ESB



  • i got 3 crashes, every one of them happened within the 20 minutes:

    all ended like:

    0;255;3;0;9;759816 TSF:MSG:READ,215-215-0,s=2,c=3,t=16,pt=0,l=0,sg=1:
    0;255;3;0;9;759818 NRF5:SND:TO=215,LEN=32,PID=2,NOACK=0
    

    the strange part, its now node 215 instead of 114, both are located in a room somewhat distance from the gateway

    average communication looks like:

    0;255;3;0;9;750507 NRF5:RX:LEN=32,NOACK=0,PID=0,RSSI=-34,RX=0
    0;255;3;0;9;750508 TSF:MSG:READ,216-216-0,s=0,c=3,t=16,pt=0,l=0,sg=1:
    0;255;3;0;9;750510 NRF5:SND:TO=216,LEN=32,PID=1,NOACK=0
    0;255;3;0;9;750514 NRF5:SND:END=1,ACK=1,RTRY=1,RSSI=-35,WAKE=5
    0;255;3;0;9;750515 TSF:MSG:SEND,0-0-216-216,s=255,c=3,t=17,pt=6,l=25,sg=1,ft=0,st=OK:<NONCE>
    0;255;3;0;9;750535 NRF5:RX:LEN=32,NOACK=0,PID=1,RSSI=-34,RX=0
    0;255;3;0;9;750537 TSF:MSG:READ,216-216-0,s=0,c=1,t=1,pt=7,l=5,sg=1:60.9
    


  • every time it crashes, it at this line:

    0;255;3;0;9;1268917 NRF5:SND:TO=216,LEN=32,PID=0,NOACK=0
    

    The "good" part, it happens to all nodes.

    Could it have something to do with power? because the next line should be also an "SND"

    Or can it be an encryption thing, that it happens before the SND part?



  • I've been troubleshooting for the last couple of days now;

    so far:

    • switches Weemos modules => both crashed
    • switches Ebyte modules => both crashed
    • powered the Ebyte modules separately from the ESP8266 => no luck eighter

    2 things that came up "positive"

    • remove: #define MY_SECURITY_SIMPLE_PASSWD => it ran all night without any errors
    • FTDI + nrf52832 (serial gateway) + #define MY_SECURITY_SIMPLE_PASSWD => ran for the last couple of hours without any incident.

    I don't know if it makes any sense, but when I combine the weemos with a nrf52832 (using Serial Gateway) is get bumps in the road. separate they work like charm.



  • @omemanti said in Nrf52 gateway crashes:

    powered the Ebyte modules separately from the ESP8266 => no luck eighter

    So you powered the Ebyte module with an external regulator?
    Are your power supply and regulator powerful enough?



  • @electrik

    It an assumption but I guess so, its an ST-Link V2 as power for the Ebyte module.
    I use it to test my nodes with. So far none broke down.

    The Weemos has its own USB power



  • It is worth to investigate the specs of the regulator of the St link. Did you try adding a capacitor on the power supply?



  • @electrik,

    ill take a look at it, yup, a 100nf and 100uf next to the nrf52832. One setup had a 470uf for good measure..



  • tonight, I let a node send data to the gateway, this one hangs after a couple of hours, but this time, I also hooked up an FTDI to the node, to have some readout as well from it.

    It also broke down at the same stage like all the others did:

    45381108 TSF:MSG:SEND,215-215-0-0,s=1,c=1,t=0,pt=7,l=5,sg=1,ft=0,st=OK:13.3
    45381165 NRF5:SND:TO=0,LEN=32,PID=1,NOACK=0

    Why would it always hang on that this same line?

    -- while operation, the node stays at a solid 3,0 V during all operations.



  • A month ago, I changed my sketch.

    I replaced "MY_SECURITY_SIMPLE_PASSWD" to "MY_ENCRYPTION_SIMPLE_PASSWD" because this was most important to me. Nothing bad happened, I received everything in perfect order.

    for the sake of testing, I switched back to "MY_SECURITY_SIMPLE_PASSWD" a couple of days ago, Guess what is happening since that time.

    So there are to options to consider, or the implementation of MY_SECURITY_SIMPLE_PASSWD has a bug, or the Simple Signing is messing with my gateway.


  • Contest Winner

    @omemanti it is not clear from your message what actually happened. Did something stop working?
    Remember that you need to share the "simple" flag setting across all nodes in the network for it to work properly. You cannot have the password option on one node and the security option on another.



  • @anticimex said in Nrf52 gateway crashes:

    @omemanti it is not clear from your message what actually happened. Did something stop working?

    like posted a month ago; it "sometimes" stops working at the following line:

    0;255;3;0;9;759816 TSF:MSG:READ,215-215-0,s=2,c=3,t=16,pt=0,l=0,sg=1:
    0;255;3;0;9;759818 NRF5:SND:TO=215,LEN=32,PID=2,NOACK=0
    

    All nodes in the network are sending in data every 5 to 10 minutes (depending on the node) it all runs smoothly up until the line like above comes around. So all nodes send data and are using the same password etc.

    All went oke when I changed to only encryption, when I went back to security it starts breaking again.
    Average time form rebooting the gateway up until crashing averages from 30 minutes up until 15 hours. (yesterday I rebooted the gateway at 8:00 and it stopped working at 23:30)


  • Contest Winner

    @omemanti ok. I am not sure if there is anything specific in the nrf52 port but I think that the signing code is pretty much the same across all ports for software signing. Perhaps @d00616 has a clue?



  • I posted a Log of the gateway from boot (around 2 hours ago) to last crash.

    https://github.com/Omemanti/Paste/blob/master/Gateway_log_01-01-2019_security.txt

    everything seems normal (to me) except de crash in the end.



  • FYI:

    Yesterday I tried to use the MY_ENCRYPTION_SIMPLE_PASSWD and SIGNING (so not MY_SECURITY, everything separate), the gateway also crashes after a couple of hours.

    So reverted all my sketches and now only have MY_ENCRYPTION_SIMPLE_PASSWD on all my nodes. Since that time I've been receiving everything and had no crashes.


  • Plugin Developer

    I'm in the same situation, but with an Arduino Nano.

    Did you ever get MY_SECURITY_SIMPLE_PASSWD to work ok in the end?

    When I tried MY_ENCRYPTION_SIMPLE_PASSWD instead it worked straight away, so now I'm moving my network to that first. But I'd like to have optimal security if possible.


  • Contest Winner

    Please remember that the simple security flags use software implementation for signing (encryption as well unless the radio has native support), so they claim more resources. This is noticible on resource limited devices such as the atmega328p.
    Nowadays, running both software encryption and softare signing on atmega328p at the same time is almost doomed to fail due to the heap and stack colliding.


  • Plugin Developer

    @anticimex I've become quite good at saving memory precisely because I anticipated that I wanted to enable full security. Are you saying that's a fools errand?

    I was hoping that future versions of the security functionality might save some memory too?


  • Contest Winner

    @alowhum security v3 will most likely not be less resource intensive. The aim there is to make it more secure and less complicated to use.
    However, I am currently in a stage in life where I simply do not have the time to actively work on that so someone else have to look into it, details for the plans are on github, or it has to wait for now.
    There is still the option to use hw accelerated signing so atmega328p users at least in theory can still use it. And if you do manage to squeeze out enough ram to avoid the arduino environment to warn about it you should be fine.
    If the environment do give warnings on memory usage, it might still work, but that is less guaranteed and light very well lead to crashes.


  • Plugin Developer

    @anticimex said in Nrf52 gateway crashes:

    There is still the option to use hw accelerated signing so atmega328p users at least in theory can still use it. And if you do manage to squeeze out enough ram to avoid the arduino environment to warn about it you should be fine.
    If the environment do give warnings on memory usage, it might still work, but that is less guaranteed and light very well lead to crashes.

    Interesting, I didn't know there was hardware acceleration in the Arduino for encryption type things.

    I've built all my nodes to leave about 20% memory for security, as you said. I believe 20% should be quite doable for most uses, as some of my most outrageous nodes still have enough available. I've come to love progmem, use byte instead of int, etc. For me the Arduino Nano is still the most beginner friendly device out there, bar perhaps the Micro:bit (NRf51), and I believe there's still some life left in her.

    Talking about which: glad to hear usability is a focus for the next version, and I'm sorry to hear you won't have much time to work on it. I totally get it though. Life always has priority 🙂


  • Contest Winner

    @alowhum I never said there was hw acceleration in atmega328p for encryption. There is hw acceleration for signing with atsha204a. And some radios have hw acceleration for encryption such as rfm69.
    I don't recall that I have stated 20% ram for security either. I do not have a % for security at all. But I do recommend you follow the build environments warnings on memory usage where it states stability problems might occur.


  • Plugin Developer

    @anticimex The Arduino IDE always gives a warning at 80%? Or is that actually variable?

    I was wondering why I had never heard of hardware acceleration on the nano 😉


  • Contest Winner

    @alowhum the arduino IDE is not very clever and I believe the 20% left when it starts to warn is the % of ram available for the stack. So depending on your sketch, and the libraries you use, the functions you call and the order they are called has an impact in how much stack is claimed and if the stack needs more than what is left of the ram, bad and strange things start to happen.
    The 20% is just a PUMA (PUlled out of My Ass) number the arduino folks came up with for what the think is a reasonable stack usage for an average user. And so, if they see that you use more ram after linking your sketch, they think you should be warned that you might get into trouble.


  • Plugin Developer

    @anticimex said in Nrf52 gateway crashes:

    If the environment do give warnings on memory usage, it might still work, but that is less guaranteed and light very well lead to crashes.

    Ah, I assumed this referred to the Arduino IDE's warning message. Which environment do you mean?


  • Contest Winner

    @alowhum the environment known as the arduino IDE 😉



  • After a year of nearly having no problems, I got nodes that started crashing.

    I let my repeater up for the night and monitored his behavour.

    I got this again:

    08:47:27.607 -> 22440 NRF5:SND:TO=0,LEN=16,PID=0,NOACK=0
    08:47:27.641 -> 22447 NRF5:SND:END=1,ACK=1,RTRY=2,RSSI=-45,WAKE=8
    08:47:27.641 -> 22452 TSF:MSG:SEND,78-78-0-0,s=1,c=1,t=0,pt=7,l=5,sg=0,ft=0,st=OK:24.3
    08:47:27.707 -> 22508 NRF5:SND:TO=0,LEN=16,PID=1,NOACK=0
    08:47:27.707 -> 22515 NRF5:SND:END=1,ACK=1,RTRY=2,RSSI=-45,WAKE=7
    08:47:27.707 -> 22520 TSF:MSG:SEND,78-78-0-0,s=0,c=1,t=1,pt=7,l=5,sg=0,ft=0,st=OK:45.2
    08:47:28.435 -> 23226 NRF5:SND:TO=0,LEN=16,PID=2,NOACK=0
    08:47:28.435 -> 23233 NRF5:SND:END=1,ACK=1,RTRY=2,RSSI=-45,WAKE=7
    08:47:28.435 -> 23237 TSF:MSG:SEND,78-78-0-0,s=2,c=1,t=16,pt=1,l=1,sg=0,ft=0,st=OK:1
    08:47:28.766 -> Si7021 Found
    08:47:29.331 -> 24145 NRF5:SND:TO=0,LEN=16,PID=3,NOACK=0
    08:47:29.331 -> 24152 NRF5:SND:END=1,ACK=1,RTRY=2,RSSI=-45,WAKE=7
    08:47:29.331 -> 24157 TSF:MSG:SEND,78-78-0-0,s=1,c=1,t=0,pt=7,l=5,sg=0,ft=0,st=OK:24.3
    08:47:29.397 -> 24213 NRF5:SND:TO=0,LEN=16,PID=0,NOACK=0
    08:47:29.397 -> 24220 NRF5:SND:END=1,ACK=1,RTRY=2,RSSI=-45,WAKE=7
    08:47:29.397 -> 24225 TSF:MSG:SEND,78-78-0-0,s=0,c=1,t=1,pt=7,l=5,sg=0,ft=0,st=OK:45.2
    08:47:32.310 -> 27131 NRF5:SND:TO=0,LEN=16,PID=1,NOACK=0
    

    anyone got an Idea what this message means,

    it crashes after the last line.


Log in to reply
 

Suggested Topics

  • 3
  • 6
  • 2
  • 1
  • 1
  • 5

7
Online

11.4k
Users

11.1k
Topics

112.7k
Posts