Testing development branch with RF69HW is not working as it should


  • Hero Member

    Hi,

    I've had some troubles with my sensors lately, but I have not really had time to dig into it. I therefore thought I would start from scratch with the latest development branch to see if things got better.

    I'm trying to get two moteinos with the RFM69HW radio at 868 MHz to communicate to each other. I have loaded one with the Gateway example with a falling two defines:

    #define MY_RADIO_RFM69
    #define MY_IS_RFM69HW
    

    The other I have loaded with the energy pulse meter since that is the sensor I am trying to get to work. The first time I did this everything seem to initialising fine, and the Gateway received the messages from the sensor initialising. However, there were a few st:fail messages from the sensor. I uploaded the sketches multiple times with minor changes to the sensor and at some point nothing got through. Every message from the sensor failed. I did not change anything in the initialisation and, only in the logic to count the energy pulses. I even tried with different boards, switching the Gateway and sensor, but nothing helped. Am I missing anything in the set up for the radio?

    I should also mention that I have had issues with version 1.5 of the library as well on the same hardware. Therefore, I am not sure if it is a hardware problem or my use of the library that is at fault. Still, as I said, I have tried with multiple boards, both old that was working before and one that has never been used before.

    Any suggestions are appreciated.

    Thanks.


  • Hero Member

    For reference, the full sensor start-up log looks like this:

    Starting sensor (RRNNA-, 2.0.0-beta)
    Radio init successful.
    send: 3-3-0-0 s=255,c=3,t=15,pt=0,l=2,sg=0,st=fail:
    send: 3-3-0-0 s=255,c=0,t=17,pt=0,l=10,sg=0,st=fail:2.0.0-beta
    send: 3-3-0-0 s=255,c=3,t=6,pt=1,l=1,sg=0,st=fail:0
    send: 3-3-0-0 s=255,c=3,t=11,pt=0,l=12,sg=0,st=fail:Energy Meter
    send: 3-3-0-0 s=255,c=3,t=12,pt=0,l=3,sg=0,st=fail:1.0
    send: 3-3-0-0 s=1,c=0,t=13,pt=0,l=0,sg=0,st=fail:
    find parent
    send: 3-3-255-255 s=255,c=3,t=7,pt=0,l=0,sg=0,st=bc:
    Init complete, id=3, parent=0, distance=255
    

    And the Gateway looks like this:

    0;255;3;0;9;Starting gateway (RRNGA-, 2.0.0-beta)
    0;255;3;0;9;Radio init successful.
    0;255;3;0;14;Gateway startup complete.
    0;255;3;0;9;Init complete, id=0, parent=0, distance=0
    

    Some messages do get through, sometimes, though:

    Sensor:

    find parent
    send: 3-3-255-255 s=255,c=3,t=7,pt=0,l=0,sg=0,st=bc:
    read: 0-0-3 s=255,c=3,t=15,pt=0,l=2,sg=0:
    

    Gateway:

    0;255;3;0;9;read: 3-3-0 s=255,c=3,t=15,pt=0,l=2,sg=0:
    0;255;3;0;9;send: 0-0-3-3 s=255,c=3,t=15,pt=0,l=2,sg=0,st=ok:

  • Hero Member

    Some further information. I have tested with the library directly from the moteino site. I have the Gateway running on one node and the "node" running on another. The first time I ran the sketches things looked pretty well, apart from a few packets being lost. However, on subsequent runs it appears that the Gateway is receiving the messages (rssi = -29), but it is not able to send the ack to the node (or the node is not able to hear it). The node responds with "nothing..." After each transmission attempts, while the Gateway sends at least two ACKs per message it receives from the node. However, the Gateway does not receive any response when it tries to ping the node.

    I have changed which device is the node and the Gateway, and even included a third device as both, and the behaviour is always the same. The Gateway receives packets, but not the node.

    I'm aware that the above explanation makes no sense if you're not familiar with the Gateway and node test sketches for moteino 🙂


  • Admin

    Weird. And routing seems to be correct? From what I can see from the log it looks ok.

    Not much is changed in the moteino library more than some ESP adoption and init fail test.
    https://github.com/mysensors/Arduino/commits/development/libraries/MySensors/drivers/RFM69/RFM69.cpp


  • Hero Member

    @hek It has to be something wrong with my devices, although that doesn't make sense when I've tested so many. Anyway, I posted on the moteino forum, so let's see what they have to say about it.



  • What kind of protoboard are you using? I once used a double sided perf board to mount a Moteino on and I couldn't get a signal to the gateway even if my life depended on it. Later (after a week or so) I used single side, with the same Moteino and it went without a hitch. I think the reflections from the copper on the component side threw either the Mega328 or the RFM in a frenzy so no working link could be made.


  • Hero Member

    I'm using a standard Moteino USB with nothing else attached (apart from the USB cable and antenna).



  • @kolaf I've been having issues as well. I was running 1.5 and then 1.6-beta and now 2.0.0-beta. My setup is has 6 "SwitchMotes" from LowPowerLab, all of them with the RFM69HW radios. Given they are always powered from mains, it made sense to make them all repeaters. I have several battery powered nodes which are just plain nodes. I moved to 2.0.0-beta due to a routing loop issue in 1.6 (at least that was what the debugs were telling me) hoping that 2.0.0 would have it resolved. I cleared the EEPROMs of all the nodes and then applied the regular scripts. I watched each one boot and they all found the gateway as their parent directly. everything worked well for a few days and now the nodes have completely stopped passing messages to the gateway. I can tell visually (tx/rx led flashing nearly continuously at the SwitchMotes with no activity on the gateway) there is something looping. it happened before and a route loop ended up in the EEPROM tables, so the nodes would reboot into a loop again. I'll pull some debugs to follow this up.


  • Hero Member

    Weird, I'm curious to hear what you find.

    I have a thread on going at the lowpower forum trying to figure out my own problems: https://lowpowerlab.com/forum/index.php/topic,1821.0.html

    For my part the problem is clearly unrelated to MySensors, but I find it difficult to believe that it is a hardware problem when it affects so many devices (unless it is an age thing or I managed to break them all at once).



  • Well, here is the boot up of my test node. Looks like it talked to the gateway directly (I'm in range so that makes sends) but then gives up and looks for another route which it finds via node 7. Ugh.. that's not right.

    The "Encryption Enabled" is something I added to let me know that library pulled the AES key for the radio from NVRAM. just wanted to have that comfort. 🙂

    Starting repeater (RRORAS, 2.0.0-beta)
    Encryption Enabled
    Radio init successful.
    Signing required
    Skipping security for command 3 type 15
    send: 2-2-0-0 s=255,c=3,t=15,pt=0,l=2,sg=0,st=ok:␁␁
    Waiting for GW to send signing preferences...
    Skipping security for command 3 type 15
    read: 0-0-2 s=255,c=3,t=15,pt=0,l=2,sg=0:␁␁
    Mark node 0 as one that require signed messages
    Mark node 0 as one that do not require whitelisting
    Skipping security for command 3 type 16
    send: 2-2-0-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-0-2 s=255,c=3,t=17,pt=6,l=25,sg=0:4C91BE7376270BCAF2A0A6C325C530D8ECF3EF0224048E74A6
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 0200560012FF322E302E302D62657461
    Current nonce: 4C91BE7376270BCAF2A0A6C325C530D8ECF3EF0224048E74A6AAAAAAAAAAAAAA
    HMAC: B8BB1477E160F460500921C54A585F70101B73C2DE51CA103625F4324D1C0267
    Signature in message: 01BB1477E160F460500921C54A585F
    Message signed
    Message to send has been signed
    send: 2-2-0-0 s=255,c=0,t=18,pt=0,l=10,sg=1,st=ok:2.0.0-beta
    Skipping security for command 3 type 16
    send: 2-2-0-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-0-2 s=255,c=3,t=17,pt=6,l=25,sg=0:DC48AD1E7510EEEB2254E9CEEB7ECB227644A0B14B5A4006C0
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 02000E2306FF00
    Current nonce: DC48AD1E7510EEEB2254E9CEEB7ECB227644A0B14B5A4006C0AAAAAAAAAAAAAA
    HMAC: 4BE1D45B70696CD8C227A186FB1415EE19FE29630CE33AD625473756CB7CA5AD
    Signature in message: 01E1D45B70696CD8C227A186FB1415EE19FE29630CE33AD6
    Message signed
    Message to send has been signed
    send: 2-2-0-0 s=255,c=3,t=6,pt=1,l=1,sg=1,st=ok:0
    Skipping security for command 3 type 16
    read: 0-0-2 s=255,c=3,t=16,pt=0,l=0,sg=0:
    Signing backend: ATSHA204Soft
    SHA256: B030BFA7AF89236B91517DC2F2AD0EBBB0062FF4B1BEA63159AAAAAAAAAAAAAA
    Transmittng nonce
    Skipping security for command 3 type 17
    send: 2-2-0-0 s=255,c=3,t=17,pt=6,l=25,sg=0,st=fail:B030BFA7AF89236B91517DC2F2AD0EBBB0062FF4B1BEA63159
    Skipping security for command 3 type 16
    send: 2-2-0-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-0-2 s=255,c=3,t=17,pt=6,l=25,sg=0:D16A3D55860DB1DEF5C912297147A7ED2EDD258A14853CE8B5
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 020056C400FFFFFFFFFFFFFFFFFF0300
    Current nonce: D16A3D55860DB1DEF5C912297147A7ED2EDD258A14853CE8B5AAAAAAAAAAAAAA
    HMAC: 8658EBD21DE1970E2F459374C11868D5A83CCE6FD324D7E8C35DF3B97FE653E6
    Signature in message: 0158EBD21DE1970E2F459374C11868
    Message signed
    Message to send has been signed
    send: 2-2-0-0 s=255,c=4,t=0,pt=6,l=10,sg=1,st=ok:FFFFFFFFFFFFFFFF0300
    Skipping security for command 3 type 16
    send: 2-2-0-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=fail:
    Failed to transmit nonce request!
    sign fail
    send: 2-2-0-0 s=255,c=3,t=11,pt=0,l=11,sg=1,st=ok:SwitchMote3
    Skipping security for command 3 type 16
    send: 2-2-0-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-0-2 s=255,c=3,t=17,pt=6,l=25,sg=0:279A2B8C1C0BD8E53A825A4C811EABC05CED3C9C8C5A8B2127
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 02002E030CFF312E312E30
    Current nonce: 279A2B8C1C0BD8E53A825A4C811EABC05CED3C9C8C5A8B2127AAAAAAAAAAAAAA
    HMAC: E52A3C1FFD45BB263AB85CEA2512E6556BE4334D7E8263690D7A4C445F916801
    Signature in message: 012A3C1FFD45BB263AB85CEA2512E6556BE4334D
    Message signed
    Message to send has been signed
    send: 2-2-0-0 s=255,c=3,t=12,pt=0,l=5,sg=1,st=fail:1.1.0
    Skipping security for command 3 type 16
    send: 2-2-0-0 s=1,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Message to send could not be signed!
    sign fail
    send: 2-2-0-0 s=1,c=0,t=3,pt=0,l=0,sg=1,st=fail:
    Skipping security for command 3 type 16
    send: 2-2-0-0 s=2,c=3,t=16,pt=0,l=0,sg=0,st=fail:
    Failed to transmit nonce request!
    sign fail
    send: 2-2-0-0 s=2,c=0,t=3,pt=0,l=0,sg=1,st=fail:
    Skipping security for command 3 type 16
    send: 2-2-0-0 s=3,c=3,t=16,pt=0,l=0,sg=0,st=fail:
    Failed to transmit nonce request!
    sign fail
    send: 2-2-0-0 s=3,c=0,t=3,pt=0,l=0,sg=1,st=fail:
    Skipping security for command 3 type 16
    send: 2-2-0-0 s=4,c=3,t=16,pt=0,l=0,sg=0,st=fail:
    find parent
    send: 2-2-255-255 s=255,c=3,t=7,pt=0,l=0,sg=1,st=bc:
    Verification timeout
    Skipping security for command 3 type 8
    read: 7-7-2 s=255,c=3,t=8,pt=1,l=1,sg=1:1
    parent=7, d=2
    Skipping security for command 3 type 8
    read: 3-3-2 s=255,c=3,t=8,pt=1,l=1,sg=1:3
    Skipping security for command 3 type 8
    read: 6-6-2 s=255,c=3,t=8,pt=1,l=1,sg=1:1
    Skipping security for command 3 type 8
    read: 5-5-2 s=255,c=3,t=8,pt=1,l=1,sg=1:2
    Skipping security for command 3 type 8
    read: 4-4-2 s=255,c=3,t=8,pt=1,l=1,sg=1:2
    Failed to transmit nonce request!
    sign fail
    send: 4-2-2-2 s=255,c=3,t=8,pt=1,l=1,sg=1,st=fail:2
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=5,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:2AF1CDE94D97445AE90B28F5BB248C2C404887D6448243F396
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 020006000405
    Current nonce: 2AF1CDE94D97445AE90B28F5BB248C2C404887D6448243F396AAAAAAAAAAAAAA
    HMAC: 1038545352465B7D9F74F932A3F3218D4AFA5702A8CE6FE055D7BE6CE3EBDD82
    Signature in message: 0138545352465B7D9F74F932A3F3218D4AFA5702A8CE6FE055
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=5,c=0,t=4,pt=0,l=0,sg=1,st=ok:
    Init complete, id=2, parent=7, distance=2
    

    What do you make of all of this?



  • Ok, here's the gateway's view of the transaction..

      [0;255;3;0;9;read: 2-6-0 s=255,c=3,t=11,pt=0,l=11,sg=0:SwitchMote3]
      [2;255;3;0;11;SwitchMote3]
      [0;255;3;0;9;read: 2-6-0 s=255,c=3,t=16,pt=0,l=0,sg=0:]
      [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:E206B5FF9C387BB7E1B3BB1FD3C88C05CE14FB6BE015DF204F]
      [0;255;3;0;9;read: 2-6-0 s=255,c=3,t=12,pt=0,l=5,sg=0:1.1.0]
      [2;255;3;0;12;1.1.0]
      [0;255;3;0;9;read: 2-6-0 s=1,c=3,t=16,pt=0,l=0,sg=0:]
      [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:A1EC8E72F0B4B834D9DB0A5099942545B9D1A775D35EE05237]
      [0;255;3;0;9;read: 2-6-0 s=1,c=0,t=3,pt=0,l=0,sg=0:]
      [2;1;0;0;3;]
      [0;255;3;0;9;read: 2-6-0 s=2,c=3,t=16,pt=0,l=0,sg=0:]
      [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:76F66319A6DC1BD70A3D0AF214BC445E0947F38D2A0A087DFE]
      [0;255;3;0;9;read: 2-6-0 s=2,c=0,t=3,pt=0,l=0,sg=0:]
      [2;2;0;0;3;]
      [0;255;3;0;9;read: 2-6-0 s=3,c=3,t=16,pt=0,l=0,sg=0:]
      [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:E00D2258F1A1838AA0B6DAC5F5AB612916F04D94681281D322]
      [0;255;3;0;9;read: 2-6-0 s=3,c=0,t=3,pt=0,l=0,sg=0:]
      [2;3;0;0;3;]
      [0;255;3;0;9;read: 2-6-0 s=4,c=3,t=16,pt=0,l=0,sg=0:]
      [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:42EF7EAB7EF35224564ABCBB832EF8452220C72E6FE571BB33]
      [0;255;3;0;9;read: 2-6-0 s=4,c=0,t=3,pt=0,l=0,sg=0:]
      [2;4;0;0;3;]
      [0;255;3;0;9;read: 2-6-0 s=5,c=3,t=16,pt=0,l=0,sg=0:]
      [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:0FA8B16F58BF135EB9E55E2DA402BAF4BEF26ADDA1812398FB]
      [0;255;3;0;9;read: 2-6-0 s=5,c=0,t=4,pt=0,l=0,sg=0:]
      [2;5;0;0;4;]
      [0;255;3;0;9;sign fail]
      [0;255;3;0;9;send: 2-0-0-0 s=5,c=0,t=4,pt=0,l=0,sg=0,st=fail:]
    

    I'm using MyController. Here is the node's view of that specific boot.

    Starting repeater (RRORAS, 2.0.0-beta)
    Encryption Enabled
    Radio init successful.
    Signing required
    Skipping security for command 3 type 15
    send: 2-2-7-0 s=255,c=3,t=15,pt=0,l=2,sg=0,st=ok:␁␁
    Waiting for GW to send signing preferences...
    Skipping security for command 3 type 15
    read: 0-7-2 s=255,c=3,t=15,pt=0,l=2,sg=0:␁␁
    Mark node 0 as one that require signed messages
    Mark node 0 as one that do not require whitelisting
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:AB31DBA98EE07CB671B2963E007346CE5FDD2DA182E2A9C5F6
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 0200560012FF322E302E302D62657461
    Current nonce: AB31DBA98EE07CB671B2963E007346CE5FDD2DA182E2A9C5F6AAAAAAAAAAAAAA
    HMAC: 8B06824B6BF99F22F06D1F40563FB0A6ABFDF0F2844C6B202CF1FA2454454746
    Signature in message: 0106824B6BF99F22F06D1F40563FB0
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=255,c=0,t=18,pt=0,l=10,sg=1,st=ok:2.0.0-beta
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:E05E83E7C9D9A40DF0709B227D6A40B0542D1CE5B6819BB03C
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 02000E2306FF07
    Current nonce: E05E83E7C9D9A40DF0709B227D6A40B0542D1CE5B6819BB03CAAAAAAAAAAAAAA
    HMAC: A35B9352DAE7B008EB1A7FD07A5F63CC61C424202996C5862294717CD4840399
    Signature in message: 015B9352DAE7B008EB1A7FD07A5F63CC61C424202996C586
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=255,c=3,t=6,pt=1,l=1,sg=1,st=ok:7
    Skipping security for command 3 type 16
    read: 0-7-2 s=255,c=3,t=16,pt=0,l=0,sg=0:
    Signing backend: ATSHA204Soft
    SHA256: 51D9E435DB0D76325F2B043EC3C65CFDB4941DDA1783F29641AAAAAAAAAAAAAA
    Transmittng nonce
    Skipping security for command 3 type 17
    send: 2-2-7-0 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:51D9E435DB0D76325F2B043EC3C65CFDB4941DDA1783F29641
    Signature in message: 01F9305455406EC03185820C564790A2CA
    Message to process: 0002460B06FF496D70657269616C
    Current nonce: 51D9E435DB0D76325F2B043EC3C65CFDB4941DDA1783F29641AAAAAAAAAAAAAA
    HMAC: 20F9305455406EC03185820C564790A2CA36CA74EDE1C0D7714D5D17E64A0A16
    Signature OK
    read: 0-7-2 s=255,c=3,t=6,pt=0,l=8,sg=0:Imperial
    Skipping security for ACK on command 3 type 6
    send: 2-2-7-0 s=255,c=3,t=6,pt=0,l=8,sg=0,st=ok:Imperial
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:D3152DA42E7827D878A6A3B3069BFE3CF37D015F1CC4B786ED
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 020056C400FFFFFFFFFFFFFFFFFF0300
    Current nonce: D3152DA42E7827D878A6A3B3069BFE3CF37D015F1CC4B786EDAAAAAAAAAAAAAA
    HMAC: E789304B6589D6457D5465A5C3059CE062434B1E0834523764E55032503E0BB9
    Signature in message: 0189304B6589D6457D5465A5C3059C
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=255,c=4,t=0,pt=6,l=10,sg=1,st=ok:FFFFFFFFFFFFFFFF0300
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:8B3B8DB4F3DAC3070F287736FDD9D47407E6760C5925984F3F
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 02005E030BFF5377697463684D6F746533
    Current nonce: 8B3B8DB4F3DAC3070F287736FDD9D47407E6760C5925984F3FAAAAAAAAAAAAAA
    HMAC: 07EC6A51765F376E8D16CD2C4E091FBBDE5695FED00CABF27E13EF4543429520
    Signature in message: 01EC6A51765F376E8D16CD2C4E09
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=255,c=3,t=11,pt=0,l=11,sg=1,st=ok:SwitchMote3
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:E206B5FF9C387BB7E1B3BB1FD3C88C05CE14FB6BE015DF204F
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 02002E030CFF312E312E30
    Current nonce: E206B5FF9C387BB7E1B3BB1FD3C88C05CE14FB6BE015DF204FAAAAAAAAAAAAAA
    HMAC: 656EE1D40B058730D45C690128D07914514295556F3EC0CDE8D5758A9974C95A
    Signature in message: 016EE1D40B058730D45C690128D0791451429555
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=255,c=3,t=12,pt=0,l=5,sg=1,st=ok:1.1.0
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=1,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:A1EC8E72F0B4B834D9DB0A5099942545B9D1A775D35EE05237
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 020006000301
    Current nonce: A1EC8E72F0B4B834D9DB0A5099942545B9D1A775D35EE05237AAAAAAAAAAAAAA
    HMAC: 582AD81BE5DCDDACCC095AC487A4006BE34C7F9CA04239DCFE2F778BCCCDA407
    Signature in message: 012AD81BE5DCDDACCC095AC487A4006BE34C7F9CA04239DCFE
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=1,c=0,t=3,pt=0,l=0,sg=1,st=ok:
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=2,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:76F66319A6DC1BD70A3D0AF214BC445E0947F38D2A0A087DFE
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 020006000302
    Current nonce: 76F66319A6DC1BD70A3D0AF214BC445E0947F38D2A0A087DFEAAAAAAAAAAAAAA
    HMAC: C03E0B0D759CE6CA1D9D2AB097129FFD1CCA295F80E7825BE495F6EE6AC54233
    Signature in message: 013E0B0D759CE6CA1D9D2AB097129FFD1CCA295F80E7825BE4
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=2,c=0,t=3,pt=0,l=0,sg=1,st=ok:
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=3,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:E00D2258F1A1838AA0B6DAC5F5AB612916F04D94681281D322
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 020006000303
    Current nonce: E00D2258F1A1838AA0B6DAC5F5AB612916F04D94681281D322AAAAAAAAAAAAAA
    HMAC: 382B1AF5470719EE1C00D1B5E70137F69A5F098A75C21685442FBCE764C360B5
    Signature in message: 012B1AF5470719EE1C00D1B5E70137F69A5F098A75C2168544
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=3,c=0,t=3,pt=0,l=0,sg=1,st=ok:
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=4,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:42EF7EAB7EF35224564ABCBB832EF8452220C72E6FE571BB33
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 020006000304
    Current nonce: 42EF7EAB7EF35224564ABCBB832EF8452220C72E6FE571BB33AAAAAAAAAAAAAA
    HMAC: EA04A9123E20CAF354189809E30AA51A3F57B61A878489427C6F20B33A3D5A2E
    Signature in message: 0104A9123E20CAF354189809E30AA51A3F57B61A878489427C
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=4,c=0,t=3,pt=0,l=0,sg=1,st=ok:
    Skipping security for command 3 type 16
    send: 2-2-7-0 s=5,c=3,t=16,pt=0,l=0,sg=0,st=ok:
    Nonce requested from 0. Waiting...
    Skipping security for command 3 type 17
    read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:0FA8B16F58BF135EB9E55E2DA402BAF4BEF26ADDA1812398FB
    Nonce received from 0. Proceeding with signing...
    Signing backend: ATSHA204Soft
    Message to process: 020006000405
    Current nonce: 0FA8B16F58BF135EB9E55E2DA402BAF4BEF26ADDA1812398FBAAAAAAAAAAAAAA
    HMAC: 203850D0A1BF3EEBB94A59ADCC377CC2B0EEA1DB4546ADF05CD47EB65CA1FD58
    Signature in message: 013850D0A1BF3EEBB94A59ADCC377CC2B0EEA1DB4546ADF05C
    Message signed
    Message to send has been signed
    send: 2-2-7-0 s=5,c=0,t=4,pt=0,l=0,sg=1,st=ok:
    Init complete, id=2, parent=7, distance=2
    

    So what we see is that my network has "re-converged" from everyone talking directly to the gateway and now I'm bouncing from node 2 via 7 to 6 than to the gateway.. what gives? these are all one hop from the gateway and should directly talk with it...

    Now, this is the first time I've used the Signing Feature. But this routing thing has happened twice before without that feature enabled. It's a real pain in the butt as now the routing is stuck in the NVRAM and even rebooting the nodes won't fix it. The messages are very intermittent and that makes the controls not reliable. I have to pull the nodes out of the wall, run the clear config script and then reload the switch script to heal the network. I'm at a loss... thanks for any help.

    Oh, and my code for the Mote is here: https://github.com/TheCranston/MY-Sensors.git


  • Hero Member

    I'm afraid we might be fighting two different problems. But anyway, we've made some breakthrough on my problems :-). It appears to be related to the radio having trouble to catch the entire first packet after waking from standby. It seems to work better when waking from sleep. If you check my last post in this thread https://lowpowerlab.com/forum/index.php/topic,1821.msg13160.html#msg13160 you can see what I have done to change the idle behaviour for the radio. The node is now able to process acknowledgements to all the messages it sends. There are still some issues, but for me this is a great improvement.

    Following from this I have patched my copy of the MySensors development branch with the latest RFM69 library with my small patch. I also did a small change to the RFMTransport to change when/how it sends acknowledgements to messages in transportReceive. I have run a gateway and energy meter sensor since last night (around 14 hours) and the communication has worked flawlessly for that period. This is the first in a very long time 🙂


  • Admin

    Great work investigating the issue @kolaf!



  • @kolaf There is a distinct possibility that it's two different issues. I'll give your patch a try on a few of my nodes? At one time I had tried to take the recent RFM69 library from Felix's github and use it with MySensors. I really liked the ATC idea that is in the current codebase. I wish I had the stuff to dig down into the code like you did. I'm recovering from a major illness (still on disability) and the meds make it very hard to brain for longer than a few hours a day.

    @hek should I be sending periodic heartbeats from my mains powered devices? Would that help the network to maintain convergence?


  • Admin

    @BenCranston said:

    should I be sending periodic heartbeats from my mains powered devices? Would that help the network to maintain convergence?

    Not sure.. It would probably recover faster as the find-new-parent-thing only happens at transmission time (if it has lost its parent). So potentially it could have solved any routing problems at the a new message should be sent.



  • Greetings! I've been trying a few things and am reporting in....

    I added a 5 minute heartbeat to each of my nodes. I can see them checking in now. However the network still melts down within 24 hours. I replaced the gateway Moteino and have had the same result. The patch that @kolaf suggested basically quadrupled the functional time of the network, which is really cool. Looking at the routing each node is offering up "stale" routes to the gateway thereby creating a loop. Graphically something like this:
    0_1461543405625_State 1.png

    What I've been able to determine is that the trigger, at least a several times, is related to the gateway basically going to sleep. A power cycle and we are back in business. The cascade of the routing loop is something like this:
    0_1461543509572_State 2.png

    Now, that's two issues..

    Looking at just the routing stability. does it make sense to do something like a probe to determine a route is valid before installing in the table? I've yet to review the code base, but a Time To Live in a message would also stop the loop after effectively aging out. I'm sure there is a lively discussion archived somewhere on how the routing works...

    The other issue is that my gateway RFM69HW radios "appear" to be going to sleep and then i have to power cycle the Moteino to get it back on the network.. I'm wondering if there is something that is putting the radio in some sort of sleep or low power mode that it's getting stuck there...

    sorry for the rambling.


  • Hero Member

    @BenCranston I'm glad the fix the proposed helped out, but too bad it was not good enough. It might be worth catching up on the latest few developments in the thread. Basically it turns out that changing all references to standby to sleep in the setMode function is a bit overkill. Maybe this is also causing some of your trouble. The current version of the fix consists of putting the radio to sleep in receiveBegin, like this:

    void RFM69::receiveBegin() {
        DATALEN = 0;
        SENDERID = 0;
        TARGETID = 0;
        PAYLOADLEN = 0;
        ACK_REQUESTED = 0;
        ACK_RECEIVED = 0;
        RSSI = 0;
        setMode(RF69_MODE_SLEEP);
        if (readReg(REG_IRQFLAGS2) & RF_IRQFLAGS2_PAYLOADREADY)
           writeReg(REG_PACKETCONFIG2, (readReg(REG_PACKETCONFIG2) & 0xFB) |  RF_PACKET2_RXRESTART); // avoid RX deadlocks
        writeReg(REG_DIOMAPPING1, RF_DIOMAPPING1_DIO0_01); // set DIO0 to  "PAYLOADREADY" in receive mode
        setMode(RF69_MODE_RX);
    }
    

    In my case it also turned out that the RF environment around 868 MHz was a bit noisy. This messed with the CSMA function which always caused the node to wait a second before transmitting the message since the channel was never quiet enough. This limit is controlled by CSMA_LIMIT which I set to -40 instead of -90. Actually, what I ended up doing was to switch the frequency down 1 MHz, to 867, which was a much quieter band. The trouble with the high noise floor was that the gateway had trouble hearing the nodes that were far enough away to have a received RSSI less than -60 when the noise floor was -55. It could be worth continuously printing the RSSI of the channel at the gateway without anyone transmitting to see what your background noise is.



  • @kolaf excellent! I'll give that a try. thanks for pointing it out. How are you determining the noise floor on the various frequencies? I'm running my network at 915Mhz for what its worth.

    What are your thoughts on the routing looping I've been seeing. I've been able to sort of clean it up for a little while if I can re-establish connectivity right after a node reset and then sending an I_CHILDREN with a payload of "C" to clear the route table. Or at least that's what I think I asked them to do based on reading the API. 🙂

    Currently I gave up on repeaters in the network a few days ago and moved them all to simple nodes. Still having issues with stability..


  • Hero Member

    @BenCranston For testing the noise I simply print the result from readRSSI() inside the radio library inside the canSend function to the serial connection. The reason for doing it like this is that the radio is very picky about which mode it has to be in for reporting the rssi value. I used the node example from the RFM69 as the basis for this test. At the beginning of RFM69.cpp there are three lines that initialise the radio with the correct frequency. This can be changed to shift the frequency up or down a few megahertz.

    I'v never had a chance/need to look into the routing functionality (although I actually have a PhD in network routing), so I cannot comment much on this. From your description the basic problem is that the gateway for some reason fails to respond, or that the response from the gateway is not captured by the node. The resulting routing flood seems like the natural consequence of this. This is why i pointed to the latest developments in my testing since you're better off solving the thing that triggers the rerouting rather than fixing any rerouting problems yourself 🙂


  • Hero Member

    A simple thing you can do in RFM69Transport is to increase the retry count for the messages that are sent. The default value is 2 (implicit), to increase this by changing the following:

    return _radio.sendWithRetry(to,data,len);
    

    to

    return _radio.sendWithRetry(to,data,len, 5);
    

    To have it retry five times.

    My guess is that this will greatly increase the operation time of your network.


Log in to reply
 

Suggested Topics

58
Online

11.5k
Users

11.1k
Topics

112.7k
Posts