Testing development branch with RF69HW is not working as it should
-
Hi,
I've had some troubles with my sensors lately, but I have not really had time to dig into it. I therefore thought I would start from scratch with the latest development branch to see if things got better.
I'm trying to get two moteinos with the RFM69HW radio at 868 MHz to communicate to each other. I have loaded one with the Gateway example with a falling two defines:
#define MY_RADIO_RFM69 #define MY_IS_RFM69HW
The other I have loaded with the energy pulse meter since that is the sensor I am trying to get to work. The first time I did this everything seem to initialising fine, and the Gateway received the messages from the sensor initialising. However, there were a few st:fail messages from the sensor. I uploaded the sketches multiple times with minor changes to the sensor and at some point nothing got through. Every message from the sensor failed. I did not change anything in the initialisation and, only in the logic to count the energy pulses. I even tried with different boards, switching the Gateway and sensor, but nothing helped. Am I missing anything in the set up for the radio?
I should also mention that I have had issues with version 1.5 of the library as well on the same hardware. Therefore, I am not sure if it is a hardware problem or my use of the library that is at fault. Still, as I said, I have tried with multiple boards, both old that was working before and one that has never been used before.
Any suggestions are appreciated.
Thanks.
-
For reference, the full sensor start-up log looks like this:
Starting sensor (RRNNA-, 2.0.0-beta) Radio init successful. send: 3-3-0-0 s=255,c=3,t=15,pt=0,l=2,sg=0,st=fail: send: 3-3-0-0 s=255,c=0,t=17,pt=0,l=10,sg=0,st=fail:2.0.0-beta send: 3-3-0-0 s=255,c=3,t=6,pt=1,l=1,sg=0,st=fail:0 send: 3-3-0-0 s=255,c=3,t=11,pt=0,l=12,sg=0,st=fail:Energy Meter send: 3-3-0-0 s=255,c=3,t=12,pt=0,l=3,sg=0,st=fail:1.0 send: 3-3-0-0 s=1,c=0,t=13,pt=0,l=0,sg=0,st=fail: find parent send: 3-3-255-255 s=255,c=3,t=7,pt=0,l=0,sg=0,st=bc: Init complete, id=3, parent=0, distance=255
And the Gateway looks like this:
0;255;3;0;9;Starting gateway (RRNGA-, 2.0.0-beta) 0;255;3;0;9;Radio init successful. 0;255;3;0;14;Gateway startup complete. 0;255;3;0;9;Init complete, id=0, parent=0, distance=0
Some messages do get through, sometimes, though:
Sensor:
find parent send: 3-3-255-255 s=255,c=3,t=7,pt=0,l=0,sg=0,st=bc: read: 0-0-3 s=255,c=3,t=15,pt=0,l=2,sg=0:
Gateway:
0;255;3;0;9;read: 3-3-0 s=255,c=3,t=15,pt=0,l=2,sg=0: 0;255;3;0;9;send: 0-0-3-3 s=255,c=3,t=15,pt=0,l=2,sg=0,st=ok:
-
Some further information. I have tested with the library directly from the moteino site. I have the Gateway running on one node and the "node" running on another. The first time I ran the sketches things looked pretty well, apart from a few packets being lost. However, on subsequent runs it appears that the Gateway is receiving the messages (rssi = -29), but it is not able to send the ack to the node (or the node is not able to hear it). The node responds with "nothing..." After each transmission attempts, while the Gateway sends at least two ACKs per message it receives from the node. However, the Gateway does not receive any response when it tries to ping the node.
I have changed which device is the node and the Gateway, and even included a third device as both, and the behaviour is always the same. The Gateway receives packets, but not the node.
I'm aware that the above explanation makes no sense if you're not familiar with the Gateway and node test sketches for moteino
-
Weird. And routing seems to be correct? From what I can see from the log it looks ok.
Not much is changed in the moteino library more than some ESP adoption and init fail test.
https://github.com/mysensors/Arduino/commits/development/libraries/MySensors/drivers/RFM69/RFM69.cpp
-
@hek It has to be something wrong with my devices, although that doesn't make sense when I've tested so many. Anyway, I posted on the moteino forum, so let's see what they have to say about it.
-
What kind of protoboard are you using? I once used a double sided perf board to mount a Moteino on and I couldn't get a signal to the gateway even if my life depended on it. Later (after a week or so) I used single side, with the same Moteino and it went without a hitch. I think the reflections from the copper on the component side threw either the Mega328 or the RFM in a frenzy so no working link could be made.
-
I'm using a standard Moteino USB with nothing else attached (apart from the USB cable and antenna).
-
@kolaf I've been having issues as well. I was running 1.5 and then 1.6-beta and now 2.0.0-beta. My setup is has 6 "SwitchMotes" from LowPowerLab, all of them with the RFM69HW radios. Given they are always powered from mains, it made sense to make them all repeaters. I have several battery powered nodes which are just plain nodes. I moved to 2.0.0-beta due to a routing loop issue in 1.6 (at least that was what the debugs were telling me) hoping that 2.0.0 would have it resolved. I cleared the EEPROMs of all the nodes and then applied the regular scripts. I watched each one boot and they all found the gateway as their parent directly. everything worked well for a few days and now the nodes have completely stopped passing messages to the gateway. I can tell visually (tx/rx led flashing nearly continuously at the SwitchMotes with no activity on the gateway) there is something looping. it happened before and a route loop ended up in the EEPROM tables, so the nodes would reboot into a loop again. I'll pull some debugs to follow this up.
-
Weird, I'm curious to hear what you find.
I have a thread on going at the lowpower forum trying to figure out my own problems: https://lowpowerlab.com/forum/index.php/topic,1821.0.html
For my part the problem is clearly unrelated to MySensors, but I find it difficult to believe that it is a hardware problem when it affects so many devices (unless it is an age thing or I managed to break them all at once).
-
Well, here is the boot up of my test node. Looks like it talked to the gateway directly (I'm in range so that makes sends) but then gives up and looks for another route which it finds via node 7. Ugh.. that's not right.
The "Encryption Enabled" is something I added to let me know that library pulled the AES key for the radio from NVRAM. just wanted to have that comfort.
Starting repeater (RRORAS, 2.0.0-beta) Encryption Enabled Radio init successful. Signing required Skipping security for command 3 type 15 send: 2-2-0-0 s=255,c=3,t=15,pt=0,l=2,sg=0,st=ok:␁␁ Waiting for GW to send signing preferences... Skipping security for command 3 type 15 read: 0-0-2 s=255,c=3,t=15,pt=0,l=2,sg=0:␁␁ Mark node 0 as one that require signed messages Mark node 0 as one that do not require whitelisting Skipping security for command 3 type 16 send: 2-2-0-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-0-2 s=255,c=3,t=17,pt=6,l=25,sg=0:4C91BE7376270BCAF2A0A6C325C530D8ECF3EF0224048E74A6 Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 0200560012FF322E302E302D62657461 Current nonce: 4C91BE7376270BCAF2A0A6C325C530D8ECF3EF0224048E74A6AAAAAAAAAAAAAA HMAC: B8BB1477E160F460500921C54A585F70101B73C2DE51CA103625F4324D1C0267 Signature in message: 01BB1477E160F460500921C54A585F Message signed Message to send has been signed send: 2-2-0-0 s=255,c=0,t=18,pt=0,l=10,sg=1,st=ok:2.0.0-beta Skipping security for command 3 type 16 send: 2-2-0-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-0-2 s=255,c=3,t=17,pt=6,l=25,sg=0:DC48AD1E7510EEEB2254E9CEEB7ECB227644A0B14B5A4006C0 Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 02000E2306FF00 Current nonce: DC48AD1E7510EEEB2254E9CEEB7ECB227644A0B14B5A4006C0AAAAAAAAAAAAAA HMAC: 4BE1D45B70696CD8C227A186FB1415EE19FE29630CE33AD625473756CB7CA5AD Signature in message: 01E1D45B70696CD8C227A186FB1415EE19FE29630CE33AD6 Message signed Message to send has been signed send: 2-2-0-0 s=255,c=3,t=6,pt=1,l=1,sg=1,st=ok:0 Skipping security for command 3 type 16 read: 0-0-2 s=255,c=3,t=16,pt=0,l=0,sg=0: Signing backend: ATSHA204Soft SHA256: B030BFA7AF89236B91517DC2F2AD0EBBB0062FF4B1BEA63159AAAAAAAAAAAAAA Transmittng nonce Skipping security for command 3 type 17 send: 2-2-0-0 s=255,c=3,t=17,pt=6,l=25,sg=0,st=fail:B030BFA7AF89236B91517DC2F2AD0EBBB0062FF4B1BEA63159 Skipping security for command 3 type 16 send: 2-2-0-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-0-2 s=255,c=3,t=17,pt=6,l=25,sg=0:D16A3D55860DB1DEF5C912297147A7ED2EDD258A14853CE8B5 Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 020056C400FFFFFFFFFFFFFFFFFF0300 Current nonce: D16A3D55860DB1DEF5C912297147A7ED2EDD258A14853CE8B5AAAAAAAAAAAAAA HMAC: 8658EBD21DE1970E2F459374C11868D5A83CCE6FD324D7E8C35DF3B97FE653E6 Signature in message: 0158EBD21DE1970E2F459374C11868 Message signed Message to send has been signed send: 2-2-0-0 s=255,c=4,t=0,pt=6,l=10,sg=1,st=ok:FFFFFFFFFFFFFFFF0300 Skipping security for command 3 type 16 send: 2-2-0-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=fail: Failed to transmit nonce request! sign fail send: 2-2-0-0 s=255,c=3,t=11,pt=0,l=11,sg=1,st=ok:SwitchMote3 Skipping security for command 3 type 16 send: 2-2-0-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-0-2 s=255,c=3,t=17,pt=6,l=25,sg=0:279A2B8C1C0BD8E53A825A4C811EABC05CED3C9C8C5A8B2127 Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 02002E030CFF312E312E30 Current nonce: 279A2B8C1C0BD8E53A825A4C811EABC05CED3C9C8C5A8B2127AAAAAAAAAAAAAA HMAC: E52A3C1FFD45BB263AB85CEA2512E6556BE4334D7E8263690D7A4C445F916801 Signature in message: 012A3C1FFD45BB263AB85CEA2512E6556BE4334D Message signed Message to send has been signed send: 2-2-0-0 s=255,c=3,t=12,pt=0,l=5,sg=1,st=fail:1.1.0 Skipping security for command 3 type 16 send: 2-2-0-0 s=1,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Message to send could not be signed! sign fail send: 2-2-0-0 s=1,c=0,t=3,pt=0,l=0,sg=1,st=fail: Skipping security for command 3 type 16 send: 2-2-0-0 s=2,c=3,t=16,pt=0,l=0,sg=0,st=fail: Failed to transmit nonce request! sign fail send: 2-2-0-0 s=2,c=0,t=3,pt=0,l=0,sg=1,st=fail: Skipping security for command 3 type 16 send: 2-2-0-0 s=3,c=3,t=16,pt=0,l=0,sg=0,st=fail: Failed to transmit nonce request! sign fail send: 2-2-0-0 s=3,c=0,t=3,pt=0,l=0,sg=1,st=fail: Skipping security for command 3 type 16 send: 2-2-0-0 s=4,c=3,t=16,pt=0,l=0,sg=0,st=fail: find parent send: 2-2-255-255 s=255,c=3,t=7,pt=0,l=0,sg=1,st=bc: Verification timeout Skipping security for command 3 type 8 read: 7-7-2 s=255,c=3,t=8,pt=1,l=1,sg=1:1 parent=7, d=2 Skipping security for command 3 type 8 read: 3-3-2 s=255,c=3,t=8,pt=1,l=1,sg=1:3 Skipping security for command 3 type 8 read: 6-6-2 s=255,c=3,t=8,pt=1,l=1,sg=1:1 Skipping security for command 3 type 8 read: 5-5-2 s=255,c=3,t=8,pt=1,l=1,sg=1:2 Skipping security for command 3 type 8 read: 4-4-2 s=255,c=3,t=8,pt=1,l=1,sg=1:2 Failed to transmit nonce request! sign fail send: 4-2-2-2 s=255,c=3,t=8,pt=1,l=1,sg=1,st=fail:2 Skipping security for command 3 type 16 send: 2-2-7-0 s=5,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:2AF1CDE94D97445AE90B28F5BB248C2C404887D6448243F396 Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 020006000405 Current nonce: 2AF1CDE94D97445AE90B28F5BB248C2C404887D6448243F396AAAAAAAAAAAAAA HMAC: 1038545352465B7D9F74F932A3F3218D4AFA5702A8CE6FE055D7BE6CE3EBDD82 Signature in message: 0138545352465B7D9F74F932A3F3218D4AFA5702A8CE6FE055 Message signed Message to send has been signed send: 2-2-7-0 s=5,c=0,t=4,pt=0,l=0,sg=1,st=ok: Init complete, id=2, parent=7, distance=2
What do you make of all of this?
-
Ok, here's the gateway's view of the transaction..
[0;255;3;0;9;read: 2-6-0 s=255,c=3,t=11,pt=0,l=11,sg=0:SwitchMote3] [2;255;3;0;11;SwitchMote3] [0;255;3;0;9;read: 2-6-0 s=255,c=3,t=16,pt=0,l=0,sg=0:] [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:E206B5FF9C387BB7E1B3BB1FD3C88C05CE14FB6BE015DF204F] [0;255;3;0;9;read: 2-6-0 s=255,c=3,t=12,pt=0,l=5,sg=0:1.1.0] [2;255;3;0;12;1.1.0] [0;255;3;0;9;read: 2-6-0 s=1,c=3,t=16,pt=0,l=0,sg=0:] [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:A1EC8E72F0B4B834D9DB0A5099942545B9D1A775D35EE05237] [0;255;3;0;9;read: 2-6-0 s=1,c=0,t=3,pt=0,l=0,sg=0:] [2;1;0;0;3;] [0;255;3;0;9;read: 2-6-0 s=2,c=3,t=16,pt=0,l=0,sg=0:] [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:76F66319A6DC1BD70A3D0AF214BC445E0947F38D2A0A087DFE] [0;255;3;0;9;read: 2-6-0 s=2,c=0,t=3,pt=0,l=0,sg=0:] [2;2;0;0;3;] [0;255;3;0;9;read: 2-6-0 s=3,c=3,t=16,pt=0,l=0,sg=0:] [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:E00D2258F1A1838AA0B6DAC5F5AB612916F04D94681281D322] [0;255;3;0;9;read: 2-6-0 s=3,c=0,t=3,pt=0,l=0,sg=0:] [2;3;0;0;3;] [0;255;3;0;9;read: 2-6-0 s=4,c=3,t=16,pt=0,l=0,sg=0:] [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:42EF7EAB7EF35224564ABCBB832EF8452220C72E6FE571BB33] [0;255;3;0;9;read: 2-6-0 s=4,c=0,t=3,pt=0,l=0,sg=0:] [2;4;0;0;3;] [0;255;3;0;9;read: 2-6-0 s=5,c=3,t=16,pt=0,l=0,sg=0:] [0;255;3;0;9;send: 0-0-6-2 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:0FA8B16F58BF135EB9E55E2DA402BAF4BEF26ADDA1812398FB] [0;255;3;0;9;read: 2-6-0 s=5,c=0,t=4,pt=0,l=0,sg=0:] [2;5;0;0;4;] [0;255;3;0;9;sign fail] [0;255;3;0;9;send: 2-0-0-0 s=5,c=0,t=4,pt=0,l=0,sg=0,st=fail:]
I'm using MyController. Here is the node's view of that specific boot.
Starting repeater (RRORAS, 2.0.0-beta) Encryption Enabled Radio init successful. Signing required Skipping security for command 3 type 15 send: 2-2-7-0 s=255,c=3,t=15,pt=0,l=2,sg=0,st=ok:␁␁ Waiting for GW to send signing preferences... Skipping security for command 3 type 15 read: 0-7-2 s=255,c=3,t=15,pt=0,l=2,sg=0:␁␁ Mark node 0 as one that require signed messages Mark node 0 as one that do not require whitelisting Skipping security for command 3 type 16 send: 2-2-7-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:AB31DBA98EE07CB671B2963E007346CE5FDD2DA182E2A9C5F6 Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 0200560012FF322E302E302D62657461 Current nonce: AB31DBA98EE07CB671B2963E007346CE5FDD2DA182E2A9C5F6AAAAAAAAAAAAAA HMAC: 8B06824B6BF99F22F06D1F40563FB0A6ABFDF0F2844C6B202CF1FA2454454746 Signature in message: 0106824B6BF99F22F06D1F40563FB0 Message signed Message to send has been signed send: 2-2-7-0 s=255,c=0,t=18,pt=0,l=10,sg=1,st=ok:2.0.0-beta Skipping security for command 3 type 16 send: 2-2-7-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:E05E83E7C9D9A40DF0709B227D6A40B0542D1CE5B6819BB03C Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 02000E2306FF07 Current nonce: E05E83E7C9D9A40DF0709B227D6A40B0542D1CE5B6819BB03CAAAAAAAAAAAAAA HMAC: A35B9352DAE7B008EB1A7FD07A5F63CC61C424202996C5862294717CD4840399 Signature in message: 015B9352DAE7B008EB1A7FD07A5F63CC61C424202996C586 Message signed Message to send has been signed send: 2-2-7-0 s=255,c=3,t=6,pt=1,l=1,sg=1,st=ok:7 Skipping security for command 3 type 16 read: 0-7-2 s=255,c=3,t=16,pt=0,l=0,sg=0: Signing backend: ATSHA204Soft SHA256: 51D9E435DB0D76325F2B043EC3C65CFDB4941DDA1783F29641AAAAAAAAAAAAAA Transmittng nonce Skipping security for command 3 type 17 send: 2-2-7-0 s=255,c=3,t=17,pt=6,l=25,sg=0,st=ok:51D9E435DB0D76325F2B043EC3C65CFDB4941DDA1783F29641 Signature in message: 01F9305455406EC03185820C564790A2CA Message to process: 0002460B06FF496D70657269616C Current nonce: 51D9E435DB0D76325F2B043EC3C65CFDB4941DDA1783F29641AAAAAAAAAAAAAA HMAC: 20F9305455406EC03185820C564790A2CA36CA74EDE1C0D7714D5D17E64A0A16 Signature OK read: 0-7-2 s=255,c=3,t=6,pt=0,l=8,sg=0:Imperial Skipping security for ACK on command 3 type 6 send: 2-2-7-0 s=255,c=3,t=6,pt=0,l=8,sg=0,st=ok:Imperial Skipping security for command 3 type 16 send: 2-2-7-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:D3152DA42E7827D878A6A3B3069BFE3CF37D015F1CC4B786ED Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 020056C400FFFFFFFFFFFFFFFFFF0300 Current nonce: D3152DA42E7827D878A6A3B3069BFE3CF37D015F1CC4B786EDAAAAAAAAAAAAAA HMAC: E789304B6589D6457D5465A5C3059CE062434B1E0834523764E55032503E0BB9 Signature in message: 0189304B6589D6457D5465A5C3059C Message signed Message to send has been signed send: 2-2-7-0 s=255,c=4,t=0,pt=6,l=10,sg=1,st=ok:FFFFFFFFFFFFFFFF0300 Skipping security for command 3 type 16 send: 2-2-7-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:8B3B8DB4F3DAC3070F287736FDD9D47407E6760C5925984F3F Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 02005E030BFF5377697463684D6F746533 Current nonce: 8B3B8DB4F3DAC3070F287736FDD9D47407E6760C5925984F3FAAAAAAAAAAAAAA HMAC: 07EC6A51765F376E8D16CD2C4E091FBBDE5695FED00CABF27E13EF4543429520 Signature in message: 01EC6A51765F376E8D16CD2C4E09 Message signed Message to send has been signed send: 2-2-7-0 s=255,c=3,t=11,pt=0,l=11,sg=1,st=ok:SwitchMote3 Skipping security for command 3 type 16 send: 2-2-7-0 s=255,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:E206B5FF9C387BB7E1B3BB1FD3C88C05CE14FB6BE015DF204F Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 02002E030CFF312E312E30 Current nonce: E206B5FF9C387BB7E1B3BB1FD3C88C05CE14FB6BE015DF204FAAAAAAAAAAAAAA HMAC: 656EE1D40B058730D45C690128D07914514295556F3EC0CDE8D5758A9974C95A Signature in message: 016EE1D40B058730D45C690128D0791451429555 Message signed Message to send has been signed send: 2-2-7-0 s=255,c=3,t=12,pt=0,l=5,sg=1,st=ok:1.1.0 Skipping security for command 3 type 16 send: 2-2-7-0 s=1,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:A1EC8E72F0B4B834D9DB0A5099942545B9D1A775D35EE05237 Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 020006000301 Current nonce: A1EC8E72F0B4B834D9DB0A5099942545B9D1A775D35EE05237AAAAAAAAAAAAAA HMAC: 582AD81BE5DCDDACCC095AC487A4006BE34C7F9CA04239DCFE2F778BCCCDA407 Signature in message: 012AD81BE5DCDDACCC095AC487A4006BE34C7F9CA04239DCFE Message signed Message to send has been signed send: 2-2-7-0 s=1,c=0,t=3,pt=0,l=0,sg=1,st=ok: Skipping security for command 3 type 16 send: 2-2-7-0 s=2,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:76F66319A6DC1BD70A3D0AF214BC445E0947F38D2A0A087DFE Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 020006000302 Current nonce: 76F66319A6DC1BD70A3D0AF214BC445E0947F38D2A0A087DFEAAAAAAAAAAAAAA HMAC: C03E0B0D759CE6CA1D9D2AB097129FFD1CCA295F80E7825BE495F6EE6AC54233 Signature in message: 013E0B0D759CE6CA1D9D2AB097129FFD1CCA295F80E7825BE4 Message signed Message to send has been signed send: 2-2-7-0 s=2,c=0,t=3,pt=0,l=0,sg=1,st=ok: Skipping security for command 3 type 16 send: 2-2-7-0 s=3,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:E00D2258F1A1838AA0B6DAC5F5AB612916F04D94681281D322 Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 020006000303 Current nonce: E00D2258F1A1838AA0B6DAC5F5AB612916F04D94681281D322AAAAAAAAAAAAAA HMAC: 382B1AF5470719EE1C00D1B5E70137F69A5F098A75C21685442FBCE764C360B5 Signature in message: 012B1AF5470719EE1C00D1B5E70137F69A5F098A75C2168544 Message signed Message to send has been signed send: 2-2-7-0 s=3,c=0,t=3,pt=0,l=0,sg=1,st=ok: Skipping security for command 3 type 16 send: 2-2-7-0 s=4,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:42EF7EAB7EF35224564ABCBB832EF8452220C72E6FE571BB33 Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 020006000304 Current nonce: 42EF7EAB7EF35224564ABCBB832EF8452220C72E6FE571BB33AAAAAAAAAAAAAA HMAC: EA04A9123E20CAF354189809E30AA51A3F57B61A878489427C6F20B33A3D5A2E Signature in message: 0104A9123E20CAF354189809E30AA51A3F57B61A878489427C Message signed Message to send has been signed send: 2-2-7-0 s=4,c=0,t=3,pt=0,l=0,sg=1,st=ok: Skipping security for command 3 type 16 send: 2-2-7-0 s=5,c=3,t=16,pt=0,l=0,sg=0,st=ok: Nonce requested from 0. Waiting... Skipping security for command 3 type 17 read: 0-7-2 s=255,c=3,t=17,pt=6,l=25,sg=0:0FA8B16F58BF135EB9E55E2DA402BAF4BEF26ADDA1812398FB Nonce received from 0. Proceeding with signing... Signing backend: ATSHA204Soft Message to process: 020006000405 Current nonce: 0FA8B16F58BF135EB9E55E2DA402BAF4BEF26ADDA1812398FBAAAAAAAAAAAAAA HMAC: 203850D0A1BF3EEBB94A59ADCC377CC2B0EEA1DB4546ADF05CD47EB65CA1FD58 Signature in message: 013850D0A1BF3EEBB94A59ADCC377CC2B0EEA1DB4546ADF05C Message signed Message to send has been signed send: 2-2-7-0 s=5,c=0,t=4,pt=0,l=0,sg=1,st=ok: Init complete, id=2, parent=7, distance=2
So what we see is that my network has "re-converged" from everyone talking directly to the gateway and now I'm bouncing from node 2 via 7 to 6 than to the gateway.. what gives? these are all one hop from the gateway and should directly talk with it...
Now, this is the first time I've used the Signing Feature. But this routing thing has happened twice before without that feature enabled. It's a real pain in the butt as now the routing is stuck in the NVRAM and even rebooting the nodes won't fix it. The messages are very intermittent and that makes the controls not reliable. I have to pull the nodes out of the wall, run the clear config script and then reload the switch script to heal the network. I'm at a loss... thanks for any help.
Oh, and my code for the Mote is here: https://github.com/TheCranston/MY-Sensors.git
-
I'm afraid we might be fighting two different problems. But anyway, we've made some breakthrough on my problems :-). It appears to be related to the radio having trouble to catch the entire first packet after waking from standby. It seems to work better when waking from sleep. If you check my last post in this thread https://lowpowerlab.com/forum/index.php/topic,1821.msg13160.html#msg13160 you can see what I have done to change the idle behaviour for the radio. The node is now able to process acknowledgements to all the messages it sends. There are still some issues, but for me this is a great improvement.
Following from this I have patched my copy of the MySensors development branch with the latest RFM69 library with my small patch. I also did a small change to the RFMTransport to change when/how it sends acknowledgements to messages in transportReceive. I have run a gateway and energy meter sensor since last night (around 14 hours) and the communication has worked flawlessly for that period. This is the first in a very long time
-
Great work investigating the issue @kolaf!
-
@kolaf There is a distinct possibility that it's two different issues. I'll give your patch a try on a few of my nodes? At one time I had tried to take the recent RFM69 library from Felix's github and use it with MySensors. I really liked the ATC idea that is in the current codebase. I wish I had the stuff to dig down into the code like you did. I'm recovering from a major illness (still on disability) and the meds make it very hard to brain for longer than a few hours a day.
@hek should I be sending periodic heartbeats from my mains powered devices? Would that help the network to maintain convergence?
-
@BenCranston said:
should I be sending periodic heartbeats from my mains powered devices? Would that help the network to maintain convergence?
Not sure.. It would probably recover faster as the find-new-parent-thing only happens at transmission time (if it has lost its parent). So potentially it could have solved any routing problems at the a new message should be sent.
-
Greetings! I've been trying a few things and am reporting in....
I added a 5 minute heartbeat to each of my nodes. I can see them checking in now. However the network still melts down within 24 hours. I replaced the gateway Moteino and have had the same result. The patch that @kolaf suggested basically quadrupled the functional time of the network, which is really cool. Looking at the routing each node is offering up "stale" routes to the gateway thereby creating a loop. Graphically something like this:
What I've been able to determine is that the trigger, at least a several times, is related to the gateway basically going to sleep. A power cycle and we are back in business. The cascade of the routing loop is something like this:
Now, that's two issues..
Looking at just the routing stability. does it make sense to do something like a probe to determine a route is valid before installing in the table? I've yet to review the code base, but a Time To Live in a message would also stop the loop after effectively aging out. I'm sure there is a lively discussion archived somewhere on how the routing works...
The other issue is that my gateway RFM69HW radios "appear" to be going to sleep and then i have to power cycle the Moteino to get it back on the network.. I'm wondering if there is something that is putting the radio in some sort of sleep or low power mode that it's getting stuck there...
sorry for the rambling.
-
@BenCranston I'm glad the fix the proposed helped out, but too bad it was not good enough. It might be worth catching up on the latest few developments in the thread. Basically it turns out that changing all references to standby to sleep in the setMode function is a bit overkill. Maybe this is also causing some of your trouble. The current version of the fix consists of putting the radio to sleep in receiveBegin, like this:
void RFM69::receiveBegin() { DATALEN = 0; SENDERID = 0; TARGETID = 0; PAYLOADLEN = 0; ACK_REQUESTED = 0; ACK_RECEIVED = 0; RSSI = 0; setMode(RF69_MODE_SLEEP); if (readReg(REG_IRQFLAGS2) & RF_IRQFLAGS2_PAYLOADREADY) writeReg(REG_PACKETCONFIG2, (readReg(REG_PACKETCONFIG2) & 0xFB) | RF_PACKET2_RXRESTART); // avoid RX deadlocks writeReg(REG_DIOMAPPING1, RF_DIOMAPPING1_DIO0_01); // set DIO0 to "PAYLOADREADY" in receive mode setMode(RF69_MODE_RX); }
In my case it also turned out that the RF environment around 868 MHz was a bit noisy. This messed with the CSMA function which always caused the node to wait a second before transmitting the message since the channel was never quiet enough. This limit is controlled by CSMA_LIMIT which I set to -40 instead of -90. Actually, what I ended up doing was to switch the frequency down 1 MHz, to 867, which was a much quieter band. The trouble with the high noise floor was that the gateway had trouble hearing the nodes that were far enough away to have a received RSSI less than -60 when the noise floor was -55. It could be worth continuously printing the RSSI of the channel at the gateway without anyone transmitting to see what your background noise is.
-
@kolaf excellent! I'll give that a try. thanks for pointing it out. How are you determining the noise floor on the various frequencies? I'm running my network at 915Mhz for what its worth.
What are your thoughts on the routing looping I've been seeing. I've been able to sort of clean it up for a little while if I can re-establish connectivity right after a node reset and then sending an I_CHILDREN with a payload of "C" to clear the route table. Or at least that's what I think I asked them to do based on reading the API.
Currently I gave up on repeaters in the network a few days ago and moved them all to simple nodes. Still having issues with stability..
-
@BenCranston For testing the noise I simply print the result from readRSSI() inside the radio library inside the canSend function to the serial connection. The reason for doing it like this is that the radio is very picky about which mode it has to be in for reporting the rssi value. I used the node example from the RFM69 as the basis for this test. At the beginning of RFM69.cpp there are three lines that initialise the radio with the correct frequency. This can be changed to shift the frequency up or down a few megahertz.
I'v never had a chance/need to look into the routing functionality (although I actually have a PhD in network routing), so I cannot comment much on this. From your description the basic problem is that the gateway for some reason fails to respond, or that the response from the gateway is not captured by the node. The resulting routing flood seems like the natural consequence of this. This is why i pointed to the latest developments in my testing since you're better off solving the thing that triggers the rerouting rather than fixing any rerouting problems yourself
-
A simple thing you can do in RFM69Transport is to increase the retry count for the messages that are sent. The default value is 2 (implicit), to increase this by changing the following:
return _radio.sendWithRetry(to,data,len);
to
return _radio.sendWithRetry(to,data,len, 5);
To have it retry five times.
My guess is that this will greatly increase the operation time of your network.