Option non blocking registration at gateway
-
Is it possible to make the registration at the gateway non blocking, so the node can start processing other things and send messages when the gateway is reachable.
Use case:
A node that switches lights and is mounted behind a switch in the wall. For emergency situations it should always be possible to switch on the lights even when the gateway fails.
When the switch is pressed, the node notifies the gateway the lights are switched on or off. The lights can also be switched on or off via the gateway.
-
@eric_smid said:
non blocking
This would require buffering of transmission messages, which currently isn't implemented in the library.
For your particular usecase I would handle the button presses in an interrupt handler, which triggers sending a message to the GW.
If updating the GW blocks and new button presses come in, the interrupt handler can always act on them.
-
Hi Yveaux. To my opinion it is not necesarry for the library to do the buffering.
I'm looking for an option where the node enters void loop() even if there is no gateway when the node boots, so it can do things locally.
Because there is no gateway, it is not possible to send or receive messages. It is no problem if they get lost.
For the node a user can create a buffer for messages that should be reported when the nodes detects a gateway is available.
The function isTransportOK() can for instance be use to discover if a gateway is available.
So I think it is only necesarry to implement an option to enter void(loop) when no gateway is detected and periodically try to register to the gateway.
-
I think what @eric_smid suggests can be an important feature. The manual control must work, even if the gateway is temporarily out of order. Otherwise the rest of the family will complain a lot.
-
@mfalkvidd @Yveaux There was a similar discussion a couple of weeks ago. The gateway should always be reachable. Unless there's a power cut. But to be honest I don't know what happens then. Might take a couple of hours before my network is completely healed. Because the nodes might come up earlier than the gateway once the power is back on. Also the gateway might not be able to handle all the node registrations. But in those cases nodes need to register themselves with a random delay. Just to give the gateway the time to handle the load. Because I'm not convinced it can.
My personal statement is: Devices should work independently from your home automation if they are a vital part of your life. Like light and so. Home Automation and IoT should enhance your life, and not take over your life.
And to be honest. I don't see the need of buffering messages. Because the messages you'll be buffering might already be outdated when the gateway and your controller come up again. Meaning you could have switched the lights on and then off. So what would be the value of buffering the outdated data? We should just simply accept the fact that we've lost some sensor values. The data can be interpolated in the long run anyway. Which Domoticz already does, when it shows you the graphs.
I can't remember the last power cut we had over here. So to me losing some sensor values in a situation like that is acceptable. Certainly if all of my MySensors devices will keep working in a situation like this. Then loosing data isn't that bad.
This is also the reason why you should send the current state of sensors and actors on a regular basis to you controller. That way the controller and the nodes will be in synch again at some time, after a power cut.
This feature would indeed help us in creating a solution which enhances our lives instead of the opposite. But that's just my 50 cents. Curious how Z-waves handles this?
-
I have Z-wave modules (Fibaro) installed. The switches can be used without gateway.
When the gateway boots (I use Razzberry) it start detecting the nodes. I assume it it is up to the nodes to send the current state.
I agree that buffering messages is useless in most cases. A periodic update (initiated by the user program) is in my opinion the correct solution.
-
I would love to see an start on this - maybe a relay example that works without the gateway. As @TheoL said, lights - and i think thats one end to start. The best would offcourse be to make some generic thing that works for all nodes.
-
I have the same issue with a gas sensor. I would really like the buzzer to sound even if the gateway is unreachable. Letting the home automation know there is a gas leak in order to take other measures wold be optimal but sounding the alarm is mandatory.
-
It's only during startup the node does a handshake with the parent node (which can be disabled using MY_TRANSPORT_DONT_CARE_MODE in dev branch). During normal operation the library never block even if communication link is down.
-
Important: enabling MY_TRANSPORT_DONT_CARE_MODE requires setting MY_PARENT_NODE_ID
-
Great this option is available!
Is it possible to let the node use the last used parent, instead of configuring the parent?
-
@eric_smid No, not in the current implementation
-
@hek Just curious. How important is the handshake? Because if it's important I think it would be a good option, to let node's do a delayed hand shake when the initial one fails. So that they can continue, but just won't use the my sensors part until a delayed handshake is done.
A delayed hand shake would only be needed when there's a power cut.
-
Coming from this thread, where I learned that an implementation of this feature is in progress.
In my opinion it is an essential architectural feature. Reasons for this you can see above.
Here I summarize what I think is necessary:
General requirements
- Devices should work independently from the gateway
- Enter loop() even if gateway is not reachable
- Buffering of messages not necessary. Its up to the user to implement this.
Needed functionality
- retrieve available gateways (gets a list of available gateways)
- request to register ( gets success or failure with reason)
- de-register ( gets success or failure with reason)
-
I've been thinking about his as well. Probably for different reasons though.
I find myself with time to spare and my laptop at hand quite often, but I'm not home at those moments. I'd love to be able to write code and test sensors at those times, but that's impossible because there's no gateway.
For that I'd like to propose a "Test Mode".
- test port to Transport.
- NO connection to gateway.
- simulate the Transport calls in the sketch with just a line in the debug with ACK.
- when possible: send "messages" to node with the Arduino serial monitor.
I am aware that this might pose some biiiiiiig changes in the lib. So this might be something for the 2.5 or 3.0 release.
Thoughts?
-
I am now ready with my node and I need the non-blocking feature urgently. Is there an estimation when this feature will be come or can anyone provide a workaround?
-
If you use the development branch and want to prohibit the registration wait. Add
#define MY_TRANSPORT_WAIT_READY_MS 1
@tekka, correct me if I'm wrong please
-
@hek correct
@Heizelmann Would be good to get some feedback on this feature, please let us know how it works in your hands.
-
Here my report after a quick test.
To say it short: Generally I am happy with it, it works very well. Only one small problem.
In detail:
When connection is initially off, the sketch entered after the defined wait the loop.before() 5 TSM:INIT 7 TSF:WUR:MS=1000 13 !TSM:INIT:TSP FAIL 15 TSM:FAIL:CNT=1 17 TSM:FAIL:PDT 1008 MCO:BGN:STP Setup() 1510 MCO:BGN:INIT OK,TSP=0
If the sketch tries to send within the loop a message is logged.
145312 !MCO:SND:NODE NOT REG
If connection comes back the sketch connects to the gateway on the fly within the defined wait time.
244251 TSM:FPAR:OK 244252 TSM:ID 244254 TSM:ID:OK 244255 TSM:UPL 244259 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=24,pt=1,l=1,sg=0,ft=0,st=OK:1 244265 TSF:MSG:READ,0-0-42,s=255,c=3,t=25,pt=1,l=1,sg=0:1 244271 TSF:MSG:PONG RECV,HP=1 244274 TSM:UPL:OK 244276 TSM:READY:ID=42,PAR=0,DIS=1 244281 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=15,pt=6,l=2,sg=0,ft=0,st=OK:0100 244288 TSF:MSG:READ,0-0-42,s=255,c=3,t=15,pt=6,l=2,sg=0:0100 244295 TSF:MSG:SEND,42-42-0-0,s=255,c=0,t=17,pt=0,l=10,sg=0,ft=0,st=OK:2.1.0-beta 244304 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=6,pt=1,l=1,sg=0,ft=0,st=OK:0 244321 TSF:MSG:READ,0-0-42,s=255,c=3,t=6,pt=0,l=1,sg=0:M Presentation() 244329 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=11,pt=0,l=22,sg=0,ft=0,st=OK:My Node 244339 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=12,pt=0,l=3,sg=0,ft=0,st=OK:1.2 244347 TSF:MSG:SEND,42-42-0-0,s=1,c=0,t=1,pt=0,l=0,sg=0,ft=0,st=OK: 244356 TSF:MSG:SEND,42-42-0-0,s=2,c=0,t=1,pt=0,l=0,sg=0,ft=0,st=OK: 244364 TSF:MSG:SEND,42-42-0-0,s=3,c=0,t=3,pt=0,l=0,sg=0,ft=0,st=OK: 244370 MCO:REG:REQ 244373 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=26,pt=1,l=1,sg=0,ft=0,st=OK:2 244380 TSF:MSG:READ,0-0-42,s=255,c=3,t=27,pt=1,l=1,sg=0:1 244385 MCO:PIM:NODE REG=1
Only problem is that you get high logging traffic in the case when you are in loop and network connection gets lost again as long as the connection is lost.
605458 TSF:MSG:READ,1-1-1,s=1,c=1,t=1,pt=0,l=0,sg=0: 605463 !TSF:MSG:LEN,0!=7 605465 TSF:MSG:READ,1-1-1,s=1,c=1,t=1,pt=0,l=0,sg=0: 605470 !TSF:MSG:LEN,0!=7 605472 TSF:MSG:READ,1-1-1,s=1,c=1,t=1,pt=0,l=0,sg=0: 605477 !TSF:MSG:LEN,0!=7 ...
-
@Heizelmann Thanks for your testing. I couldn't reproduce this behavior, can you post your sketch for further analysis?
-
@tekka May be it is the kind of my testing. For getting quick results I simply plug and un-plug the radio module (NRF24L01+) from the socket in my sensor node. I know, this is not a real use case but it was easy and I didn't thought there is adifference in behaviour. How do you you switch off and on the network connection in your test?
-
@Heizelmann yes, unplugging the radio results in what you observed. Simply unplug the GW (and repeaters)...
-
@tekka OK unplugging the GW leads to a different behaviour.
First the node didn't detect the failure as long it is not trying to send.
When it tries to send it fails131268 !TSF:MSG:SEND,42-42-0-0,s=1,c=1,t=16,pt=0,l=1,sg=0,ft=0,st=NACK:1
and after some more failure this
238494 !TSM:READY:UPL FAIL,SNP 238497 TSM:FPAR 238534 TSF:MSG:SEND,42-42-255-255,s=255,c=3,t=7,pt=0,l=0,sg=0,ft=6,st=OK: 238995 !TSF:SND:TNR 240002 !TSF:SND:TNR 240542 !TSM:FPAR:NO REPLY 240544 TSM:FPAR 240581 TSF:MSG:SEND,42-42-255-255,s=255,c=3,t=7,pt=0,l=0,sg=0,ft=0,st=OK: 246683 !TSM:FPAR:FAIL 246685 TSM:FAIL:CNT=1 246687 TSM:FAIL:PDT
and after a while mostly only
609748 !TSF:SND:TNR
When gateway comes back a proper reinitialization takes place:
620572 TSM:FAIL:RE-INIT 620574 TSM:INIT 620581 TSM:INIT:TSP OK 620583 TSM:INIT:STATID=42 620587 TSF:SID:OK,ID=42 620589 TSM:FPAR 620625 TSF:MSG:SEND,42-42-255-255,s=255,c=3,t=7,pt=0,l=0,sg=0,ft=0,st=OK: Current: 0.00 0:0:0 Current: 0.00 0:0:0 621443 TSF:MSG:READ,0-0-42,s=255,c=3,t=8,pt=1,l=1,sg=0:0 621448 TSF:MSG:FPAR OK,ID=0,D=1 Current: 0.00 0:0:0 Current: 0.00 0:0:0 622633 TSM:FPAR:OK 622635 TSM:ID 622637 TSM:ID:OK 622638 TSM:UPL 622642 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=24,pt=1,l=1,sg=0,ft=0,st=OK:1 622652 TSF:MSG:READ,0-0-42,s=255,c=3,t=25,pt=1,l=1,sg=0:1 622657 TSF:MSG:PONG RECV,HP=1 622660 TSM:UPL:OK 622662 TSM:READY:ID=42,PAR=0,DIS=1
From now the sensor can send again without errors.
So far my tests.
Except the case where the radio of the sensor went off and causes a lot of logging all test cases where satisfactorily.
I think this is really an edge case and not so serious.
From this point of view I would welcome that this beta development version will be published.
-
I tested my node with the "#define MY_TRANSPORT_WAIT_READY_MS 1" feature.
It seems to work fine!
Tnx for the work.
-
I detected another problem. From debug log it looks like the loop() function starts before initialization is finished. This can cause problems when trying to send data in loop() very early.
Here a log example:
0 MCO:BGN:INIT NODE,CP=RNNNA--,VER=2.1.0-beta 4 MCO:BGN:BFR 5 before() begin 5 before() end 5 TSM:INIT 9 TSF:WUR:MS=1000 16 TSM:INIT:TSP OK 17 TSM:INIT:STATID=42 20 TSF:SID:OK,ID=42 21 TSM:FPAR 58 TSF:MSG:SEND,42-42-255-255,s=255,c=3,t=7,pt=0,l=0,sg=0,ft=0,st=OK: 651 TSF:MSG:READ,0-0-42,s=255,c=3,t=8,pt=1,l=1,sg=0:0 656 TSF:MSG:FPAR OK,ID=0,D=1 1010 MCO:BGN:STP 1011 Setup() begin 1512 Setup() end 1512 MCO:BGN:INIT OK,TSP=0 1516 loop() begin 2017 loop() end 2017 loop() begin 2065 TSM:FPAR:OK 2066 TSM:ID 2067 TSM:ID:OK 2069 TSM:UPL 2072 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=24,pt=1,l=1,sg=0,ft=0,st=OK:1 2078 TSF:MSG:READ,0-0-42,s=255,c=3,t=25,pt=1,l=1,sg=0:1 2083 TSF:MSG:PONG RECV,HP=1 2086 TSM:UPL:OK 2087 TSM:READY:ID=42,PAR=0,DIS=1 2093 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=15,pt=6,l=2,sg=0,ft=0,st=OK:0100 2100 TSF:MSG:READ,0-0-42,s=255,c=3,t=15,pt=6,l=2,sg=0:0100 2107 TSF:MSG:SEND,42-42-0-0,s=255,c=0,t=17,pt=0,l=10,sg=0,ft=0,st=OK:2.1.0-beta 2116 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=6,pt=1,l=1,sg=0,ft=0,st=OK:0 2126 TSF:MSG:READ,0-0-42,s=255,c=3,t=6,pt=0,l=1,sg=0:M 2131 Presentation() begin 2135 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=11,pt=0,l=22,sg=0,ft=0,st=OK:Sink Pump Control Node 2145 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=12,pt=0,l=3,sg=0,ft=0,st=OK:1.2 2153 TSF:MSG:SEND,42-42-0-0,s=1,c=0,t=1,pt=0,l=0,sg=0,ft=0,st=OK: 2161 TSF:MSG:SEND,42-42-0-0,s=2,c=0,t=1,pt=0,l=0,sg=0,ft=0,st=OK: 2168 TSF:MSG:SEND,42-42-0-0,s=3,c=0,t=3,pt=0,l=0,sg=0,ft=0,st=OK: 2174 Presentation() end 2176 MCO:REG:REQ 2181 TSF:MSG:SEND,42-42-0-0,s=255,c=3,t=26,pt=1,l=1,sg=0,ft=0,st=OK:2 2187 TSF:MSG:READ,0-0-42,s=255,c=3,t=27,pt=1,l=1,sg=0:1 2192 MCO:PIM:NODE REG=1 2518 loop() end 2518 loop() begin ...
-
@Heizelmann I'm not sure I understood your concern.
Initialization is finished here, right before the main loop is entered:
1512 MCO:BGN:INIT OK,TSP=0
Since your timeout is set to 1sec (which obviously is too short to fully establish the transport link) the node will enter the main loop() after the timeout. However, since the link is not established, only init messages are sent at that point (i.e. library init, signing preferences, presentation). Sensor data will not be sent until the node is registered.