GaryStofer

GaryStofer

I made a little PCB that connects an Arduino ProMini i.e. see here with a nRF24L module and contains two LEDs one Switch and a voltage divider to measure the battery voltage. The board can be ordered from here OSH PCBs for about $3 for 3 PCBs, in the US at least.

Send me a message if you want the Eagle PCB files.

See schematic below -- follow notes on schematic -- Do not power with 5V serial adapter, radio will not like 5V.

There is a slight silkscreen mistake on the pcb near SV2: Mosi should read Miso.

Pro mini used :

GaryStofer

I made a MySensor gateway node that works as a bridge to an existing X10 system. It allows to control the X10 light and appliance modules from the home automation software. The setup consists of a X10 remote which has been modified so that the digital signal from the gateway keys the transmitter. The X10 controller then receives the data package and does it's thing on the AC line to control the switches. Any existing X10 remote or alarm sensor can be used. I had an old key-fob with broken buttons that became the donor transmitter. When hacking the remote one looks for the signal from it's IC that normally keys the transmitter and disconnects that to feed the signal from the MySensor node instead. A scope is handy to find which pin is the right one.

The GIT with the Arduino script for the Gateway/X10 controller can be found at [https://github.com/garyStofer/MyMySensors2.1] . The X10 house code is hard-coded to 'A" and I limited the number of switches arbitrarily to 12, but the avid Arduino enthusiast will quickly see how to modify the code to suit. It is not necessary that the X10 part be implemented on the MySensor gateway, it could just as well be done on a separate node.

If there is any interest I will add some pictures and schematics.

GaryStofer

@alex28 Hi Alex, sorry for your frustration.

I think documentation, either incomplete or outdated, is the crux of most open source projects.

I can not speak to the Hardware you used to try to get a network up as I have stuck to the more simple approach of using the simple Arduino ( ATmel328P) nodes for both sensors, repeaters and gateways, while using a RPI as the network controller with something like Domoticz running on it. I started years ago , even before the RFM069 was an option and made PCBs that incorporate the NRF24 and the ATMEL 328P along with the necessary voltage regulators etc. Maybe it was easier to get started then because there where less options and less misleading documentation was available, but I don't recall running into any problems worthwhile of mentioning or having the level of frustrations you have encountered.

I have two sites running with 8 and 12 nodes each. All sensors running on batteries. The range of the NRF24 is limited in that I only get through one or two sheet rock walls inside the house, but using the NRF24 module with the built in PA/LNA on the gateway and one repeater opened up the range considerably. I'm fairly positive that all the NRF modules I have a re clones....

I use the serial gateway on the Atmel328p and connect that directly to the serial port of the Rpi Zero-W, without USB adapters then run Domoticz on the PI to get onto the internet .

If you look on OSH - PCB you will find many good PCBs that you can make MYsensors nodes with using the more simple Arduino platform.

Most of my frustrations stemmed from the Linux configuration for the RPI so that it doesn't clobber the SD card on surprise power failure.

Cheers -- Gary

GaryStofer

@Yveaux So I found the problem now.

The disconnect between the sniffer and the network running V2.1 stemmed from the change in the way the NETWORK_BASE_ID is defined.

In the sniffer and V1.51 of the lib it is defined as a single uint64_t but in the V2.1 lib it is defined as a list of 5 individual bytes which are then later used to initialize a 5 byte char array via a macro.

Since earlier I had chosen a different NETWORK_BASE_ID from the default this uint64_t defined ID was now used to initialize the 5 byte array, except of course that only the first location got assigned anything. Luckily all of this happened without a compiler warning or error...

After I changed the NETWORK_BASE_ID in my nodes & gateway to use the 5 individual bytes format in the correct order everybody is all again. Even my V1.51 sensors that have not been recompiled yet seem to be just fine with the network.

Thanks

GaryStofer

First let me say that the test "network" consists of a single node and a Serial GW with a Domoticz controller only. No repeater nodes are present and it operates at 250Kbd on a frequency outside of the WiFi spectrum verified to have no traffic. The test subjects are within 10 feet of each other and the protocol sniffer receiver is placed roughly midway between the two communicating nodes. The test node has a plugable connector for the RF modules so that I can switch them out separately. The GW has the LNA/PA module and the node is using the standard mudules. Both GW and node are set to "High" power.

Looking at the problem with the promiscuous protocol sniffer and wireshark I can clearly see that packets collide mid air when both the node and the GW transmitting at the same time. This happens when the controller/GW talks to the node while having the protocol ACK request enabled and the node does not have the extra wait/delay programmed.

Going through 12 RF modules node side, I can see that about 1/3 shows the problem consistently to the extent that all 16 retransmissions are used up , 1/3 to various degrees and the rest shows no problems at all. Before you say "ohh fake chips etc." these modules all came from the same batch, so more than likely not fake and non fake, but rather some parameter that the lib assumes to tight of a margin for.

When turning off the protocol level ACK via the controller/GW, the package goes out without an request for the node to send back the package, only the HW ack remains in the picture. Setup like this I can see that the third of the modules that cause the persistent collisions evoke a re-send of the package by the GW after ~2 to 2.5ms , while the group that shows no problem shows one single transmission from the GW only. Since the sniffer is not fast enough to capture the quick turn around ( spec says 130us) for the short HW ack to go out I can not get say whether the node is simply not sending it or just not sending it in quick enough. All I can say is that the GW code did not see a HW ack and therefore sends out the package a second time.

Meanwhile on the node side all is well, it just might get the same package twice and since no status changed there is no harm done. However when you now turn on the protocol level ACK request in the controller ( as you should) the node has to turn around and send out the echo package which then collides with the resend of the original package from the GW. The node also doesn't get an HW-ACK becasue of this and tries to resend the echo package over again until it runs out of tries.

When I then introduce the wait/delay in the code above to wait out the 2.5Ms for the GW to resend the initial package this ACK/resend storm goes away because the air is clear when the node sends it's echo package. If I wait/delay for 2 ms the problem is greatly reduced and if I wait 3 ms I see no more evidence of collisions. Either wait or delay works however, I choose delay() to be positively sure that no other communication is started during that time by the node.

I conclude that as a work around this solves the issue we are having with the RF24 transport, but without further investing into faster RF sniffing equipment I can't say for sure if this could be solved better on the TX side by waiting longer for the HW ack to arrive from the receiving node, however I think there would be a good chance for that.

There is also a chance that the PA/LNA module used on the GW could be part of the problem by not being able to switch quickly enough back to RX mode and therefore loose the HW-ACK for nodes that respond quicker than others.

Edited: Checked with a non PA/LNA module on the GW and no change could be observed, so this theory can be discounted.

BTW: All of this is very reminiscent of an email communication in 2015 I had with Ekblad !!

GS

GaryStofer

@skywatch yes, wait() works too. My thinking for using delay() was that I did not want any possibility of any incoming message being processes during that time, so therefore delay instead of wait. But I did not do any further digging in the code to assess such risk or do any stress testing with the node being a repeater also , etc. I figure the implementers of the lib would know what's best when implementing a fix. It's also possible that the root cause lies in the Serial GW code if other gateways do not have this problem. It could be that it just takes the GW code a little too long to switch into RX mode to catch the ACK. I figured @hek or @Yveaux would get to the bottom of it and know what to do. Do they still read the forums or do I need to do a pull request on the GIT ?

GS

GaryStofer

When communicating from the gateway to a node there seems to be none, or an insufficient delay for the TX/RX turnaround time. This causes the GW not to fully receive the incoming ACK message and therefore fail the transmission. With a protocol sniffer it can be observed that the receiving node sent out the ACK message immediately , but otherwise properly.

The mere inclusion of MY_DEBUG serial print messages on the receiving node, "solves" the issue by
providing a slight delay between reception and subsequent transmission of the ACK/ECHO package allowing the transmitting node to switch from transmission to reception properly.

A delay of 3-4 ms in file MyTransport.cpp / line 703 in the code block of

if ( msg.getRequestEcho()) {
     TRANSPORT_DEBUG(PSTR("TSF:MSG:ECGO REQ\n")
#ifdef MY_RADIO_RF24
     delay(4);
#endif
....
...
..
}

Solves the issue of the lost ack messages.

This has been tested on multiple RF24 modules -- With and without LNA/PA both on the gateway and node sides.

Suggest to add above code to the code base so that other people don't have to go through days of debugging to find this problem again.

See post further down for details and explanation .

G.S.

GaryStofer

@mfalkvidd No, the issue with sleep(0) still exists in V2.3.2. That is the version that the ArDuino Library installer installs currently . -- Work around is to call hwPowerDown(WDTO_SLEEP_FOEREVER) as in above code if you want the battery to last more than a day.

GS

GaryStofer

Solved by adding a short delay between the RX/TX turnaround when replying with an ACK message.

In file MyTransport.cpp at line 706 in the "if (msg.getRequestEcho) block, right after the debug message "TSF:MSG:"ECHO REQ" added a 2ms delay to allow the caller to switch from TX to RX mode.

GS

GaryStofer

@ferro , @monte, @hek , @Yveaux
I guess I have not mentioned this before :
When turning on MY_DEBUG or MY_DEBUG_VERBOSE_RF the ACK problem goes away . The Debug trace looks as expected and the GW is happy with what it gets back from the node in terms of ACK. No error message .... turn Debug off and the problem reappears.

This is why I think that there is a timing issue, with the RX/TX change over time. By adding the Serial prints it slows the RX/TX change over down enough for the GW to be able to catch the ACK package and be happy.

If Hek or Yveaux is not going to put his hat in the ring here I guess I have to either run with debug permanently on -- Heck a messy -- or dive down and debug the library myself.

GS

GaryStofer

@ferro The systems use the Serial Mysensor GW -- some connect via USB some connect directly to the serial port on a Pi.

I cant change the Controller to anything else , not that I think is controller related -- The customer is happy with Domo and has many man hours invested in scripts and interfacing.

ACK, when transmitting from node to GW/controller, seems to work fine, both HW-ACK and "protocol" ack. ACK from GW/Controller to node seems to be flat out broken, but I can't tell yet if the node is not sending out, or the GW is not receiving it properly.

Also noticed that the GW code is not using the IRQ line from the radio, I see the SPI clock running on the GW constantly.

Thanks

Also

GaryStofer

@monte
Yes --- I know why it "Works" without ACK enabled -- That's not the point. But of course one needs to have the ACK working on the GW otherwise the controller and Node can get out of sync status wise.

Yes -- all of the above you mentioned was tried -- It's not a HW problem, nor a RF interference , nor a range problem -- Spectrum analyzer shows no other traffic on the chosen Freq.

Like I said -- All this worked under 1.5 reliably -- with the same HW -- Flawlessly, even on a somewhat noisy RF channel.

GaryStofer

@ejlane No -- Not every time -- Just when I need it ---

GaryStofer

@GaryStofer

Best posts made by GaryStofer

Latest posts made by GaryStofer