Over the air updates

axillent

@hek that is true.
but do we plan to relay ota updates?
sure we can do, but is it a resonable complication?
even zvawe standard is not relaying inclusion/exclusion messages

ToSa

From my pov that's one of the biggest benefits of ota updates : you can do updates "in place" without the need to move the sensor towards the gateway or the other way around.
If we use the same message structure, the additional complexity is limited: gateway and relay nodes know how to deal with it and the only additional step for the sensor is to find the correct relay address. Error handling (switching relay during ota update etc.) would be limited or not existent keeping the bootloader as small as possible - if something unexpected happens like a disappearing relay during the update, the entire update would fail, the sensor reboots and tries again.

hek

@ToSa

Yes, agree!
Need to discuss something with you. Are you available on your registered forum email?

axillent

Probably we can reuse this http://ncrmnt.org/wp/2014/02/27/rf24boot-a-universal-over-the-air-bootloader-for-all-those-ucs/

ToSa

Quick update: I have the low level hardware access code ready (ability to communicate with the nRF24 without the library as the library is too big for the bootloader) and most of the other arduino side boodloader code as well. The raspberry pi side of the story is behind as binary data submission and a database layer are a prereq. I started based on the initial mongodb setup in the 1.4 dev branch but not sure if that's the strategy longer term.
I had some initial success testing the bootloader with some dirty hacks on the raspberry side (removing all debugging that would fail on binary data / removing the handling of trailing 0 etc.) when my hardware started to fail. I replaced the arduino / the nRF24 on both ends and even the raspberry Pi - without success... loaded old code that I knew was working on both ends and it still doesn't work... Both Arduino and RPi seem to work fine but once the first packet arrives from the Arduino to the RPi it reports retrieval and then is stuck unless I reboot the RPi... I'll retry once I'm back from China in two weeks - don't expect to hear anything from my end in the meantime as I won't able to take any hardware with me.

@axillent : the universal bootloader is great but would not be able to utilize the infrastructure (routing / packet format) to communicate and hence would not allow to update sensors that are out of reach for direct communication to the central node providing the updates (gateway or separate).

ToSa

The OTA bootloader was merged into the development branch some time ago. It consists of two components at this point: the OTA bootloader itself and a quick&dirty NodeJSController that connects through a standard SerialGateway or EthernetGateway and is used as repository and sketch distributor for the sensors.
I've created another pull request just now to include a couple of additional tweaks/fixes and an installation guide to get you started (NodeJsController/Readme.html).

Damme

@ToSa I might have missed it but is there any documentation of the protocol used to transmit OTA?
(I looked in the source and might have missed it .. o:) ) How big is the bootloader installed?

Zeph

The initial description sounds like a "pull" architecture, where the sensor node's bootloader figures out whether it needs to update itself and then invokes the bootloading of the appropriate binary.

I have some tendency towards a more "push" oriented approach, where the central code can (1) ask the node about it's current code and version if it has any doubt and (2) command the node to go into bootloading mode.

The advantage is that we don't have to anticipate the future upgrade path in the sensor node's code, and different nodes even with the same hardware could be "told" to program themselves with different code.

In my own case, I might want to change the code in some nodes to go to a higher bandwidth "christmas lights control" mode, then later change it back to a low bandwidth "sensor reports" mode. In other words, there is no implied "upgrade sequence for node type 23", just the ability to arbitrarily reload code in any node from the central controlling software (i'm deliberately being vague about what that central software is: a smart gateway, or a HA controller via a gateway, or a separate laptop or whatever).

I would initiate doing that by changing a config in one place. The config could be as simple as a text file with lines containing a node identifier and a reference to which hex file (or binary equivalent) we currently want in that node. The central software could compute a checksum or hash of the desired binary code, and the node could report the checksum of the current PROGMEM, from which the central software could decide to commend the node to go into OTA bootloading mode.

The code running in the sensor node needs very little to support this. At minimum - nothing, you just do a power cycle and the update happens while the bootloader has control. (A variant of this uses a reed switch to trigger rebooting, so you don't even have to open the case of a battery powered node). Or for nodes that are physically inaccessible, there could be "send me the hash of your current PROGMEM" and "reboot into the bootloader" commands added to the MySensors operational set.

(The "push" OTA bootloading process could differ some in the details eg: it could be initiated by the node checking with central for any update, rather than central sending a command to the node, and othewise work as above. The key thing in the push approach is that the sensor node just reloads whatever program central wants it to load, which is controlled by flexible config at central rather than expecting the sensor node to decide what it will next be programmed with)

(edit: the other difference is that "the proper code to load" is controlled per node, not per node type. So central could load the same code into every type 19 node, or it could differ per physical node.)

Would the OTA bootloader you are writing accommodate "push" bootloading like this, as well as "pull"?

Zeph

Out of curiosity, what's the real world speed like, when doing the bootloading over the MySensor network, with and without a relay node in the middle? Presumably you want reliable delivery so as to not load corrupt code.

Damme

@Zeph
Hmm, My approach would be at the server side decide 'Node 23 needs an update''
Send RESET node 23 (Hmm, I dont know if soft reset executes the bootloader?)
Node 23 ask server 'Do you have an update for me?'
Server : YES! and throws it away

ToSa

Nobody ever mentioned it would be fast :)
Takes a couple of minutes to load the DallasTemperatureSensor sketch that I user for testing - but depending on where your sensor sits that's still less time than going two stairs up, moving that big cabinet to the side, getting the sensor out going two stairs down again, dissembling the enclosure, connecting it to the PC/Mac, flashing the new firmware and then all of this in the opposite sequence to get it back where it belongs...

ToSa

@Damme that's exactly how it works - and yes, the soft reset (using the watchdog) executes the bootloader and asks the server if a new version is available.

Zeph

@ToSa said:

...For this step to work the details about the node can't be stored "somewhere" in the EEPROM but need to be ad a well defined address so that both the bootloader and the program itself can access the same data.

the master replies with details about the latest program version for the sensor node type

if the response from the master lists the same version as the one installed, the node boots into the existing program

if the master has a new program version then the node starts fetching the new program in small chunks and writes to the 328p program mem

This is the part where it sounds like "pull" - the node decides whether to update based on its own sensor node type - in essence fetching an update for itself if there is newer code fo it's type.

The "push" alternative would have the central authority make that decision on a node by node basis (not just node type by node type) and then tell a specific node to go into update mode.

Implications of pull vs push.

One is that when you change the "latest release" for a node type, in the "pull" case all of the nodes of that type could try to update themselves at once. In the push case, the server could do them one after another, and even space out the updating of nodes if desired to reduce bandwidth.
For another, suppose you had several nodes of the "same node type". Even tho two "heater control" nodes are the same type, they might have different hardware attached. Suppose you decide you want to upgrade JUST one of those nodes, say because there's a safety feature you need to add to just that one based on the heater it's connected to. In the "pull by node type" model, all of your header control nodes will have to be updated if any of them are updated. In the "push by node" model, the server could also choose to update just the one node.
Or suppose you want to split node types. Sometimes there's not exact sensor type defined in Vera, so you pick the closest approximation. Later a better and more specific node type gets defined. But you can't change the node type of a given node, because all nodes of that type will "pull down" the same code.
I'm not a big fan of "node type" as a primary concept of a wireless sensor network anyway (in the current sense). The current concept of "node type" seems more like a "vera_mapping_of_several_variables". What is the "node type" of a sensor node with a DHT-11 on pins 5 and 6 and a LDR on A2? If you swap out the DHT-11 for a DHT-22, this needs different code in the node, so it needs to be a different node type. If you move the LDR to A0, new code needed => new node type so it can fetch the right code upon update. Each combination of inputs and outputs needs its own unique code and thus "node type" for pulling updates.

I've discussed that elsewhere. I see "sensor type" as part of the mapping configuration for a given HA controller, not as something the node itself should care about. "Node type" is even worse, because of the mix and match combination of sensors it may have. "Node type for pulling updates" gets worse still, since the code needs to change not just based on the combination of sensors but based on the the specific hardware (dht-11 vs dht-22) and the pin assignments.

ToSa

@Zeph It actually depends on how that "somewhere" I mentioned in the initial description is coded.

The bootloader sends a message to the controller like "I'm node 23. I'm a temperature node and I'm currently running version 5 of the temperature node sketch which has a CRC of 0xABCD"

The controller sends something back like "You should be running version 6 of the temperature node sketch with CRC 0xFEDC"

So it's truly the controller (the central authority) that decides. At this point what I've done in the NodeJsController (which really is pretty dump and only meant for testing) is that I did not care about the nodeID bud only submitted a response based on the latest version available in the database for the given node type. You could obviously maintain a list of "expected sketches / sketch versions" for each nodeID and drive the decision on what the controller sends back based on that list instead of the node type only.

It really does exactly what you want it to do - the "pull" truly is a "pull for information if the central authority wants me to update". The big benefit of this "pull" setup is that the controller is stateless and just answers each request coming from the node making the code way cleaner and the overall setup way more reliable.

Zeph

@ToSa said:

The bootloader sends a message to the controller like "I'm node 23. I'm a temperature node and I'm currently running version 5 of the temperature node sketch which has a CRC of 0xABCD"

But that's not quite the right info. What it needs to say is "I'm running version 5 of the 18B20 temp sensor on pin 7" sketch, because the temperature node running with a DHT-22, or even an 18B20 on pin 8, needs to use different code.

Or ""I'm running version 52 of the 18B20 temp sensor on pin 7 and power blind relays on pins 5 and 6 and an IR detector on pin 12"

So I'm suggesting that the node say "I'm node 23, my PROGMEM has CRC 0xABCD, do you want me to load anything differrent". The rest is up to the server.

The bootloading code does not need to know what "type" the node is, only a signature of the PROGMEM. The server can then decide what code that specific node should be running instead, if any. Concepts like sensor types or node types or even sequences of versions are irrelevant to bootloading as seen from the node end.

At the server end, it has a table that says "node 23 should be running XYZZY.hex which has a signature of 0xAC3E". If that's not what it's doing, then at a time of the server's choosing, it can tell node 23 to update itself and send the appropriate program bytes. (At this point, the actual transfer of bytes from server to the node's PROGMEM, your current approach is fine, I'm talking about a higher level of the protocol or architecture).

ToSa

@Zeph: yep - that means in your case you don't really care about the node type. In other scenarios where you have 60 nodes installed and 20 of them are relay nodes, another 20 are switch detectors and another 20 are temperature sensors (all of them having the same hardware setup) the node type is pretty useful. For your specific need you probably should not care about the node type at all - maybe set node type == nodeID and that's it. The additional 2byte payload should not matter too much.

The ideal setup from my perspective would look like this (dreaming): based on the information shared back (combination of sensors and pin connections) the controller would reassemble the source code and build a new sketch for the given configuration, compile it and send it (I'm not kidding - I worked on a very similar approach a few years back).

Reality is: this is meant to be a bootloader for MySensors. The way MySensors currently works is that the combination you mentioned (18B20 temp sensor on pin 7 and power blind relays on pins 5 and 6 and an IR detector on pin 12) requires a specific sketch to be loaded that has these pin assignments etc. hard-coded.

This is the piece of code you would want to adjust - at this point it pulls all available firmware records for the given type and sorts descending by version - which delivers the highest available version back as the first record:

db.collection('firmware', function(err, c) {
	c.findOne({
		$query: {
			'type': fwtype
		},
		$orderby: {
			'version': -1
		}
	}, function(err, result) {

Instead the "expected firmware" type and version could be an attribute for the given node in the "node" collection which is manually maintained:

db.collection('node', function(err, c) {
	c.findOne({
		'id': destination
	}, function(err, noderesult) {
		db.collection('firmware', function(err, c) {
			c.findOne({
				'type': noderesult.expected_firmware_type,
				'version': noderesult.expected_firmware_version
			}, function(err, result) {

Damme

@ToSa Still I wonder if there is any OTA bootloader / protocol readme :) (So I dont have to dissect the nodejs code to write my own implementation)

ToSa

@Damme look at NodeJsController/Readme.html - actually for now better look at this version which has a couple of updates (will send another pull request tomorrow for the documentation as well as some minor changes).

If you are looking for tech documentation (protocol etc.) that's not yet included but the communication is fairly easy (complexity is mainly to make it robust - not kill a node if something goes wrong etc.):

the bootloader is using the same procedure to find its parent / request a nodeID etc. as a normal MySensors sketch would do
then a config request / config response is exchanged between node and controller
assuming an update is needed a series of code block requests / responses is executed until the full firmware is submitted

Data is submitted as binary - you can see the message payload details in MyOtaBootloader.h:

typedef struct
{
uint16_t type;
uint16_t version;
} FirmwareConfigRequest;

typedef struct
{
uint16_t type;
uint16_t version;
uint16_t blocks;
uint16_t crc;
} FirmwareConfigResponse;

typedef struct
{
uint16_t type;
uint16_t version;
uint16_t block;
} FirmwareRequest;

typedef struct
{
uint16_t type;
uint16_t version;
uint16_t block;
uint8_t data[FIRMWARE_BLOCK_SIZE];
} FirmwareResponse;

Zeph

@ToSa said:

yep - that means in your case you don't really care about the node type. In other scenarios where you have 60 nodes installed and 20 of them are relay nodes, another 20 are switch detectors and another 20 are temperature sensors (all of them having the same hardware setup) the node type is pretty useful

Suppose you do have 20 identical temperature nodes. It's trivially simple tor the server to tell each one of them to update to the same code in the "push by node" model. Not only that, but the server gets to decide when to allocate the bandwidth for each node.

Unfortunately, in the "pull by node type" model, you have no way to update some nodes of the given type and not other nodes of that type.

The "push by node" model easily handles any case the "pull by node type" model does, but the opposite is not true.

To even approach the "push by node" dynamics with "pull by node type" design, you have to have two concepts of "node type" which must not be conflated.

node type for purposes of the user interface
node type for purposes of updating the code in the ATMega328p

When you say "20 temperature nodes" the concept of "node type" would be meaningful in the first sense if you mean "20 nodes containing only a temperature sensor for the HA Controller to display".

But for updating the PROGMEM, the concept of "node type" needs to be "20 nodes containing only a temperature sensor of type DHT-11 on pin 6".

A node with a DHT-22 or 18b20 on pin 6, or a node with a DHT-11 on pin 5, would be the same "node type" for purposes of the user interface (which doesn't care), but different "node types" for purposes of updating PROGMEM.

Once you start considering node type = node id in some cases, it becomes simpler to just ignore the already messy and problematic "node type for purpose of update" concept (as seen by the node) and just do updates per node, period. If you want it, you get the functionally of "update all nodes that use identical code" essentially for free at the server end with the push by node model anyway, PLUS the ability to update individual nodes of any to to run any code you want, and when you the server want to schedule it. I don't see the downside of push-by-node here.

At worst, the server could have a table of node-id to "node type" for lookup and then follow your same dynamics. That's not how I'd do it (this model allows even simpler and more flexible options), but it would be a tiny "shim" to allow the more flexible "push by node" model to emulate the "pull by node type" dynamics if a given implementer so desired.

(Just by the way, this discussion is for me fun and mutually respectful brainstorming, I hope it lands that way).

ToSa

@Zeph not sure what you are asking for as I mentioned above that you can use the bootloader as it is today to just update specific nodes (by nodeID, update one and not update another even if they have the same node type). The implementation is not a "pull by node type" but it's a "pull by node ID, node type, node version" - which information the controller uses to decide if an update should be executed is is up to you!!!

Terminology: the "node type" I'm referring to means the specific setup of the hardware - only if that's the same then the node type would be the same (combination of sensors / pin connections / same sketch to be used). The back-end cares about a node type because it needs to know which sketch to use/send.
The user interface ideally never cares about a node type but really cares about the specific sensor type(s). This "translation" needs to happen in the background no matter if you use an OTA bootloader or not.

Examples:

Let's assume you have two nodes in the living room - the user interface should just show "living room temperature" no matter if the temperature sensor is connected to node 1 together with the light switch or connected to node 2 together with the blinds. This "translation" needs to happen anyways - ideally in the controller.
Let's assume you have two temperature sensors connected to one node - one measures the room temperature at 1.5m height and one is a floor temperature (not unusual for floor heating). Just knowing that there are two temperature sensors but not knowing which one is which will not be sufficient for the heating controller to make the correct adjustments. Again that translation from "node 23 with one XYZ temp sensor on pin 5 and one XYZ temp sensor on pin 7" to "node 23 temp sensor at pin 5 is the floor temperature" needs to happen anyways.

Over the air updates

10

11.9k

11.2k

113.3k