Over the air updates

Damme

@ToSa I might have missed it but is there any documentation of the protocol used to transmit OTA?
(I looked in the source and might have missed it .. o:) ) How big is the bootloader installed?

Zeph

The initial description sounds like a "pull" architecture, where the sensor node's bootloader figures out whether it needs to update itself and then invokes the bootloading of the appropriate binary.

I have some tendency towards a more "push" oriented approach, where the central code can (1) ask the node about it's current code and version if it has any doubt and (2) command the node to go into bootloading mode.

The advantage is that we don't have to anticipate the future upgrade path in the sensor node's code, and different nodes even with the same hardware could be "told" to program themselves with different code.

In my own case, I might want to change the code in some nodes to go to a higher bandwidth "christmas lights control" mode, then later change it back to a low bandwidth "sensor reports" mode. In other words, there is no implied "upgrade sequence for node type 23", just the ability to arbitrarily reload code in any node from the central controlling software (i'm deliberately being vague about what that central software is: a smart gateway, or a HA controller via a gateway, or a separate laptop or whatever).

I would initiate doing that by changing a config in one place. The config could be as simple as a text file with lines containing a node identifier and a reference to which hex file (or binary equivalent) we currently want in that node. The central software could compute a checksum or hash of the desired binary code, and the node could report the checksum of the current PROGMEM, from which the central software could decide to commend the node to go into OTA bootloading mode.

The code running in the sensor node needs very little to support this. At minimum - nothing, you just do a power cycle and the update happens while the bootloader has control. (A variant of this uses a reed switch to trigger rebooting, so you don't even have to open the case of a battery powered node). Or for nodes that are physically inaccessible, there could be "send me the hash of your current PROGMEM" and "reboot into the bootloader" commands added to the MySensors operational set.

(The "push" OTA bootloading process could differ some in the details eg: it could be initiated by the node checking with central for any update, rather than central sending a command to the node, and othewise work as above. The key thing in the push approach is that the sensor node just reloads whatever program central wants it to load, which is controlled by flexible config at central rather than expecting the sensor node to decide what it will next be programmed with)

(edit: the other difference is that "the proper code to load" is controlled per node, not per node type. So central could load the same code into every type 19 node, or it could differ per physical node.)

Would the OTA bootloader you are writing accommodate "push" bootloading like this, as well as "pull"?

Zeph

Out of curiosity, what's the real world speed like, when doing the bootloading over the MySensor network, with and without a relay node in the middle? Presumably you want reliable delivery so as to not load corrupt code.

Damme

@Zeph
Hmm, My approach would be at the server side decide 'Node 23 needs an update''
Send RESET node 23 (Hmm, I dont know if soft reset executes the bootloader?)
Node 23 ask server 'Do you have an update for me?'
Server : YES! and throws it away

ToSa

Nobody ever mentioned it would be fast :)
Takes a couple of minutes to load the DallasTemperatureSensor sketch that I user for testing - but depending on where your sensor sits that's still less time than going two stairs up, moving that big cabinet to the side, getting the sensor out going two stairs down again, dissembling the enclosure, connecting it to the PC/Mac, flashing the new firmware and then all of this in the opposite sequence to get it back where it belongs...

ToSa

@Damme that's exactly how it works - and yes, the soft reset (using the watchdog) executes the bootloader and asks the server if a new version is available.

Zeph

@ToSa said:

...For this step to work the details about the node can't be stored "somewhere" in the EEPROM but need to be ad a well defined address so that both the bootloader and the program itself can access the same data.

the master replies with details about the latest program version for the sensor node type

if the response from the master lists the same version as the one installed, the node boots into the existing program

if the master has a new program version then the node starts fetching the new program in small chunks and writes to the 328p program mem

This is the part where it sounds like "pull" - the node decides whether to update based on its own sensor node type - in essence fetching an update for itself if there is newer code fo it's type.

The "push" alternative would have the central authority make that decision on a node by node basis (not just node type by node type) and then tell a specific node to go into update mode.

Implications of pull vs push.

One is that when you change the "latest release" for a node type, in the "pull" case all of the nodes of that type could try to update themselves at once. In the push case, the server could do them one after another, and even space out the updating of nodes if desired to reduce bandwidth.
For another, suppose you had several nodes of the "same node type". Even tho two "heater control" nodes are the same type, they might have different hardware attached. Suppose you decide you want to upgrade JUST one of those nodes, say because there's a safety feature you need to add to just that one based on the heater it's connected to. In the "pull by node type" model, all of your header control nodes will have to be updated if any of them are updated. In the "push by node" model, the server could also choose to update just the one node.
Or suppose you want to split node types. Sometimes there's not exact sensor type defined in Vera, so you pick the closest approximation. Later a better and more specific node type gets defined. But you can't change the node type of a given node, because all nodes of that type will "pull down" the same code.
I'm not a big fan of "node type" as a primary concept of a wireless sensor network anyway (in the current sense). The current concept of "node type" seems more like a "vera_mapping_of_several_variables". What is the "node type" of a sensor node with a DHT-11 on pins 5 and 6 and a LDR on A2? If you swap out the DHT-11 for a DHT-22, this needs different code in the node, so it needs to be a different node type. If you move the LDR to A0, new code needed => new node type so it can fetch the right code upon update. Each combination of inputs and outputs needs its own unique code and thus "node type" for pulling updates.

I've discussed that elsewhere. I see "sensor type" as part of the mapping configuration for a given HA controller, not as something the node itself should care about. "Node type" is even worse, because of the mix and match combination of sensors it may have. "Node type for pulling updates" gets worse still, since the code needs to change not just based on the combination of sensors but based on the the specific hardware (dht-11 vs dht-22) and the pin assignments.

ToSa

@Zeph It actually depends on how that "somewhere" I mentioned in the initial description is coded.

The bootloader sends a message to the controller like "I'm node 23. I'm a temperature node and I'm currently running version 5 of the temperature node sketch which has a CRC of 0xABCD"

The controller sends something back like "You should be running version 6 of the temperature node sketch with CRC 0xFEDC"

So it's truly the controller (the central authority) that decides. At this point what I've done in the NodeJsController (which really is pretty dump and only meant for testing) is that I did not care about the nodeID bud only submitted a response based on the latest version available in the database for the given node type. You could obviously maintain a list of "expected sketches / sketch versions" for each nodeID and drive the decision on what the controller sends back based on that list instead of the node type only.

It really does exactly what you want it to do - the "pull" truly is a "pull for information if the central authority wants me to update". The big benefit of this "pull" setup is that the controller is stateless and just answers each request coming from the node making the code way cleaner and the overall setup way more reliable.

Zeph

@ToSa said:

The bootloader sends a message to the controller like "I'm node 23. I'm a temperature node and I'm currently running version 5 of the temperature node sketch which has a CRC of 0xABCD"

But that's not quite the right info. What it needs to say is "I'm running version 5 of the 18B20 temp sensor on pin 7" sketch, because the temperature node running with a DHT-22, or even an 18B20 on pin 8, needs to use different code.

Or ""I'm running version 52 of the 18B20 temp sensor on pin 7 and power blind relays on pins 5 and 6 and an IR detector on pin 12"

So I'm suggesting that the node say "I'm node 23, my PROGMEM has CRC 0xABCD, do you want me to load anything differrent". The rest is up to the server.

The bootloading code does not need to know what "type" the node is, only a signature of the PROGMEM. The server can then decide what code that specific node should be running instead, if any. Concepts like sensor types or node types or even sequences of versions are irrelevant to bootloading as seen from the node end.

At the server end, it has a table that says "node 23 should be running XYZZY.hex which has a signature of 0xAC3E". If that's not what it's doing, then at a time of the server's choosing, it can tell node 23 to update itself and send the appropriate program bytes. (At this point, the actual transfer of bytes from server to the node's PROGMEM, your current approach is fine, I'm talking about a higher level of the protocol or architecture).

ToSa

@Zeph: yep - that means in your case you don't really care about the node type. In other scenarios where you have 60 nodes installed and 20 of them are relay nodes, another 20 are switch detectors and another 20 are temperature sensors (all of them having the same hardware setup) the node type is pretty useful. For your specific need you probably should not care about the node type at all - maybe set node type == nodeID and that's it. The additional 2byte payload should not matter too much.

The ideal setup from my perspective would look like this (dreaming): based on the information shared back (combination of sensors and pin connections) the controller would reassemble the source code and build a new sketch for the given configuration, compile it and send it (I'm not kidding - I worked on a very similar approach a few years back).

Reality is: this is meant to be a bootloader for MySensors. The way MySensors currently works is that the combination you mentioned (18B20 temp sensor on pin 7 and power blind relays on pins 5 and 6 and an IR detector on pin 12) requires a specific sketch to be loaded that has these pin assignments etc. hard-coded.

This is the piece of code you would want to adjust - at this point it pulls all available firmware records for the given type and sorts descending by version - which delivers the highest available version back as the first record:

db.collection('firmware', function(err, c) {
	c.findOne({
		$query: {
			'type': fwtype
		},
		$orderby: {
			'version': -1
		}
	}, function(err, result) {

Instead the "expected firmware" type and version could be an attribute for the given node in the "node" collection which is manually maintained:

db.collection('node', function(err, c) {
	c.findOne({
		'id': destination
	}, function(err, noderesult) {
		db.collection('firmware', function(err, c) {
			c.findOne({
				'type': noderesult.expected_firmware_type,
				'version': noderesult.expected_firmware_version
			}, function(err, result) {

Damme

@ToSa Still I wonder if there is any OTA bootloader / protocol readme :) (So I dont have to dissect the nodejs code to write my own implementation)

ToSa

@Damme look at NodeJsController/Readme.html - actually for now better look at this version which has a couple of updates (will send another pull request tomorrow for the documentation as well as some minor changes).

If you are looking for tech documentation (protocol etc.) that's not yet included but the communication is fairly easy (complexity is mainly to make it robust - not kill a node if something goes wrong etc.):

the bootloader is using the same procedure to find its parent / request a nodeID etc. as a normal MySensors sketch would do
then a config request / config response is exchanged between node and controller
assuming an update is needed a series of code block requests / responses is executed until the full firmware is submitted

Data is submitted as binary - you can see the message payload details in MyOtaBootloader.h:

typedef struct
{
uint16_t type;
uint16_t version;
} FirmwareConfigRequest;

typedef struct
{
uint16_t type;
uint16_t version;
uint16_t blocks;
uint16_t crc;
} FirmwareConfigResponse;

typedef struct
{
uint16_t type;
uint16_t version;
uint16_t block;
} FirmwareRequest;

typedef struct
{
uint16_t type;
uint16_t version;
uint16_t block;
uint8_t data[FIRMWARE_BLOCK_SIZE];
} FirmwareResponse;

Zeph

@ToSa said:

yep - that means in your case you don't really care about the node type. In other scenarios where you have 60 nodes installed and 20 of them are relay nodes, another 20 are switch detectors and another 20 are temperature sensors (all of them having the same hardware setup) the node type is pretty useful

Suppose you do have 20 identical temperature nodes. It's trivially simple tor the server to tell each one of them to update to the same code in the "push by node" model. Not only that, but the server gets to decide when to allocate the bandwidth for each node.

Unfortunately, in the "pull by node type" model, you have no way to update some nodes of the given type and not other nodes of that type.

The "push by node" model easily handles any case the "pull by node type" model does, but the opposite is not true.

To even approach the "push by node" dynamics with "pull by node type" design, you have to have two concepts of "node type" which must not be conflated.

node type for purposes of the user interface
node type for purposes of updating the code in the ATMega328p

When you say "20 temperature nodes" the concept of "node type" would be meaningful in the first sense if you mean "20 nodes containing only a temperature sensor for the HA Controller to display".

But for updating the PROGMEM, the concept of "node type" needs to be "20 nodes containing only a temperature sensor of type DHT-11 on pin 6".

A node with a DHT-22 or 18b20 on pin 6, or a node with a DHT-11 on pin 5, would be the same "node type" for purposes of the user interface (which doesn't care), but different "node types" for purposes of updating PROGMEM.

Once you start considering node type = node id in some cases, it becomes simpler to just ignore the already messy and problematic "node type for purpose of update" concept (as seen by the node) and just do updates per node, period. If you want it, you get the functionally of "update all nodes that use identical code" essentially for free at the server end with the push by node model anyway, PLUS the ability to update individual nodes of any to to run any code you want, and when you the server want to schedule it. I don't see the downside of push-by-node here.

At worst, the server could have a table of node-id to "node type" for lookup and then follow your same dynamics. That's not how I'd do it (this model allows even simpler and more flexible options), but it would be a tiny "shim" to allow the more flexible "push by node" model to emulate the "pull by node type" dynamics if a given implementer so desired.

(Just by the way, this discussion is for me fun and mutually respectful brainstorming, I hope it lands that way).

ToSa

@Zeph not sure what you are asking for as I mentioned above that you can use the bootloader as it is today to just update specific nodes (by nodeID, update one and not update another even if they have the same node type). The implementation is not a "pull by node type" but it's a "pull by node ID, node type, node version" - which information the controller uses to decide if an update should be executed is is up to you!!!

Terminology: the "node type" I'm referring to means the specific setup of the hardware - only if that's the same then the node type would be the same (combination of sensors / pin connections / same sketch to be used). The back-end cares about a node type because it needs to know which sketch to use/send.
The user interface ideally never cares about a node type but really cares about the specific sensor type(s). This "translation" needs to happen in the background no matter if you use an OTA bootloader or not.

Examples:

Let's assume you have two nodes in the living room - the user interface should just show "living room temperature" no matter if the temperature sensor is connected to node 1 together with the light switch or connected to node 2 together with the blinds. This "translation" needs to happen anyways - ideally in the controller.
Let's assume you have two temperature sensors connected to one node - one measures the room temperature at 1.5m height and one is a floor temperature (not unusual for floor heating). Just knowing that there are two temperature sensors but not knowing which one is which will not be sufficient for the heating controller to make the correct adjustments. Again that translation from "node 23 with one XYZ temp sensor on pin 5 and one XYZ temp sensor on pin 7" to "node 23 temp sensor at pin 5 is the floor temperature" needs to happen anyways.

Zeph

OK, maybe we are converging in some ways. I'll try to list the similarities as well as differences.

So we agree that a "node type" doesn't mean a generic "termperature node" but a very specific "my sketch for controlling an 18b20 on pin 7".

Let's suppose the sketch was called "DevDuino_18b20_7.ino" (for the moment let's leave out auto-scripting).
This compiled into DevDuino_18b20_7.hex and the binary equivalent.

We would know that nodes 7, 12, and 15 should have the latest version of this sketch. (node 4 might also measure temperature but with different hardware or pin configuration, so it would not use this sketch).

If we just want to update the sketch, we would send copies of the new binary to nodes 7, 12, and 15.

So far I think we are on nearly the same page, at the generic level described. Where the "push by node" and "pull by node type" models differ is in where the knowledge that nodes 7, 12, and 15 run the same code resides. In push-by-node, the Server knows that it should send the same code to them; in the "pull by node type" model, those three nodes themselves know they want updates for a given numeric "node type".

Also, each node knows it has a given version of the firmware of its node type, and decides when to upgrade by comparing that with what the server offers.

The differences are highlighted better when we make more than a version update.

Suppose we decide to make use of pin 2 of node 7 to control an LED. At this point we need to load different firmware into node 7, and we write a sketch called "DevDuino_18b20_7_LED_2.ino". (forgive the naming, it's an example). So now we want to change the overall system configuration so that the server will load DevDuino_18b20_7_LED_2.hex into node 7. (Nodes 12 and 15 still have the other sketch without LED control and maybe always will, the new sketch is not a new version of the old sketch)

I am suggesting that all you have to do is create the new sketch (or rather it's hex or binary compiled form) and configure the server to send that to node 7 instead of the previous sketch. It doesn't matter one bit to the node whether it's switching to a completely new sketch versus a new version of the current sketch. That's in the server logic only, not OTA.

I think you are saying that your node.js server can accomodate changing what sketch (not just version) runs in each node, because it can load arbitrarily different (or identical) binary files to each node by node id, with no limitations based on what "type" that node used to be, right? Or not?

In my thinking, the node has no need to know that it's running "node type 453, version 16".

The server is free to conceptually organize firmwares by "node type" and "version" if it wishes, but those concepts do not need to be pushed down to the node level.

Pull by node type:

 Node: I'm type 453, what is the latest version of 453?
 Server: Queries database for max version of node type 453, and say "Latest 453 version is 17"
 Node: Checks that it has version 16, asks server to send 453 ver 17 for OTA programming
 Server: sends version requested by node
 (programming done)
 Node: ask server for lastest version for node type 453
 Server: ... 17
 Node: I'm version 17, no change needed

Push by node:

Node: I'm node 7 and my PROGMEM signature if 0x54FE
Server: My config says node 7 should have the latest DevDuino_18b120_7_LED_7.hex" with Sig 0x3EE5
Server: Please load the following binary into your PROGMEM (sends appropriate version)
(programming done)
Node: I'm node 7 and my PROGMEM sig is 0x3EE5
Server: Mark that one as updated

Notice that there's no problem of forgetting to update the version number - if the signature (eg: CRC or hash) in PROGMEM isn't what the server wants there, then it starts an OTA programming session, period. Even if the sig was wrong because the programming had a glitch rather than because it is out of date, the server knows it's wrong and sends again.

In the push-by-node model, the system is not limited to "updating all nodes with node type 453 versions <= 16 to version 17" -- it can load an entirely different sketch (node type 763 version 0 if the server thinks in those terms) into the node if it wishes. And the node doesn't care, it doesn't need to know what "node type" it is or compare versions (that's in the logic of the server), all the node needs to know is that the server wants it to load some code into PROGMEM, period.

That's how I imagine things working. As I have understood you, you are pushing the concept of "node type" and "version number" and the comparison of version numbers down to the node itself, rather than letting the server handle that (if it chooses). I don't see the advantage of that; the server seems a more logical place for that information - both simpler and more flexible.

Your node.js implementation might organize source code for nodes by numeric "node type" (where your node type 7 isn't the same as my node type 7) and by version within node type. That would still be supported by "push by node" model.

But another server might choose to organize firmwares by string filename plus signature (eg; CRC). It's configured with a simple table of node_id,filename. It computes the signature of the binary file to be compared with that the node reports it has in PROGMEM. There's no numeric "node type" or "version number" needed. (If you really want to also keep obsolete versions of the binary firmware on the server, there are easy workarounds fore that too). The config is dead simple: (NodeID, Filename)* Or optionally (nodelist, Filename)* if you want to reduce the number of times you spell out the filename. Use the latest (or only) copy of the given filename.

The cool thing about the "push by node" approach is that the same node bootloader can easily accomodate both server approaches (node type # + version # OR nodeID->filename) - since the only concepts the node uses are

"I can tell you the signature of what I have in PROGMEM now", and
"If you tell me to I'll load something of your choice".

For OTA bootloading the node doesn't know need to know or care about "node type" numbers or versions, nor about file names or date stamps.

So I'm not trying to eliminate your concept of the server of assigning numeric ID's to each combination of sensors and pins, and using ordered versions numbers within each "version type". I just don't see why those concepts need to also be pushed down into the node and OTA bootloader protocol. With the "push by node" model, there''s more flexibility to organize changing firmwares in the server as you prefer OR in other ways, with no meaningful cost, because the node end of the OTA programming system has been distilled to just the essence that it really needs to understand, leaving higher levels of management as a server-internal affair.

Damme

@ToSa I've been looking through the ota bootloader and noticed there are alot of uint16_t wich can be replaced with uint8_t.. saves 128bytes of code. Still needs ~900bytes less until 1024 words bootloader though but is makes more space for other stuff :)

JeJ

When i'm trying to run the NodeJsController.js script i always end up with "Error: Cannot open /dev/ttyAMA0"

I'm running a RPi with a serial gateway.

Any ideas?

mikeones

On my RPi, the serial gateway is detected as dev/ttyUSB0.

Damme

@ToSa I've been working on getting OTA to work with MQTTgateway with some success.

But I do have problem with some packages missing and I think the communication should be something like this;
bootloader checks id and version and server said there is an update. (no change from today)
but then:

[bootloader] 0000 has CHK FF(just filler in first package) REQ 0000 type 01 version 01
[server] load 0000 from hex, send addr 0000 0C9428030C9447240C9474240C947605 C7
[bootloader] 0000 has CHK FF, REQ 0010 type 01 version 01
[server] (checksum mismatch) send addr 0000 0C9428030C9447240C9474240C947605 C7
[bootloader] 0000 has CHK C7, REQ 0010 type 01 version 01
[server] load 0010 from hex send addr 0010 0C94A3050C94D0050C9480100C945003 00
And so on.. :)

what do you think about this? the total package is 32bytes, mysensors header is 7bytes. and this layout would need 19 bytes from server to bootloader..
I Have also seen some intel hex that is not in order 0010 0020 0030 etc but it could jump address. I do not think arduino ide does this but you never know..

EDIT:
I havn't read this one yet but I guess there is alot of good stuff in it :)
http://www.nordicsemi.com/eng/nordic/download_resource/10878/2/94069421

ToSa

@JeJ @mikeones : how is your serial gateway connected? using a USB-Rs232 cable or via the GPIO pins on the RPi? Did you check the Readme.md in the NodeJsController directory?

Over the air updates

11

12.0k

11.2k

113.4k