Over the air updates

ToSa · 23 Mar 2014, 21:38

Really really nice project !!!

I worked on a similar setup about two years ago with different RF modules but didn't finish. Now I was about to restart and realized that the nRF24 modules are waaaaay less expensive. I just started adjusting the old code for the nRF24 modules I ordered when I found this great project. The raspberry PI for me is the way to go as I own two sitting almost idle and don't own a Vera.

The one feature I'm missing after reading through the majority of the available documentation is over the air updates of the sensor node software. As this is one of the features I completed for my old design, I'll go ahead and try to port the bootloader and the RPi based state-less firmware server to work with the protocol and routing implemented here... if successful I'll post the results.

If you worked already on over the air updates or you know somebody who did, please let me know and I'll focus my efforts on something else

Tobias

hek · 23 Mar 2014, 21:50

We've dreamt about this

It requires a bit more advanced custom PCBs to handle OTA updates but we're thinking in the same lines here.

Our first generation PCB will be of a simpler kind but if you have bootloader knowledge thats awesome and would be really helpful!

axillent · 27 Mar 2014, 08:28

Tobias, it is I cool and valuable idea!

I have only thought it throuth and have the following conclusions:

the maximum bootloader size of atmega328p is 2048 bytes. It is still a chance to fit radio-bootloder inside it but the chance is minimal
most probably design will require an external MCU to execute update on main atmega328p. This external MCU can be:
-1 cheapest and most known atmega8a. It is the same hardware platform as most popular in arduino world atmega328p
-2 or we can shift from radio chip NRF24L01+ to NRF24LE01. The last one is NRF24L01+ with 8051 MCU on the same crystal.

From the price point of view NRF24L01+ + atmega8a can be the same as NRF24LE01 or even cheaper. But NRF24LE01 will be more compact (2 crystals instead of 3).

i think the good idea is to create a bridge between PC to device by radio
this bridge will be separate from Gateway and for simplicity should not support relaing - just use your notebook to reach the upgraded device
to bridge we can have 3 options:
-1 to bridge serial with DTR. This will allow to use native arduino way of doing firmware download or even serial monitoring. I'm not sure it is absolutely clear be possible to release
-2 to simulate USBASP on USB and use an external programmer from Arduino IDE
-3 to create own protocol
1 and 2 do not require development on PC/Macbook side but probably we will face some restrictions in realisation.
3 will require a deep PC/Macbook developemnt but will allow to get rid from most of possible restrictions

ToSa · 27 Mar 2014, 17:15

Agree it's not an easy exercise to shrink the RF code into the bootloader section.

I did an OTA bootloader within the limited space available in the 328p using a different RF module (which shouldn't be a major issue as the specs were pretty similar) and a different protocol (which might be an issue). I had to rewrite the init/rx/tx components of the RF module driver to be rather static and honestly no longer stick to the OSI model but go right through from app layer to hardware layer to make it fit and I had to reduced the use-cases in a way that the node in bootloader mode essentially only sends very specific package types and reacts to very specific package types being received.

I did not add any specific hardware (getting back to the initial comment about more complex PCB). The bootloader in inside the 328p and software reboot was handled by setting the watchdog to a short period and start an endless loop (instead of adding some additional hardware components to handle that).

I'll share more details when I had a chance to look into the protocol a bit further probably over the weekend. If I can't find the time, I'll share the code as it is...

ToSa · 1 Apr 2014, 00:50

I couldn't get the connection working by just downloading and installing the Arduino and Raspberry components from GIT. It appears that even the defaults for the connection settings are different (e.g. 1M vs 2M transfer rate). I did not dig deeper but I would highly recommend to merge the projects or at least reference on a common set of header files for e.g. protocol specific details (e.g. the enums). That way the risk to run out of sync is way smaller. That said I don't have a running environment yet to test but started reviewing the protocol...

the max bootloader size is 2048 words -> 4096 bytes (still not a lot)
there are two ways to initiate the bootloader: either on power on of the sensor node or by submitting a specific package to the node that causes a reboot
during bootloading the node does not route any packets to other nodes - it is a pure leaf node for that short period of time even if it's a router under normal condition
the bootloader first of all submits its identity (unique address / type of node / current firmware version) to the master and waits for some time for a response. If the identity is not known (first start / no address known), a new address is requested similar to DHCP. If no response, it does a few retries and then finally starts the existing program. Fir this step to work the details about the node can't be stored "somewhere" in the EEPROM but need to be ad a well defined address so that both the bootloader and the program itself can access the same data.
the master replies with details about the latest program version for the sensor node type
if the response from the master lists the same version as the one installed, the node boots into the existing program
if the master has a new program version then the node starts fetching the new program in small chunks and writes to the 328p program mem - once done it reboots running through the full process again assuming that this time the program is the latest available

In my previous setup I used the master node as software distributor - but now realized that some folks might want to keep the two separate, hence I'll add the option to use a different node for software distribution. This should allow e.g. users with a Vera to keep the Vera as master but use e.g. a Linux PC with an RF module connected (e.g. USB<->ATmega<->nRF24L) for software updates...

Announcing / requesting a new address etc is already covered by the protocol (auto assignment of radioID). A couple of packet types need to be added for the submission of the program etc. ll of this looks doable at this point.

I'll post about progress in this thread.

hek · 1 Apr 2014, 07:05

@ToSa said:

I couldn't get the connection working by just downloading and installing the Arduino and Raspberry components from GIT. It appears that even the defaults for the connection settings are different (e.g. 1M vs 2M transfer rate). I did not dig deeper but I would highly recommend to merge the projects or at least reference on a common set of header files for e.g. protocol specific details (e.g. the enums). That way the risk to run out of sync is way smaller. That said I don't have a running environment yet to test but started reviewing the protocol...

The RPI code is in a weird state right now.
http://forum.mysensors.org/topic/8/code-for-beta-testing#19

Your OTA knowledge is very valuables. It might be too complicated to make Vera the firmware distributor (because of the plugin-system-limitations). It might be better to leave this functionality to the RPI gateway only where we have full control.

Keep us updated

axillent · 2 Apr 2014, 13:16

I see tow possible solutions:

firmware upload is done by a separate device connected direclty to PC and with no any relations with RPI or vera
In this case the upload can be very similar to how we upload firmware using USB
RPI can be a helper. I see similar thing with ConnectPort X2 from Digi. It is Xbee to Ethernet gateway and alos it is a Xbee/Zigbee router node.
I can select Zigbee device from the list using WEB interface of ConnectPort X2 and can use file upload.
This one looks more robust and may be a right design but for DIY we do it will require more steps to download updated firmware

ToSa · 20 Apr 2014, 21:21

Now that the Arduino and the RPi talk to eachother (http://forum.mysensors.org/topic/8/code-for-beta-testing#27) I'm looking into ota updates.

The bootloader needs to know what firmware to request / what firmware the sensor is running - finally it needs to know what hardware is build into the sensor to load the correct firmware update. One option would be to have a separate bootloader for each sensor hardware - but that would be a nightmare to maintain over time.
Looking at the code the sketch info provides this kind of detail - but the sketch info is part of the firmware which ends up in a catch 22. We need to store the sketch info in EEPROM at an address known to the bootloader as well as to the firmware later on. Instead of storing the sketch name I would recommend to store some kind of a sensor type ID to reduce EEPROM consumption - sensors with the same hardware would get the same sensor type ID. The ota bootloader could then check with the firmware provider if there is a new version firmware for this kind of hardware...

I would propose the following: instead of storing separate bytes for address / relay / ... in EEPROM, create a struct and store that struct in EEPROM - load the struct from EEPROM to RAM on startup and check if the crc is valid... The most sophisticated approach would probably be to add a kind of unique MAC address to each sensor and to separate hardware specific and software specific configuration - something like:

struct HardwareConfig {
uint16_t hwType;
uint8_t hwVersion;
uint8_t macAddress[6];
uint16_t crc;
};

struct SensorConfig {
uint8_t address;
uint8_t relayAddress;
uint16_t fwVersion;
uint16_t crc;
};

Rough bootloader code could look like this;

--if hardware config is not valid (crc)
----// this is the probably the very first startup of the sensor after flashing of the bootloader
----send "who am i" request to firmware provider
----wait for "who am i" response from firmware provider
----store response in hardware config -> EEPROM
--if sensor config is not valid (CRC)
----// this is equivalent to the current implementation of "no radioID in EEPROM"
----request radio ID (same way as implemented already)
----store response in sensor config -> EEPROM
--send "what is the latest firmware for hardware type x" request to firmware provider
--wait for "what is the latest firmware for hardware type x" response from firmware provider
--if newer firmware available
----invalidate firmware version/crc in EEPROM
----for each block (small chunks due to max packet size)
------send firmware request to firmware provider
------wait for firmware response from firmware provider
------write to progmem
----write new firmware version/crc to EEPROM
--boot into sensor firmware

Any comments/suggestions?
If none then I'll share some code in the next few days to get the ota going - will start with the most sophisticated and then reduce / adjust if needed e.g. to get the code small enough to fit into the bootloader section

hek · 20 Apr 2014, 21:55

Looks good!

It would be great to have store a link to a specific git-url sketch and/or sha(ish) somehow. This way if user switches controller or if loses all data the new can pick up where the last died.
If we later setup a headless build-server in the cloud we could automate the whole compilation/upload process. That would be super clean!
Is it possible to add a hook in the bootloader to trigger a reset (which do the actual upgrade) from the normal sensor program? This way we can "push" a new firmware to a sensor from controller side.

Also note that relaying nodes keeps routing information in EEPROM (256 bytes).

Cheers
Henrik

ToSa · 21 Apr 2014, 09:52

I need to transfer binary data for the ota updates, to keep the packet rate at a minimum - but the code is currently tailored for text. Especially using \0 for message termination and printing message directly in e.g. debug statements won't work. As this is pretty deep in the Sensor class reused all over the place I'm a bit hesitant to change (e.g. Sensor::readMessage()). This wouldn't be an issue for the bootloader as I won't use the Sensor class anyways (too big) but for the RPi the entire chain would be used: RadioGateway <- Gateway <- Relay <- Sensor ...

hek · 21 Apr 2014, 13:49

Yes you are right, I'm doing some work on allowing binary payload right now. If fact I'm considering letting all data be transferred "binary". But there is some cross platform (endian) issues that needs to be solved.
OT: Hmm.. Is both arduino and rpi using ieee floats? Can I send the 4 float-bytes over the air? Need to do some googling...

axillent · 21 Apr 2014, 16:18

Let me add my cents. I do not think that ota updates should use the same message structure as Sensor class is using. We only need one common thing - a clear identification of ota message.
Sensor class should ignore any ota message. Actually it will ignore it without any class modification because we are ignoring any message with bad CRC.

hek · 21 Apr 2014, 16:39

@axillent

If we want to route firmware messages through relaying nodes it must have the same structure.
We need to introduce a new message type to identify firmware messages.

axillent · 21 Apr 2014, 16:42

@hek that is true.
but do we plan to relay ota updates?
sure we can do, but is it a resonable complication?
even zvawe standard is not relaying inclusion/exclusion messages

ToSa · 21 Apr 2014, 16:59

From my pov that's one of the biggest benefits of ota updates : you can do updates "in place" without the need to move the sensor towards the gateway or the other way around.
If we use the same message structure, the additional complexity is limited: gateway and relay nodes know how to deal with it and the only additional step for the sensor is to find the correct relay address. Error handling (switching relay during ota update etc.) would be limited or not existent keeping the bootloader as small as possible - if something unexpected happens like a disappearing relay during the update, the entire update would fail, the sensor reboots and tries again.

hek · 21 Apr 2014, 17:31

@ToSa

Yes, agree!
Need to discuss something with you. Are you available on your registered forum email?

axillent · 29 Apr 2014, 05:38

Probably we can reuse this http://ncrmnt.org/wp/2014/02/27/rf24boot-a-universal-over-the-air-bootloader-for-all-those-ucs/

ToSa · 7 May 2014, 21:55

Quick update: I have the low level hardware access code ready (ability to communicate with the nRF24 without the library as the library is too big for the bootloader) and most of the other arduino side boodloader code as well. The raspberry pi side of the story is behind as binary data submission and a database layer are a prereq. I started based on the initial mongodb setup in the 1.4 dev branch but not sure if that's the strategy longer term.
I had some initial success testing the bootloader with some dirty hacks on the raspberry side (removing all debugging that would fail on binary data / removing the handling of trailing 0 etc.) when my hardware started to fail. I replaced the arduino / the nRF24 on both ends and even the raspberry Pi - without success... loaded old code that I knew was working on both ends and it still doesn't work... Both Arduino and RPi seem to work fine but once the first packet arrives from the Arduino to the RPi it reports retrieval and then is stuck unless I reboot the RPi... I'll retry once I'm back from China in two weeks - don't expect to hear anything from my end in the meantime as I won't able to take any hardware with me.

@axillent : the universal bootloader is great but would not be able to utilize the infrastructure (routing / packet format) to communicate and hence would not allow to update sensors that are out of reach for direct communication to the central node providing the updates (gateway or separate).

ToSa · 23 Aug 2014, 16:51

The OTA bootloader was merged into the development branch some time ago. It consists of two components at this point: the OTA bootloader itself and a quick&dirty NodeJSController that connects through a standard SerialGateway or EthernetGateway and is used as repository and sketch distributor for the sensors.
I've created another pull request just now to include a couple of additional tweaks/fixes and an installation guide to get you started (NodeJsController/Readme.html).

Damme · 23 Aug 2014, 17:39

@ToSa I might have missed it but is there any documentation of the protocol used to transmit OTA?
(I looked in the source and might have missed it .. o:) ) How big is the bootloader installed?

Zeph · 23 Aug 2014, 19:53

The initial description sounds like a "pull" architecture, where the sensor node's bootloader figures out whether it needs to update itself and then invokes the bootloading of the appropriate binary.

I have some tendency towards a more "push" oriented approach, where the central code can (1) ask the node about it's current code and version if it has any doubt and (2) command the node to go into bootloading mode.

The advantage is that we don't have to anticipate the future upgrade path in the sensor node's code, and different nodes even with the same hardware could be "told" to program themselves with different code.

In my own case, I might want to change the code in some nodes to go to a higher bandwidth "christmas lights control" mode, then later change it back to a low bandwidth "sensor reports" mode. In other words, there is no implied "upgrade sequence for node type 23", just the ability to arbitrarily reload code in any node from the central controlling software (i'm deliberately being vague about what that central software is: a smart gateway, or a HA controller via a gateway, or a separate laptop or whatever).

I would initiate doing that by changing a config in one place. The config could be as simple as a text file with lines containing a node identifier and a reference to which hex file (or binary equivalent) we currently want in that node. The central software could compute a checksum or hash of the desired binary code, and the node could report the checksum of the current PROGMEM, from which the central software could decide to commend the node to go into OTA bootloading mode.

The code running in the sensor node needs very little to support this. At minimum - nothing, you just do a power cycle and the update happens while the bootloader has control. (A variant of this uses a reed switch to trigger rebooting, so you don't even have to open the case of a battery powered node). Or for nodes that are physically inaccessible, there could be "send me the hash of your current PROGMEM" and "reboot into the bootloader" commands added to the MySensors operational set.

(The "push" OTA bootloading process could differ some in the details eg: it could be initiated by the node checking with central for any update, rather than central sending a command to the node, and othewise work as above. The key thing in the push approach is that the sensor node just reloads whatever program central wants it to load, which is controlled by flexible config at central rather than expecting the sensor node to decide what it will next be programmed with)

(edit: the other difference is that "the proper code to load" is controlled per node, not per node type. So central could load the same code into every type 19 node, or it could differ per physical node.)

Would the OTA bootloader you are writing accommodate "push" bootloading like this, as well as "pull"?

Zeph · 23 Aug 2014, 19:57

Out of curiosity, what's the real world speed like, when doing the bootloading over the MySensor network, with and without a relay node in the middle? Presumably you want reliable delivery so as to not load corrupt code.

Damme · 23 Aug 2014, 20:21

@Zeph
Hmm, My approach would be at the server side decide 'Node 23 needs an update''
Send RESET node 23 (Hmm, I dont know if soft reset executes the bootloader?)
Node 23 ask server 'Do you have an update for me?'
Server : YES! and throws it away

ToSa · 23 Aug 2014, 20:24

Nobody ever mentioned it would be fast
Takes a couple of minutes to load the DallasTemperatureSensor sketch that I user for testing - but depending on where your sensor sits that's still less time than going two stairs up, moving that big cabinet to the side, getting the sensor out going two stairs down again, dissembling the enclosure, connecting it to the PC/Mac, flashing the new firmware and then all of this in the opposite sequence to get it back where it belongs...

ToSa · 23 Aug 2014, 20:25

@Damme that's exactly how it works - and yes, the soft reset (using the watchdog) executes the bootloader and asks the server if a new version is available.

Zeph · 23 Aug 2014, 21:00

@ToSa said:

...For this step to work the details about the node can't be stored "somewhere" in the EEPROM but need to be ad a well defined address so that both the bootloader and the program itself can access the same data.

the master replies with details about the latest program version for the sensor node type

if the response from the master lists the same version as the one installed, the node boots into the existing program

if the master has a new program version then the node starts fetching the new program in small chunks and writes to the 328p program mem

This is the part where it sounds like "pull" - the node decides whether to update based on its own sensor node type - in essence fetching an update for itself if there is newer code fo it's type.

The "push" alternative would have the central authority make that decision on a node by node basis (not just node type by node type) and then tell a specific node to go into update mode.

Implications of pull vs push.

One is that when you change the "latest release" for a node type, in the "pull" case all of the nodes of that type could try to update themselves at once. In the push case, the server could do them one after another, and even space out the updating of nodes if desired to reduce bandwidth.
For another, suppose you had several nodes of the "same node type". Even tho two "heater control" nodes are the same type, they might have different hardware attached. Suppose you decide you want to upgrade JUST one of those nodes, say because there's a safety feature you need to add to just that one based on the heater it's connected to. In the "pull by node type" model, all of your header control nodes will have to be updated if any of them are updated. In the "push by node" model, the server could also choose to update just the one node.
Or suppose you want to split node types. Sometimes there's not exact sensor type defined in Vera, so you pick the closest approximation. Later a better and more specific node type gets defined. But you can't change the node type of a given node, because all nodes of that type will "pull down" the same code.
I'm not a big fan of "node type" as a primary concept of a wireless sensor network anyway (in the current sense). The current concept of "node type" seems more like a "vera_mapping_of_several_variables". What is the "node type" of a sensor node with a DHT-11 on pins 5 and 6 and a LDR on A2? If you swap out the DHT-11 for a DHT-22, this needs different code in the node, so it needs to be a different node type. If you move the LDR to A0, new code needed => new node type so it can fetch the right code upon update. Each combination of inputs and outputs needs its own unique code and thus "node type" for pulling updates.

I've discussed that elsewhere. I see "sensor type" as part of the mapping configuration for a given HA controller, not as something the node itself should care about. "Node type" is even worse, because of the mix and match combination of sensors it may have. "Node type for pulling updates" gets worse still, since the code needs to change not just based on the combination of sensors but based on the the specific hardware (dht-11 vs dht-22) and the pin assignments.

ToSa · 23 Aug 2014, 21:11

@Zeph It actually depends on how that "somewhere" I mentioned in the initial description is coded.

The bootloader sends a message to the controller like "I'm node 23. I'm a temperature node and I'm currently running version 5 of the temperature node sketch which has a CRC of 0xABCD"

The controller sends something back like "You should be running version 6 of the temperature node sketch with CRC 0xFEDC"

So it's truly the controller (the central authority) that decides. At this point what I've done in the NodeJsController (which really is pretty dump and only meant for testing) is that I did not care about the nodeID bud only submitted a response based on the latest version available in the database for the given node type. You could obviously maintain a list of "expected sketches / sketch versions" for each nodeID and drive the decision on what the controller sends back based on that list instead of the node type only.

It really does exactly what you want it to do - the "pull" truly is a "pull for information if the central authority wants me to update". The big benefit of this "pull" setup is that the controller is stateless and just answers each request coming from the node making the code way cleaner and the overall setup way more reliable.

Zeph · 23 Aug 2014, 21:19

@ToSa said:

The bootloader sends a message to the controller like "I'm node 23. I'm a temperature node and I'm currently running version 5 of the temperature node sketch which has a CRC of 0xABCD"

But that's not quite the right info. What it needs to say is "I'm running version 5 of the 18B20 temp sensor on pin 7" sketch, because the temperature node running with a DHT-22, or even an 18B20 on pin 8, needs to use different code.

Or ""I'm running version 52 of the 18B20 temp sensor on pin 7 and power blind relays on pins 5 and 6 and an IR detector on pin 12"

So I'm suggesting that the node say "I'm node 23, my PROGMEM has CRC 0xABCD, do you want me to load anything differrent". The rest is up to the server.

The bootloading code does not need to know what "type" the node is, only a signature of the PROGMEM. The server can then decide what code that specific node should be running instead, if any. Concepts like sensor types or node types or even sequences of versions are irrelevant to bootloading as seen from the node end.

At the server end, it has a table that says "node 23 should be running XYZZY.hex which has a signature of 0xAC3E". If that's not what it's doing, then at a time of the server's choosing, it can tell node 23 to update itself and send the appropriate program bytes. (At this point, the actual transfer of bytes from server to the node's PROGMEM, your current approach is fine, I'm talking about a higher level of the protocol or architecture).

ToSa · 23 Aug 2014, 21:52

@Zeph: yep - that means in your case you don't really care about the node type. In other scenarios where you have 60 nodes installed and 20 of them are relay nodes, another 20 are switch detectors and another 20 are temperature sensors (all of them having the same hardware setup) the node type is pretty useful. For your specific need you probably should not care about the node type at all - maybe set node type == nodeID and that's it. The additional 2byte payload should not matter too much.

The ideal setup from my perspective would look like this (dreaming): based on the information shared back (combination of sensors and pin connections) the controller would reassemble the source code and build a new sketch for the given configuration, compile it and send it (I'm not kidding - I worked on a very similar approach a few years back).

Reality is: this is meant to be a bootloader for MySensors. The way MySensors currently works is that the combination you mentioned (18B20 temp sensor on pin 7 and power blind relays on pins 5 and 6 and an IR detector on pin 12) requires a specific sketch to be loaded that has these pin assignments etc. hard-coded.

This is the piece of code you would want to adjust - at this point it pulls all available firmware records for the given type and sorts descending by version - which delivers the highest available version back as the first record:

db.collection('firmware', function(err, c) {
	c.findOne({
		$query: {
			'type': fwtype
		},
		$orderby: {
			'version': -1
		}
	}, function(err, result) {

Instead the "expected firmware" type and version could be an attribute for the given node in the "node" collection which is manually maintained:

db.collection('node', function(err, c) {
	c.findOne({
		'id': destination
	}, function(err, noderesult) {
		db.collection('firmware', function(err, c) {
			c.findOne({
				'type': noderesult.expected_firmware_type,
				'version': noderesult.expected_firmware_version
			}, function(err, result) {

Damme · 23 Aug 2014, 21:56

@ToSa Still I wonder if there is any OTA bootloader / protocol readme (So I dont have to dissect the nodejs code to write my own implementation)

ToSa · 23 Aug 2014, 22:15

@Damme look at NodeJsController/Readme.html - actually for now better look at this version which has a couple of updates (will send another pull request tomorrow for the documentation as well as some minor changes).

If you are looking for tech documentation (protocol etc.) that's not yet included but the communication is fairly easy (complexity is mainly to make it robust - not kill a node if something goes wrong etc.):

the bootloader is using the same procedure to find its parent / request a nodeID etc. as a normal MySensors sketch would do
then a config request / config response is exchanged between node and controller
assuming an update is needed a series of code block requests / responses is executed until the full firmware is submitted

Data is submitted as binary - you can see the message payload details in MyOtaBootloader.h:

typedef struct
{
uint16_t type;
uint16_t version;
} FirmwareConfigRequest;

typedef struct
{
uint16_t type;
uint16_t version;
uint16_t blocks;
uint16_t crc;
} FirmwareConfigResponse;

typedef struct
{
uint16_t type;
uint16_t version;
uint16_t block;
} FirmwareRequest;

typedef struct
{
uint16_t type;
uint16_t version;
uint16_t block;
uint8_t data[FIRMWARE_BLOCK_SIZE];
} FirmwareResponse;

Zeph · 23 Aug 2014, 23:12

@ToSa said:

yep - that means in your case you don't really care about the node type. In other scenarios where you have 60 nodes installed and 20 of them are relay nodes, another 20 are switch detectors and another 20 are temperature sensors (all of them having the same hardware setup) the node type is pretty useful

Suppose you do have 20 identical temperature nodes. It's trivially simple tor the server to tell each one of them to update to the same code in the "push by node" model. Not only that, but the server gets to decide when to allocate the bandwidth for each node.

Unfortunately, in the "pull by node type" model, you have no way to update some nodes of the given type and not other nodes of that type.

The "push by node" model easily handles any case the "pull by node type" model does, but the opposite is not true.

To even approach the "push by node" dynamics with "pull by node type" design, you have to have two concepts of "node type" which must not be conflated.

node type for purposes of the user interface
node type for purposes of updating the code in the ATMega328p

When you say "20 temperature nodes" the concept of "node type" would be meaningful in the first sense if you mean "20 nodes containing only a temperature sensor for the HA Controller to display".

But for updating the PROGMEM, the concept of "node type" needs to be "20 nodes containing only a temperature sensor of type DHT-11 on pin 6".

A node with a DHT-22 or 18b20 on pin 6, or a node with a DHT-11 on pin 5, would be the same "node type" for purposes of the user interface (which doesn't care), but different "node types" for purposes of updating PROGMEM.

Once you start considering node type = node id in some cases, it becomes simpler to just ignore the already messy and problematic "node type for purpose of update" concept (as seen by the node) and just do updates per node, period. If you want it, you get the functionally of "update all nodes that use identical code" essentially for free at the server end with the push by node model anyway, PLUS the ability to update individual nodes of any to to run any code you want, and when you the server want to schedule it. I don't see the downside of push-by-node here.

At worst, the server could have a table of node-id to "node type" for lookup and then follow your same dynamics. That's not how I'd do it (this model allows even simpler and more flexible options), but it would be a tiny "shim" to allow the more flexible "push by node" model to emulate the "pull by node type" dynamics if a given implementer so desired.

(Just by the way, this discussion is for me fun and mutually respectful brainstorming, I hope it lands that way).

ToSa · 23 Aug 2014, 23:54

@Zeph not sure what you are asking for as I mentioned above that you can use the bootloader as it is today to just update specific nodes (by nodeID, update one and not update another even if they have the same node type). The implementation is not a "pull by node type" but it's a "pull by node ID, node type, node version" - which information the controller uses to decide if an update should be executed is is up to you!!!

Terminology: the "node type" I'm referring to means the specific setup of the hardware - only if that's the same then the node type would be the same (combination of sensors / pin connections / same sketch to be used). The back-end cares about a node type because it needs to know which sketch to use/send.
The user interface ideally never cares about a node type but really cares about the specific sensor type(s). This "translation" needs to happen in the background no matter if you use an OTA bootloader or not.

Examples:

Let's assume you have two nodes in the living room - the user interface should just show "living room temperature" no matter if the temperature sensor is connected to node 1 together with the light switch or connected to node 2 together with the blinds. This "translation" needs to happen anyways - ideally in the controller.
Let's assume you have two temperature sensors connected to one node - one measures the room temperature at 1.5m height and one is a floor temperature (not unusual for floor heating). Just knowing that there are two temperature sensors but not knowing which one is which will not be sufficient for the heating controller to make the correct adjustments. Again that translation from "node 23 with one XYZ temp sensor on pin 5 and one XYZ temp sensor on pin 7" to "node 23 temp sensor at pin 5 is the floor temperature" needs to happen anyways.

Zeph · 24 Aug 2014, 00:32

OK, maybe we are converging in some ways. I'll try to list the similarities as well as differences.

So we agree that a "node type" doesn't mean a generic "termperature node" but a very specific "my sketch for controlling an 18b20 on pin 7".

Let's suppose the sketch was called "DevDuino_18b20_7.ino" (for the moment let's leave out auto-scripting).
This compiled into DevDuino_18b20_7.hex and the binary equivalent.

We would know that nodes 7, 12, and 15 should have the latest version of this sketch. (node 4 might also measure temperature but with different hardware or pin configuration, so it would not use this sketch).

If we just want to update the sketch, we would send copies of the new binary to nodes 7, 12, and 15.

So far I think we are on nearly the same page, at the generic level described. Where the "push by node" and "pull by node type" models differ is in where the knowledge that nodes 7, 12, and 15 run the same code resides. In push-by-node, the Server knows that it should send the same code to them; in the "pull by node type" model, those three nodes themselves know they want updates for a given numeric "node type".

Also, each node knows it has a given version of the firmware of its node type, and decides when to upgrade by comparing that with what the server offers.

The differences are highlighted better when we make more than a version update.

Suppose we decide to make use of pin 2 of node 7 to control an LED. At this point we need to load different firmware into node 7, and we write a sketch called "DevDuino_18b20_7_LED_2.ino". (forgive the naming, it's an example). So now we want to change the overall system configuration so that the server will load DevDuino_18b20_7_LED_2.hex into node 7. (Nodes 12 and 15 still have the other sketch without LED control and maybe always will, the new sketch is not a new version of the old sketch)

I am suggesting that all you have to do is create the new sketch (or rather it's hex or binary compiled form) and configure the server to send that to node 7 instead of the previous sketch. It doesn't matter one bit to the node whether it's switching to a completely new sketch versus a new version of the current sketch. That's in the server logic only, not OTA.

I think you are saying that your node.js server can accomodate changing what sketch (not just version) runs in each node, because it can load arbitrarily different (or identical) binary files to each node by node id, with no limitations based on what "type" that node used to be, right? Or not?

In my thinking, the node has no need to know that it's running "node type 453, version 16".

The server is free to conceptually organize firmwares by "node type" and "version" if it wishes, but those concepts do not need to be pushed down to the node level.

Pull by node type:

 Node: I'm type 453, what is the latest version of 453?
 Server: Queries database for max version of node type 453, and say "Latest 453 version is 17"
 Node: Checks that it has version 16, asks server to send 453 ver 17 for OTA programming
 Server: sends version requested by node
 (programming done)
 Node: ask server for lastest version for node type 453
 Server: ... 17
 Node: I'm version 17, no change needed

Push by node:

Node: I'm node 7 and my PROGMEM signature if 0x54FE
Server: My config says node 7 should have the latest DevDuino_18b120_7_LED_7.hex" with Sig 0x3EE5
Server: Please load the following binary into your PROGMEM (sends appropriate version)
(programming done)
Node: I'm node 7 and my PROGMEM sig is 0x3EE5
Server: Mark that one as updated

Notice that there's no problem of forgetting to update the version number - if the signature (eg: CRC or hash) in PROGMEM isn't what the server wants there, then it starts an OTA programming session, period. Even if the sig was wrong because the programming had a glitch rather than because it is out of date, the server knows it's wrong and sends again.

In the push-by-node model, the system is not limited to "updating all nodes with node type 453 versions <= 16 to version 17" -- it can load an entirely different sketch (node type 763 version 0 if the server thinks in those terms) into the node if it wishes. And the node doesn't care, it doesn't need to know what "node type" it is or compare versions (that's in the logic of the server), all the node needs to know is that the server wants it to load some code into PROGMEM, period.

That's how I imagine things working. As I have understood you, you are pushing the concept of "node type" and "version number" and the comparison of version numbers down to the node itself, rather than letting the server handle that (if it chooses). I don't see the advantage of that; the server seems a more logical place for that information - both simpler and more flexible.

Your node.js implementation might organize source code for nodes by numeric "node type" (where your node type 7 isn't the same as my node type 7) and by version within node type. That would still be supported by "push by node" model.

But another server might choose to organize firmwares by string filename plus signature (eg; CRC). It's configured with a simple table of node_id,filename. It computes the signature of the binary file to be compared with that the node reports it has in PROGMEM. There's no numeric "node type" or "version number" needed. (If you really want to also keep obsolete versions of the binary firmware on the server, there are easy workarounds fore that too). The config is dead simple: (NodeID, Filename)* Or optionally (nodelist, Filename)* if you want to reduce the number of times you spell out the filename. Use the latest (or only) copy of the given filename.

The cool thing about the "push by node" approach is that the same node bootloader can easily accomodate both server approaches (node type # + version # OR nodeID->filename) - since the only concepts the node uses are

"I can tell you the signature of what I have in PROGMEM now", and
"If you tell me to I'll load something of your choice".

For OTA bootloading the node doesn't know need to know or care about "node type" numbers or versions, nor about file names or date stamps.

So I'm not trying to eliminate your concept of the server of assigning numeric ID's to each combination of sensors and pins, and using ordered versions numbers within each "version type". I just don't see why those concepts need to also be pushed down into the node and OTA bootloader protocol. With the "push by node" model, there''s more flexibility to organize changing firmwares in the server as you prefer OR in other ways, with no meaningful cost, because the node end of the OTA programming system has been distilled to just the essence that it really needs to understand, leaving higher levels of management as a server-internal affair.

Damme · 28 Aug 2014, 11:57

@ToSa I've been looking through the ota bootloader and noticed there are alot of uint16_t wich can be replaced with uint8_t.. saves 128bytes of code. Still needs ~900bytes less until 1024 words bootloader though but is makes more space for other stuff

JeJ · 28 Aug 2014, 17:05

When i'm trying to run the NodeJsController.js script i always end up with "Error: Cannot open /dev/ttyAMA0"

I'm running a RPi with a serial gateway.

Any ideas?

mikeones · 28 Aug 2014, 17:10

On my RPi, the serial gateway is detected as dev/ttyUSB0.

Damme · 28 Aug 2014, 18:45

@ToSa I've been working on getting OTA to work with MQTTgateway with some success.

But I do have problem with some packages missing and I think the communication should be something like this;
bootloader checks id and version and server said there is an update. (no change from today)
but then:

[bootloader] 0000 has CHK FF(just filler in first package) REQ 0000 type 01 version 01
[server] load 0000 from hex, send addr 0000 0C9428030C9447240C9474240C947605 C7
[bootloader] 0000 has CHK FF, REQ 0010 type 01 version 01
[server] (checksum mismatch) send addr 0000 0C9428030C9447240C9474240C947605 C7
[bootloader] 0000 has CHK C7, REQ 0010 type 01 version 01
[server] load 0010 from hex send addr 0010 0C94A3050C94D0050C9480100C945003 00
And so on..

what do you think about this? the total package is 32bytes, mysensors header is 7bytes. and this layout would need 19 bytes from server to bootloader..
I Have also seen some intel hex that is not in order 0010 0020 0030 etc but it could jump address. I do not think arduino ide does this but you never know..

EDIT:
I havn't read this one yet but I guess there is alot of good stuff in it
http://www.nordicsemi.com/eng/nordic/download_resource/10878/2/94069421

ToSa · 28 Aug 2014, 20:46

@JeJ @mikeones : how is your serial gateway connected? using a USB-Rs232 cable or via the GPIO pins on the RPi? Did you check the Readme.md in the NodeJsController directory?

Zeph · 28 Aug 2014, 20:49

@Damme said:

what do you think about this? the total package is 32bytes, mysensors header is 7bytes. and this layout would need 19 bytes from server to bootloader..

16 bit offset, 16 data bytes, one byte checksum, right?

I Have also seen some intel hex that is not in order 0010 0020 0030 etc but it could jump address. I do not think arduino ide does this but you never know..

I see that your descriptions say "0010 from hex" etc, but I thought you would be fetching from a binary blob to satisfy requests from the bootloader. As in:

Server reads the Intel hex and uses it to fill in an array of bytes. (one time, or each time a given file is requested)
Server sends requested 16 byte chunks of that array to bootloader

In that case, it doesn't matter what order the original hex lines are in, or even if they are 16 or 32 bytes wide (or less than 16 bytes at the end).

Damme · 28 Aug 2014, 20:54

@Zeph true (array) and Yes, so the node can request same address twice (might be a timeout) and verify checksum on every 16byte data.

ToSa · 28 Aug 2014, 21:09

@Damme said:

@ToSa I've been working on getting OTA to work with MQTTgateway with some success.

great!

[bootloader] 0000 has CHK FF(just filler in first package) REQ 0000 type 01 version 01
[server] load 0000 from hex, send addr 0000 0C9428030C9447240C9474240C947605 C7
[bootloader] 0000 has CHK FF, REQ 0010 type 01 version 01
[server] (checksum mismatch) send addr 0000 0C9428030C9447240C9474240C947605 C7
[bootloader] 0000 has CHK C7, REQ 0010 type 01 version 01
[server] load 0010 from hex send addr 0010 0C94A3050C94D0050C9480100C945003 00
And so on..

what do you think about this? the total package is 32bytes, mysensors header is 7bytes. and this layout would need 19 bytes from server to bootloader..

If I understand correctly you would send the CRC of the previous bloak back to the server together with the request for the next package - the server would then send the next block if the CRC is correct or resend the previous block if the CRC is not ok...
I'm wondering how the bootloader would ever run into that situation. The package itself is checksum'ed already and wouldn't be treated as correctly received package if the checksum is incorrect. Only if the previous block was received correctly the next block is requested. If that doesn't happen within a given amount of time the same block is requested again.
Can you explain a bit further what issues you are running into?

@Damme said:

I Have also seen some intel hex that is not in order 0010 0020 0030 etc but it could jump address. I do not think arduino ide does this but you never know..

Yes, I've seen that as well - actually Codebender does that sometimes and the "HEX file loader" function on the NodeJsController would need to be adjusted to address that (I'm not reading from the .hex file when a package arrives but from a byte array in Mongo)
Probably the best approach for this one would be to add the next address the bootloader should request to the previous package (e.g. "here is the data for 0x0000 - next request should be for 0x0010) and allow for variable block length because that's another one where Codebender sometimes uses <16 byte rows in the hex (not just at the end). The code on the bootloader side would get a little more complex as the flash is still written in pages and these might not line-up anymore with the 16byte blocks.

@Damme said:

I havn't read this one yet but I guess there is alot of good stuff in it
http://www.nordicsemi.com/eng/nordic/download_resource/10878/2/94069421

That chip is a totally different animal - same RF but 8051 MCU. I'll have a look at the AppNote tomorrow.

ToSa · 28 Aug 2014, 22:00

@Damme said:

@ToSa I've been looking through the ota bootloader and noticed there are alot of uint16_t wich can be replaced with uint8_t.. saves 128bytes of code. Still needs ~900bytes less until 1024 words bootloader though but is makes more space for other stuff

I'll have a look. I've taken the code from an earlier project and adjusted to MySensors - didn't review the variable types that much. I'm using CRC16 as well where CRC8 might be sufficient...

EDIT: got it down a little from 0x0E18 to 0x0DD0 (72 bytes) changing a few loop counters from uint16 to uint8. I don't want to change type to 8bit looking at the large amount of sensors people are asking for / working on. FOr version I'm planning to keep some of these running for a long time with as little maintenance as possible. With some software improvements over time and minor version changes during development 16bit for version seems to be the better fit as well.

mikeones · 28 Aug 2014, 23:37

@ToSa I use a Mini-B USB cable between my PRi and my gateway.

ToSa · 29 Aug 2014, 07:04

@mikeones said:

dev/ttyUSB0

Then /dev/ttyUSB0 is correct. /dev/ttyAMA0 would only be valid for the on-board serial port on the GPIO pinheads.
Are you running into any issues once you set the port in NodeJsGateway accordingly?

JeJ · 29 Aug 2014, 09:42

@ToSa I have my gateway connected via the GPIO and i have followed the steps in the Readme.md.
I will try to use a USB-Rs232 cable and see what happens.

ToSa · 29 Aug 2014, 09:50

@JeJ one potential reason is that the port is already in use. I mentioned somewhere that the startup script doesn't yet stop the NodeJsController correctly. Maybe you already have a NodeJsController process running? Try "sudo killall node" and then try starting it again. To check if the port itself is working you can try to open a simple terminal (minicom etc.) and reset the gateway.

Zeph · 30 Aug 2014, 04:02

@ToSa said:

With some software improvements over time and minor version changes during development 16bit for version seems to be the better fit as well.

Hmm. That seems like overkill, if I'm understanding correctly. (So maybe I am not understanding).

What I heard was:

Each sensactuator node has a "node type" and a "version" within that node type. Each combination of sensors and pin assignments has a unique "node type" (within a given wireless network). A node can only be OTA updated to a newer (higher) version of the same "node type" of the current firmware, and all nodes of that "node type" will be updated.

And extra byte for "version" isn't a big deal tho.

Will there be one or two bytes for "node type"?

Damme · 30 Aug 2014, 04:19

@Zeph 16bit calculations on a 8bit mcu will always come to a price. Imo I think we should try to keep things to 8bit as much as possible. but I dont know if its possible to shred another 900bytes out of the bootloader to fit in one less size of space (1024 words instead of 2048 words). Might be if we make a mini version of mysensors/mymessage

ToSa · 30 Aug 2014, 10:41

@Zeph said:

Each combination of sensors and pin assignments has a unique "node type" (within a given wireless network).

Actually that's part of the question - as @hek mentioned there is a desire to sell MySensors hardware - at some point there might be not just generic pinhead PCBs but real fit-for-use devices. Ideally these would have a unique node type assigned not just within a given network. New firmware could be published on mysensors.org (or via codebender or...) and based on the unique (but common across networks) node type less tech-savvy people could be secured from sending a firmware that doesn't fit the hardware... I know - a LOT of "IF"s...

@Damme
you are right - probably not the full 900 bytes but additional space could be used for encryption etc. so every reduced byte is beneficial at this point. I'll check later how much can be saved by using CRC8 instead of CRC16.
I'm already using a mini version of mysensors / mymessage: not using the cpp code files at all but just the headers and if you have a look at the "#ifdef __cplusplus" statements just added for that purpose, there is almost nothing left (the MyMessage class is stripped down to a struct and the MySensors class removed completely / enums and #defines should not consume space after compilation)

Zeph · 30 Aug 2014, 17:12

@ToSa

I'm realizing how similar the implementations of your model of updates and mine might be. This is just an early inspiration, not fully thought out.

uint8_t   node_type_id;  // same for multiple nodes
uint16_t version;   // loaded version for given node_type_id
... 
if(new_version > version) {  // test for OTA update needed

versus

uint8_t   node_id;   // unique per node
uint16_t  progmem_crc;  // calculated from PROGMEM
... 
if(new_progmem_crc != progmem_crc) { // test for OTA update needed

This might mean that I could (eventually) use a relatively minor fork of the OTA programming code to get the per-node flexibility that I seek.

ToSa · 30 Aug 2014, 17:55

@Zeph
yes, that's what I meant - you might not even need any fork of the bootloader itself and just a slight adjustment on the controller end - because the nodeID is contained in the packet (not in the payload but in the header as sender address) so you have all you need for your setup

Zeph · 30 Aug 2014, 18:14

@ToSa
The other half is testing inequality between the computed CRC of the application firmware in PROGMEM, with the CRC of the available replacement (rather than comparing for higher version number).

An example use case of the ability to load arbitrary new code into any given node. If I was diagnosing some kind of interference, I might temporarily replace the sensor firmware in some nodes (of varying node-type) with a custom radio test firmware, then later restore each with it's original sensor node firmware.

Suppose we have:

 node 5, node type 17, version 2, PROGMEM CRC 0x4567  // attic
 node 6, node type 3, version 5, PROGMEM CRC 0xABCD  // crawlspace
 node 7, node type 3, version 5, PROGMAM CRC 0xABCD // living room

And I want to temporarily replace the firmware in node 5 and 6, but keep 7 still running as a sensor.

I make RF test code available on the server, with CRC 0x7E57. This is not type 17 or type 3.

I edit the server's table of firmware assignments:

node 5, 0x7E57
node 6, 0x7E57
node 7, 0xABCD  // unchanged

This causes node 5 and 6 (formerly of different types) to load the test firmware when reload is triggered.

Then when testing is done, I edit the table back:

node 5, 0x4567   // back to its old type and version
node 6, 0xABCD  // back to the same type and version as node 7
node 7, 0xABCD  // still unaffected

This causes the normal sensor firmware (type and version) to be loaded back in on the next reload.

There could be more than just a CRC to identify the firmware (in order to avoid the birthday paradox), this is just an example.

An alternate use case is loading in my Halloween firmware to the front yard nodes (but not other nodes) for a week or two, then back..

Or an beta version of type 3, version 6, which I'd like to load on some type 3 nodes for in-situ testing (eg: in the crawlspace), but not all of the type 3 nodes because I want most of the system to continue functioning normally while I test. If the beta is bad, I may revert the test nodes to version 5; once the new version is good, I may convert all type 3 nodes to version 6.

These are some of the reasons I'd like to be able to use OTA programming of any arbitrary firmware into any given node, without being constrained to:

  Only upgrades of the same node type
  Only upgrades to higher version numbers
  Only upgrades of all nodes of the same type or none

And so that's why inequaity testing of the PROGMEM signature on a per-node basis is attractive, not just testing for a higher version number. For similar complexity, we can upgrade to a higher version number, downgrade to a different version number, or change the node type back and forth.

The type and version dynamics (which certainly IS a common use case) can be handled on the server. For example, the server can know what type every node is (kind of a good idea anyway), and can change the node -> signature entry for every node of type 3 to the signature of the next version, and then let it proceed as above to get them all updated. But that's just one option, centrally controlled.

ToSa · 30 Aug 2014, 19:10

@Zeph
from the MyOtaBootloader.c:

if (firmwareConfigResponse->version == fc.version)
	if (firmwareConfigResponse->blocks == fc.blocks)
		if (firmwareConfigResponse->crc == fc.crc)

so as long as you send the same version / blocks / crc back to the node as what iscurrently installed, no update is started. As soon as one of the three elements differs an update is loaded. It's completely in control of the server if (and which) firmware is bootloaded.

Zeph · 30 Aug 2014, 21:05

@ToSa

OK, so version is tested for != rather than for > ? Downgrading is OK?

And CRC is used as well (and block count?) where CRC is based on what's in PROGMEM now?

Cool.

Then I think all that would be needed is for the server to be able to potentially feed back a different firmwareConfigResponse to each node. In my above example (which has been edited for clarity recently BTW, so re-read it), node 6 could receive a different response than node 7 (even tho they both have the same type initially). And thus nodes 5 and 6 (but not 7) could be told to load the test firmware and then later to go back to the old version. Etc.

Is that correct?

It would be a nice enhancement if we could query the node for the CRC (and block count?) of the current PROGMEM, just to help the server stay in sync with what's out there (eg: after a node joins the network). That could be done in the application code, so we don't even have to invoke the bootloader. Then the server could figure out which nodes need to be bootloaded and trigger just those to go into the bootloader (possibly one at a time). These two together support what I call push dynamics.

Damme · 30 Aug 2014, 21:30

@Zeph I've been working on a read / write eeprom address thing in MQTT to be able to reset a node and stuff. But it seams there are more usage for it then. This might be coded into mysensors instead. (utilizing c_internal or somthing as the protocol is today)

ToSa · 31 Aug 2014, 07:46

@Damme said:

@Zeph I've been working on a read / write eeprom address thing in MQTT to be able to reset a node and stuff. But it seams there are more usage for it then. This might be coded into mysensors instead. (utilizing c_internal or somthing as the protocol is today)

Good idea - that would allow to check for current value in normal operation - not just during bootloading.

@Zeph
If you urgently want to have the CRC of the current firmware submitted during bootloading, we can add this as a third parameter to the FirmwareConfigRequest message. Actually I was thinking about getting rid of request/response and use the same format for both which would mean crc would be included anyways.

Damme · 2 Sept 2014, 10:54

This post is deleted!

Damme · 2 Sept 2014, 11:29

I deleted my last message because I though I made a big mistake..

I've been working on a SD <-> OTA loader node, and got most of if working but got stuck on the last piece which is communication.. (i'll release it then I'm finished Ive made a small change in myotabootloader, add on line ~156 msg.destination = OTAGATEWAY; to configure custom ota address)

I cant figure the following out:
Just ignore contents of packages. not relevant.

Node: (Ota<->sd loader)

read: 34-0-254 s=255,c=4,t=0,pt=6,l=4:FFFFFFFF
send: 254-254-0-34 s=255,c=4,t=1,pt=8,l=4,st=ok:0100020000304200

GW:
0;0;3;0;9;read: 34-34-0 s=255,c=3,t=7,pt=0,l=0:
0;0;3;0;9;send: 0-0-34-34 s=255,c=3,t=8,pt=1,l=1,st=ok:0
0;0;3;0;9;read: 34-34-0 s=255,c=3,t=7,pt=0,l=0:
0;0;3;0;9;send: 0-0-34-34 s=255,c=3,t=8,pt=1,l=1,st=ok:0
0;0;3;0;9;read: 34-34-254 s=255,c=4,t=0,pt=6,l=4:FFFFFFFF
0;0;3;0;9;send: 34-0-254-254 s=255,c=4,t=0,pt=6,l=4,st=ok:FFFFFFFF
0;0;3;0;9;read: 254-254-34 s=255,c=4,t=1,pt=6,l=8:0100020000304200
0;0;3;0;9;send: 254-0-0-34 s=255,c=4,t=1,pt=6,l=8,st=fail:0100020000304200
OTA bootloader:
Go
<- 34,34,0,2,3,7,255,
<- 34,34,0,2,3,7,255,
-> 0,0,34,10,35,8,255,0,
<- 34,34,254,34,196,0,255,255,255,255,255,

What am I missing? package from 254 to 34 wont get delivered.
I've also noticed that then 254 tries to send, it wont receive the next transmitted message from OTAbootloader. the next thereafter is received.

ToSa · 2 Sept 2014, 11:40

@Damme
I need to better understand the setup to think about what's going on. My take from the above:

You have three nodes:

Gateway (address 0)
SD OTA Loader Node ?!? (address 254)
Sensor Node (address 34)

Is that right?

Damme · 2 Sept 2014, 11:46

@ToSa Yes, And I think I figured it out.. I by mistake changed BROADCAST_ADDRESS to GATEWAY_ADDRESS in the bootloader then I was playing around. Testing the correct version now..

ToSa · 2 Sept 2014, 11:52

@Damme
interesting setup
to make it work with non-static addressed nodes you should probably keep the destination set to GATEWAY_ADDRESS for the REQUEST_ID call and only change afterwards.

Never mind - looking at the line number you mentioned that's probably what you did

Damme · 2 Sept 2014, 12:48

@ToSa Now I remember why I changed some things in there. (broadcast to gateway)

From the beginning I had problem getting it to talk with the GW. It only sends out
<- 255,255,255,2,3,7,255, and gets no response, The GW tries to send but fails. (wierd..) (I dont have any relay nodes)

This is with no modifications at all.

0;0;3;0;9;read: 255-255-255 s=255,c=3,t=7,pt=0,l=0:
0;0;3;0;9;send: 0-0-255-255 s=255,c=3,t=8,pt=1,l=1,st=fail:0
0;0;3;0;9;read: 255-255-255 s=255,c=3,t=7,pt=0,l=0:
0;0;3;0;9;send: 0-0-255-255 s=255,c=3,t=8,pt=1,l=1,st=fail:0
other packages send out works just fine.. (To other nodes)

and the OTA bootloader can receive other packages
Go
<- 255,255,255,2,3,7,255,
-> 23,23,0,42,225,1,11,205,204,90,66,1,
<- 255,255,255,2,3,7,255,
<- 255,255,255,2,3,7,255,
<- 255,255,255,2,3,7,255,

(from a temp / hum node)

Any ideas how to fix this?

Damme · 3 Sept 2014, 07:45

@ToSa I finally figured out why my OTA bootloader didn't read any answers from my GW (Both on I_FIND_PARENT and I_ID_REQUEST) - The answers came to quick! First I tried hardcode a delay 125ms on the GW and it worked, so I changed the code on send write to the following and now all messages arrive. Been testing it for a couple of reboots now. I'm using 5v (at 3.3v) and 16MHz
edit; noticed it misses packages sometimes now but not close to 100% like before, more like 5% now. I'llinvestigate futher then I'm trying to upload data.

  static uint8_t sendAndWait(uint8_t reqType, uint8_t resType) {
  	msg.type = reqType;
  	for (uint8_t i = 0; i < 10; i++) {
  		sendWrite(msg);
  		for (uint8_t j = 0; j < 20; j++) {
  			for (uint8_t j = 0; j < 100; j++) {
  				uint8_t pipe;
  				boolean avail = available(&pipe);
  				wdt_reset();
  				if (avail && pipe<=6) {
  					read(rmsg.array,pipe);
  					if(!(mGetVersion(rmsg) == PROTOCOL_VERSION))
  						continue;
  					if (rmsg.destination == nc.nodeId) {
  						if (mGetCommand(rmsg) == C_INTERNAL) {
  							if (rmsg.type == I_FIND_PARENT_RESPONSE) {
  								if (rmsg.data[0] < nc.distance - 1) {
  									nc.distance = rmsg.data[0] + 1;
  									nc.parentNodeId = rmsg.sender;
  									eeprom_write_byte((uint8_t*)EEPROM_PARENT_NODE_ID_ADDRESS, nc.parentNodeId);
  									eeprom_write_byte((uint8_t*)EEPROM_DISTANCE_ADDRESS, nc.distance);
  								}
  							}
  						}
  						if ((mGetCommand(rmsg) == mGetCommand(msg)) && (rmsg.type == resType))
  							return 1;
  					}
  				}
  				delaym(1);
  			}
  		}
  	}
  	return 0;
  }

Damme · 5 Sept 2014, 07:48

I had to put my project in the trash bin.. There is not enough RAM in the atmega328 to fit mysensors and SD-lib Tried 3 different versions..Too bad..! I could only transmit one package before SRAM got overrunned.

Yveaux · 5 Sept 2014, 08:58

@Damme Now do you get why I abandoned the MQTT implementation on the ATMega itself?

Zeph · 5 Sept 2014, 16:39

@Damme said:

I had to put my project in the trash bin.. There is not enough RAM in the atmega328 to fit mysensors and SD-lib Tried 3 different versions..Too bad..! I could only transmit one package before SRAM got overrunned.

If you want to stay with the AVR:

ATMega1284 based: http://lowpowerlab.com/moteino/#whatisitMEGA $20+shipping. This can add a RF69* radio, but you could instead (or also) attach a nRF24L01+

ATMega2560 based: http://www.ebay.com/itm/121391548557 $15 shipped This has even more lines broken out than the Arduino Mega2560. (If you don't mind the larger form, Arduino Mega2560 clones start under $14 shipped).

Or you can switch to an embedded ARM system. Teensy 3.1 for $17+ship. STM32F103 board on eBay for $7. DUE clone on eBay for $18. STM Nucleo from distributors for $11+ship (eMed programmed).

And of course you can use the Raspberry Pi or BeagleBone Black with a faster but power hungry ARM running Linux.

ToSa · 5 Sept 2014, 17:25

@Damme said:

I had to put my project in the trash bin.. There is not enough RAM in the atmega328 to fit mysensors and SD-lib Tried 3 different versions..Too bad..! I could only transmit one package before SRAM got overrunned.

Did you try not using MySensors but the reduced nRF24 driver version I used for the bootloader itself? If the node does not need to do routing and is expected to only respond to one or two types of messages, that might be an option and is definitely smaller...

gregl · 25 Sept 2014, 01:43

Hi @ToSa

Do you know if your OTA Bootloader uses more flash mem on the atmega328 vs the Optiboot bootloader?

I have a sketch which needs Optiboot to fit and really hoping i can one day use your cool OTA stuff.

Looking at your github it seems MyOtaBootloader.hex is 9.36k whereas Optiboot v5,0a the optiboot_atmega328.hex is 1.418k ( but there is also a optiboot_atmega328.lst file at 19.778k - I dont know enough to know if this is part of the bootloader or if its just the source???)

Cheers,
Greg

Zeph · 25 Sept 2014, 04:17

OptiBoot fits in 1/2 KiB (the binary on-chip size, not the hex file).
The OTA Bootloader is obviously larger. If you are using close to 31.5KiB on an ATMega328p, it won't fit with any bootloader larger than OptiBoot.
.
Make sure you use the latest compiler - 1.5.7 seems to squeeze harder (smaller binary), and I presume 1.0.6 (which has also been upgraded to a newer compiler) will also do so.

How big does the compiler say your sketch is, at the end of a compile?

gregl · 25 Sept 2014, 04:32

@Zeph said:

use the latest compiler - 1.5.7 seems to squeeze harder (smaller binary), and I presume 1.0.6 (which has also been upgraded to a newer compiler) will also do so.

How big does the compiler say your sketch is, at the end of a compile?

I'll check when i get home tonight.

Very interesting about the new compiler. I'm pretty sure im using 1.0.5-r2, and i've recently begun using Atmel studio with Visual Micro addon - just soooo much better when dealing with long sketches!!!

Cheers,
Greg

ToSa · 28 Sept 2014, 08:12

@gregl the fuses determine how much space is reserved for the bootloader. I don't have the datasheet at hand but I think it varies from .5k to 4k. Optiboot is one of the smallest out there and the OTA bootloader consumes the full space because it needt to includs the (shrinked) wireless driver.

Zeph · 29 Sept 2014, 18:54

I think the OTA bootloader which does not rely on any extra memory is a great option!

And I think the option of having external flash may work out well too. Sending an image to be written to SPI flash might not expand the application code as much (it already has the library). Then the bootloader just has to copy from SPI to application flash. I'm thinking that might make for a smaller total footprint, since there would be no need to fit 2 nRF libraries in the 32KiB flash - a trimmed down nRF library in the boot section plus the full nRF library in application section.

Of course, if the application gets hosed, you would not be able to do OTA bootloading and would have to physically access the node to recover, but if it's a matter of fitting or not fitting into FLASH, that might be a risk one is willing to take.

marceltrapman · 30 Sept 2014, 06:50

@Zeph said:

Sending an image to be written to SPI flash might not expand the application code as much (it already has the library).

Because I am sure that Flash memory will come in handy sooner or later I have added the Winbond W25X40 to the first version of my board already :).

tekka · 17 Nov 2014, 12:13

just managed to optimize the OTA bootloader to under 2k - 2k more for sketches with OTA.

ToSa · 17 Nov 2014, 12:54

@tekka
That sounds great! Can you post your code (or share a pull request)? This would either allow us to free up the remaining space or to add encryption
The only neck-breaker would be if any of the size reduction increases the risk for a bricked node that needs manual intervention (e.g. reset / power cycle etc.).

tbowmo · 16 Jan 2015, 21:50

what is the current status of the OTA firmware updates? Is external flash necessary at all? Is someone working on it?

Right now I am trying to get DualOptiboot from lowpowerlabs.com to work with my board, and an external flash / eeprom. But was wondering if it would be used at all.

Then when I got the bootloader working, I need to test firmware updates, but the road is still long and windy to get there (at the moment, only have 1 hour here, and there, to work on the hardware)

tekka · 17 Jan 2015, 12:27

@tbowmo
Did some work on the OTA bootloader: combined optiboot (for uploads via IDE / avrdude) + OTA bootloader with some major modifications. Current version is stable and works for regular OTA updates

I will post the source once I find some spare time to clean and comment it.

klim · 18 Jan 2015, 12:13

hi, is your work on ota based on internal o external flash?

tekka · 18 Jan 2015, 12:15

@tbowmo
internal: FW streaming via controller

tbowmo · 18 Jan 2015, 12:54

@tekka

I have been thinking about OTA the last couple of days, while trying to get DualOptiboot working (using external SPI flash). If program directly, then the firmware needs to be send in a ordered way by the controler, in order for the bootloader to get things done.. What if a single package is missed?

What if we have 100 nodes, that all needs software update at the same time? Is the system able to handle that?

A part of me says, go with the direct method (that is, skipping external SPI flash) for my minimized module, but then again.. I realy want to have the added "security" of having an external flash, where I can download to, and only when checksums are correct, then I can issue a "Reload software command" to the node.

In theory, I could send the software to 100 nodes, and then when they all are ready, broadcast an "restart node" to all the affected nodes.. (future plans I know..)

Just my thoughts rambling around in my head

tekka · 18 Jan 2015, 13:34

@tbowmo

Yes, in the current setup, the node requests FW blocks the way they will flashed, i.e. page-wise. If one block is missing, the bootloader will re-request that block several times and reboot after a few unsuccessful attempts. The nRF24L01 has CRC on the payload and auto-retransmission of corrupt payloads (see RFInit; 15x every 150us).
As soon as the OTA update is initiated, the CRC is invalidated and the sensor remains in the bootloader until the update is successful and the CRC is valid.

Updating 100 nodes simultaneously: for my understanding, the limitations are if the gateway and/or repeater nodes can handle the traffic and the connection quality). Updating sensors semi-sequentially (e.g. 5 nodes at a time) works from my experience.

Having added "security" with an external flash is certainly a nice feature (and opens other very interesting applications), but is it that important for OTA updating nodes with a down-time of a few minutes? Again, one could instruct the controller to update the nodes in a controlled fashion...

tbowmo · 18 Jan 2015, 13:58

@tekka

Sure for a temperature sensor, downtime of a couple of minutes is not an issue, but there might be other types of nodes that shouldn't have downtime.

What if we bring in the WAF? Let's say the node we want to update is the one that turns on light in the wife's walk in closet, And she is getting ready for a night out with the girlfriends Then "a few" minutes downtime could be fatal to your own health

tekka · 18 Jan 2015, 14:29

@tbowmo
...lol shouldn't you be preparing for bar hopping instead of updating the sensors?
as mentioned previously, no big deal to have different updating options/sources in the bootloader, I will think about that

tbowmo · 18 Jan 2015, 18:14

@tekka said:

@tbowmo
...lol shouldn't you be preparing for bar hopping instead of updating the sensors?

Someone had to be at home watching the kids. And when the wife is out, we have the time to spend on fun projects, instead of doing the laundry or whatever tasks she could figure out she wanted help with

Dheeraj · 26 Feb 2015, 18:37

@Damme Is OTA working for you..I am also getting the same message from Gw.

skywatch · 1 Mar 2015, 11:21

I would really like to get OTA working here as it's freezing outside and I have to go there to update the software in the greenhouse control system.

So please, can we have a 'how to' step-by-step guide to OTA? Please?

S.

Over the air updates

Suggested Topics

Update RF24 library to latest version
Bug Reports • 23 Mar 2014, 23:37 • andriej 24 Mar 2014, 22:52

ESP32 with LoRa
General Discussion • 31 Jan 2023, 11:06 • dhanushmh a day ago

Code Garage to the rescue.
General Discussion • 11 Jan 2025, 11:25 • skywatch 23 Feb 2025, 14:57

Which device I have to use to connect with accelerometer before connecting to my pc?
General Discussion • 24 Jan 2023, 17:16 • Yada Kijsathan 19 days ago

Is it possible to extract child ID from a just sent message?
General Discussion • 8 Nov 2021, 09:21 • Nigel31 5 days ago

Human presence sensors....
General Discussion • 31 Jan 2025, 10:54 • skywatch 9 Feb 2025, 19:36

12
Online

11.5k
Users

11.1k
Topics

112.8k
Posts

Over the air updates

Suggested Topics

Update RF24 library to latest version Bug Reports • 23 Mar 2014, 23:37 • andriej 24 Mar 2014, 22:52

ESP32 with LoRa General Discussion • 31 Jan 2023, 11:06 • dhanushmh a day ago

Code Garage to the rescue. General Discussion • 11 Jan 2025, 11:25 • skywatch 23 Feb 2025, 14:57

Which device I have to use to connect with accelerometer before connecting to my pc? General Discussion • 24 Jan 2023, 17:16 • Yada Kijsathan 19 days ago

Is it possible to extract child ID from a just sent message? General Discussion • 8 Nov 2021, 09:21 • Nigel31 5 days ago

Human presence sensors.... General Discussion • 31 Jan 2025, 10:54 • skywatch 9 Feb 2025, 19:36

12Online

11.5kUsers

11.1kTopics

112.8kPosts

Update RF24 library to latest version
Bug Reports • 23 Mar 2014, 23:37 • andriej 24 Mar 2014, 22:52

ESP32 with LoRa
General Discussion • 31 Jan 2023, 11:06 • dhanushmh a day ago

Code Garage to the rescue.
General Discussion • 11 Jan 2025, 11:25 • skywatch 23 Feb 2025, 14:57

Which device I have to use to connect with accelerometer before connecting to my pc?
General Discussion • 24 Jan 2023, 17:16 • Yada Kijsathan 19 days ago

Is it possible to extract child ID from a just sent message?
General Discussion • 8 Nov 2021, 09:21 • Nigel31 5 days ago

Human presence sensors....
General Discussion • 31 Jan 2025, 10:54 • skywatch 9 Feb 2025, 19:36

12
Online

11.5k
Users

11.1k
Topics

112.8k
Posts