Over the air updates


  • Code Contributor

    @Zeph It actually depends on how that "somewhere" I mentioned in the initial description is coded.

    The bootloader sends a message to the controller like "I'm node 23. I'm a temperature node and I'm currently running version 5 of the temperature node sketch which has a CRC of 0xABCD"

    The controller sends something back like "You should be running version 6 of the temperature node sketch with CRC 0xFEDC"

    So it's truly the controller (the central authority) that decides. At this point what I've done in the NodeJsController (which really is pretty dump and only meant for testing) is that I did not care about the nodeID bud only submitted a response based on the latest version available in the database for the given node type. You could obviously maintain a list of "expected sketches / sketch versions" for each nodeID and drive the decision on what the controller sends back based on that list instead of the node type only.

    It really does exactly what you want it to do - the "pull" truly is a "pull for information if the central authority wants me to update". The big benefit of this "pull" setup is that the controller is stateless and just answers each request coming from the node making the code way cleaner and the overall setup way more reliable.


  • Hero Member

    @ToSa said:

    The bootloader sends a message to the controller like "I'm node 23. I'm a temperature node and I'm currently running version 5 of the temperature node sketch which has a CRC of 0xABCD"

    But that's not quite the right info. What it needs to say is "I'm running version 5 of the 18B20 temp sensor on pin 7" sketch, because the temperature node running with a DHT-22, or even an 18B20 on pin 8, needs to use different code.

    Or ""I'm running version 52 of the 18B20 temp sensor on pin 7 and power blind relays on pins 5 and 6 and an IR detector on pin 12"

    So I'm suggesting that the node say "I'm node 23, my PROGMEM has CRC 0xABCD, do you want me to load anything differrent". The rest is up to the server.

    The bootloading code does not need to know what "type" the node is, only a signature of the PROGMEM. The server can then decide what code that specific node should be running instead, if any. Concepts like sensor types or node types or even sequences of versions are irrelevant to bootloading as seen from the node end.

    At the server end, it has a table that says "node 23 should be running XYZZY.hex which has a signature of 0xAC3E". If that's not what it's doing, then at a time of the server's choosing, it can tell node 23 to update itself and send the appropriate program bytes. (At this point, the actual transfer of bytes from server to the node's PROGMEM, your current approach is fine, I'm talking about a higher level of the protocol or architecture).


  • Code Contributor

    @Zeph: yep - that means in your case you don't really care about the node type. In other scenarios where you have 60 nodes installed and 20 of them are relay nodes, another 20 are switch detectors and another 20 are temperature sensors (all of them having the same hardware setup) the node type is pretty useful. For your specific need you probably should not care about the node type at all - maybe set node type == nodeID and that's it. The additional 2byte payload should not matter too much.

    The ideal setup from my perspective would look like this (dreaming): based on the information shared back (combination of sensors and pin connections) the controller would reassemble the source code and build a new sketch for the given configuration, compile it and send it (I'm not kidding - I worked on a very similar approach a few years back).

    Reality is: this is meant to be a bootloader for MySensors. The way MySensors currently works is that the combination you mentioned (18B20 temp sensor on pin 7 and power blind relays on pins 5 and 6 and an IR detector on pin 12) requires a specific sketch to be loaded that has these pin assignments etc. hard-coded.

    This is the piece of code you would want to adjust - at this point it pulls all available firmware records for the given type and sorts descending by version - which delivers the highest available version back as the first record:

    db.collection('firmware', function(err, c) {
    	c.findOne({
    		$query: {
    			'type': fwtype
    		},
    		$orderby: {
    			'version': -1
    		}
    	}, function(err, result) {
    

    Instead the "expected firmware" type and version could be an attribute for the given node in the "node" collection which is manually maintained:

    db.collection('node', function(err, c) {
    	c.findOne({
    		'id': destination
    	}, function(err, noderesult) {
    		db.collection('firmware', function(err, c) {
    			c.findOne({
    				'type': noderesult.expected_firmware_type,
    				'version': noderesult.expected_firmware_version
    			}, function(err, result) {

  • Code Contributor

    @ToSa Still I wonder if there is any OTA bootloader / protocol readme 🙂 (So I dont have to dissect the nodejs code to write my own implementation)


  • Code Contributor

    @Damme look at NodeJsController/Readme.html - actually for now better look at this version which has a couple of updates (will send another pull request tomorrow for the documentation as well as some minor changes).

    If you are looking for tech documentation (protocol etc.) that's not yet included but the communication is fairly easy (complexity is mainly to make it robust - not kill a node if something goes wrong etc.):

    • the bootloader is using the same procedure to find its parent / request a nodeID etc. as a normal MySensors sketch would do
    • then a config request / config response is exchanged between node and controller
    • assuming an update is needed a series of code block requests / responses is executed until the full firmware is submitted

    Data is submitted as binary - you can see the message payload details in MyOtaBootloader.h:

    typedef struct
    {
    uint16_t type;
    uint16_t version;
    } FirmwareConfigRequest;

    typedef struct
    {
    uint16_t type;
    uint16_t version;
    uint16_t blocks;
    uint16_t crc;
    } FirmwareConfigResponse;

    typedef struct
    {
    uint16_t type;
    uint16_t version;
    uint16_t block;
    } FirmwareRequest;

    typedef struct
    {
    uint16_t type;
    uint16_t version;
    uint16_t block;
    uint8_t data[FIRMWARE_BLOCK_SIZE];
    } FirmwareResponse;


  • Hero Member

    @ToSa said:

    yep - that means in your case you don't really care about the node type. In other scenarios where you have 60 nodes installed and 20 of them are relay nodes, another 20 are switch detectors and another 20 are temperature sensors (all of them having the same hardware setup) the node type is pretty useful

    Suppose you do have 20 identical temperature nodes. It's trivially simple tor the server to tell each one of them to update to the same code in the "push by node" model. Not only that, but the server gets to decide when to allocate the bandwidth for each node.

    Unfortunately, in the "pull by node type" model, you have no way to update some nodes of the given type and not other nodes of that type.

    The "push by node" model easily handles any case the "pull by node type" model does, but the opposite is not true.

    To even approach the "push by node" dynamics with "pull by node type" design, you have to have two concepts of "node type" which must not be conflated.

    • node type for purposes of the user interface
    • node type for purposes of updating the code in the ATMega328p

    When you say "20 temperature nodes" the concept of "node type" would be meaningful in the first sense if you mean "20 nodes containing only a temperature sensor for the HA Controller to display".

    But for updating the PROGMEM, the concept of "node type" needs to be "20 nodes containing only a temperature sensor of type DHT-11 on pin 6".

    A node with a DHT-22 or 18b20 on pin 6, or a node with a DHT-11 on pin 5, would be the same "node type" for purposes of the user interface (which doesn't care), but different "node types" for purposes of updating PROGMEM.

    Once you start considering node type = node id in some cases, it becomes simpler to just ignore the already messy and problematic "node type for purpose of update" concept (as seen by the node) and just do updates per node, period. If you want it, you get the functionally of "update all nodes that use identical code" essentially for free at the server end with the push by node model anyway, PLUS the ability to update individual nodes of any to to run any code you want, and when you the server want to schedule it. I don't see the downside of push-by-node here.

    At worst, the server could have a table of node-id to "node type" for lookup and then follow your same dynamics. That's not how I'd do it (this model allows even simpler and more flexible options), but it would be a tiny "shim" to allow the more flexible "push by node" model to emulate the "pull by node type" dynamics if a given implementer so desired.

    (Just by the way, this discussion is for me fun and mutually respectful brainstorming, I hope it lands that way).


  • Code Contributor

    @Zeph not sure what you are asking for as I mentioned above that you can use the bootloader as it is today to just update specific nodes (by nodeID, update one and not update another even if they have the same node type). The implementation is not a "pull by node type" but it's a "pull by node ID, node type, node version" - which information the controller uses to decide if an update should be executed is is up to you!!!

    Terminology: the "node type" I'm referring to means the specific setup of the hardware - only if that's the same then the node type would be the same (combination of sensors / pin connections / same sketch to be used). The back-end cares about a node type because it needs to know which sketch to use/send.
    The user interface ideally never cares about a node type but really cares about the specific sensor type(s). This "translation" needs to happen in the background no matter if you use an OTA bootloader or not.

    Examples:

    1. Let's assume you have two nodes in the living room - the user interface should just show "living room temperature" no matter if the temperature sensor is connected to node 1 together with the light switch or connected to node 2 together with the blinds. This "translation" needs to happen anyways - ideally in the controller.
    2. Let's assume you have two temperature sensors connected to one node - one measures the room temperature at 1.5m height and one is a floor temperature (not unusual for floor heating). Just knowing that there are two temperature sensors but not knowing which one is which will not be sufficient for the heating controller to make the correct adjustments. Again that translation from "node 23 with one XYZ temp sensor on pin 5 and one XYZ temp sensor on pin 7" to "node 23 temp sensor at pin 5 is the floor temperature" needs to happen anyways.

  • Hero Member

    OK, maybe we are converging in some ways. I'll try to list the similarities as well as differences.

    So we agree that a "node type" doesn't mean a generic "termperature node" but a very specific "my sketch for controlling an 18b20 on pin 7".

    Let's suppose the sketch was called "DevDuino_18b20_7.ino" (for the moment let's leave out auto-scripting).
    This compiled into DevDuino_18b20_7.hex and the binary equivalent.

    We would know that nodes 7, 12, and 15 should have the latest version of this sketch. (node 4 might also measure temperature but with different hardware or pin configuration, so it would not use this sketch).

    If we just want to update the sketch, we would send copies of the new binary to nodes 7, 12, and 15.

    So far I think we are on nearly the same page, at the generic level described. Where the "push by node" and "pull by node type" models differ is in where the knowledge that nodes 7, 12, and 15 run the same code resides. In push-by-node, the Server knows that it should send the same code to them; in the "pull by node type" model, those three nodes themselves know they want updates for a given numeric "node type".

    Also, each node knows it has a given version of the firmware of its node type, and decides when to upgrade by comparing that with what the server offers.

    The differences are highlighted better when we make more than a version update.

    Suppose we decide to make use of pin 2 of node 7 to control an LED. At this point we need to load different firmware into node 7, and we write a sketch called "DevDuino_18b20_7_LED_2.ino". (forgive the naming, it's an example). So now we want to change the overall system configuration so that the server will load DevDuino_18b20_7_LED_2.hex into node 7. (Nodes 12 and 15 still have the other sketch without LED control and maybe always will, the new sketch is not a new version of the old sketch)

    I am suggesting that all you have to do is create the new sketch (or rather it's hex or binary compiled form) and configure the server to send that to node 7 instead of the previous sketch. It doesn't matter one bit to the node whether it's switching to a completely new sketch versus a new version of the current sketch. That's in the server logic only, not OTA.

    I think you are saying that your node.js server can accomodate changing what sketch (not just version) runs in each node, because it can load arbitrarily different (or identical) binary files to each node by node id, with no limitations based on what "type" that node used to be, right? Or not?

    In my thinking, the node has no need to know that it's running "node type 453, version 16".

    The server is free to conceptually organize firmwares by "node type" and "version" if it wishes, but those concepts do not need to be pushed down to the node level.

    Pull by node type:

     Node: I'm type 453, what is the latest version of 453?
     Server: Queries database for max version of node type 453, and say "Latest 453 version is 17"
     Node: Checks that it has version 16, asks server to send 453 ver 17 for OTA programming
     Server: sends version requested by node
     (programming done)
     Node: ask server for lastest version for node type 453
     Server: ... 17
     Node: I'm version 17, no change needed
    

    Push by node:

    Node: I'm node 7 and my PROGMEM signature if 0x54FE
    Server: My config says node 7 should have the latest DevDuino_18b120_7_LED_7.hex" with Sig 0x3EE5
    Server: Please load the following binary into your PROGMEM (sends appropriate version)
    (programming done)
    Node: I'm node 7 and my PROGMEM sig is 0x3EE5
    Server: Mark that one as updated
    

    Notice that there's no problem of forgetting to update the version number - if the signature (eg: CRC or hash) in PROGMEM isn't what the server wants there, then it starts an OTA programming session, period. Even if the sig was wrong because the programming had a glitch rather than because it is out of date, the server knows it's wrong and sends again.

    In the push-by-node model, the system is not limited to "updating all nodes with node type 453 versions <= 16 to version 17" -- it can load an entirely different sketch (node type 763 version 0 if the server thinks in those terms) into the node if it wishes. And the node doesn't care, it doesn't need to know what "node type" it is or compare versions (that's in the logic of the server), all the node needs to know is that the server wants it to load some code into PROGMEM, period.

    That's how I imagine things working. As I have understood you, you are pushing the concept of "node type" and "version number" and the comparison of version numbers down to the node itself, rather than letting the server handle that (if it chooses). I don't see the advantage of that; the server seems a more logical place for that information - both simpler and more flexible.

    Your node.js implementation might organize source code for nodes by numeric "node type" (where your node type 7 isn't the same as my node type 7) and by version within node type. That would still be supported by "push by node" model.

    But another server might choose to organize firmwares by string filename plus signature (eg; CRC). It's configured with a simple table of node_id,filename. It computes the signature of the binary file to be compared with that the node reports it has in PROGMEM. There's no numeric "node type" or "version number" needed. (If you really want to also keep obsolete versions of the binary firmware on the server, there are easy workarounds fore that too). The config is dead simple: (NodeID, Filename)* Or optionally (nodelist, Filename)* if you want to reduce the number of times you spell out the filename. Use the latest (or only) copy of the given filename.

    The cool thing about the "push by node" approach is that the same node bootloader can easily accomodate both server approaches (node type # + version # OR nodeID->filename) - since the only concepts the node uses are

    • "I can tell you the signature of what I have in PROGMEM now", and
    • "If you tell me to I'll load something of your choice".

    For OTA bootloading the node doesn't know need to know or care about "node type" numbers or versions, nor about file names or date stamps.

    So I'm not trying to eliminate your concept of the server of assigning numeric ID's to each combination of sensors and pins, and using ordered versions numbers within each "version type". I just don't see why those concepts need to also be pushed down into the node and OTA bootloader protocol. With the "push by node" model, there''s more flexibility to organize changing firmwares in the server as you prefer OR in other ways, with no meaningful cost, because the node end of the OTA programming system has been distilled to just the essence that it really needs to understand, leaving higher levels of management as a server-internal affair.


  • Code Contributor

    @ToSa I've been looking through the ota bootloader and noticed there are alot of uint16_t wich can be replaced with uint8_t.. saves 128bytes of code. Still needs ~900bytes less until 1024 words bootloader though but is makes more space for other stuff 🙂



  • When i'm trying to run the NodeJsController.js script i always end up with "Error: Cannot open /dev/ttyAMA0"

    I'm running a RPi with a serial gateway.

    Any ideas?



  • On my RPi, the serial gateway is detected as dev/ttyUSB0.


  • Code Contributor

    @ToSa I've been working on getting OTA to work with MQTTgateway with some success.

    But I do have problem with some packages missing and I think the communication should be something like this;
    bootloader checks id and version and server said there is an update. (no change from today)
    but then:

    [bootloader] 0000 has CHK FF(just filler in first package) REQ 0000 type 01 version 01
    [server] load 0000 from hex, send addr 0000 0C9428030C9447240C9474240C947605 C7
    [bootloader] 0000 has CHK FF, REQ 0010 type 01 version 01
    [server] (checksum mismatch) send addr 0000 0C9428030C9447240C9474240C947605 C7
    [bootloader] 0000 has CHK C7, REQ 0010 type 01 version 01
    [server] load 0010 from hex send addr 0010 0C94A3050C94D0050C9480100C945003 00
    And so on.. 🙂

    what do you think about this? the total package is 32bytes, mysensors header is 7bytes. and this layout would need 19 bytes from server to bootloader..
    I Have also seen some intel hex that is not in order 0010 0020 0030 etc but it could jump address. I do not think arduino ide does this but you never know..

    EDIT:
    I havn't read this one yet but I guess there is alot of good stuff in it 🙂
    http://www.nordicsemi.com/eng/nordic/download_resource/10878/2/94069421


  • Code Contributor

    @JeJ @mikeones : how is your serial gateway connected? using a USB-Rs232 cable or via the GPIO pins on the RPi? Did you check the Readme.md in the NodeJsController directory?


  • Hero Member

    @Damme said:

    what do you think about this? the total package is 32bytes, mysensors header is 7bytes. and this layout would need 19 bytes from server to bootloader..

    16 bit offset, 16 data bytes, one byte checksum, right?

    I Have also seen some intel hex that is not in order 0010 0020 0030 etc but it could jump address. I do not think arduino ide does this but you never know..

    I see that your descriptions say "0010 from hex" etc, but I thought you would be fetching from a binary blob to satisfy requests from the bootloader. As in:

    Server reads the Intel hex and uses it to fill in an array of bytes. (one time, or each time a given file is requested)
    Server sends requested 16 byte chunks of that array to bootloader
    

    In that case, it doesn't matter what order the original hex lines are in, or even if they are 16 or 32 bytes wide (or less than 16 bytes at the end).


  • Code Contributor

    @Zeph true (array) and Yes, so the node can request same address twice (might be a timeout) and verify checksum on every 16byte data.


  • Code Contributor

    @Damme said:

    @ToSa I've been working on getting OTA to work with MQTTgateway with some success.

    great!

    [bootloader] 0000 has CHK FF(just filler in first package) REQ 0000 type 01 version 01
    [server] load 0000 from hex, send addr 0000 0C9428030C9447240C9474240C947605 C7
    [bootloader] 0000 has CHK FF, REQ 0010 type 01 version 01
    [server] (checksum mismatch) send addr 0000 0C9428030C9447240C9474240C947605 C7
    [bootloader] 0000 has CHK C7, REQ 0010 type 01 version 01
    [server] load 0010 from hex send addr 0010 0C94A3050C94D0050C9480100C945003 00
    And so on.. 🙂

    what do you think about this? the total package is 32bytes, mysensors header is 7bytes. and this layout would need 19 bytes from server to bootloader..

    If I understand correctly you would send the CRC of the previous bloak back to the server together with the request for the next package - the server would then send the next block if the CRC is correct or resend the previous block if the CRC is not ok...
    I'm wondering how the bootloader would ever run into that situation. The package itself is checksum'ed already and wouldn't be treated as correctly received package if the checksum is incorrect. Only if the previous block was received correctly the next block is requested. If that doesn't happen within a given amount of time the same block is requested again.
    Can you explain a bit further what issues you are running into?

    @Damme said:

    I Have also seen some intel hex that is not in order 0010 0020 0030 etc but it could jump address. I do not think arduino ide does this but you never know..

    Yes, I've seen that as well - actually Codebender does that sometimes and the "HEX file loader" function on the NodeJsController would need to be adjusted to address that (I'm not reading from the .hex file when a package arrives but from a byte array in Mongo)
    Probably the best approach for this one would be to add the next address the bootloader should request to the previous package (e.g. "here is the data for 0x0000 - next request should be for 0x0010) and allow for variable block length because that's another one where Codebender sometimes uses <16 byte rows in the hex (not just at the end). The code on the bootloader side would get a little more complex as the flash is still written in pages and these might not line-up anymore with the 16byte blocks.

    @Damme said:

    I havn't read this one yet but I guess there is alot of good stuff in it 🙂
    http://www.nordicsemi.com/eng/nordic/download_resource/10878/2/94069421

    That chip is a totally different animal - same RF but 8051 MCU. I'll have a look at the AppNote tomorrow.


  • Code Contributor

    @Damme said:

    @ToSa I've been looking through the ota bootloader and noticed there are alot of uint16_t wich can be replaced with uint8_t.. saves 128bytes of code. Still needs ~900bytes less until 1024 words bootloader though but is makes more space for other stuff 🙂

    I'll have a look. I've taken the code from an earlier project and adjusted to MySensors - didn't review the variable types that much. I'm using CRC16 as well where CRC8 might be sufficient...

    EDIT: got it down a little from 0x0E18 to 0x0DD0 (72 bytes) changing a few loop counters from uint16 to uint8. I don't want to change type to 8bit looking at the large amount of sensors people are asking for / working on. FOr version I'm planning to keep some of these running for a long time with as little maintenance as possible. With some software improvements over time and minor version changes during development 16bit for version seems to be the better fit as well.



  • @ToSa I use a Mini-B USB cable between my PRi and my gateway.


  • Code Contributor

    @mikeones said:

    dev/ttyUSB0

    Then /dev/ttyUSB0 is correct. /dev/ttyAMA0 would only be valid for the on-board serial port on the GPIO pinheads.
    Are you running into any issues once you set the port in NodeJsGateway accordingly?



  • @ToSa I have my gateway connected via the GPIO and i have followed the steps in the Readme.md.
    I will try to use a USB-Rs232 cable and see what happens.


  • Code Contributor

    @JeJ one potential reason is that the port is already in use. I mentioned somewhere that the startup script doesn't yet stop the NodeJsController correctly. Maybe you already have a NodeJsController process running? Try "sudo killall node" and then try starting it again. To check if the port itself is working you can try to open a simple terminal (minicom etc.) and reset the gateway.


  • Hero Member

    @ToSa said:

    With some software improvements over time and minor version changes during development 16bit for version seems to be the better fit as well.

    Hmm. That seems like overkill, if I'm understanding correctly. (So maybe I am not understanding).

    What I heard was:

    Each sensactuator node has a "node type" and a "version" within that node type. Each combination of sensors and pin assignments has a unique "node type" (within a given wireless network). A node can only be OTA updated to a newer (higher) version of the same "node type" of the current firmware, and all nodes of that "node type" will be updated.

    And extra byte for "version" isn't a big deal tho.

    Will there be one or two bytes for "node type"?


  • Code Contributor

    @Zeph 16bit calculations on a 8bit mcu will always come to a price. Imo I think we should try to keep things to 8bit as much as possible. but I dont know if its possible to shred another 900bytes out of the bootloader to fit in one less size of space (1024 words instead of 2048 words). Might be if we make a mini version of mysensors/mymessage


  • Code Contributor

    @Zeph said:

    Each combination of sensors and pin assignments has a unique "node type" (within a given wireless network).

    Actually that's part of the question - as @hek mentioned there is a desire to sell MySensors hardware - at some point there might be not just generic pinhead PCBs but real fit-for-use devices. Ideally these would have a unique node type assigned not just within a given network. New firmware could be published on mysensors.org (or via codebender or...) and based on the unique (but common across networks) node type less tech-savvy people could be secured from sending a firmware that doesn't fit the hardware... I know - a LOT of "IF"s...

    @Damme
    you are right - probably not the full 900 bytes but additional space could be used for encryption etc. so every reduced byte is beneficial at this point. I'll check later how much can be saved by using CRC8 instead of CRC16.
    I'm already using a mini version of mysensors / mymessage: not using the cpp code files at all but just the headers and if you have a look at the "#ifdef __cplusplus" statements just added for that purpose, there is almost nothing left (the MyMessage class is stripped down to a struct and the MySensors class removed completely / enums and #defines should not consume space after compilation)


  • Hero Member

    @ToSa

    I'm realizing how similar the implementations of your model of updates and mine might be. This is just an early inspiration, not fully thought out.

    uint8_t   node_type_id;  // same for multiple nodes
    uint16_t version;   // loaded version for given node_type_id
    ... 
    if(new_version > version) {  // test for OTA update needed
    

    versus

    uint8_t   node_id;   // unique per node
    uint16_t  progmem_crc;  // calculated from PROGMEM
    ... 
    if(new_progmem_crc != progmem_crc) { // test for OTA update needed
    

    This might mean that I could (eventually) use a relatively minor fork of the OTA programming code to get the per-node flexibility that I seek.


  • Code Contributor

    @Zeph
    yes, that's what I meant - you might not even need any fork of the bootloader itself and just a slight adjustment on the controller end - because the nodeID is contained in the packet (not in the payload but in the header as sender address) so you have all you need for your setup


  • Hero Member

    @ToSa
    The other half is testing inequality between the computed CRC of the application firmware in PROGMEM, with the CRC of the available replacement (rather than comparing for higher version number).

    An example use case of the ability to load arbitrary new code into any given node. If I was diagnosing some kind of interference, I might temporarily replace the sensor firmware in some nodes (of varying node-type) with a custom radio test firmware, then later restore each with it's original sensor node firmware.

    Suppose we have:

     node 5, node type 17, version 2, PROGMEM CRC 0x4567  // attic
     node 6, node type 3, version 5, PROGMEM CRC 0xABCD  // crawlspace
     node 7, node type 3, version 5, PROGMAM CRC 0xABCD // living room
    

    And I want to temporarily replace the firmware in node 5 and 6, but keep 7 still running as a sensor.

    I make RF test code available on the server, with CRC 0x7E57. This is not type 17 or type 3.

    I edit the server's table of firmware assignments:

    node 5, 0x7E57
    node 6, 0x7E57
    node 7, 0xABCD  // unchanged
    

    This causes node 5 and 6 (formerly of different types) to load the test firmware when reload is triggered.

    Then when testing is done, I edit the table back:

    node 5, 0x4567   // back to its old type and version
    node 6, 0xABCD  // back to the same type and version as node 7
    node 7, 0xABCD  // still unaffected
    

    This causes the normal sensor firmware (type and version) to be loaded back in on the next reload.

    There could be more than just a CRC to identify the firmware (in order to avoid the birthday paradox), this is just an example.

    An alternate use case is loading in my Halloween firmware to the front yard nodes (but not other nodes) for a week or two, then back..

    Or an beta version of type 3, version 6, which I'd like to load on some type 3 nodes for in-situ testing (eg: in the crawlspace), but not all of the type 3 nodes because I want most of the system to continue functioning normally while I test. If the beta is bad, I may revert the test nodes to version 5; once the new version is good, I may convert all type 3 nodes to version 6.

    These are some of the reasons I'd like to be able to use OTA programming of any arbitrary firmware into any given node, without being constrained to:

      Only upgrades of the same node type
      Only upgrades to higher version numbers
      Only upgrades of all nodes of the same type or none
    

    And so that's why inequaity testing of the PROGMEM signature on a per-node basis is attractive, not just testing for a higher version number. For similar complexity, we can upgrade to a higher version number, downgrade to a different version number, or change the node type back and forth.

    The type and version dynamics (which certainly IS a common use case) can be handled on the server. For example, the server can know what type every node is (kind of a good idea anyway), and can change the node -> signature entry for every node of type 3 to the signature of the next version, and then let it proceed as above to get them all updated. But that's just one option, centrally controlled.


  • Code Contributor

    @Zeph
    from the MyOtaBootloader.c:

    if (firmwareConfigResponse->version == fc.version)
    	if (firmwareConfigResponse->blocks == fc.blocks)
    		if (firmwareConfigResponse->crc == fc.crc)
    

    so as long as you send the same version / blocks / crc back to the node as what iscurrently installed, no update is started. As soon as one of the three elements differs an update is loaded. It's completely in control of the server if (and which) firmware is bootloaded.


  • Hero Member

    @ToSa

    OK, so version is tested for != rather than for > ? Downgrading is OK?

    And CRC is used as well (and block count?) where CRC is based on what's in PROGMEM now?

    Cool.

    Then I think all that would be needed is for the server to be able to potentially feed back a different firmwareConfigResponse to each node. In my above example (which has been edited for clarity recently BTW, so re-read it), node 6 could receive a different response than node 7 (even tho they both have the same type initially). And thus nodes 5 and 6 (but not 7) could be told to load the test firmware and then later to go back to the old version. Etc.

    Is that correct?

    It would be a nice enhancement if we could query the node for the CRC (and block count?) of the current PROGMEM, just to help the server stay in sync with what's out there (eg: after a node joins the network). That could be done in the application code, so we don't even have to invoke the bootloader. Then the server could figure out which nodes need to be bootloaded and trigger just those to go into the bootloader (possibly one at a time). These two together support what I call push dynamics.


  • Code Contributor

    @Zeph I've been working on a read / write eeprom address thing in MQTT to be able to reset a node and stuff. But it seams there are more usage for it then. This might be coded into mysensors instead. (utilizing c_internal or somthing as the protocol is today)


  • Code Contributor

    @Damme said:

    @Zeph I've been working on a read / write eeprom address thing in MQTT to be able to reset a node and stuff. But it seams there are more usage for it then. This might be coded into mysensors instead. (utilizing c_internal or somthing as the protocol is today)

    Good idea - that would allow to check for current value in normal operation - not just during bootloading.

    @Zeph
    If you urgently want to have the CRC of the current firmware submitted during bootloading, we can add this as a third parameter to the FirmwareConfigRequest message. Actually I was thinking about getting rid of request/response and use the same format for both which would mean crc would be included anyways.


  • Code Contributor

    This post is deleted!

  • Code Contributor

    I deleted my last message because I though I made a big mistake..

    I've been working on a SD <-> OTA loader node, and got most of if working but got stuck on the last piece which is communication.. (i'll release it then I'm finished Ive made a small change in myotabootloader, add on line ~156 msg.destination = OTAGATEWAY; to configure custom ota address)

    I cant figure the following out:
    Just ignore contents of packages. not relevant.

    • Node: (Ota<->sd loader)

      read: 34-0-254 s=255,c=4,t=0,pt=6,l=4:FFFFFFFF
      send: 254-254-0-34 s=255,c=4,t=1,pt=8,l=4,st=ok:0100020000304200
      
    • GW:
      0;0;3;0;9;read: 34-34-0 s=255,c=3,t=7,pt=0,l=0:
      0;0;3;0;9;send: 0-0-34-34 s=255,c=3,t=8,pt=1,l=1,st=ok:0
      0;0;3;0;9;read: 34-34-0 s=255,c=3,t=7,pt=0,l=0:
      0;0;3;0;9;send: 0-0-34-34 s=255,c=3,t=8,pt=1,l=1,st=ok:0
      0;0;3;0;9;read: 34-34-254 s=255,c=4,t=0,pt=6,l=4:FFFFFFFF
      0;0;3;0;9;send: 34-0-254-254 s=255,c=4,t=0,pt=6,l=4,st=ok:FFFFFFFF
      0;0;3;0;9;read: 254-254-34 s=255,c=4,t=1,pt=6,l=8:0100020000304200
      0;0;3;0;9;send: 254-0-0-34 s=255,c=4,t=1,pt=6,l=8,st=fail:0100020000304200

    • OTA bootloader:
      Go
      <- 34,34,0,2,3,7,255,
      <- 34,34,0,2,3,7,255,
      -> 0,0,34,10,35,8,255,0,
      <- 34,34,254,34,196,0,255,255,255,255,255,

    What am I missing? package from 254 to 34 wont get delivered.
    I've also noticed that then 254 tries to send, it wont receive the next transmitted message from OTAbootloader. the next thereafter is received.


  • Code Contributor

    @Damme
    I need to better understand the setup to think about what's going on. My take from the above:

    You have three nodes:

    • Gateway (address 0)
    • SD OTA Loader Node ?!? (address 254)
    • Sensor Node (address 34)

    Is that right?


  • Code Contributor

    @ToSa Yes, And I think I figured it out.. I by mistake changed BROADCAST_ADDRESS to GATEWAY_ADDRESS in the bootloader then I was playing around. Testing the correct version now.. 🙂


  • Code Contributor

    @Damme
    interesting setup 👍
    to make it work with non-static addressed nodes you should probably keep the destination set to GATEWAY_ADDRESS for the REQUEST_ID call and only change afterwards.

    Never mind - looking at the line number you mentioned that's probably what you did 🙂


  • Code Contributor

    @ToSa Now I remember why I changed some things in there. (broadcast to gateway)

    From the beginning I had problem getting it to talk with the GW. It only sends out
    <- 255,255,255,2,3,7,255, and gets no response, The GW tries to send but fails. (wierd..) (I dont have any relay nodes)

    This is with no modifications at all.

    0;0;3;0;9;read: 255-255-255 s=255,c=3,t=7,pt=0,l=0:
    0;0;3;0;9;send: 0-0-255-255 s=255,c=3,t=8,pt=1,l=1,st=fail:0
    0;0;3;0;9;read: 255-255-255 s=255,c=3,t=7,pt=0,l=0:
    0;0;3;0;9;send: 0-0-255-255 s=255,c=3,t=8,pt=1,l=1,st=fail:0
    other packages send out works just fine.. (To other nodes)

    and the OTA bootloader can receive other packages
    Go
    <- 255,255,255,2,3,7,255,
    -> 23,23,0,42,225,1,11,205,204,90,66,1,
    <- 255,255,255,2,3,7,255,
    <- 255,255,255,2,3,7,255,
    <- 255,255,255,2,3,7,255,

    (from a temp / hum node)

    Any ideas how to fix this?


  • Code Contributor

    @ToSa I finally figured out why my OTA bootloader didn't read any answers from my GW (Both on I_FIND_PARENT and I_ID_REQUEST) - The answers came to quick! First I tried hardcode a delay 125ms on the GW and it worked, so I changed the code on send write to the following and now all messages arrive. Been testing it for a couple of reboots now. I'm using 5v (at 3.3v) and 16MHz
    edit; noticed it misses packages sometimes now but not close to 100% like before, more like 5% now. I'llinvestigate futher then I'm trying to upload data.

      static uint8_t sendAndWait(uint8_t reqType, uint8_t resType) {
      	msg.type = reqType;
      	for (uint8_t i = 0; i < 10; i++) {
      		sendWrite(msg);
      		for (uint8_t j = 0; j < 20; j++) {
      			for (uint8_t j = 0; j < 100; j++) {
      				uint8_t pipe;
      				boolean avail = available(&pipe);
      				wdt_reset();
      				if (avail && pipe<=6) {
      					read(rmsg.array,pipe);
      					if(!(mGetVersion(rmsg) == PROTOCOL_VERSION))
      						continue;
      					if (rmsg.destination == nc.nodeId) {
      						if (mGetCommand(rmsg) == C_INTERNAL) {
      							if (rmsg.type == I_FIND_PARENT_RESPONSE) {
      								if (rmsg.data[0] < nc.distance - 1) {
      									nc.distance = rmsg.data[0] + 1;
      									nc.parentNodeId = rmsg.sender;
      									eeprom_write_byte((uint8_t*)EEPROM_PARENT_NODE_ID_ADDRESS, nc.parentNodeId);
      									eeprom_write_byte((uint8_t*)EEPROM_DISTANCE_ADDRESS, nc.distance);
      								}
      							}
      						}
      						if ((mGetCommand(rmsg) == mGetCommand(msg)) && (rmsg.type == resType))
      							return 1;
      					}
      				}
      				delaym(1);
      			}
      		}
      	}
      	return 0;
      }

  • Code Contributor

    I had to put my project in the trash bin.. There is not enough RAM in the atmega328 to fit mysensors and SD-lib 🙂 Tried 3 different versions..Too bad..! I could only transmit one package before SRAM got overrunned.


  • Mod

    @Damme Now do you get why I abandoned the MQTT implementation on the ATMega itself? 😉


  • Hero Member

    @Damme said:

    I had to put my project in the trash bin.. There is not enough RAM in the atmega328 to fit mysensors and SD-lib 🙂 Tried 3 different versions..Too bad..! I could only transmit one package before SRAM got overrunned.

    If you want to stay with the AVR:

    ATMega1284 based: http://lowpowerlab.com/moteino/#whatisitMEGA $20+shipping. This can add a RF69* radio, but you could instead (or also) attach a nRF24L01+

    ATMega2560 based: http://www.ebay.com/itm/121391548557 $15 shipped This has even more lines broken out than the Arduino Mega2560. (If you don't mind the larger form, Arduino Mega2560 clones start under $14 shipped).

    Or you can switch to an embedded ARM system. Teensy 3.1 for $17+ship. STM32F103 board on eBay for $7. DUE clone on eBay for $18. STM Nucleo from distributors for $11+ship (eMed programmed).

    And of course you can use the Raspberry Pi or BeagleBone Black with a faster but power hungry ARM running Linux.


  • Code Contributor

    @Damme said:

    I had to put my project in the trash bin.. There is not enough RAM in the atmega328 to fit mysensors and SD-lib 🙂 Tried 3 different versions..Too bad..! I could only transmit one package before SRAM got overrunned.

    Did you try not using MySensors but the reduced nRF24 driver version I used for the bootloader itself? If the node does not need to do routing and is expected to only respond to one or two types of messages, that might be an option and is definitely smaller...


  • Hero Member

    Hi @ToSa

    Do you know if your OTA Bootloader uses more flash mem on the atmega328 vs the Optiboot bootloader?

    I have a sketch which needs Optiboot to fit and really hoping i can one day use your cool OTA stuff.

    Looking at your github it seems MyOtaBootloader.hex is 9.36k whereas Optiboot v5,0a the optiboot_atmega328.hex is 1.418k ( but there is also a optiboot_atmega328.lst file at 19.778k - I dont know enough to know if this is part of the bootloader or if its just the source???)

    Cheers,
    Greg


  • Hero Member

    OptiBoot fits in 1/2 KiB (the binary on-chip size, not the hex file).
    The OTA Bootloader is obviously larger. If you are using close to 31.5KiB on an ATMega328p, it won't fit with any bootloader larger than OptiBoot.
    .
    Make sure you use the latest compiler - 1.5.7 seems to squeeze harder (smaller binary), and I presume 1.0.6 (which has also been upgraded to a newer compiler) will also do so.

    How big does the compiler say your sketch is, at the end of a compile?


  • Hero Member

    @Zeph said:

    use the latest compiler - 1.5.7 seems to squeeze harder (smaller binary), and I presume 1.0.6 (which has also been upgraded to a newer compiler) will also do so.

    How big does the compiler say your sketch is, at the end of a compile?

    I'll check when i get home tonight.

    Very interesting about the new compiler. I'm pretty sure im using 1.0.5-r2, and i've recently begun using Atmel studio with Visual Micro addon - just soooo much better when dealing with long sketches!!!

    Cheers,
    Greg


  • Code Contributor

    @gregl the fuses determine how much space is reserved for the bootloader. I don't have the datasheet at hand but I think it varies from .5k to 4k. Optiboot is one of the smallest out there and the OTA bootloader consumes the full space because it needt to includs the (shrinked) wireless driver.


  • Hero Member

    I think the OTA bootloader which does not rely on any extra memory is a great option!

    And I think the option of having external flash may work out well too. Sending an image to be written to SPI flash might not expand the application code as much (it already has the library). Then the bootloader just has to copy from SPI to application flash. I'm thinking that might make for a smaller total footprint, since there would be no need to fit 2 nRF libraries in the 32KiB flash - a trimmed down nRF library in the boot section plus the full nRF library in application section.

    Of course, if the application gets hosed, you would not be able to do OTA bootloading and would have to physically access the node to recover, but if it's a matter of fitting or not fitting into FLASH, that might be a risk one is willing to take.


  • Mod

    @Zeph said:

    Sending an image to be written to SPI flash might not expand the application code as much (it already has the library).

    Because I am sure that Flash memory will come in handy sooner or later I have added the Winbond W25X40 to the first version of my board already :).


  • Admin

    just managed to optimize the OTA bootloader to under 2k - 2k more for sketches with OTA.


  • Code Contributor

    @tekka
    That sounds great! Can you post your code (or share a pull request)? This would either allow us to free up the remaining space or to add encryption 🙂
    The only neck-breaker would be if any of the size reduction increases the risk for a bricked node that needs manual intervention (e.g. reset / power cycle etc.).


  • Admin

    what is the current status of the OTA firmware updates? Is external flash necessary at all? Is someone working on it?

    Right now I am trying to get DualOptiboot from lowpowerlabs.com to work with my board, and an external flash / eeprom. But was wondering if it would be used at all.

    Then when I got the bootloader working, I need to test firmware updates, but the road is still long and windy to get there (at the moment, only have 1 hour here, and there, to work on the hardware)


  • Admin

    @tbowmo
    Did some work on the OTA bootloader: combined optiboot (for uploads via IDE / avrdude) + OTA bootloader with some major modifications. Current version is stable and works for regular OTA updates 🙂

    I will post the source once I find some spare time to clean and comment it.



  • hi, is your work on ota based on internal o external flash?


  • Admin

    @tbowmo
    internal: FW streaming via controller


  • Admin

    @tekka

    I have been thinking about OTA the last couple of days, while trying to get DualOptiboot working (using external SPI flash). If program directly, then the firmware needs to be send in a ordered way by the controler, in order for the bootloader to get things done.. What if a single package is missed?

    What if we have 100 nodes, that all needs software update at the same time? Is the system able to handle that?

    A part of me says, go with the direct method (that is, skipping external SPI flash) for my minimized module, but then again.. I realy want to have the added "security" of having an external flash, where I can download to, and only when checksums are correct, then I can issue a "Reload software command" to the node.

    In theory, I could send the software to 100 nodes, and then when they all are ready, broadcast an "restart node" to all the affected nodes.. (future plans I know..)

    Just my thoughts rambling around in my head 🙂


  • Admin

    @tbowmo

    Yes, in the current setup, the node requests FW blocks the way they will flashed, i.e. page-wise. If one block is missing, the bootloader will re-request that block several times and reboot after a few unsuccessful attempts. The nRF24L01 has CRC on the payload and auto-retransmission of corrupt payloads (see RFInit; 15x every 150us).
    As soon as the OTA update is initiated, the CRC is invalidated and the sensor remains in the bootloader until the update is successful and the CRC is valid.

    Updating 100 nodes simultaneously: for my understanding, the limitations are if the gateway and/or repeater nodes can handle the traffic and the connection quality). Updating sensors semi-sequentially (e.g. 5 nodes at a time) works from my experience.

    Having added "security" with an external flash is certainly a nice feature (and opens other very interesting applications), but is it that important for OTA updating nodes with a down-time of a few minutes? Again, one could instruct the controller to update the nodes in a controlled fashion...


  • Admin

    @tekka

    Sure for a temperature sensor, downtime of a couple of minutes is not an issue, but there might be other types of nodes that shouldn't have downtime.

    What if we bring in the WAF? Let's say the node we want to update is the one that turns on light in the wife's walk in closet, And she is getting ready for a night out with the girlfriends 🙂 Then "a few" minutes downtime could be fatal to your own health 🙂


  • Admin

    @tbowmo
    ...lol 😉 shouldn't you be preparing for bar hopping instead of updating the sensors?
    as mentioned previously, no big deal to have different updating options/sources in the bootloader, I will think about that 🙂


  • Admin

    @tekka said:

    @tbowmo
    ...lol 😉 shouldn't you be preparing for bar hopping instead of updating the sensors?

    Someone had to be at home watching the kids. And when the wife is out, we have the time to spend on fun projects, instead of doing the laundry or whatever tasks she could figure out she wanted help with 🙂


  • Plugin Developer

    @Damme Is OTA working for you..I am also getting the same message from Gw.



  • I would really like to get OTA working here as it's freezing outside and I have to go there to update the software in the greenhouse control system.

    So please, can we have a 'how to' step-by-step guide to OTA? Please?

    S.


Log in to reply
 

Suggested Topics

  • 4
  • 3
  • 2
  • 2
  • 933
  • 274

0
Online

11.2k
Users

11.1k
Topics

112.5k
Posts