@Sergio-Rius The reason for the initialization PSK is to be able to differentiate a sensor you flashed yourself from a sensor someone else flashed and try to register on your network.Of course, you could use the sensor name as the key, but it is a weak key and it would also require to store the new sensor name on the GW every time you want to initialize a new sensor. The initialization PSK being common to all nodes, you don't need to change anything on the GW when you build a new sensor.
You can't use an initial key generated by the GW because it would then need to be sent OTA to the sensor. Since the sensor do not have a PSK yet, this generated key would have to be send in clear. A rogue sensor listening at this moment could read this key and register instead of your own sensor. Although limiting the time during which a GW would accept a new sensor is limited by the "WPS" button, you have to think about a rogue sensor constantly listening and trying to register on your network until the GW is in "WPS" mode. (Note that I only refer to the basic concept of "WPS", which is limiting the time when a new sensor can register by using a button to initiate a 5-10 seconds window where the GW would accept a new sensor. The idea is simply that, since the initialization is using a common PSK which is a security risk if it is stored in easily accessible EEPROM, you want to limit as much as possible when this initialization PSK can be used, so only once during the initialization then it is deleted, and only if a button has been pressed on the GW within the last 5-10 seconds).
Also, when using several separate networks (i.e. different GW linked to the same controllers, but each GW talking to its own sensor network), by using a different initialization PSK on each gateway (and on the sensors that must register on a specific gateway), you make sure that a sensor can only register on the right gateway. Then, each GW has only the list of sensor PSK for his own network so there is no risk a sensor attempt to talk to a GW it is not supposed to (if that's a concern, like having a low security sensor network and a high security sensor network).
Also, I assume that GW are always in a secure location (cannot be stolen and read). If a GW can be compromised, we're opening an entirely new can of worms.
And regarding your last comment: I, also, think async encryption is a better solution on non-constrained network (i.e. when memory and computing power is not a big concern). But, in this context, if we want to keep legacy support, we better look for the most lightweight encryption algorithms. Also, anything using a 256 key cannot be used with the current protocol since a 256 key or token would take the entire packet, with no room left for the routing information. Since "out-of-band" needs to send new tokens regularly, we have to stay with 128 bits. Going with 256 keys would require to also modify the protocol and make the entire thing much more complex to implement