Skip to content
  • MySensors
  • OpenHardware.io
  • Categories
  • Recent
  • Tags
  • Popular
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo
  1. Home
  2. Announcements
  3. 💬 Building a Raspberry Pi Gateway

💬 Building a Raspberry Pi Gateway

Scheduled Pinned Locked Moved Announcements
1.1k Posts 173 Posters 423.0k Views 131 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • O otto001

    Hi,

    I am running the current version (2.2.0) on a RPi 1.
    I noticed that sometimes the cpu usage of the service is increasing dramatically (around 90% and more) - and then the service stops working.
    As I am not a C-Pro, I wrote a little python-script, which is called every minute by cron and restarts the service, if it uses more than 50% cpu.
    Maybe someone will find it useful:

    # -*- coding: utf-8 -*-
    import psutil, sys, os, datetime
    
    PY3 = sys.version_info[0] == 2
    LOOKFOR = u"mysgw"
    for proc in psutil.process_iter():
        if proc.name() == "mysgw":
            dt = str(datetime.datetime.now()).split('.')[0]
            #print(proc)
            cpu = proc.cpu_percent(interval=1)
            #print(cpu)
            #print(proc.cpu_times())
            if cpu > 50:
                    print(dt + " high CPU-usage! --> restarting service")
                    os.system("/bin/systemctl restart mysgw.service")
            else:
                    if datetime.datetime.now().strftime('%M') == "45":
                            print(dt + " everything OK, cpu: " + str(cpu))
    

    and the crontab-entry (as root):

    * * * * * /home/pi/pstest.sh >> /var/log/mysgw-logger.log
    

    Cheers,
    Pula

    Nick WillisN Offline
    Nick WillisN Offline
    Nick Willis
    wrote on last edited by
    #800

    @otto001 Thanks for that! I'm also getting this annoying service failure on 2.2 on a RPi 2. It seems to happen at random at night when nothing much is going on in the house. It probably happens once every 1 to 2 weeks.
    CPU use is almost 100% and the service stops working.
    Are there any logs I can check? I'd prefer to know what's wrong although I'll go with the auto restart on high cpu use to prevent the house from ceasing to function at random!

    1 Reply Last reply
    1
    • alowhumA Offline
      alowhumA Offline
      alowhum
      Plugin Developer
      wrote on last edited by
      #801

      I once noticed this too, on a pi zero system with domoticz beta.

      1 Reply Last reply
      1
      • O Offline
        O Offline
        otto001
        wrote on last edited by otto001
        #802

        @Nick-Willis:
        I do run the gateway in debug mode, but I could not see anything special.
        Yesterday the service was auto-restarted by my script:

        2018-02-27 13:31:03 high CPU-usage! --> restarting service```
        

        On the same time in the "normal" mysgw.log (debugging enabled) (should be syslog normally, but I changed the logfile location for better investigation):

        Feb 27 13:30:36 raspiez mysgw: TSF:MSG:READ,14-14-0,s=1,c=1,t=23,pt=7,l=5,sg=0:0.11
        Feb 27 13:30:36 raspiez mysgw: TSF:MSG:READ,14-14-0,s=3,c=1,t=0,pt=7,l=5,sg=0:21.9
        Feb 27 13:30:36 raspiez mysgw: TSF:MSG:READ,14-14-0,s=4,c=1,t=1,pt=7,l=5,sg=0:31.4
        Feb 27 13:30:38 raspiez mysgw: New connection from 192.168.1.11 <-- fhem re-connects
        Feb 27 13:30:38 raspiez mysgw: GWT:TSA:C=1,CONNECTED
        Feb 27 13:31:04 raspiez mysgw: Received SIGTERM <-- AUTO-Restart
        

        The only thing that is a little bit unusual is the re-connect of my controller (fhem) - but this happens every 30 minutes for stability reasons.
        This should not happen, but I can not find out what the problem is.

        Maybe you could also enable debugging by changing the following file (maybe we can find a pattern why this is happening):

        /etc/systemd/system/mysgw.service
        

        from

        ExecStart=/usr/local/bin/mysgw
        

        to

        ExecStart=/usr/local/bin/mysgw -d
        

        Please do not forget to issue a

        sudo systemctl daemon-reload 
        

        followed by

        sudo systemctl restart mysgw.service
        

        to enable the change.

        My restarting-script might help until someone with better knowledge finds a solution...
        I also will have an eye on the logs for findig out how often this is happening...

        Cheers,
        Otto

        M 1 Reply Last reply
        0
        • O otto001

          @Nick-Willis:
          I do run the gateway in debug mode, but I could not see anything special.
          Yesterday the service was auto-restarted by my script:

          2018-02-27 13:31:03 high CPU-usage! --> restarting service```
          

          On the same time in the "normal" mysgw.log (debugging enabled) (should be syslog normally, but I changed the logfile location for better investigation):

          Feb 27 13:30:36 raspiez mysgw: TSF:MSG:READ,14-14-0,s=1,c=1,t=23,pt=7,l=5,sg=0:0.11
          Feb 27 13:30:36 raspiez mysgw: TSF:MSG:READ,14-14-0,s=3,c=1,t=0,pt=7,l=5,sg=0:21.9
          Feb 27 13:30:36 raspiez mysgw: TSF:MSG:READ,14-14-0,s=4,c=1,t=1,pt=7,l=5,sg=0:31.4
          Feb 27 13:30:38 raspiez mysgw: New connection from 192.168.1.11 <-- fhem re-connects
          Feb 27 13:30:38 raspiez mysgw: GWT:TSA:C=1,CONNECTED
          Feb 27 13:31:04 raspiez mysgw: Received SIGTERM <-- AUTO-Restart
          

          The only thing that is a little bit unusual is the re-connect of my controller (fhem) - but this happens every 30 minutes for stability reasons.
          This should not happen, but I can not find out what the problem is.

          Maybe you could also enable debugging by changing the following file (maybe we can find a pattern why this is happening):

          /etc/systemd/system/mysgw.service
          

          from

          ExecStart=/usr/local/bin/mysgw
          

          to

          ExecStart=/usr/local/bin/mysgw -d
          

          Please do not forget to issue a

          sudo systemctl daemon-reload 
          

          followed by

          sudo systemctl restart mysgw.service
          

          to enable the change.

          My restarting-script might help until someone with better knowledge finds a solution...
          I also will have an eye on the logs for findig out how often this is happening...

          Cheers,
          Otto

          M Offline
          M Offline
          marceloaqno
          Code Contributor
          wrote on last edited by
          #803

          Please, could all of you who experienced this issue post the configuration options you used to build the gateway?
          This would help to investigate the problem.

          1 Reply Last reply
          1
          • O Offline
            O Offline
            otto001
            wrote on last edited by otto001
            #804

            @marceloaqno : of course, thanks for the hint!

            I had the following configure:

            ./configure --my-transport=nrf24 --my-rf24-irq-pin=15 --my-gateway=ethernet --my-port=5004
            

            (The high CPU problem also occured before I connected PIN 15 to IRQ without the corresponding config option. I hoped to solve the Problem using IRQ, but this did not help. And I have a second GW with default port 5003 in my network. This is running an ancient version of Raspi GW Software and this is working fine -but slower). I am using current fhem as controller and I have it connecting every 30 minutes to the gateways due to stability issues in the past. There are only about 6 nodes (just motion, humidity & temp and lux) connected to this gateway.

            In addition to that I changed the default channel in MyConfig.h (because I have two different MySensor networks):

            #ifndef MY_RF24_CHANNEL
            #define MY_RF24_CHANNEL (82)
            #endif
            

            I am pretty sure, that this is not important, but I also set up rsyslog to write gw logs to an own file instead of syslog....

            Maybe this helps.

            Cheers,
            Otto

            M 1 Reply Last reply
            1
            • O otto001

              @marceloaqno : of course, thanks for the hint!

              I had the following configure:

              ./configure --my-transport=nrf24 --my-rf24-irq-pin=15 --my-gateway=ethernet --my-port=5004
              

              (The high CPU problem also occured before I connected PIN 15 to IRQ without the corresponding config option. I hoped to solve the Problem using IRQ, but this did not help. And I have a second GW with default port 5003 in my network. This is running an ancient version of Raspi GW Software and this is working fine -but slower). I am using current fhem as controller and I have it connecting every 30 minutes to the gateways due to stability issues in the past. There are only about 6 nodes (just motion, humidity & temp and lux) connected to this gateway.

              In addition to that I changed the default channel in MyConfig.h (because I have two different MySensor networks):

              #ifndef MY_RF24_CHANNEL
              #define MY_RF24_CHANNEL (82)
              #endif
              

              I am pretty sure, that this is not important, but I also set up rsyslog to write gw logs to an own file instead of syslog....

              Maybe this helps.

              Cheers,
              Otto

              M Offline
              M Offline
              marceloaqno
              Code Contributor
              wrote on last edited by
              #805

              @otto001 I think I found the problem. I wrote a fix for it:
              https://github.com/marceloaqno/MySensors/commit/ca8865d8e1fe1b08195943fae1640d5fddb81924

              M 1 Reply Last reply
              3
              • O Offline
                O Offline
                otto001
                wrote on last edited by otto001
                #806

                @marceloaqno : Wow, you are fast! Thanks!
                Just applied the patch and recompiled. I will report in a couple of days while my monitoring-script is still running (and logging)...

                Cheers,
                Otto

                1 Reply Last reply
                2
                • M marceloaqno

                  @otto001 I think I found the problem. I wrote a fix for it:
                  https://github.com/marceloaqno/MySensors/commit/ca8865d8e1fe1b08195943fae1640d5fddb81924

                  M Offline
                  M Offline
                  mvader
                  wrote on last edited by
                  #807

                  @marceloaqno I am also recompiling and testing.
                  thanks for the (hopeful) fix
                  I will report back with my status in a few days

                  1 Reply Last reply
                  2
                  • O Offline
                    O Offline
                    otto001
                    wrote on last edited by otto001
                    #808

                    @marceloaqno :

                    At first, THANKS for your help. CPU usage does not rise anymore.
                    But unfortunately no more node READs after the same situation:

                    Mar  4 07:00:09 raspiez mysgw: TSF:MSG:READ,14-14-0,s=255,c=3,t=6,pt=1,l=1,sg=0:0
                    Mar  4 07:00:09 raspiez mysgw: TSF:MSG:READ,12-12-0,s=255,c=3,t=6,pt=1,l=1,sg=0:0
                    Mar  4 07:00:09 raspiez mysgw: GWT:RFC:C=1,MSG=14;255;3;0;6;M
                    Mar  4 07:00:09 raspiez mysgw: !TSF:MSG:SEND,0-0-14-14,s=255,c=3,t=6,pt=0,l=1,sg=0,ft=0,st=NACK:M
                    Mar  4 07:00:09 raspiez mysgw: TSF:MSG:READ,11-11-0,s=255,c=0,t=18,pt=0,l=3,sg=0:1.5
                    Mar  4 07:00:09 raspiez mysgw: GWT:RFC:C=1,MSG=12;255;3;0;6;M
                    Mar  4 07:00:09 raspiez mysgw: TSF:MSG:SEND,0-0-12-12,s=255,c=3,t=6,pt=0,l=1,sg=0,ft=0,st=OK:M
                    Mar  4 07:00:13 raspiez mysgw: TSM:READY:NWD REQ
                    Mar  4 07:00:13 raspiez mysgw: TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK:
                    Mar  4 07:00:13 raspiez mysgw: TSF:SRT:OK
                    Mar  4 07:00:13 raspiez mysgw: TSF:SAN:OK
                    Mar  4 07:00:37 raspiez mysgw: Ethernet client disconnected.
                    Mar  4 07:00:37 raspiez mysgw: New connection from 192.168.1.11
                    Mar  4 07:00:37 raspiez mysgw: GWT:TSA:C=0,CONNECTED
                    Mar  4 07:00:37 raspiez mysgw: GWT:TSA:C=1,DISCONNECTED
                    

                    After this point (again after a reconnect from the controller) no more messages from any node PLUS the cpu usage stayed normal (what makes it even worse at this moment, as I would have to create a script which monitors the logfile and restarts the mysgw service in this situation). As soon as the service is restarted, everything is working fine again.

                    What happens after this point in the log:

                    Mar  4 07:00:37 raspiez mysgw: GWT:RFC:C=0,MSG=0;0;3;0;2;
                    Mar  4 07:15:13 raspiez mysgw: TSF:SAN:OK
                    Mar  4 07:20:13 raspiez mysgw: TSM:READY:NWD REQ
                    Mar  4 07:20:13 raspiez mysgw: TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK:
                    Mar  4 07:30:13 raspiez mysgw: TSF:SRT:OK
                    Mar  4 07:30:13 raspiez mysgw: TSF:SAN:OK
                    Mar  4 07:30:37 raspiez mysgw: GWT:TSA:C=0,DISCONNECTED
                    Mar  4 07:30:37 raspiez mysgw: Ethernet client disconnected.
                    Mar  4 07:30:37 raspiez mysgw: New connection from 192.168.1.11
                    Mar  4 07:30:37 raspiez mysgw: GWT:TSA:C=0,CONNECTED
                    Mar  4 07:30:37 raspiez mysgw: GWT:RFC:C=0,MSG=0;0;3;0;2;
                    Mar  4 07:40:13 raspiez mysgw: TSM:READY:NWD REQ
                    Mar  4 07:40:13 raspiez mysgw: TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK:
                    Mar  4 07:45:13 raspiez mysgw: TSF:SAN:OK
                    Mar  4 08:00:13 raspiez mysgw: TSM:READY:NWD REQ
                    Mar  4 08:00:13 raspiez mysgw: TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK:
                    Mar  4 08:00:13 raspiez mysgw: TSF:SRT:OK
                    Mar  4 08:00:13 raspiez mysgw: TSF:SAN:OK
                    Mar  4 08:00:37 raspiez mysgw: GWT:TSA:C=0,DISCONNECTED
                    Mar  4 08:00:37 raspiez mysgw: Ethernet client disconnected.
                    Mar  4 08:00:37 raspiez mysgw: New connection from 192.168.1.11
                    Mar  4 08:00:37 raspiez mysgw: GWT:TSA:C=0,CONNECTED
                    

                    and so on....

                    It seems that the increasing cpu usage is only a symptom for the real problem.
                    As I am using this in production I will dis-apply the patch and use the daemon as before (beacuse using the python script for cpu-usage-monitoring makes the situation somehow controllable for me).
                    Maybe you could have an additional eye on this please? Could I help you with further information? Maybe I could add some additional logging somewhere in the code?

                    Thanks again and cheers,
                    Otto

                    M 1 Reply Last reply
                    3
                    • O otto001

                      @marceloaqno :

                      At first, THANKS for your help. CPU usage does not rise anymore.
                      But unfortunately no more node READs after the same situation:

                      Mar  4 07:00:09 raspiez mysgw: TSF:MSG:READ,14-14-0,s=255,c=3,t=6,pt=1,l=1,sg=0:0
                      Mar  4 07:00:09 raspiez mysgw: TSF:MSG:READ,12-12-0,s=255,c=3,t=6,pt=1,l=1,sg=0:0
                      Mar  4 07:00:09 raspiez mysgw: GWT:RFC:C=1,MSG=14;255;3;0;6;M
                      Mar  4 07:00:09 raspiez mysgw: !TSF:MSG:SEND,0-0-14-14,s=255,c=3,t=6,pt=0,l=1,sg=0,ft=0,st=NACK:M
                      Mar  4 07:00:09 raspiez mysgw: TSF:MSG:READ,11-11-0,s=255,c=0,t=18,pt=0,l=3,sg=0:1.5
                      Mar  4 07:00:09 raspiez mysgw: GWT:RFC:C=1,MSG=12;255;3;0;6;M
                      Mar  4 07:00:09 raspiez mysgw: TSF:MSG:SEND,0-0-12-12,s=255,c=3,t=6,pt=0,l=1,sg=0,ft=0,st=OK:M
                      Mar  4 07:00:13 raspiez mysgw: TSM:READY:NWD REQ
                      Mar  4 07:00:13 raspiez mysgw: TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK:
                      Mar  4 07:00:13 raspiez mysgw: TSF:SRT:OK
                      Mar  4 07:00:13 raspiez mysgw: TSF:SAN:OK
                      Mar  4 07:00:37 raspiez mysgw: Ethernet client disconnected.
                      Mar  4 07:00:37 raspiez mysgw: New connection from 192.168.1.11
                      Mar  4 07:00:37 raspiez mysgw: GWT:TSA:C=0,CONNECTED
                      Mar  4 07:00:37 raspiez mysgw: GWT:TSA:C=1,DISCONNECTED
                      

                      After this point (again after a reconnect from the controller) no more messages from any node PLUS the cpu usage stayed normal (what makes it even worse at this moment, as I would have to create a script which monitors the logfile and restarts the mysgw service in this situation). As soon as the service is restarted, everything is working fine again.

                      What happens after this point in the log:

                      Mar  4 07:00:37 raspiez mysgw: GWT:RFC:C=0,MSG=0;0;3;0;2;
                      Mar  4 07:15:13 raspiez mysgw: TSF:SAN:OK
                      Mar  4 07:20:13 raspiez mysgw: TSM:READY:NWD REQ
                      Mar  4 07:20:13 raspiez mysgw: TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK:
                      Mar  4 07:30:13 raspiez mysgw: TSF:SRT:OK
                      Mar  4 07:30:13 raspiez mysgw: TSF:SAN:OK
                      Mar  4 07:30:37 raspiez mysgw: GWT:TSA:C=0,DISCONNECTED
                      Mar  4 07:30:37 raspiez mysgw: Ethernet client disconnected.
                      Mar  4 07:30:37 raspiez mysgw: New connection from 192.168.1.11
                      Mar  4 07:30:37 raspiez mysgw: GWT:TSA:C=0,CONNECTED
                      Mar  4 07:30:37 raspiez mysgw: GWT:RFC:C=0,MSG=0;0;3;0;2;
                      Mar  4 07:40:13 raspiez mysgw: TSM:READY:NWD REQ
                      Mar  4 07:40:13 raspiez mysgw: TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK:
                      Mar  4 07:45:13 raspiez mysgw: TSF:SAN:OK
                      Mar  4 08:00:13 raspiez mysgw: TSM:READY:NWD REQ
                      Mar  4 08:00:13 raspiez mysgw: TSF:MSG:SEND,0-0-255-255,s=255,c=3,t=20,pt=0,l=0,sg=0,ft=0,st=OK:
                      Mar  4 08:00:13 raspiez mysgw: TSF:SRT:OK
                      Mar  4 08:00:13 raspiez mysgw: TSF:SAN:OK
                      Mar  4 08:00:37 raspiez mysgw: GWT:TSA:C=0,DISCONNECTED
                      Mar  4 08:00:37 raspiez mysgw: Ethernet client disconnected.
                      Mar  4 08:00:37 raspiez mysgw: New connection from 192.168.1.11
                      Mar  4 08:00:37 raspiez mysgw: GWT:TSA:C=0,CONNECTED
                      

                      and so on....

                      It seems that the increasing cpu usage is only a symptom for the real problem.
                      As I am using this in production I will dis-apply the patch and use the daemon as before (beacuse using the python script for cpu-usage-monitoring makes the situation somehow controllable for me).
                      Maybe you could have an additional eye on this please? Could I help you with further information? Maybe I could add some additional logging somewhere in the code?

                      Thanks again and cheers,
                      Otto

                      M Offline
                      M Offline
                      marceloaqno
                      Code Contributor
                      wrote on last edited by
                      #809

                      @otto001 I'm trying to reproduce this problem, but so far I've got nothing beyond the normal behavior of mysgw.
                      I tried with a RPi1 and RPi3 using the fhem controller with the latest raspbian.

                      M 1 Reply Last reply
                      1
                      • M marceloaqno

                        @otto001 I'm trying to reproduce this problem, but so far I've got nothing beyond the normal behavior of mysgw.
                        I tried with a RPi1 and RPi3 using the fhem controller with the latest raspbian.

                        M Offline
                        M Offline
                        mvader
                        wrote on last edited by mvader
                        #810

                        @marceloaqno said in 💬 Building a Raspberry Pi Gateway:

                        @otto001 I'm trying to reproduce this problem, but so far I've got nothing beyond the normal behavior of mysgw.
                        I tried with a RPi1 and RPi3 using the fhem controller with the latest raspbian.

                        My uptime is at 4 days now. so far still working with your patch. CPU is at 2% and sensor network still working correctly. will report if that changes.

                        1 Reply Last reply
                        0
                        • Nick WillisN Offline
                          Nick WillisN Offline
                          Nick Willis
                          wrote on last edited by
                          #811

                          I've been using your restarting method and its been working ok and has done one restart in 5 days.
                          However I now see something else where altho cpu use is fine, none of my nodes can communicate to the gateway. My controller can connect to the gateway no problem but there are no messages received. Restarting nodes does not help. Only a manual restart of the gateway gets things going again - at least for a while anyway.
                          Where is the log output from the gateway to be found? I know it can be run from the bash prompt to write the output to screen but how about logging the installed service output?

                          mfalkviddM 1 Reply Last reply
                          0
                          • Nick WillisN Nick Willis

                            I've been using your restarting method and its been working ok and has done one restart in 5 days.
                            However I now see something else where altho cpu use is fine, none of my nodes can communicate to the gateway. My controller can connect to the gateway no problem but there are no messages received. Restarting nodes does not help. Only a manual restart of the gateway gets things going again - at least for a while anyway.
                            Where is the log output from the gateway to be found? I know it can be run from the bash prompt to write the output to screen but how about logging the installed service output?

                            mfalkviddM Offline
                            mfalkviddM Offline
                            mfalkvidd
                            Mod
                            wrote on last edited by mfalkvidd
                            #812

                            @nick-willis the log is sent to syslog by default. See https://www.mysensors.org/build/raspberry#troubleshooting for details.

                            1 Reply Last reply
                            1
                            • O Offline
                              O Offline
                              otto001
                              wrote on last edited by otto001
                              #813

                              @marceloaqno : This is strange. I am using a RPi 1 on this gateway with raspbian Jessie. Beside the GW-service there is only a small python programm which listens on an UPD-Port and plays an mp3 if it receives suitable data (for doorbell). Since March 4, 20:15 there were several restarts due to high cpu usage (after dis-applying your patch):

                              2018-03-04 23:01:03 cpu (99.7) very high! --> restarting service
                              2018-03-05 07:31:03 cpu (99.7) very high! --> restarting service
                              2018-03-05 11:01:03 cpu (98.7) very high! --> restarting service
                              2018-03-05 13:31:04 cpu (94.7) very high! --> restarting service
                              2018-03-06 11:31:03 cpu (99.7) very high! --> restarting service
                              2018-03-06 12:31:03 cpu (98.7) very high! --> restarting service
                              2018-03-07 06:01:03 cpu (94.8) very high! --> restarting service
                              2018-03-07 22:31:03 cpu (99.8) very high! --> restarting service
                              

                              With this "auto-restarts" the sensors are working fine, I did not notice any problems.

                              However, I just upgraded to the recent Stretch (what takes a LONG time on RPi 1 and made me struggle a little bit with the new network names). I also rebuilt the GW of course and will report here what will happen....

                              @Nick-Willis : did you apply the patch of marceloaqno or not? Did you enable debugging? (see here )

                              Cheers,
                              Otto

                              M 1 Reply Last reply
                              0
                              • O otto001

                                @marceloaqno : This is strange. I am using a RPi 1 on this gateway with raspbian Jessie. Beside the GW-service there is only a small python programm which listens on an UPD-Port and plays an mp3 if it receives suitable data (for doorbell). Since March 4, 20:15 there were several restarts due to high cpu usage (after dis-applying your patch):

                                2018-03-04 23:01:03 cpu (99.7) very high! --> restarting service
                                2018-03-05 07:31:03 cpu (99.7) very high! --> restarting service
                                2018-03-05 11:01:03 cpu (98.7) very high! --> restarting service
                                2018-03-05 13:31:04 cpu (94.7) very high! --> restarting service
                                2018-03-06 11:31:03 cpu (99.7) very high! --> restarting service
                                2018-03-06 12:31:03 cpu (98.7) very high! --> restarting service
                                2018-03-07 06:01:03 cpu (94.8) very high! --> restarting service
                                2018-03-07 22:31:03 cpu (99.8) very high! --> restarting service
                                

                                With this "auto-restarts" the sensors are working fine, I did not notice any problems.

                                However, I just upgraded to the recent Stretch (what takes a LONG time on RPi 1 and made me struggle a little bit with the new network names). I also rebuilt the GW of course and will report here what will happen....

                                @Nick-Willis : did you apply the patch of marceloaqno or not? Did you enable debugging? (see here )

                                Cheers,
                                Otto

                                M Offline
                                M Offline
                                marceloaqno
                                Code Contributor
                                wrote on last edited by marceloaqno
                                #814

                                @otto001 In a new attempt to trigger this problem, I am using a script that will force the fhem controller to reconnect every 30s to the gateway (built with mysensors master branch, unpatched), and sends me a note in case of high CPU usage:

                                I'll leave it running for a few days to see what happens.

                                Here is the script, if anyone is interested:

                                #!/bin/sh
                                
                                while true; do
                                  (echo "set mysgw disconnect"; echo "quit") | telnet localhost 7072 > /dev/null 2>&1
                                  sleep 1
                                  (echo "set mysgw connect"; echo "quit") | telnet localhost 7072 > /dev/null 2>&1
                                  sleep 1
                                
                                  cpu_percent=$(ps -C "mysgw" -o %cpu=)
                                  cpu_percent=${cpu_percent%%.*}
                                  if [ "$cpu_percent" -ge 90 ]
                                  then
                                    echo "ALERT: HIGH CPU USAGE"
                                    # pushbullet script from https://gist.github.com/outadoc/189bd3ccbf5d6e0f39e4
                                    /home/pi/pushbullet.sh "RPi1: HIGH CPU USAGE"
                                    exit
                                  fi
                                
                                  sleep 30
                                done
                                
                                1 Reply Last reply
                                0
                                • O Offline
                                  O Offline
                                  otto001
                                  wrote on last edited by
                                  #815

                                  @marceloaqno : Thank you!
                                  Just a notice: I do NOT disconnect, I just connect every 30 minutes. Has been working for me since more than 2 years (ancient ms-gw version on RPi).

                                  Yesterday I did a dist-upgrade to stretch on the "problematic" gw (as mentioned above) - without your patch but with my monitoring-script running. It is a little bit to early I presume, but no high cpu problems so far. I will report back here in a few days. Maybe this problem is really related to old libraries or something?! If this is running stable for a few days I will apply your patch and see what happens.
                                  Thanks again!

                                  Cheers,
                                  Pula

                                  1 Reply Last reply
                                  0
                                  • M Offline
                                    M Offline
                                    marceloaqno
                                    Code Contributor
                                    wrote on last edited by
                                    #816

                                    @otto001 After a few hours running the reconnect script, I noticed several error messages in my gateway log:

                                    accept(): Too many open files
                                    

                                    The problem was that the gateway wasn't releasing the socket descriptor after each disconnection of the controller. After reaching the limit (mine was 1036) the controller can't connect.

                                    I'm not entirely sure that this problem is related to what you're having. You can check how many file descriptors currently opened by the gateway with the command below

                                    sudo lsof -u root |grep mysgw |wc -l
                                    

                                    The fix:
                                    https://github.com/marceloaqno/MySensors/commit/a40d4441b7460225100398ff6f2581c2b0df36ea

                                    1 Reply Last reply
                                    2
                                    • ? Offline
                                      ? Offline
                                      A Former User
                                      wrote on last edited by
                                      #817

                                      How do I update from master branch to development branch?

                                      1 Reply Last reply
                                      0
                                      • gohanG Offline
                                        gohanG Offline
                                        gohan
                                        Mod
                                        wrote on last edited by
                                        #818

                                        Why do you want to do that? Development version is still work in progress

                                        1 Reply Last reply
                                        0
                                        • ? Offline
                                          ? Offline
                                          A Former User
                                          wrote on last edited by A Former User
                                          #819

                                          I'm using a raspberry pi gateway in home-assistant.
                                          Sometimes my gateway doesn't work anymore, and when I restart the service, it is running good
                                          I used ./configure --my-transport=nrf24 --my-rf24-irq-pin=15 --my-gateway=ethernet --my-port=5003

                                          M 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          17

                                          Online

                                          11.7k

                                          Users

                                          11.2k

                                          Topics

                                          113.0k

                                          Posts


                                          Copyright 2019 TBD   |   Forum Guidelines   |   Privacy Policy   |   Terms of Service
                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • MySensors
                                          • OpenHardware.io
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular