ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    docker fills up /data, prevents code from running

    Ask your questions right here!
    2
    11
    716
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      jaredjohansen
      last edited by

      I have been doing development with docker containers for some time on a particular voxl. On occasion, I will save the docker image. I believe this has, slowly, come to fill up the entire disk space on /data.

      If I check the disk utilization on /data, it shows that there is plenty of disk space:

      yocto:/data$ df -h /data
      Filesystem      Size  Used Avail Use% Mounted on
      /dev/sda9        15G  8.1G  6.5G  56% /data
      

      However, if I try to create a single file in /data, the kernel tells me there is no disk space left:

      yocto:/data$ touch test.txt
      touch: cannot touch 'test.txt': No space left on device
      

      These two commands present contradictory information about the state of the disk space remaining. Since I cannot create a single file, I believe the latter command is the one that is reflective of reality.

      Before this occurred, I realized that I was running out of disk space and attempted to clean up old docker images. I tried commands like docker system prune or docker prune images but the version of docker installed on the voxl does not support those commands.

      Instead, I used docker image rm <name> to remove everything but two images. While that cleared up what was displayed by typing docker images, it did not appear to affect the disk utilization.

      It appears that docker preserves images and containers in /data/overlay (the directory names match the image_id's and container_id's in docker). After running the above commands, the corresponding directories were not removed out of the /data/overlay directory. I am afraid to remove them manually because of my (very limited) understanding of how docker works - that images are built on top of other images. I worry that if I delete a particular directory, it could mess up the entire process that is used to build/start the latest docker image I am using.

      I currently believe that docker is somehow masking/hiding how much disk space is actually being used (or the kernel is unaware of disk space that docker is no longer using). Either way, I don't know how to fix it.

      At this point, when I turn on the voxl, the docker-daemon will not stay running. If I try to restart it with systemctl restart docker-daemon, it will die after about 15 seconds. I am hoping to get it to run to I can save off my latest code. After that, I'd be fine nuking /data and starting over. (If there is a better way to go about this, I'm all ears!)

      For completeness, I can use journalctl -u docker-daemon to see the error messages:

      Oct 25 22:45:49 apq8096 systemd[1]: Started docker service for VOXL.
      Oct 25 22:45:49 apq8096 docker-prepare.sh[4628]: preparing docker with docker-prepare.sh
      Oct 25 22:45:49 apq8096 docker-prepare.sh[4628]: this may take a few seconds
      Oct 25 22:45:50 apq8096 docker[4627]: time="2021-10-25T22:45:50.015279000Z" level=info msg="API listen on /var/run/docker.
      sock"
      Oct 25 22:45:50 apq8096 docker[4627]: time="2021-10-25T22:45:50.025349000Z" level=info msg="[graphdriver] using prior stor
      age driver \"overlay\""
      Oct 25 22:45:50 apq8096 docker[4627]: time="2021-10-25T22:45:50.143791000Z" level=info msg="Firewalld running: false"
      Oct 25 22:45:50 apq8096 docker[4627]: time="2021-10-25T22:45:50.282820000Z" level=info msg="Default bridge (docker0) is as
      signed with an IP address 172.17.0.1/16. Daemon option --bip can be used to set a preferred IP address"
      Oct 25 22:45:50 apq8096 docker[4627]: time="2021-10-25T22:45:50.501537000Z" level=fatal msg="Error starting daemon: unable
       to open database file"
      Oct 25 22:45:50 apq8096 systemd[1]: [[1;39mdocker-daemon.service: Main process exited, code=exited, status=1/FAILURE[[0m
      Oct 25 22:46:06 apq8096 docker-prepare.sh[4628]: docker-prepare: failed to see cpuset appear after 15 seconds
      Oct 25 22:46:06 apq8096 systemd[1]: [[1;39mdocker-daemon.service: Control process exited, code=exited status=1[[0m
      Oct 25 22:46:06 apq8096 systemd[1]: [[1;39mdocker-daemon.service: Unit entered failed state.[[0m
      Oct 25 22:46:06 apq8096 systemd[1]: [[1;39mdocker-daemon.service: Failed with result 'exit-code'.[[0m
      

      I looked in /etc/systemd/system/docker-daemon.service to learn that this is the command issued at startup: /usr/bin/docker daemon -g /data. When I run that command, I get this error:

      yocto:/data$ /usr/bin/docker daemon -g /data
      INFO[0000] API listen on /var/run/docker.sock           
      INFO[0000] [graphdriver] using prior storage driver "overlay" 
      INFO[0000] Firewalld running: false                     
      INFO[0000] Default bridge (docker0) is assigned with an IP address 172.17.0.1/16. Daemon option --bip can be used to set a preferred IP address 
      FATA[0000] Error starting daemon: unable to open database file 
      

      At some point earlier in my investigation (before I found the disk was full), I found someone in another modalai forum post with the same error message. They used this one-liner to fix things:

      rm /data/network/files/local-kv.db
      

      I don't know if this is somehow related to the issue at hand.

      Any guidance on how to recover from this situation would be appreciated.
      Any guidance on the best way to manage /data and keep track of its real disk utilization would be helpful too!

      1 Reply Last reply Reply Quote 0
      • J
        jaredjohansen
        last edited by

        There is one other thing I did that is noteworthy. I reflashed the base image with 3.3 and installed the voxl-suite. In this process, I selected the option that left /data intact.

        1 Reply Last reply Reply Quote 0
        • Eric KatzfeyE
          Eric Katzfey ModalAI Team
          last edited by

          Can you show the output of # df -i /data?

          1 Reply Last reply Reply Quote 0
          • J
            jaredjohansen
            last edited by

            Here you are:

            yocto:/$ df -i /data
            Filesystem     Inodes  IUsed IFree IUse% Mounted on
            /dev/sda9      977280 977280     0  100% /data
            
            1 Reply Last reply Reply Quote 0
            • J
              jaredjohansen
              last edited by

              Here is some more info. Looks like docker is using all the inodes:

              yocto:/data$ sudo find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n              
                    1 adb_devid
                    1 db
                    1 dhcpcd-wlan0.info
                    1 dnsmasq.conf
                    1 dnsmasq_d.leases
                    1 l2tp_cfg.xml
                    1 linkgraph.db
                    1 mobileap_cfg.xml
                    1 modalai
                    1 network
                    1 repositories-overlay
                    2 usb
                    2 web_root
                    4 persist
                    7 iproute2
                   20 misc
                   53 containers
                  189 graph
              3265231 overlay
              
              1 Reply Last reply Reply Quote 0
              • Eric KatzfeyE
                Eric Katzfey ModalAI Team
                last edited by

                System image 3.3.0 increases the number of inodes on /data to 3M. However, that may require that /data is completely wiped when you install it.

                1 Reply Last reply Reply Quote 0
                • J
                  jaredjohansen
                  last edited by

                  I believe I already had system image 3.3 installed on my VOXL. (The number of inodes dedicated to overlay is around the 3M mark.) Earlier, when I said I reflashed the base image with 3.3, it is was an effort to rule out the system image being corrupt.

                  Is there some way to clean up all the inodes being used by docker via the command line? (The docker-daemon still dies on bootup/restart, so I can't use docker commands.)

                  1 Reply Last reply Reply Quote 0
                  • J
                    jaredjohansen
                    last edited by

                    I found a partial solution. I deleted some unused, empty directories in /data (e.g., audio). This allowed me to create four free inodes.

                    I was able to run systemctl restart docker-daemon and it worked for just a few seconds before crashing. In that time, I was able to run docker ps -a and see the list of containers. I tried to docker start <my_container> but the docker-daemon had already crashed.

                    I went into /data/containers and deleted the hello world container.

                    I re-ran systemctl restart docker-daemon and was able to run docker rm <container_name> for some of the unused containers. Doing this a few times freed up ~200 inodes. The docker-daemon was able to run without crashing.

                    At that point, I was able to start <my_container>, and enter it. I was able to push my code to my git repo. This is what I set out to do.

                    Now that I have my data saved, I could nuke the /data/overlay directory and reclaim most of my inodes. I'd prefer if there was a better way to keep the /data/overlay directory clean (perhaps as a part of regular maintenance). This would be preferable to nuking the entire direcotry from time to time as it gets filled up. If there is a good way modalAI knows how to do this, please share!

                    (And thanks for the pointer about the inodes. I didn't consider that that was what was happening.)

                    1 Reply Last reply Reply Quote 0
                    • J
                      jaredjohansen
                      last edited by

                      @Eric-Katzfey, let me ask a follow-up question.

                      Currently, the docker version that is on the VOXL is v1.9. This version doesn't support commands like docker system prune or docker prune images.

                      Does ModalAI plan to update the docker version used on VOXL?

                      Is it possible to upgrade it myself? Or are there reasons why ModalAI is still using v1.9? (A google search told me the latest version of docker is 20.10.)

                      1 Reply Last reply Reply Quote 0
                      • Eric KatzfeyE
                        Eric Katzfey ModalAI Team
                        last edited by

                        We use Yocto to build the system image. It is a fairly old version (Jethro) and that really limits what we can do as far as upgrades. When you try to upgrade a single component you usually have to upgrade it's dependencies, and then the dependencies of those dependencies, etc. So upgrading the Docker version is likely a very large task and not something we are planning to do any time soon.

                        1 Reply Last reply Reply Quote 0
                        • J
                          jaredjohansen
                          last edited by

                          Good to know -- thanks, Eric!

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Powered by NodeBB | Contributors