Aspiring Architect: Data Persistence Models in Docker Containers

A container has different layers starting with Minimal Subset of OS topped by Container Filesystem topped by Application layer topped by Hosting layer. All these layers are read-only.

There is a top layer called Container Runtime layer which will be in a Read/Write state.

The data on Container Runtime layer is persistent only when the container is stopped/started . If a continer is deleted, this data will be lost forever. Also this data is isolated only to that continer and cannot be shared with other containers.

So lets look at better data persistance models to share data between different containers on a Host.

Volume and Bind Mount are two ways of persistent data storage thats avilable on a Host, which can be accessed (read/write) by multiple containers.

Volume is the storage created and managed by Docker. This means no containers can go beyond the boundaries of docker while working with volumes.

Bind Mount is the storage directly from file system of the host file. So if there is a malicious code deployed in a container, it can break the host by manipulating the host filesystem.

By now looking at color coding you should have understood that Volume is a better/safer way of storing and sharing data between containers. Let me show you by a demonstration.

For below demonstration, i have Docker Desktop, Windows Subsystem for Linux 2.0, Ubuntu 18.04 LTS installed on my dev machine.

This will enable my windows machine to run a Linux machine which i will be using as a host.

Lets work with Volume first

Let me open the Linux server and check the docker version first.

Then i will create a volume and i will name it "sharedvolume". Even if there is no folders at this point, volume will create necessary folders inside docker space.

Then i will create a linux container named "sender" with the sharedvolume mounted as "/app" folder inside container

docker --version
docker volume create sharedvolume
docker container run -dit --name sender --mount type=volume,source=sharedvolume,target=/app ubuntu

Lets list the running containers with Docker Container ls command.

Make a note of the CONTAINER ID of the sender container.

Lets access the filesystem of the container and from inside the sender container i will create a text file with intent of storing some data in the volume(/app folder).

docker attach <starting 4 letters of container id>
cd /app
echo "This is the data i want to share with receiver container">volumedata.txt
ls
more volumedata.txt
exit

more volumedata.txt is a command to read the content of that file which is saved inside volume.

Now lets create the second linux container named "receiver" with the same sharedvolume mounted as "/app" folder inside container. Then access the receiver container's file system, go to mounted sharedvolume (/app folder) and see if we can find volumedata.txt file and its content.

docker container run -dit --name receiver --mount type=volume,source=sharedvolume,target=/app ubuntu
docker container ls
docker attach <first 4 letters of receiver container id>
cd /app
ls
more volumedata.txt

See the data saved from sender container was now accessible in receiver container.

Let me show the best part. If you open explorer on your windows dev machine and go to "\\wsl$\docker-desktop-data\version-pack-data\community\docker\volumes\sharedvolume\_data" path, you can see the volumedata.txt file here.

if you look at the path, you can clearly see that this data is created/stored insode the content of Docker.

Now lets stop both sender and receiver containers and delete them.

docker container stop sender receiver
docker container rm sender receiver

Lets work with Bind Mount now.

Unlike Volume, you Bond Mount will not create folders, so you need to use an existing path/folder on your Linux host to store data.

Frist step is to create a folder called "bindstorage".

Then create the sender container again, but this time with bind-mount as the data storage mapped to /app folder.

The i will access sender container filesystem, create a binddata.txt with some content inside /app folder in the sender container.

mkdir bindstorage
ls
docker container run -dit --name sender --mount type=bind,source="$(pwd)"/bindstorage,target=/app ubuntu
docker container ls
docker attach <first 4 letters of sender container>
cd app
echo "this is the data saved in bindmount">bindmountdata.txt
ls
more bindmountdata.txt
exit

Now lets create receiver container wiht same bind-mount storage to data sharing.

Then lets see if we can find bindmountdata.txt file and its content.

docker container run -dit --name receiver --mount type=bind,source=$(pwd)/bindstorage,target=/app ubuntu
docker container ls
docker attach <first 4 letters of receiver container id>
cd app
ls
more bindmountdata.txt
exit

The data saved from sender container can now be read in receiver container.

You may have a question "apart from some syntax changes, both seems like same? why is volume is preffered one?"

Let me show the worst part of Bind-Mount storage model.

If you open the file explorer on your windows dev machine, and go to "\\wsl$\Ubuntu-18.04\home\linuxusr\bindstorage" path, you can see the bindmountdata.txt file.

This means the storage used in on the file system of the Hosting machine and with a bit of tweaking, any malicious code in a container can mess-up the Linux host and can break all other containers on that host, even scan for the details of other images and containers on the host.

Check this dirtyCOW which is a famous container vulnerability back in the day.

Hope i am helpful to some fellow developers.

24/05/2021

Data Persistence Models in Docker Containers

1 comment: