Taking a look at MinIO

Recently I played around with MinIO which describes itself as being » The 100% Open Source, Enterprise-Grade, Amazon S3 Compatible Object Storage «. Let us take a closer look.

Installation

The MinIO installation is quite simple. If it’s not in your distributions package management it should be enough to download the server (and it might be helpful to download the client). Make both files executable and probably move them to a better place like /usr/bin or /usr/local/bin. For the latter you might want to make sure that /usr/local/bin is in your Shell Environments PATH variable.

# The server
wget https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio
# the client
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc

Unlike what I would have expected (a filesystem) MinIO just uses the filesystem you’re using. You can feed it with a single disk by just creating a filesystem as well as a folder on your disk. Since all my systems are using ZVOLs (ZFS) I created one 500 GB volume on two ZFS systems, formatted them with ext4 and to make sure it does not do weird things with the lost+found folder I just created a folder called data0. The ext4 filesystem is mounted to /storage.

Running minio is as simple as running ./minio server /storage/data. Hence my first impression was very positive. Instead of dealing with thousands of settings and finding my way through the documentation a simple wget, chmod and the final ./minio command gets everything up and running. MinIO allows you to setup object storage within seconds!

How about SSL/TLS?

Just install certbot through your package-management, configure it standalone for a domain by issuing certbot certonly --standalone and your certificates will end up in /etc/letsencrypt/live/. Now you’re able to symlink them so that MinIO will use them (without ANY configuration). I do symlink them to /etc/minio/certs:

root@minio1:/etc/minio/certs# ls -la
total 8
drwxr-xr-x 2 root root 4096 Jun 20 12:01 .
drwxr-xr-x 3 root root 4096 Jun 20 12:00 ..
lrwxrwxrwx 1 root root   57 Jun 20 12:01 private.key -> ../../letsencrypt/live/minio1.jeanbruenn.info/privkey.pem
lrwxrwxrwx 1 root root   59 Jun 20 12:00 public.crt -> ../../letsencrypt/live/minio1.jeanbruenn.info/fullchain.pem

Note that the key’s filename has to be private.key and the certificates filename has to be public.crt. Mind that I do add --config-dir /etc/minio --certs-dir /etc/minio/certs to the ./minio command:

root@minio1:~# ./minio server --config-dir /etc/minio --certs-dir /etc/minio/certs /storage/data/
 
Endpoint:  https://185.37.145.134:9000  https://127.0.0.1:9000    
AccessKey: xxxx
SecretKey: xxxxxxxx
 
Browser Access:
   https://185.37.145.134:9000  https://127.0.0.1:9000    
 
Command-line Access: https://docs.min.io/docs/minio-client-quickstart-guide
   $ mc config host add myminio https://185.37.145.134:9000 xxxx xxxxxxxx
 
Object API (Amazon S3 compatible):
   Go:         https://docs.min.io/docs/golang-client-quickstart-guide
   Java:       https://docs.min.io/docs/java-client-quickstart-guide
   Python:     https://docs.min.io/docs/python-client-quickstart-guide
   JavaScript: https://docs.min.io/docs/javascript-client-quickstart-guide
   .NET:       https://docs.min.io/docs/dotnet-client-quickstart-guide

The client

The client allows pretty cool stuff e.g. with the mirror functionality. You may mirror your stuff from one MinIO instance to another:

root@minio1:~# ./mc mirror --watch /storage/data/photos minio2/photos
...40214.jpg:  6.52 GiB / 6.52 GiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 32.89 MiB/s

For that functionality you just need to add the instance to your configuration as explained in the comments when starting minio. I just replaced myminio with minio1 as well as minio2:

mc config host add minio2 https://minio2.jeanbruenn.info:9000 xxxx xxxxxxxx

It doesn’t matter if you use a domain or an IP here it seems (as long as the domain points to the correct ip). Right after that you may add minio2 in the mc mirror command. However, the client also allows you to mirror a non-minio directory / file to your minio storage:

jean@asuna:~$ ./mc mirror --watch /home/jean/Pictures minio1/photos
..._1063.jpg:  69.79 MiB / 79.57 MiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░┃ 1.04 MiB/s

Which is pretty cool. Not so cool is the lack of resume functionality in case you’re STRG+C’ing the transfer. If the mirror transfer is failing on it’s own there should be a session which you’re able to resume. Sadly I noticed that functionality way too late. When I stopped a 100 GB transfer and started the mirror again in the hope it would resume on its own – instead of that the old session is gone and I wasn’t able to resume the large transfer. Rsync is a lot more mature here; Definately a feature I’ll most likely ask at MinIO to implement (resuming by comparing what’s already transferred instead of requiring a session / session ID).

The client offers a lot more features. Just to show you one more let’s take a look at ls:

jean@asuna:~$ ./mc ls minio1
[2019-06-21 10:04:53 CEST]      0B photos/
jean@asuna:~$ ./mc ls minio1/photos/
[2019-06-23 12:02:05 CEST]      0B .dtrash/
[2019-06-23 12:02:05 CEST]      0B 2006/
[2019-06-23 12:02:05 CEST]      0B 2007/
[2019-06-23 12:02:05 CEST]      0B 2008/
[2019-06-23 12:02:05 CEST]      0B 2009/
[2019-06-23 12:02:05 CEST]      0B 2010/
[2019-06-23 12:02:05 CEST]      0B 2011/
[2019-06-23 12:02:05 CEST]      0B 2012/
[2019-06-23 12:02:05 CEST]      0B 2013/
[2019-06-23 12:02:05 CEST]      0B 2014/
[2019-06-23 12:02:05 CEST]      0B 2015/
[2019-06-23 12:02:05 CEST]      0B 2016/
[2019-06-23 12:02:05 CEST]      0B 2017/
[2019-06-23 12:02:05 CEST]      0B 2018/
[2019-06-23 12:02:05 CEST]      0B 2019/
jean@asuna:~$ ./mc ls minio1/photos/2009
[2019-06-23 12:02:09 CEST]      0B 01/
[2019-06-23 12:02:09 CEST]      0B 02/
[2019-06-23 12:02:09 CEST]      0B 03/
[2019-06-23 12:02:09 CEST]      0B 04/
[2019-06-23 12:02:09 CEST]      0B 05/
jean@asuna:~$ ./mc ls minio1/photos/2009/01
[2019-06-23 12:02:12 CEST]      0B 01/
[2019-06-23 12:02:12 CEST]      0B 31/
jean@asuna:~$ ./mc ls minio1/photos/2009/01/01
[2019-06-20 21:12:23 CEST]  942KiB 100_1125.jpg
[2019-06-20 21:12:22 CEST]  933KiB 100_1126.jpg
[2019-06-20 21:12:23 CEST]  967KiB 100_1127.jpg

Pretty cool isn’t it? However, it would be cool if they’d display a size for the folders/buckets. Another feature I’d love to have implemented (though I guess that would slow things down). There is also find:

jean@asuna:~$ ./mc find minio1/photos --name "*DSC03091*"
minio1/photos/2019/06/15/DSC03091.ARW
minio1/photos/2019/06/15/DSC03091.ARW.xmp

Now, if you keep in mind that you can access AWS / Google and S3-compatible object storages the same way with mc you’ll notice how mighty that stuff is.

Is Minio the ZFS of cloud storage?

The Author of Minio, the ZFS of cloud storage thinks so. I’m somewhat skeptical. Don’t get me wrong: He has a point. In his article he states what ZFS is best known for (abstracting away the physical storage device boundaries .. removal of the need to manually handle physical storage or worry about their individual capacities .. ability to detect data corruption and recovery .. scaleability ..) what he did not do is comparing the features on it’s own.

  • Does MinIO allow me to tune the blocksize depending on my use-case?
  • Does MinIO use a variable blocksize?
  • Does MinIO allow me to set the compression type?
  • May I enable compression on a per-bucket (as in ZFS per dataset/volume) basis?
  • A raidz1 (comparable to raid5) requires at least 3 disks. How many nodes do I need for full redundancy with MinIO?
  • Does MinIO really scale as much as ZFS or is there a limit in Nodes/Buckets?
  • May I create snapshots? Are those snapshots incremental?

I believe that such stuff makes sense in cloud-storage. (23rd June 2019) Let’s go through those questions:

Blocksize: I believe the block size is hardcoded (I wasn’t able to find anything to configure it through a config file or using mc when creating a bucket). Also, MinIO is bound to the blocksize used by the underlying filesystem. In case you’re using ext4 you’re bound to 4K blocks, in case you’re using XFS you’re bound to the blocksize of your pagesize which usually is 4K.

Compression: No, MinIO does not allow me to set the compression type – it’s using golang/snappy. So I’m not able to give my bucket archive gzip while using something fast for my documents bucket. One of the key features of ZFS which I really really love is that I am able to configure on a per dataset/per volume basis. MinIO does not allow me to enable and disable compression on a per-bucket basis. Furthermore if using encryption you’re not able to use compression at all.

Nodes: In ZFS you can build up a mirror with just two disks. You can build a raidz1 with at least three disks. To use the distribution stuff of MinIO you’re going to need at least 4 disks (or nodes which on its own represent 4 disks). You MAY just create two folders on your same disk (data0, data1) and do the same on a 2nd node to fullfil the requirement for 4 disks. While for ZFS there is no maximum (those limits are bound to your CPU/RAM/Mainboard/Chassis.. Okay, to be honest, ZFS has limits as well – they’re however so big that we won’t likely reach them in the near future) the maximum for MinIO is 32 nodes (not very likely that one would need more I believe)

Snapshots: No – I wasn’t able to find anything about creating a snapshot of a bucket or of the whole MinIO based cloud storage.

Comparing ZFS with MinIO is pretty unfair. Stating „MinIO, the ZFS of cloud storage“ is in my opinion adventurous. Personally for that claim to be true I would expect more (per bucket configuration, snapshots, different compression settings, the possibility to configure specific per-bucket optimizations for my use-cases like the block size). Just as I initially wrote the author does have valid points, though: The bitrot detection is a nice thing. The use of extremely fast hashing algorithms as well as erasure-coding is cool. The efforts spent into data-recovery are cool as well. The mixup of nodes and disks is a cool thing – you do not have to worry much about the disk structure underlying your minio storage.

That’s all stuff which makes me love working with MinIO. I’m just not that much of a fan of the articles statement.

Update

Furthermore, it looks like there is no dynamic expansion possible. You’re expanding by adding more clusters / using federation.

  • the get/set functionality for mc admin config is not ready, yet. To load a new configuration you’ll have to load the whole configuration.
  • --config-dir seems to be deprecated (Deprecate config-dir bring in certs-dir for TLS configuration).
  • some settings HAVE to be set as environment variables; I really really dislike that; fundamental configuration shouldn’t be given in environment. Not exactly sure why MinIO does so much with environment variables…
  • I was searching on a way to set the reduced redundancy storage class – on the MinIO Slack channel my question has been answered already: -storage-class REDUCED_REDUNDANCY as parameter to minio will do that. Another option is to use that together with mc mirror so when synchronizing your data.

Distributed MinIO

MinIO will generate an ACCESS KEY as well as a SECRET KEY for you. You may however use your own by exporting two variables. Those keys have to match on all nodes if you’re using a distributed setup.

export MINIO_ACCESS_KEY=xxxx
export MINIO_SECRET_KEY=xxxxxxxx

Just as explained before, you don’t need two disks per node (one disk with different folders will work as well – with the implication that if one disk fails, two minio disks have failed). Assuming that your filesystem specific for MinIO is mounted on /storage just issue:

mkdir /storage/data0
mkdir /storage/data1

Do that on both nodes. Now we’re modifying the minio start command to contain all 4 disks. Instead of directly using the local path we’re using an http/https URI (e.g. http://ip.ad.dr.ess/bucket):

./minio server --config-dir /etc/minio --certs-dir /etc/minio/certs \
  https://minio1.jeanbruenn.info/storage/data0 \
  https://minio1.jeanbruenn.info/storage/data1 \
  https://minio2.jeanbruenn.info/storage/data0 \
  https://minio2.jeanbruenn.info/storage/data1

MinIO will start up as usually. It will wait for all disks being connected (so start minio the same way on the second node). One of the difference you can easily notice is that files are stored split:

jean@asuna:~$ ./mc ls minio1/photos/2006/01/08/100_1220.jpg
[2019-06-20 21:11:07 CEST]  960KiB 100_1220.jpg
root@minio1:~# ls -lah /storage/data0/photos/2006/01/08/100_1220.jpg/
total 496K
drwxr-xr-x 2 root root 4.0K Jun 20 21:11 .
drwxr-xr-x 7 root root 4.0K Jun 20 21:11 ..
-rw-r--r-- 1 root root 481K Jun 20 21:11 part.1
-rw-r--r-- 1 root root  511 Jun 20 21:11 xl.json
 
root@minio1:~# ls -lah /storage/data1/photos/2006/01/08/100_1220.jpg/
total 496K
drwxr-xr-x 2 root root 4.0K Jun 20 21:11 .
drwxr-xr-x 7 root root 4.0K Jun 20 21:11 ..
-rw-r--r-- 1 root root 481K Jun 20 21:11 part.1
-rw-r--r-- 1 root root  511 Jun 20 21:11 xl.json
 
root@minio2:~# ls -lah /storage/data1/photos/2006/01/08/100_1220.jpg/
total 496K
drwxr-xr-x 2 root root 4.0K Jun 20 21:11 .
drwxr-xr-x 7 root root 4.0K Jun 20 21:11 ..
-rw-r--r-- 1 root root 481K Jun 20 21:11 part.1
-rw-r--r-- 1 root root  511 Jun 20 21:11 xl.json

I just checked the part.1 checksums (Sha1) they all differ. By doing that I also noticed, that I had a typo when starting MinIO which leads to it creating the path itself:

root@minio2:~# ls -la /stor
storage/ storge/

On the one hand that is cool (note how I missed the a in storage and MinIO just created that path on my limited root disk) on the other hand I’d prefer MinIO to fail if a path is not available instead of just creating it (at least for the data directory).

Adding a file to minio1 makes it appear on minio2 which is pretty cool. Again: I’ve built a redundant cloud storage within seconds!

Here’s the output of mc admin info on a fresh install:

root@minio2:~# ./mc admin info minio1
●  minio1.jeanbruenn.info:9000
   Status : online
   Uptime : 38 minutes 
  Version : 2019-06-19T18:24:42Z
  Storage : Used 65 KiB
   Drives : 2/2 OK
 
●  minio2.jeanbruenn.info:9000
   Status : online
   Uptime : 38 minutes 
  Version : 2019-06-19T18:24:42Z
  Storage : Used 65 KiB
   Drives : 2/2 OK

All in All I really like MinIO. I’ll cover connecting to it in PHP using FlySystem in my next blog post.

No Comments

Post a Comment