ZFS storage upgrade

Upgraded a zfs-based storage (zfs on linux) from ~18 TiB usable-capacity to ~36 TiB usable-capacity. Had to re-create the pool for that. Here is how I did that…

First of all I removed the mirrored ZIL pair and the Hotspare just to make sure neither would cause any trouble:

root@storage:~# zpool status
  pool: storage
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Fri Jan 15 14:14:53 2016
config:
 
        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz1-0           ONLINE       0     0     0
            WD-WCC4N2xx      ONLINE       0     0     0
            SG-W6A12Gxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
          raidz1-1           ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
          raidz1-2           ONLINE       0     0     0
            SG-W6A12Gxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
        logs
          mirror-3           ONLINE       0     0     0
            zil1             ONLINE       0     0     0
            zil2             ONLINE       0     0     0
        cache
          cache1             ONLINE       0     0     0
          cache2             ONLINE       0     0     0
        spares
          WD-WCC4N4xx        AVAIL  
 
errors: No known data errors

My first try to remove the ZIL:

root@storage:~# zpool remove storage zil1
cannot remove zil1: no such device in pool
root@storage:~# zpool remove storage zil2
cannot remove zil2: no such device in pool

Ah. I have to use the full path.

root@storage:~# zpool remove storage /dev/disk/by-partlabel/zil1
cannot remove /dev/disk/by-partlabel/zil1: operation not supported on this type of pool
root@storage:~# zpool remove storage /dev/disk/by-partlabel/zil2
cannot remove /dev/disk/by-partlabel/zil2: operation not supported on this type of pool

Now I forgot something specific about the zil-device. If you have a mirrored-pair for ZIL you need to remove the mirror (the vdev) instead of the physical attached discs:

root@storage:~# zpool remove storage mirror-3

Now let me get rid of the hot spare:

root@storage:~# zpool remove storage /dev/disk/by-partlabel/WD-WCC4N4xx

Right after a zpool scrub storage the pool looks like that:

root@storage:~# zpool status
  pool: storage
 state: ONLINE
  scan: scrub repaired 0 in 1h1m with 0 errors on Wed Jan 27 18:38:57 2016
config:
 
        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz1-0           ONLINE       0     0     0
            WD-WCC4N2xx      ONLINE       0     0     0
            SG-W6A12Gxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
          raidz1-1           ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
          raidz1-2           ONLINE       0     0     0
            SG-W6A12Gxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
        cache
          cache1             ONLINE       0     0     0
          cache2             ONLINE       0     0     0
 
errors: No known data errors

The current pool consists of three raidz1 vdevs with three 3TB discs each. You might assume that this would lead to 18 TB usable space.. However, after alignment, internal reservation, TiB/TB miscalculation, there’s only 14,7 TB usable data left. Anyway. The new pool should consist of 4 raidz1 consisting of 4x 3TB. Instead of using two vdevs consisting of 8 discs I decided for the four vdev variant due to the higher IOPS.

First of all, I am picking one of the new discs to save a replication of the pool there. Just to make sure I am also saving a .tar.gz on that disc containing all the data. So cfdisk /dev/sdk followed by partprobe followed by mkfs.ext4 /dev/sdk1 and the final mount /dev/sdk1 /mnt.

Now let’s create a replication stream stored compressed:

root@storage:~# zfs snapshot -r storage@backup
root@storage:~# zfs send -R storage@backup | pigz > /mnt/backup.gz

This will take a while. You can save some time if you have enough space and your disc is fast enough by not using compression or using lz4. If you do have spare backups somewhere else, you most likely do not need this step. Once finished I destroy the original pool, so that I can create a new one:

root@storage:~# zpool destroy storage

As explained in my previous posts, I am preparing the new discs (except for sdk which holds my backup) by using a GPT partition table, creating one partition and adding the serial of the disc as partition name. Type is 39 (solaris root). That makes it easy for me to identify a failing disc.

First I need to create a fake-disk because zfs does not know a keyword like „missing“:

root@storage:~# dd if=/dev/zero of=/tmp/fakedisk bs=1 count=0 seek=3T
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00013862 s, 0.0 kB/s

Then let’s create the Pool:

zpool create -o ashift=12 storage \
  raidz1 /dev/disk/by-partlabel/WD-WCC4N2xx     /dev/disk/by-partlabel/SG-W6A12Gxx \
         /dev/disk/by-partlabel/WD-WCC4N6xx     /dev/disk/by-partlabel/SG-Z5020Fxx \
  raidz1 /dev/disk/by-partlabel/WD-WCC4N6xx     /dev/disk/by-partlabel/SG-W6A12Fxx \
         /dev/disk/by-partlabel/WD-WCC4N4xx     /dev/disk/by-partlabel/SG-W6A12Fxx \
  raidz1 /dev/disk/by-partlabel/SG-W6A12Gxx /dev/disk/by-partlabel/WD-WCC4N6xx     \
         /dev/disk/by-partlabel/SG-W6A12Fxx /tmp/fakedisk \
  raidz1 /dev/disk/by-partlabel/WD-WCAWZ2xx     /dev/disk/by-partlabel/SG-Z501ZYxx \
         /dev/disk/by-partlabel/WD-WCAWZ2xx     /dev/disk/by-partlabel/SG-Z5020Gxx \
  log mirror /dev/disk/by-partlabel/zil1 /dev/disk/by-partlabel/zil2 \
  cache /dev/disk/by-partlabel/cache1 /dev/disk/by-partlabel/cache2

Now let’s kill the fake disk:

zpool offline storage /tmp/fakedisk
rm -rf /tmp/fakedisk
zpool scrub storage

Now let’s copy the data back

root@storage:/mnt# unpigz backup.gz | zfs recv -F storage@backup

This will take a while again… Okay.. It took too long (or I made something wrong) hence ctrl+c re-created the datasets and unpacked the .tar.gz I’ve created .. 🙂 I am so impatient sometimes.. bad habit. Anyway. It is time to prepare the last drive and replace our fake-device with the remaining disc. Currently the pool looks like this:

root@storage:~# zpool status
  pool: storage
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu Jan 28 11:52:34 2016
config:
 
        NAME                 STATE     READ WRITE CKSUM
        storage              DEGRADED     0     0     0
          raidz1-0           ONLINE       0     0     0
            WD-WCC4N2xx      ONLINE       0     0     0
            SG-W6A12Gxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-Z5020Fxx      ONLINE       0     0     0
          raidz1-1           ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
            WD-WCC4N4xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
          raidz1-2           DEGRADED     0     0     0
            SG-W6A12Gxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
            /tmp/fakedisk    OFFLINE      0     0     0
          raidz1-3           ONLINE       0     0     0
            WD-WCAWZ2xx      ONLINE       0     0     0
            SG-Z501ZYxx      ONLINE       0     0     0
            WD-WCAWZ2xx      ONLINE       0     0     0
            SG-Z5020Gxx      ONLINE       0     0     0
        logs
          mirror-4           ONLINE       0     0     0
            zil1             ONLINE       0     0     0
            zil2             ONLINE       0     0     0
        cache
          cache1             ONLINE       0     0     0
          cache2             ONLINE       0     0     0
 
errors: No known data errors
 
root@storage:~# zpool iostat -v
                        capacity     operations    bandwidth
pool                 alloc   free   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
storage               285G  43.2T      1    257  2.62K  24.4M
  raidz1             71.3G  10.8T      0     64    705  6.11M
    WD-WCC4N2xx          -      -      0     30    627  2.13M
    SG-W6A12Gxx          -      -      0     27    584  2.13M
    WD-WCC4N6xx          -      -      0     30    569  2.13M
    SG-Z5020Fxx          -      -      0     27    593  2.13M
  raidz1             71.3G  10.8T      0     64    668  6.11M
    WD-WCC4N6xx          -      -      0     30    602  2.13M
    SG-W6A12Fxx          -      -      0     27    539  2.13M
    WD-WCC4N4xx          -      -      0     30    596  2.13M
    SG-W6A12Fxx          -      -      0     27    590  2.13M
  raidz1             71.3G  10.8T      0     64    671  6.11M
    SG-W6A12Gxx          -      -      0     29    583  2.13M
    WD-WCC4N6xx          -      -      0     28    614  2.13M
    SG-W6A12Fxx          -      -      0     29   1003  2.13M
    /tmp/fakedisk        -      -      0      0      0      0
  raidz1             71.3G  10.8T      0     64    641  6.11M
    WD-WCAWZ2xx          -      -      0     29    546  2.13M
    SG-Z501ZYxx          -      -      0     27    585  2.13M
    WD-WCAWZ2xx          -      -      0     29    554  2.13M
    SG-Z5020Gxx          -      -      0     27    562  2.13M
logs                     -      -      -      -      -      -
  mirror                 0  15.2G      0      0      0      0
    zil1                 -      -      0      0    168    114
    zil2                 -      -      0      0    168    114
cache                    -      -      -      -      -      -
  cache1             68.5G  95.4G      0     64    341  8.05M
  cache2             68.5G  95.4G      0     64    332  8.04M
-------------------  -----  -----  -----  -----  -----  -----

First try to replace the disc (changed the partition type to solaris root, added the partition name and issued partprobe)

root@storage:~# zpool replace storage /tmp/fakedisk /dev/disk/by-partlabel/SG-Z5020Gxx
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-partlabel/SG-Z5020Gxx contains a filesystem of type 'ext4'

pfft…

root@storage:~# zpool replace storage -f /tmp/fakedisk /dev/disk/by-partlabel/SG-Z5020Gxx

Now let’s take a look at the pool:

root@storage:~# zpool status
  pool: storage
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jan 28 19:41:48 2016
        85.0G scanned out of 1.07T at 750M/s, 0h23m to go
        5.06G resilvered, 7.75% done
config:
 
        NAME                 STATE     READ WRITE CKSUM
        storage              DEGRADED     0     0     0
          raidz1-0           ONLINE       0     0     0
            WD-WCC4N2xx      ONLINE       0     0     0
            SG-W6A12Gxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-Z5020Fxx      ONLINE       0     0     0
          raidz1-1           ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
            WD-WCC4N4xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
          raidz1-2           DEGRADED     0     0     0
            SG-W6A12Gxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
            replacing-3      OFFLINE      0     0     0
              /tmp/fakedisk  OFFLINE      0     0     0
              SG-Z5020Gxx    ONLINE       0     0     0  (resilvering)
          raidz1-3           ONLINE       0     0     0
            WD-WCAWZ2xx      ONLINE       0     0     0
            SG-Z501ZYxx      ONLINE       0     0     0
            WD-WCAWZ2xx      ONLINE       0     0     0
            SG-Z5020Gxx      ONLINE       0     0     0
        logs
          mirror-4           ONLINE       0     0     0
            zil1             ONLINE       0     0     0
            zil2             ONLINE       0     0     0
        cache
          cache1             ONLINE       0     0     0
          cache2             ONLINE       0     0     0
 
errors: No known data errors

Oh my! Resilvering at ~750 MB/s? Nice. However, I am unable to explain this bandwidth. zpool iostat 1 jumps between different values like 26 and 83 MB/s and iostat -dxh shows more like 95 MB/s if I sum up writes and reads in KB. Because a single disc can only do something like 95-150 MB/s (writing) those 750 MB/s are probably some summed up value. 20 minutes later the pool was back. Just to make sure, a scrub:

root@storage:~# zpool status
  pool: storage
 state: ONLINE
  scan: scrub in progress since Thu Jan 28 21:12:03 2016
        115G scanned out of 1.07T at 1.47G/s, 0h11m to go
        0 repaired, 10.49% done
config:
 
        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz1-0           ONLINE       0     0     0
            WD-WCC4N2xx      ONLINE       0     0     0
            SG-W6A12Gxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-Z5020Fxx      ONLINE       0     0     0
          raidz1-1           ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
            WD-WCC4N4xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
          raidz1-2           ONLINE       0     0     0
            SG-W6A12Gxx      ONLINE       0     0     0
            WD-WCC4N6xx      ONLINE       0     0     0
            SG-W6A12Fxx      ONLINE       0     0     0
            SG-Z5020Gxx      ONLINE       0     0     0
          raidz1-3           ONLINE       0     0     0
            WD-WCAWZ2xx      ONLINE       0     0     0
            SG-Z501ZYxx      ONLINE       0     0     0
            WD-WCAWZ2xx      ONLINE       0     0     0
            SG-Z5020Gxx      ONLINE       0     0     0
        logs
          mirror-4           ONLINE       0     0     0
            zil1             ONLINE       0     0     0
            zil2             ONLINE       0     0     0
        cache
          cache1             ONLINE       0     0     0
          cache2             ONLINE       0     0     0
 
errors: No known data errors

No Comments

Post a Comment