LVM by Example

In this post I’ll be discussing the fundamentals of the Logical Volume Manager in Linux, usually simply referred to as LVM. I’ve used LVM occasionally over the years, but for the most part I would just create a single big partition on my disk, toss XFS on it and call it a day. Recently that changed when I decided to replace my aging home media server with a new beast of a box that I wanted to do a lot more than simply serve up content. I knew I would need lots of storage, but didn’t necessarily know how I wanted to partition my disks ahead of time. I also wanted to move away from btrfs, which I never had a big problem with but I felt it would be better to use a more mainstream filesystem.

On top of having needs for media, I wanted this box to act as a private file share. My laptop, with only 500GB SSD, just isn’t big enough to hold my photos and videos I regularly shoot. A hundred thousand photos and videos taken with a 24 megapixel camera takes up a ton of space, and the videos I’m recording chew up even more space. Not only do I need lots of raw storage but I want fast access to the stuff I’m working with at the moment. SSD speeds access is important when accessing hundreds of files and I don’t want to be back in the days of slow spinning platters.

After a bit of soul searching and a ton of research I finally realized LVM would help me with all my needs. Instead of partitioning disks ahead of time and getting everything right the first time, I’ve decided to let LVM handle all the management of the disks. Not only can I trivially grow a partition but I can ensure I get fast access to my most frequently requested files by leveraging lvmcache.

To demonstrate basic LVM commands and usage I’ve launched an i3 spot instance and attached an additional 100GB EBS volume. The i3 instances have an NVMe drive which we would normally use as our primary storage, if this was a database server.

First, we can see our available devices by using the lsblk command. The xvdf and nvme0n1 are the two devices we can work with:

root@ip-172-30-0-151:~# lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
xvda    202:0    0     8G  0 disk
└─xvda1 202:1    0     8G  0 part /
xvdf    202:80   0   100G  0 disk
nvme0n1 259:0    0 442.4G  0 disk

Gentle Overview

It’s important to understand three concepts when working with LVM.

  • Physical Volumes
  • Volume Groups
  • Logical Volumes

Physical volumes are usually disks. Volume groups as a bunch of disks put together. Logical volumes are slices we can take from the volume groups. Here’s a crappy diagram that might help visualize what’s going on:

Creating Volumes and Filesystems

The first thing we need to do is create a Physical Volume. This initializes a disk for use with LVM. This will only take a second. This is simply telling LVM that we’ll be using the device later:

root@ip-172-30-0-151:~# pvcreate /dev/nvme0n1
  Physical volume "/dev/nvme0n1" successfully created

Once we’ve registered the disk for LVM usage, we can create our first Volume Group. A volume group can be associated with multiple physical volumes. You can think of a volume group as a pool of storage which we’ll later be able to allocate space in the form of logical volumes. Creating a volume group can be done using the vgcreate command. In the following example, I’ll create a volume group called “demo”, and add my first physical volume:

root@ip-172-30-0-151:~# vgcreate demo /dev/nvme0n1
  Volume group "demo" successfully created

The vgs command can be used to list all the volume groups. The -v flag gives us more verbose output. You’ll see we now have a single volume group called demo that’s the size of the entire NVMe drive:

root@ip-172-30-0-151:~# vgs -v
    Using volume group(s) on command line.
  VG   Attr   Ext   #PV #LV #SN VSize   VFree   VG UUID                                VProfile
  demo wz--n- 4.00m   1   0   0 442.38g 442.38g dPu5pq-mxMM-dZbu-8vc1-PYsc-Snhf-f5qNWk

Next we’ll create a logical volume using lvcreate. We can pass a size using the -L flag.

root@ip-172-30-0-151:~# lvcreate -L100G demo
  Logical volume "lvol0" created.

You can see above LVM has created a new volume and named it for us. We can have it use our own name by supplying the -n flag. We’ll probably want this most of the time:

root@ip-172-30-0-151:~# lvcreate -L100G -n mysecondlv demo
  Logical volume "mysecondlv" created.

When we view the logical volumes with lvs, we see the two volumes just created:

root@ip-172-30-0-151:~# lvs
  LV         VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0      demo -wi-a----- 100.00g
  mysecondlv demo -wi-a----- 100.00g

Now that we have a logical volume, we can put a filesystem on it. Let’s use XFS:

root@ip-172-30-0-151:~# mkfs.xfs /dev/demo/mysecondlv 
meta-data=/dev/demo/mysecondlv   isize=512    agcount=4, agsize=6553600 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=12800, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

I’ll keep things simple and mount the volume at /root/myvolume. Note the total space reported by df output (trimmed for readability):

root@ip-172-30-0-151:~# mkdir myvolume
root@ip-172-30-0-151:~# mount /dev/demo/mysecondlv myvolume
root@ip-172-30-0-151:~# df -h
Filesystem                   Size  Used Avail Use% Mounted on
udev                          15G     0   15G   0% /dev
tmpfs                        3.0G  8.6M  3.0G   1% /run
/dev/xvda1                   7.7G  847M  6.9G  11% /
...
/dev/mapper/demo-mysecondlv  100G   33M  100G   1% /root/myvolume

We can remove the first volume (the one we let LVM name) easily using lvremove:

root@ip-172-30-0-151:~# lvremove /dev/demo/lvol0
Do you really want to remove and DISCARD active logical volume lvol0? [y/n]: y
  Logical volume "lvol0" successfully removed

Expanding a Volume

We have a ton of free space on our demo volume group. Let’s give our filesystem a little more space to work with. The lvextend command lets us grow a volume. We can specify a relative size with the -L flag by prefixing a size with a +. For instance, we can grow the LV by 50GB by doing the following:

root@ip-172-30-0-151:~# lvextend -L +50G demo/mysecondlv
  Size of logical volume demo/mysecondlv changed from 100.00 GiB (25600 extents) to 150.00 GiB (38400 extents).
  Logical volume mysecondlv successfully resized.

We’ve increased the volume size but the filesystem won’t know to take advantage of the new space. We can use xfs_growfs to take over the rest of the available space. It’s an online operation, no need to unmount:

root@ip-172-30-0-151:~# xfs_growfs myvolume
meta-data=/dev/mapper/demo-mysecondlv isize=512    agcount=4, agsize=6553600 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=12800, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 26214400 to 39321600

Checking df again we see our filesystem has increased capacity:

root@ip-172-30-0-151:~# df -h
Filesystem                   Size  Used Avail Use% Mounted on
udev                          15G     0   15G   0% /dev
tmpfs                        3.0G  8.6M  3.0G   1% /run
/dev/xvda1                   7.7G  847M  6.9G  11% /
...
/dev/mapper/demo-mysecondlv  150G   33M  150G   1% /root/myvolume

Now we’ve gone through the exercise of creating physical volumes, volume groups, and logical volumes. Let’s remove the volume group we just created using vgremove. Note we have to unmount the volume first. If we don’t, LVM will complain:

root@ip-172-30-0-151:~# umount myvolume
root@ip-172-30-0-151:~# vgremove demo
Do you really want to remove volume group "demo" containing 1 logical volumes? [y/n]: y
Do you really want to remove and DISCARD active logical volume mysecondlv? [y/n]:
Do you really want to remove and DISCARD active logical volume mysecondlv? [y/n]: y
  Logical volume "mysecondlv" successfully removed
  Volume group "demo" successfully removed

At this point, we’ve created (and removed) physical volumes, volume groups, logical volumes, and put a filesystem on a LV. We’ve also expanded the filesystem on the fly, which can be pretty handy.

Using SSD Cache with Spinning Rust

Let’s take a look at something a little more complex. Next we’ll create a logical volume on a spinning disk, using the SSD to cache the most frequently used blocks. Then we’ll tie it all together and create a filesystem.

First we’ll ensure we have two physical volumes. We’ve only used the NVMe drive so far, so I’ll go ahead and prepare the slower EBS volume (referred to as origin) for use:

root@ip-172-30-0-151:~# pvcreate /dev/xvdf
  Physical volume "/dev/xvdf" successfully created

Note: Yes, I am using a smaller drive for my origin volume than the cache volume. This is only to save on cash in case I forgot to destroy it later. Normally your origin volume is much larger than your cache.

Then we’ll need to create a new volume group with the two physical volumes added to it. Using LVM’s block caching requires all of the volumes to be in the same volume group.

root@ip-172-30-0-151:~# vgcreate demo /dev/nvme0n1 /dev/xvdf
  Volume group "demo" successfully created

The vgdisplay command can tell us a lot about the volume we’ve just created. Note the two physical volumes at the end.

root@ip-172-30-0-151:~# vgdisplay -v
    Using volume group(s) on command line.
  --- Volume group ---
  VG Name               demo
  System ID
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               542.37 GiB
  PE Size               4.00 MiB
  Total PE              138847
  Alloc PE / Size       0 / 0
  Free  PE / Size       138847 / 542.37 GiB
  VG UUID               pxg3mf-Tdko-om17-Hs66-R4uZ-0MaG-2xoo19

  --- Physical volumes ---
  PV Name               /dev/nvme0n1
  PV UUID               HsreBN-6Low-fygm-mCWC-cAXe-NJrl-21PYwU
  PV Status             allocatable
  Total PE / Free PE    113248 / 113248

  PV Name               /dev/xvdf
  PV UUID               Mrltt7-BBi2-1ded-dRAQ-98GA-dsXc-fsaHQs
  PV Status             allocatable
  Total PE / Free PE    25599 / 25599

Now that we have our two disks in the volume group, we can set up the cache and origin (slow disk). First, create the volume for the origin. Note that I’m explicitly specifying the slower drive, /dev/xvdf for my origin:

root@ip-172-30-0-151:~# lvcreate -n slow -L80G demo /dev/xvdf
  Logical volume "slow" created.

For the cache, we’ll need two volumes. One for the cache itself, and one for the cache metadata. According to the man page

The size of this LV should be 1000 times smaller than the cache data LV, with a minimum size of 8MiB.

I was feeling lazy so I used convenient numbers. Yes, I’m wasting space:

root@ip-172-30-0-151:~# lvcreate -n cache -L20G demo /dev/nvme0n1
  Logical volume "cache" created.

root@ip-172-30-0-151:~# lvcreate -n meta -L 1G demo /dev/nvme0n1
  Logical volume "meta" created.

All three volumes can be seen by lvs:

root@ip-172-30-0-151:~# lvs -a demo
  LV    VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cache demo -wi-a----- 20.00g
  meta  demo -wi-a-----  1.00g
  slow  demo -wi-a----- 80.00g

We need to tell LVM to create a cache pool. lvconvert is the command we’ll use for that. We tell LVM which volume to use as our cache and which to use as our meta:

root@ip-172-30-0-151:~# lvconvert --type cache-pool --poolmetadata demo/meta demo/cache
  WARNING: Converting logical volume demo/cache and demo/meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Do you really want to convert demo/cache and demo/meta? [y/n]: y
  Converted demo/cache to cache pool.

Now that we’ve converted our cache and meta volumes into a cache pool, they’ll no longer show up when we use lvs alone. We’ll need to pass the -a flag:

root@ip-172-30-0-151:~# lvs -a
  LV              VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cache           demo Cwi---C--- 20.00g
  [cache_cdata]   demo Cwi------- 20.00g
  [cache_cmeta]   demo ewi-------  1.00g
  [lvol0_pmspare] demo ewi-------  1.00g
  slow            demo -wi-a----- 80.00g

Next we associate the cache pool with the slow volume:

root@ip-172-30-0-151:~# lvconvert --type cache --cachepool demo/cache demo/slow
  Logical volume demo/slow is now cached.

Now we can create our filesystem:

root@ip-172-30-0-151:~# mkfs.xfs /dev/demo/slow
meta-data=/dev/demo/slow         isize=512    agcount=16, agsize=1310704 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=20971264, imaxpct=25
         =                       sunit=16     swidth=16 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=10240, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

We’ll mount the drive somewhere convenient:

root@ip-172-30-0-151:~# mkdir whatever
root@ip-172-30-0-151:~# mount /dev/demo/slow whatever/
root@ip-172-30-0-151:~/whatever# df -h
Filesystem             Size  Used Avail Use% Mounted on
udev                   7.5G     0  7.5G   0% /dev
tmpfs                  1.5G  8.5M  1.5G   1% /run
/dev/xvda1             7.7G  848M  6.9G  11% /
tmpfs                  7.5G     0  7.5G   0% /dev/shm
tmpfs                  5.0M     0  5.0M   0% /run/lock
tmpfs                  7.5G     0  7.5G   0% /sys/fs/cgroup
tmpfs                  1.5G     0  1.5G   0% /run/user/1000
/dev/mapper/demo-slow   80G   33M   80G   1% /root/whatever

At this point we now have the ability to leverage the cost effectiveness of large slow spinning drives, while getting the performance of SSDs for the data we access most frequently. There’s quite a bit more to LVM to explore, much more than I can cover in a single coherent post. In a future post, I’ll show how to create more complex disk arrangements, take snapshots, and benchmark your configuration. If this post has been helpful, please reach out on Twitter, I’m @rustyrazorblade!

If you found this post helpful, please consider sharing to your network. I'm also available to help you be successful with your distributed systems! Please reach out if you're interested in working with me, and I'll be happy to schedule a free one-hour consultation.