LVM by Example
In this post I’ll be discussing the fundamentals of the Logical Volume Manager in Linux, usually simply referred to as LVM. I’ve used LVM occasionally over the years, but for the most part I would just create a single big partition on my disk, toss XFS on it and call it a day. Recently that changed when I decided to replace my aging home media server with a new beast of a box that I wanted to do a lot more than simply serve up content. I knew I would need lots of storage, but didn’t necessarily know how I wanted to partition my disks ahead of time. I also wanted to move away from btrfs, which I never had a big problem with but I felt it would be better to use a more mainstream filesystem.
On top of having needs for media, I wanted this box to act as a private file share. My laptop, with only 500GB SSD, just isn’t big enough to hold my photos and videos I regularly shoot. A hundred thousand photos and videos taken with a 24 megapixel camera takes up a ton of space, and the videos I’m recording chew up even more space. Not only do I need lots of raw storage but I want fast access to the stuff I’m working with at the moment. SSD speeds access is important when accessing hundreds of files and I don’t want to be back in the days of slow spinning platters.
After a bit of soul searching and a ton of research I finally realized LVM would help me with all my needs. Instead of partitioning disks ahead of time and getting everything right the first time, I’ve decided to let LVM handle all the management of the disks. Not only can I trivially grow a partition but I can ensure I get fast access to my most frequently requested files by leveraging lvmcache.
To demonstrate basic LVM commands and usage I’ve launched an i3 spot instance and attached an additional 100GB EBS volume. The i3 instances have an NVMe drive which we would normally use as our primary storage, if this was a database server.
First, we can see our available devices by using the
lsblk command. The
nvme0n1 are the two devices we can work with:
root@ip-172-30-0-151:~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 8G 0 disk └─xvda1 202:1 0 8G 0 part / xvdf 202:80 0 100G 0 disk nvme0n1 259:0 0 442.4G 0 disk
It’s important to understand three concepts when working with LVM.
- Physical Volumes
- Volume Groups
- Logical Volumes
Physical volumes are usually disks. Volume groups as a bunch of disks put together. Logical volumes are slices we can take from the volume groups. Here’s a crappy diagram that might help visualize what’s going on:
Creating Volumes and Filesystems
The first thing we need to do is create a Physical Volume. This initializes a disk for use with LVM. This will only take a second. This is simply telling LVM that we’ll be using the device later:
root@ip-172-30-0-151:~# pvcreate /dev/nvme0n1 Physical volume "/dev/nvme0n1" successfully created
Once we’ve registered the disk for LVM usage, we can create our first Volume Group. A volume group can be associated with multiple physical volumes. You can think of a volume group as a pool of storage which we’ll later be able to allocate space in the form of logical volumes. Creating a volume group can be done using the
vgcreate command. In the following example, I’ll create a volume group called “demo”, and add my first physical volume:
root@ip-172-30-0-151:~# vgcreate demo /dev/nvme0n1 Volume group "demo" successfully created
vgs command can be used to list all the volume groups. The
-v flag gives us more verbose output. You’ll see we now have a single volume group called
demo that’s the size of the entire NVMe drive:
root@ip-172-30-0-151:~# vgs -v Using volume group(s) on command line. VG Attr Ext #PV #LV #SN VSize VFree VG UUID VProfile demo wz--n- 4.00m 1 0 0 442.38g 442.38g dPu5pq-mxMM-dZbu-8vc1-PYsc-Snhf-f5qNWk
Next we’ll create a logical volume using
lvcreate. We can pass a size using the
root@ip-172-30-0-151:~# lvcreate -L100G demo Logical volume "lvol0" created.
You can see above LVM has created a new volume and named it for us. We can have it use our own name by supplying the
-n flag. We’ll probably want this most of the time:
root@ip-172-30-0-151:~# lvcreate -L100G -n mysecondlv demo Logical volume "mysecondlv" created.
When we view the logical volumes with
lvs, we see the two volumes just created:
root@ip-172-30-0-151:~# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 demo -wi-a----- 100.00g mysecondlv demo -wi-a----- 100.00g
Now that we have a logical volume, we can put a filesystem on it. Let’s use XFS:
root@ip-172-30-0-151:~# mkfs.xfs /dev/demo/mysecondlv meta-data=/dev/demo/mysecondlv isize=512 agcount=4, agsize=6553600 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=12800, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
I’ll keep things simple and mount the volume at
/root/myvolume. Note the total space reported by
df output (trimmed for readability):
root@ip-172-30-0-151:~# mkdir myvolume root@ip-172-30-0-151:~# mount /dev/demo/mysecondlv myvolume root@ip-172-30-0-151:~# df -h Filesystem Size Used Avail Use% Mounted on udev 15G 0 15G 0% /dev tmpfs 3.0G 8.6M 3.0G 1% /run /dev/xvda1 7.7G 847M 6.9G 11% / ... /dev/mapper/demo-mysecondlv 100G 33M 100G 1% /root/myvolume
We can remove the first volume (the one we let LVM name) easily using
root@ip-172-30-0-151:~# lvremove /dev/demo/lvol0 Do you really want to remove and DISCARD active logical volume lvol0? [y/n]: y Logical volume "lvol0" successfully removed
Expanding a Volume
We have a ton of free space on our demo volume group. Let’s give our filesystem a little more space to work with. The
lvextend command lets us grow a volume. We can specify a relative size with the
-L flag by prefixing a size with a
+. For instance, we can grow the LV by 50GB by doing the following:
root@ip-172-30-0-151:~# lvextend -L +50G demo/mysecondlv Size of logical volume demo/mysecondlv changed from 100.00 GiB (25600 extents) to 150.00 GiB (38400 extents). Logical volume mysecondlv successfully resized.
We’ve increased the volume size but the filesystem won’t know to take advantage of the new space. We can use
xfs_growfs to take over the rest of the available space. It’s an online operation, no need to unmount:
root@ip-172-30-0-151:~# xfs_growfs myvolume meta-data=/dev/mapper/demo-mysecondlv isize=512 agcount=4, agsize=6553600 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=12800, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 26214400 to 39321600
df again we see our filesystem has increased capacity:
root@ip-172-30-0-151:~# df -h Filesystem Size Used Avail Use% Mounted on udev 15G 0 15G 0% /dev tmpfs 3.0G 8.6M 3.0G 1% /run /dev/xvda1 7.7G 847M 6.9G 11% / ... /dev/mapper/demo-mysecondlv 150G 33M 150G 1% /root/myvolume
Now we’ve gone through the exercise of creating physical volumes, volume groups, and logical volumes. Let’s remove the volume group we just created using
vgremove. Note we have to unmount the volume first. If we don’t, LVM will complain:
root@ip-172-30-0-151:~# umount myvolume root@ip-172-30-0-151:~# vgremove demo Do you really want to remove volume group "demo" containing 1 logical volumes? [y/n]: y Do you really want to remove and DISCARD active logical volume mysecondlv? [y/n]: Do you really want to remove and DISCARD active logical volume mysecondlv? [y/n]: y Logical volume "mysecondlv" successfully removed Volume group "demo" successfully removed
At this point, we’ve created (and removed) physical volumes, volume groups, logical volumes, and put a filesystem on a LV. We’ve also expanded the filesystem on the fly, which can be pretty handy.
Using SSD Cache with Spinning Rust
Let’s take a look at something a little more complex. Next we’ll create a logical volume on a spinning disk, using the SSD to cache the most frequently used blocks. Then we’ll tie it all together and create a filesystem.
First we’ll ensure we have two physical volumes. We’ve only used the NVMe drive so far, so I’ll go ahead and prepare the slower EBS volume (referred to as origin) for use:
root@ip-172-30-0-151:~# pvcreate /dev/xvdf Physical volume "/dev/xvdf" successfully created
Note: Yes, I am using a smaller drive for my origin volume than the cache volume. This is only to save on cash in case I forgot to destroy it later. Normally your origin volume is much larger than your cache.
Then we’ll need to create a new volume group with the two physical volumes added to it. Using LVM’s block caching requires all of the volumes to be in the same volume group.
root@ip-172-30-0-151:~# vgcreate demo /dev/nvme0n1 /dev/xvdf Volume group "demo" successfully created
vgdisplay command can tell us a lot about the volume we’ve just created. Note the two physical volumes at the end.
root@ip-172-30-0-151:~# vgdisplay -v Using volume group(s) on command line. --- Volume group --- VG Name demo System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 1 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 2 Act PV 2 VG Size 542.37 GiB PE Size 4.00 MiB Total PE 138847 Alloc PE / Size 0 / 0 Free PE / Size 138847 / 542.37 GiB VG UUID pxg3mf-Tdko-om17-Hs66-R4uZ-0MaG-2xoo19 --- Physical volumes --- PV Name /dev/nvme0n1 PV UUID HsreBN-6Low-fygm-mCWC-cAXe-NJrl-21PYwU PV Status allocatable Total PE / Free PE 113248 / 113248 PV Name /dev/xvdf PV UUID Mrltt7-BBi2-1ded-dRAQ-98GA-dsXc-fsaHQs PV Status allocatable Total PE / Free PE 25599 / 25599
Now that we have our two disks in the volume group, we can set up the cache and origin (slow disk). First, create the volume for the origin. Note that I’m explicitly specifying the slower drive,
/dev/xvdf for my origin:
root@ip-172-30-0-151:~# lvcreate -n slow -L80G demo /dev/xvdf Logical volume "slow" created.
For the cache, we’ll need two volumes. One for the cache itself, and one for the cache metadata. According to the man page
The size of this LV should be 1000 times smaller than the cache data LV, with a minimum size of 8MiB.
I was feeling lazy so I used convenient numbers. Yes, I’m wasting space:
root@ip-172-30-0-151:~# lvcreate -n cache -L20G demo /dev/nvme0n1 Logical volume "cache" created. root@ip-172-30-0-151:~# lvcreate -n meta -L 1G demo /dev/nvme0n1 Logical volume "meta" created.
All three volumes can be seen by
root@ip-172-30-0-151:~# lvs -a demo LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert cache demo -wi-a----- 20.00g meta demo -wi-a----- 1.00g slow demo -wi-a----- 80.00g
We need to tell LVM to create a cache pool.
lvconvert is the command we’ll use for that. We tell LVM which volume to use as our cache and which to use as our meta:
root@ip-172-30-0-151:~# lvconvert --type cache-pool --poolmetadata demo/meta demo/cache WARNING: Converting logical volume demo/cache and demo/meta to pool's data and metadata volumes. THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.) Do you really want to convert demo/cache and demo/meta? [y/n]: y Converted demo/cache to cache pool.
Now that we’ve converted our cache and meta volumes into a cache pool, they’ll no longer show up when we use
lvs alone. We’ll need to pass the
root@ip-172-30-0-151:~# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert cache demo Cwi---C--- 20.00g [cache_cdata] demo Cwi------- 20.00g [cache_cmeta] demo ewi------- 1.00g [lvol0_pmspare] demo ewi------- 1.00g slow demo -wi-a----- 80.00g
Next we associate the cache pool with the slow volume:
root@ip-172-30-0-151:~# lvconvert --type cache --cachepool demo/cache demo/slow Logical volume demo/slow is now cached.
Now we can create our filesystem:
root@ip-172-30-0-151:~# mkfs.xfs /dev/demo/slow meta-data=/dev/demo/slow isize=512 agcount=16, agsize=1310704 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0 data = bsize=4096 blocks=20971264, imaxpct=25 = sunit=16 swidth=16 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=10240, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
We’ll mount the drive somewhere convenient:
root@ip-172-30-0-151:~# mkdir whatever root@ip-172-30-0-151:~# mount /dev/demo/slow whatever/ root@ip-172-30-0-151:~/whatever# df -h Filesystem Size Used Avail Use% Mounted on udev 7.5G 0 7.5G 0% /dev tmpfs 1.5G 8.5M 1.5G 1% /run /dev/xvda1 7.7G 848M 6.9G 11% / tmpfs 7.5G 0 7.5G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 7.5G 0 7.5G 0% /sys/fs/cgroup tmpfs 1.5G 0 1.5G 0% /run/user/1000 /dev/mapper/demo-slow 80G 33M 80G 1% /root/whatever
At this point we now have the ability to leverage the cost effectiveness of large slow spinning drives, while getting the performance of SSDs for the data we access most frequently. There’s quite a bit more to LVM to explore, much more than I can cover in a single coherent post. In a future post, I’ll show how to create more complex disk arrangements, take snapshots, and benchmark your configuration. If this post has been helpful, please reach out on Twitter, I’m @rustyrazorblade!If you found this post helpful, please consider sharing to your network. I'm also available to help you be successful with your distributed systems! Please reach out if you're interested in working with me, and I'll be happy to schedule a free one-hour consultation.