Filesystems

Ben Clifford

benc@hawaga.org.uk

www.hawaga.org.uk/ben/tech/raspberry-pint-filesystems

Files

Storage devices

microsd

Read 512-byte block from storage device
Modify bytes
Write 512-byte block to storage device

filesystems

$ df -hT / /boot
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/root      ext4   29G  5.7G   23G  21% /
/dev/mmcblk0p1 vfat  253M   40M  214M  16% /boot

There are lots of different types of filesystems, with different properties. On a regular Raspberry Pi OS install, the SD card is split into two sections, using two different kinds of file systems. The "root" filesystem is most of the disk, and is an ext4 filesystem. The "boot" filesystem is much much smaller, and uses vfat.

mountpoints

$ df -hT / /boot
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/root      ext4   29G  5.7G   23G  21% /
/dev/mmcblk0p1 vfat  253M   40M  214M  16% /boot

$ ls /etc/password
$ ls /boot/config.txt

One thing in the output of that df command is the mount point. This is do with identifying which filesystem a particular path and filename refers to. A unix filesystem path looks like:


/home/benc/src/example/hello.py

a series of directory names and then a filename at the end. How does the Pi know which filesystem this file is talking about? Well, "normally" we start at the root filesystem, and we go down the path names, and end up in a directory and store the file there, on the root filesystem. BUT! if we hit a "mount point" (such as /boot) then instead of going into a directory on the root filesystem, we move into the "mounted" file system. So with the df display I showed before, anything under /boot would be on the small boot file system - for example, there's a file /boot/config.txt which is often used to change the pi bootup configuration.

(many) filesystems

pi@tyne:~ $ df -hTa
Filesystem     Type         Size  Used Avail Use% Mounted on
/dev/root      ext4          29G  5.7G   23G  21% /
devtmpfs       devtmpfs     459M     0  459M   0% /dev
sysfs          sysfs           0     0     0    - /sys
proc           proc            0     0     0    - /proc
tmpfs          tmpfs        464M     0  464M   0% /dev/shm
devpts         devpts          0     0     0    - /dev/pts
tmpfs          tmpfs        464M   47M  417M  11% /run
tmpfs          tmpfs        5.0M  4.0K  5.0M   1% /run/lock
tmpfs          tmpfs        464M     0  464M   0% /sys/fs/cgroup
cgroup2        cgroup2         0     0     0    - /sys/fs/cgroup/unified
cgroup         cgroup          0     0     0    - /sys/fs/cgroup/systemd
cgroup         cgroup          0     0     0    - /sys/fs/cgroup/pids
cgroup         cgroup          0     0     0    - /sys/fs/cgroup/net_cls
cgroup         cgroup          0     0     0    - /sys/fs/cgroup/cpu,cpuacct
cgroup         cgroup          0     0     0    - /sys/fs/cgroup/memory
cgroup         cgroup          0     0     0    - /sys/fs/cgroup/blkio
cgroup         cgroup          0     0     0    - /sys/fs/cgroup/freezer
cgroup         cgroup          0     0     0    - /sys/fs/cgroup/cpuset
cgroup         cgroup          0     0     0    - /sys/fs/cgroup/devices
sunrpc         rpc_pipefs      0     0     0    - /run/rpc_pipefs
systemd-1      -               -     -     -    - /proc/sys/fs/binfmt_misc
mqueue         mqueue          0     0     0    - /dev/mqueue
debugfs        debugfs         0     0     0    - /sys/kernel/debug
configfs       configfs        0     0     0    - /sys/kernel/config
/dev/mmcblk0p1 vfat         253M   40M  214M  16% /boot
/dev/sdb1      ext4         1.8T  1.7T   96G  95% /mnt
tmpfs          tmpfs         93M     0   93M   0% /run/user/1000
binfmt_misc    binfmt_misc     0     0     0    - /proc/sys/fs/binfmt_misc

If I use the -a parameter to df I can see all the mountpoints on my pi. There are lots of them, and mostly we can ignore them.

default filesystems

vfat

ext4

So let's go back to the two types of filesystem that are in use on a Raspberry Pi OS SD card install:

$ df -hT / /boot
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/root      ext4   29G  5.7G   23G  21% /
/dev/mmcblk0p1 vfat  253M   40M  214M  16% /boot

One is ext4, one is vfat. What and why the difference? Remember I said that storage is a bunch of fixed size blocks that somehow have to be organised so that they look like files? The filesystem is the mechanism by which that happens. vfat uses and combines blocks in one particular way; ext4 does that in a different way. vfat is a variant of the FAT filesystem - this is a format that goes back to around 1980, is pretty simple to implement, and has support on lots of different operating systems: so you can access the /boot filesystem easily from a mac or a PC, which is good if you want to edit the basic boot configuration of your Pi from there. This is the standard format that you'll find on almost any removable media like a USB stick or SD card, whether it's going in a big computer, or a Pi, or a phone, or a digital camera. ext4 is one of a series of similar filesystems, (earlier ones are ext3, ext2...) which are in some sense the "native" filesystem of linux. It has features that vfat doesn't have - such as tracking which users own which files, and which users are allowed to read and modify files, and features to make it faster and more reliable. So if you're making a filesystem that is only going to be used on one linux system (like the internal root filesystem for a Pi), this is a good choice. But you would have trouble reading it on a mac or a PC, and slightly strange things can happen if you mount it on a different linux system.

FAT on small devices

other disk file systems

zfs
btrfs

multiple block devices. snapshots.

network file systems

sshfs

$ mkdir ~/mnt/tyne
$ sshfs pi@tyne.cqx.ltd.uk:/home/pi ~/mnt/tyne
$ df -hT /home/benc/mnt/tyne/
Filesystem                  Type        Size  Used Avail Use% Mounted on
pi@tyne.cqx.ltd.uk:/home/pi fuse.sshfs   29G  5.7G   23G  21% /home/benc/mnt/tyne

other network filesystems

NFS - unix tradition
samba - windows tradition

Network Filesystem downsides

Software expects file access to be:

fast

reliable

The network is often not those things.

media devices

~ $ jmtpfs ./phone
Device 0 (VID=2717 and PID=ff40) is a Xiaomi Mi-2s (id2) (MTP).
Android device detected, assigning default bug flags

~ $ df -h ./phone
Filesystem      Size  Used Avail Use% Mounted on
jmtpfs          708M -2.7G  3.4G    - /home/pi/phone

~/phone/Internal shared storage/DCIM/Camera $ cp IMG_20210223_17* ~/tmp/p/

LEDs-as-filesystem

$ df -h /sys
Filesystem      Size  Used Avail Use% Mounted on
sysfs              0     0     0    - /sys

$ cat /sys/class/leds/led0/trigger
none rc-feedback kbd-scrolllock kbd-numlock kbd-capslock
kbd-kanalock kbd-shiftlock kbd-altgrlock kbd-ctrllock
kbd-altlock kbd-shiftllock kbd-shiftrlock kbd-ctrlllock
kbd-ctrlrlock timer oneshot heartbeat backlight gpio cpu
cpu0 cpu1 cpu2 cpu3 default-on input panic mmc1 [mmc0]
rfkill-any rfkill-none rfkill0 rfkill1

# echo none > trigger
# while true; do
    echo 255 > brightness ;
    sleep 0.2 ;
    echo 0 > brightness ;
    sleep 0.8 ;
  done

So i've talked about filesystems that are backed by block devices like hard drives, and I've talked about filesystems that are backed by other filesystems on other computers. What else could a filesystem be backed by? Well, the filesystem code can do whatever it wants - there is no requirement to have some actual thing that you would think of as a place to put files. For example, on a Pi you can interact with the onboard LEDs via the filesystem: There's a directory /sys/class/leds/led0/ which doesn't exist on any disk - it's part of the sysfs filesystem: We can ask for the list of possible triggers: $ cat /sys/class/leds/led0/trigger none rc-feedback kbd-scrolllock kbd-numlock kbd-capslock kbd-kanalock kbd-shiftlock kbd-altgrlock kbd-ctrllock kbd-altlock kbd-shiftllock kbd-shiftrlock kbd-ctrlllock kbd-ctrlrlock timer oneshot heartbeat backlight gpio cpu cpu0 cpu1 cpu2 cpu3 default-on input panic mmc1 [mmc0] rfkill-any rfkill-none rfkill0 rfkill1 [todo: actually flash LED using a bash for loop, and embed a small video here]

let a thousand filesystems bloom

Write code in linux kernel
Filesystem in User Space (FUSE)

I've shown three very different places for filesystems to "store" their content: block devices, somewhere over the network, and making files out of some hardware. Maybe that leads to the question: what else could there be? The df display that I showed earlier had a pile of other file systems in use. It's fairly straightforward to implement a new filesystem using a tool called FUSE - Filesystem in User SpacE - where you can write the guts of your filesystem in one of several languages and have them mount like any other. Those guts can do whatever you can imagine and code up. Some of the filesystems I've mentioned (eg mtp and sshfs) are implemented using fuse. Others are compiled into the kernel.

overlayfs: make multiple directories appear as one
encfs: transparently encrypts files on disk
tmpfs: stores files in memory (what in the 1980s would be called a RAM disk)
proc: Exposes lots of linux internals under /proc
ntfs: mount Windows NTFS filesystems
iso9660: access files on CD-ROMs
cernvmfs: for distributing software installations globally
exFAT: like FAT but more modern features eg >4Gb file size

- end -

more notes

so i make files and they get stored on my Pis internal SD card.
except if i plug in a USB stick it appears in this folder here:
and somehow that... doesn't get stored on the Pi internal SD card?

(at this point, gentle intro to mount points, and perhaps `df` command line utility
with -h parameter (but no -T?) - just show mountpoints, size and block device)

now can introduce a diagram perhaps of tools / VFS / filesystems / block device
- deliberately simplify to exclude the non-block-based filesystems at this stage

discuss briefly what a block device is.

filesystems:
Two that you'll usually see in use on a bog-stanard Pi installed "normally":
ext4 (3,2...)
fat (variants: vfat, exafat) - bit of history there. especially note that on a pi, /boot is a fat filesystem. get some old original PC picture? FAT stands for "file allocation table". Windows stream moved onto NTFS. but fat has stuck around as a fairly simple filesystem to implement that is used commonly on removable media - eg digital cameras. phones. even programmign a BBC microbit or a Pi Pico (pi pico especially relevant to Raspberry Pi meetup, using somewhat weird https://github.com/Microsoft/uf2 format).
(show df -h with a microbit plugged in and a USB stick plugged in, and a pi pico plugged in (or two of them!), on a Pi? along with a photo of that physically - **even** MY SOLDERING IRON!)

Discuss unix permissions/ownership, and point out that regular FAT doesn't have these - its history is as a single user filesystem.

other filesystems ... that don't use block devices
network filesystems: doesn't need to use a block device - the files could be coming from some other computer, not from a block device.
two examples of that I use a lot are NFS and SSHFS - they look very different.
SSHFS especially interesting to me as a very low end way of getting from my linux laptop
to edit files on remote systems.
* request from richard to know about samba (although not sure if I can easily demo that?
perhaps I can set up a samba server on my laptop?)

and ... they don't even need to have any backing store that looks like storage.
The classic example of that is /proc: df -hT /proc
but on the pi, can see things like GPIO pins: (that mode where you read from a file and
see GPIO pin state is a nice Pi specific example of that)

It doesn't have a backing store (it just says proc) and it doesn't have a size. this is a
filesystem that makes every process running on your Pi appear as a directory, as well as
various other bits of operating system specific stuff
(give a couple of examples: eg a process one, and /proc/loadavg - one process based and one system-wide)
(or a process one, and a GPIO /sys one?)
* specific example: can I toggle the LED on a Pi Zero by saving in a text editor?
(assuming people can see my camera in speaker view)

Also interestings:
* overlayfs - used to create a writeable overlay on top of a read only filesystem. This technique is something that you might do 20 years ago with mounting a CD-ROM of a linux distro and then letting you edit things locally; more recently thats how container images work.
* andrew mentioned ZFS - properties of this? (is it available on Pi?)
* also 'btrfs' - see bodil stokke's build a pi thing that came up on my feed - eg start here: https://twitter.com/bodil/status/1349091913274679301
* FUSE is one way to build your own filesystem implementation (can I get a 1 screen filesystem?)
- especially note that I'm a big fan of plugin architectures in general: build a core, build a clean-ish interface, plug stuff in across that interface.
- SSHFS is actually implemented on top of fuse.
- I also use encfs on top of fuse
- plenty of other exampels here: https://en.wikipedia.org/wiki/Filesystem_in_Userspace
* another way to write a filesystem is to implement it in the kernel - lower level
* mention in passing iso9660 - CD-rom filesystem
* CERN VM FS

example of what FAT looks like on disk, briefly?

discuss partitions somewhere: so that a block device like an SD card is treated as several block devices. show /proc/partitions. /dev/sda /dev/sda1 (or mmblkc etc)

mention "special" nodes like symlinks, device nodes, etc - without going into detail too much.
but also mention that these are traditionally unixy so non-unixy filesystems like vfat don't
deal with them. example of device nodes: the partitions (which i probably will have shown by now)
and serial ports - for example, /dev/ttyAMA0 which if you try to use the Pi UART pins, you
might have encountered.

mention extended attributes - merely in passing. "I don't encounter a lot of use for them". the main example is rsync fake root.

* "why use different filesystems on block devices?" - because of different properties - ZFS "interesting large scale stuff". FAT - broad compatibility across devices. In a Pi context, if I wanted to be able to put a config file on my SD card in a form that I could edit on almost any computer, making a FAT partition and putting it on that partition would be the way to do it.

* downsides of network file systems: they try to make something remote, subject to things like connection failure, remote server reboots, etc, work like something that is directly connected: so occasional weird problems: eg if I write to a remote filesystem (i.e. press save in my text editor), and that server has crashed, should the local system: a) wait (for days?) until that remote system is restarted or b) return an error right away? (neither is the right answer in all cases...). There are problems with distributed systems that need to be addressed, and using the filesystem interface isn't always the right way to address them.

* when demoing fuse, show that this is an example of what the kernel might ask of any filesystem
- so show a few (not many) filesystem-like API calls.

* THINGS FOR ME TO PRACTIALLY LEARN FOR THIS TALK AND THEN TALK ABOUT MY ADVENTURES:

* install ZFS on a pi (?)
* write a basic fuse filesystem (can I make it do something interesting?) -- perhaps something to talk to the micropython filesystem on a Pi Pico (separate from the bootload FAT filesystem)...
* cernvmfs

* "mount" command - talk about how lots of desktop environments will do this mount automatically so a lot of the time if you plug it in it will appear somewhere automatically - put that alongside "df" as "some commands we can use to do stuff" (and umount too).

* TODO: can I format this as a blog post/article (series?) primarily and present that way? so that there is a better lasting artefact than a slide set / video? (or really make the slide set "online viewing first, presentation second?"

* make a "christmas tree" slide which is my pi with as many different filesystems mounted on it as I have talked about - or many, at least - df -hT output

* show NFS between VMs on my linux VMs (not Pi, though, but still an example)

* should give some example of layout on disk: FAT might be an interesting one to learn about because its used on SD cards a lot and not complicated - so relevant to the microcontroller side of things. - point is "here's a layout of things on disk, but what you see is 'files' - what hides this layout from you is the filesystem code"

* need to be clear when I say filesystem, do I mean "the driver code (in kernel or FUSE)" or "the conceptual layout of things on disk" or "an instance of that conceptual layout on a particular block device instance/loopback"

* mention loopbacks as block devices

* mention "mount" command, and mention that if you have a desktop environment installed
then often stuff will be mounted automatically for you.