I always liked the idea of everything being a file. It’s a simple and elegant abstraction that makes a lot of sense. In this post, I’ll shortly explain what contemporary systems mean by “file”, then I’ll try to explain what I’d like files to be, and finally I’ll try to explain why I think that it is a good idea.

What is a file?

In many systems, a file is a sequence of bytes that can be read and written. It can be stored on a disk (using a file system), it can be stored in volatile memory, or it can be generated on the fly. Some files are special in that they represent a directory (a collection of files). Some are block devices, such as a hard drive. Others can be char devices (e.g. a serial port), sockets, named pipes, and I am sure there are more.

In these systems, files (except directory files) are opaque to the system. The system doesn’t know what is inside a file, it just knows its name, length, and a few more attributes. The system is not aware of the structure of the file, it doesn’t know what the file represents, and it doesn’t know what operations can be performed on the contents of the file.

This is ok in many cases. Except when it’s not.

When is it not ok?

Let’s take a look at some examples, each personally experienced by me.

Video player

Let’s say you want to play a video. You open a video player, you select a video file, and you press play. The video player usually reads the file, decodes the video data, and sends the decoded frames to the compositor.

Except, now it doesn’t. The video player doesn’t support the video format you need. But what if this video player is your favorite one, and you really don’t want to use any other? You try to look up some plugins for it, but alas, this format simply is not yet supported. You are out of luck. You have to use a different video player.

Driver issues

Finally, you have switched to a new video player, open the video file, and press play, and it works! But the video is stuttering. You check the expert settings, and you see that the video player is not using hardware acceleration because of some driver issues that are not present in other players. Again, you are out of luck.

Image editor

You need to edit an image, so you open an image editor of your choice that you paid for, because you really, really like it. You try to open the image file, but some error pops up. You try to open a different image and get the same error. You try to open it in a different image editor, and it works. You try to open the image in your favorite image editor again, but get the same error. Now what? You could try to convert the file to a different format in the second editor, but you would lose some information, e.g. layer names, or color space. You contact the support, they’re really sorry, but they can’t help you. The bugfix is scheduled for the next release, but it’s not ready yet. You have to use a different image editor.

File archiver

You need to decompress an archive, so you open a file archiver of your choice, open the archive, and press extract. A folder picker pops up, you select the destination, and press ok, but the archiver dies a horrible death. Why? The folder picker in your desktop environment supports opening network shares, but the archiver doesn’t know what to do with a path that starts with smb://. You have to extract the files locally and then copy them to the share.

Cross-file references

You have a project with a bunch of files. Some of them are source files, some of them are assets, and some of them are configuration files. Some of them reference other files, e.g. a source file references an asset, or a config file references another config file. You have installed some tools. You have a linter, a formatter, a test runner, a bundler with many intermediate steps, and so on. The usual frontend stack. You have configured the bundler that when it encounters a reference to an asset, it should copy the asset to the output directory. But the linter does not have this option. It stubbornly refuses to lint the source file because it says that the reference file cannot be parsed as a source file. You have to use a different linter.

Performance

Finally, after you change the linter, you have set up every tool you need to work efficiently. You can now focus on your project. You write some code, hit save, and the fans start spinning. Why is that? You check the CPU usage and see that every tool you have configured noticed that you have changed some files and started parsing everything from scratch. You see, the tools don’t know about each other. They don’t know that they don’t need to parse the same files multiple times. If you want to get rid of this nonsense, you have to use a single tool that can do everything you need at once, but there is no such tool.

What do all these examples have in common?

In all these examples, the tools being used are required to implement every file format or protocol you want to work with. There is no such thing as a generic video player, a generic image editor, a generic file archiver, or a generic frontend stack. Implementing a new file format or file transfer protocol is a lot of work, and it’s not something that every tool can afford to do.

Some desktop environments support automatically mounting network shares as directories, but this is hacky and clunky at best. It’s not a generic solution, it’s just a workaround for a specific use case. What if the application used an input field instead of a folder picker? The automatic mounting would not work, and the application would have to implement the network share support itself.

The proposed solution

From now on, I’ll refer to file formats and protocols collectively as protocols, since file formats are just one-way protocols.

I propose that the system provides the protocols required by any application. Much like the directory, the system should know what is inside a file, list its contents, and perform operations on them, be it a media file, disk image or a network share.

Like:

# ls
file video.mp4   video (mpg4)
# ls video.mp4/
attr width       1920
attr height      1080
attr frame rate  60
dir  frames      123456
# ls video.mp4/frames/ | head -n 1
file 1           video frame
# ls video.mp4/frames/1 | head -n 1
file 1,1         color
# ls video.mp4/frames/1/1,1
attr red         0
attr green       20
attr blue        40

There should be a single implementation of each protocol, and every tool should be able to use it. If there is an issue with the implementation, it should be fixed in one place, and every tool should benefit from it. If there is a new source code dialect, compression format, media format, or anything else, it should be implemented once, and every tool should be able to use it. Similarly to a directory, the user should be able to list the contents of a file, and work with individual items inside the file.

For me, a file is not just a sequence of bytes that can be read and written. For me, every file has some structure and a set of operations that can be performed on it. For example, an image file has fields like width and height and pixel data, and it has operations like rotate, crop, and resize. It can also have layers and effects, and you can perform operations on those too. A video file has a frame rate, a sequence of frames, and operations like clip and merge. A text file has a sequence of characters and operations like search, replace, and spell check.

If you want to clip a video file, you don’t strictly need a video editor. You can use it, because it drastically simplifies the process, but you can just list the contents of the video file, find the frames you need, and write them to a new file:

# mkfile target.mp4 --like source.mp4
# cp -r source.mp4/frames/[0..1024] target.mp4/frames/

If you want to draw a circle on a screenshot, you don’t need an image editor. Again, you can, it’s easier, but you can just list the image pixel data, find the pixels that are inside the circle, and change their color.

Files can even be materialized on behalf of other files. The source code issue mentioned above can be solved by materializing the AST from the source code file. The linter can then just list the contents of the AST file and lint the AST nodes, irrespective of the source code dialect. The same goes for multiple tools watching for changes and recompiling/running tests/etc. The system materializes everything that is needed at most once. I know I am being very hand-wavy here, but I yet have to figure out the details.

# cd my-project
# ls
file source.js      source code (JavaScript)
dir  source.js.ast  abstract syntax tree [materialized]
# ls source.js.ast/
attr language       JavaScript
attr version        ES2021
attr mode           strict
attr source         source.js
file 1              abstract syntax tree node (Program)
# ls source.js.ast/1
attr type           Program
dir  body           123
# ls source.js.ast/1/body
attr type           Program
file 1              abstract syntax tree node (ImportDeclaration)
file 2              abstract syntax tree node (ImportDeclaration)
... 121 more entries
# ls source.js.ast/1/body/1
attr type           abstract syntax tree node (ImportDeclaration)
file 1              abstract syntax tree node (ImportKeyword)
file 2              abstract syntax tree node (Identifier)

Hardware resources and system internals

I think it’s a good idea to have access to all and any hardware resources as files. For example, if you want to capture a screenshot, you do not need a tool specifically designed for your desktop environment. Instead, you can just copy the contents of the screen buffer file to an image file. Similarly, if you want to cast your screen, you can just pipe the screen buffer file to a network socket:

# cat /dev/screen/1 | mp4 | tcp my-phone:1234 

The same applies to basically any computer interface: where there is a place to read or write, receive or send, there is a file and a protocol implementation.

Even more, I think that applying this idea to system internals could be a game changer. I think that both the user and the system should be able to introspect and edit any part of the system. You do not need any special tools to see what is happening inside the kernel, you can just list the contents of a special kernel file and see any objects that are currently in the kernel memory. You can see the state of any TCP connection, you can see all allocated memory pages and their purpose, you can see the state of the scheduler and the reason for it, and so on. Because, if I squint a bit, I can see that an allocated resource is just a materialized file on behalf of some request file.

# cd /kernel/net/tcp
# ls | head -n 2
file 1  tcp connection (local:24568 -> remote:80, state:ESTABLISHED)
file 2  tcp connection (local:24569 -> remote:443, state:SYN_SENT)
# cd /kernel/mem/phys
# ls | head -n 2
file 1  memory page (address:0x00000000, size:4KiB, purpose:null page)
file 2  memory page (address:0x00001000, size:4KiB, purpose:kernel stack)
# cd /kernel/sched
# ls | head -n 1
file 1  process (pid:1, state:RUNNING, priority:0)
# cd /kernel/pci
# ls | head -n 1
file 1  pci device (vendor:0x8086, device:0x1234, class:network controller)

All of the above together then allows for some interesting use cases. For example, you can introspect the kernel on remote hosts and make amends if required, all without any specialized tools. Or you can have structured logs, where each log entry is a file that can be introspected with complete snapshot of the system at the time of the log entry.

# cd /kernel/crash/2023-09-29T22:42:02+02:00
# ls | head -n 1
file 1  kernel dump (message:oops, reason:division by zero)

It’s codecs all the way down

Ideally, all this would be implemented as a set of codecs. This means that for each protocol there is one codec that every tool can use, and for each use-case there is one tool that can utilize any codec that is relevant for the job.

Then, building a video player is just a matter of writing a tool that can pipe video frames to a window file. The compositor then pipes the window file to the screen buffer file. As for hardware acceleration, remember that file contents can have references to other files, and files don’t necessarily need to live in system memory, which means that either

  • decoding is done on GPU and the video data never has to leave the GPU memory, or
  • decoding is done on CPU and the decoded pixels are copied to the GPU memory and the GPU does the compositing, or
  • there is no GPU and both decoding and compositing are done on CPU.

The nice thing about codecs is that they can be chained. If the video source is on a remote host, the video player can pipe from it to the decoder as usual, and since the “remote file” is in fact a codec pipeline itself from the network card through the network stack to system memory, everything works the same way as if the video was local.

If the window file is actually a network socket to a remote host, the compositor on the remote host can pipe from it to the screen buffer as usual. This allows for scenarios where video needs to be transcoded on a different machine due to performance reasons, or where the video player is on a different machine than the screen.

I am using media files here as example but the same goes for basically any data type or protocol. If you want to send an email, you just instantiate an SMTP codec, pipe the email file to it, and then pipe the codec output to a network socket: cat email | smtp --send | tcp smtp.example.com:25

Security

If everything is a file, then we can apply the same security mechanisms to everything. We can set permissions on individual pixels on a screen, on individual tokens in a source file, on individual remote hosts, and so on.

This allows for e.g. masking sensitive information in a screenshot or while sharing screen. If you ever need to ban a specific domain or IP address for other users, you can just set permissions on the corresponding files and the system will make sure that the ban is enforced, no matter what application is trying to access them.

Also, if your source file contains sensitive information, you can set permissions on specific lines or tokens, and the system will make sure that only authorized users can access them.

Conclusion

This could drastically simplify the development of basically any software, because you don’t have to worry about things that someone else already solved. You just pipe the data to the right files and the system does the rest.

Also, if there’s only one implementation of the codec for a given protocol, there’s much more incentive to make it as good as possible, as performant as possible, and as bug-free as possible.

I think that this could be a game changer, even if it’s just a thought experiment for now, and I am excited to see where this idea will take me while trying to implement it in my operating system.