780 lines
18 KiB
Markdown
780 lines
18 KiB
Markdown
# Metanix: Design Document (WIP)
|
||
|
||
_Declarative infrastructure, but with fewer repeated DSL war crimes._
|
||
|
||
---
|
||
|
||
## 1. High-Level Overview
|
||
|
||
**Metanix** is a Nix library / flake that generates NixOS configurations and related infrastructure from a higher-level “world description” file, `meta.nix`, plus a `policy` section.
|
||
|
||
Instead of hand-writing:
|
||
|
||
- a zillion host-specific `configuration.nix` files
|
||
- DHCP, DNS, firewall rules
|
||
- user definitions
|
||
- dotfiles and home-manager configs
|
||
|
||
…you describe:
|
||
|
||
- **locations**, **subnets**, and **hosts**
|
||
- **systems** that correspond to those hosts
|
||
- **global identity and policy** (users, groups, ACLs, shared configs)
|
||
|
||
Metanix then:
|
||
|
||
- infers IPs, roles, and trust relationships
|
||
- builds NixOS configs for each system
|
||
- wires DNS, DHCP, WireGuard, firewalls, and home-manager for users where applicable
|
||
|
||
You still _can_ hand-write Nix when you care about specifics. You just don’t have to for the 90% of boilerplate that machines are objectively better at than you.
|
||
|
||
---
|
||
|
||
## 2. Goals & Non-Goals
|
||
|
||
### Goals
|
||
|
||
- **Reduce boilerplate**
|
||
Generate as much as possible from a high-level description of the world.
|
||
|
||
- **Deterministic global identity**
|
||
Users and groups have consistent UIDs/GIDs across all managed systems.
|
||
|
||
- **Declarative RBAC & network trust**
|
||
Identity and access control defined once, applied consistently to:
|
||
|
||
- firewalls
|
||
- services
|
||
- admin surfaces
|
||
|
||
- **Location-aware infrastructure**
|
||
Use `locations` and `hosts` to drive:
|
||
|
||
- IP addressing
|
||
- control-plane vs data-plane
|
||
- which systems are “upstream” vs “downstream”
|
||
|
||
- **Home-manager integration**
|
||
User environments (dotfiles, tools, browser setup) managed from policy, not from random snowflake configs.
|
||
|
||
### Non-Goals
|
||
|
||
- Replacing NixOS modules
|
||
Metanix composes and configures them; it doesn’t rewrite them.
|
||
|
||
- Being a one-click magic box
|
||
This is not “paste YAML, receive Kubernetes.” You’re still expected to understand your network and your systems.
|
||
|
||
- Hiding complexity at all costs
|
||
The complexity is still there. Metanix just centralizes it so you can reason about it.
|
||
|
||
---
|
||
|
||
## 3. Core Concepts
|
||
|
||
### 3.1 `meta.nix` structure (top level)
|
||
|
||
`meta.nix` is the main entrypoint to Metanix. Simplified structure:
|
||
|
||
```nix
|
||
{
|
||
domain = "kasear.net";
|
||
locations = { ... };
|
||
systems = { ... };
|
||
consumers = { ... }; # global defaults for resources
|
||
policy = { ... }; # identity, RBAC, shared configs
|
||
}
|
||
```
|
||
|
||
Metanix is the flake; `meta.nix` is the data.
|
||
|
||
---
|
||
|
||
## 4. Locations, Subnets, Hosts
|
||
|
||
Locations describe _where_ things live. Subnets describe _how_ they’re sliced. Hosts are the concrete entries inside those subnets.
|
||
|
||
### 4.1 Locations
|
||
|
||
Shape:
|
||
|
||
```nix
|
||
locations.<location> = {
|
||
owner = "yaro"; # default owner for this location
|
||
admins = [ "ops" ]; # location-wide admins
|
||
users = [ "monitor" ]; # location-relevant users
|
||
|
||
<subnet> = { ... };
|
||
};
|
||
```
|
||
|
||
Example:
|
||
|
||
```nix
|
||
locations.home = {
|
||
owner = "yaro";
|
||
admins = [ "ops" ];
|
||
|
||
main = { ... };
|
||
dmz = { ... };
|
||
iot = { ... };
|
||
};
|
||
```
|
||
|
||
Location-level identity is a _hint_:
|
||
|
||
- “These users are relevant here”
|
||
- “This person is probably in charge here”
|
||
|
||
Actual presence/privilege on a given system is resolved later via hosts and systems.
|
||
|
||
### 4.2 Subnets
|
||
|
||
Shape:
|
||
|
||
```nix
|
||
locations.<location>.<subnet> = {
|
||
vlan = 10; # optional
|
||
dhcp = { start = 1; end = 250; }; # optional
|
||
|
||
owner = "ops"; # overrides location.owner
|
||
admins = [ "sre" ];
|
||
users = [ "resident" ];
|
||
|
||
hosts = { ... };
|
||
};
|
||
```
|
||
|
||
Subnets:
|
||
|
||
- define per-VLAN semantics (e.g. main, dmz, iot)
|
||
- refine identity hints for systems in that subnet
|
||
- will eventually feed into IP allocation (e.g. via a deterministic scheme like mkIp)
|
||
|
||
### 4.3 Hosts
|
||
|
||
Hosts are **interfaces into contexts**, not necessarily 1:1 machines.
|
||
|
||
Shape:
|
||
|
||
```nix
|
||
locations.<location>.<subnet>.hosts.<hostname> = {
|
||
role = "router" | "server" | "adminWorkstation" | "coreServer" | ...;
|
||
hw-address = "aa:bb:cc:dd:ee:ff"; # optional
|
||
aliases = [ "fqdn" ... ]; # optional
|
||
interface = "eno2"; # optional
|
||
dns = false; # optional, default true
|
||
hostId = 42; # optional, special cases
|
||
|
||
# Identity hints in THIS CONTEXT ONLY:
|
||
owner = "yaro"; # host’s admin owner
|
||
admins = [ "ops" "sre" ];
|
||
users = [ "analytics" ];
|
||
};
|
||
```
|
||
|
||
Key points:
|
||
|
||
- One **system** can appear as **multiple hosts** across locations/subnets.
|
||
- Each host is “this system, as seen from this network plane.”
|
||
- Identity hints here are **per host context**, not global truth.
|
||
|
||
Examples:
|
||
|
||
```nix
|
||
# Home DMZ view of deimos
|
||
locations.home.dmz.hosts.deimos = {
|
||
role = "server";
|
||
hw-address = "10:98:36:a0:2c:b2";
|
||
interface = "eno2";
|
||
aliases = [ "kasear.net" "vpn.kasear.net" ... ];
|
||
owner = "yaro";
|
||
admins = [ "ops" ];
|
||
};
|
||
|
||
# Cloud DMZ view of same system
|
||
locations.cloud.dmz.hosts.deimos-cloud = {
|
||
role = "server";
|
||
interface = "wg0";
|
||
users = [ "analytics" ]; # non-admin plane
|
||
};
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Systems, Services, Resources, Consumers
|
||
|
||
Systems describe machines from Metanix’s point of view and how they connect to hosts, services, and resources.
|
||
|
||
### 5.1 Systems
|
||
|
||
Shape:
|
||
|
||
```nix
|
||
systems.<name> = {
|
||
tags = [ "router" "public" "upstream" "downstream" ];
|
||
location = "home";
|
||
subnet = "dmz";
|
||
|
||
# Host keys under locations.*
|
||
hosts = [ "deimos" "deimos-cloud" ];
|
||
|
||
# Optional system-level hints
|
||
# owner = "yaro";
|
||
# admins = [ "ops" ];
|
||
# users = [ "monitor" ];
|
||
|
||
services = { ... };
|
||
resources = { ... };
|
||
consumers = { ... };
|
||
|
||
configuration = ./systems/x86_64-linux/<name>/default.nix;
|
||
};
|
||
```
|
||
|
||
**Tags** are semantic hints / profiles. Examples:
|
||
|
||
- `router` – network edge / routing logic
|
||
- `public` – exposed to the internet
|
||
- `upstream` – config-plane / authoritative system (e.g. Kea/Knot/Unbound/WG server)
|
||
- `downstream` – router profile consuming upstream config-plane
|
||
|
||
Metanix modules will key off these tags to decide default behaviors (e.g. unbound in “upstream” mode vs stub-only).
|
||
|
||
### 5.2 Services
|
||
|
||
Services are basically “NixOS modules we want on this system.”
|
||
|
||
Shape:
|
||
|
||
```nix
|
||
services = {
|
||
<serviceName> = {
|
||
enable = true; # optional; presence can imply true
|
||
tags = [ "upstream" ]; # service-specific tags
|
||
config = { }; # free-form module options
|
||
};
|
||
};
|
||
```
|
||
|
||
Example:
|
||
|
||
```nix
|
||
services = {
|
||
headscale = { enable = true; config = { }; };
|
||
nginx-proxy = { enable = true; config = { }; };
|
||
nginx = { enable = true; config = { }; };
|
||
httpd = { enable = false; config = { }; }; # explicit off
|
||
jellyfin = { enable = true; config = { }; };
|
||
};
|
||
```
|
||
|
||
Metanix will map these entries into:
|
||
|
||
- `services.<name>.enable = true/false`
|
||
- service-specific options
|
||
- containerization (if you decide that later) or native services
|
||
|
||
### 5.3 Resources
|
||
|
||
Resources are logical capabilities this system **provides**:
|
||
|
||
```nix
|
||
resources = {
|
||
dns = { };
|
||
media = { };
|
||
git = { };
|
||
auth = { };
|
||
};
|
||
```
|
||
|
||
These serve as:
|
||
|
||
- symbolic handles for ACLs
|
||
- targets for other systems’ `consumers`
|
||
- hints for how to wire firewall / routing / DNS / etc.
|
||
|
||
### 5.4 Consumers
|
||
|
||
Consumers describe what this system **depends on**:
|
||
|
||
```nix
|
||
consumers = {
|
||
dns = { provider = "eris"; };
|
||
dhcp = { provider = "phobos"; };
|
||
wireguard = { provider = "frontend.kasear.net"; };
|
||
};
|
||
```
|
||
|
||
Resolution order:
|
||
|
||
- `systems.<name>.consumers.<res>.provider`
|
||
overrides
|
||
- top-level `consumers.<res>.provider` defaults
|
||
|
||
Providers can be:
|
||
|
||
- a host / system name (e.g. `"phobos"`)
|
||
- FQDN (e.g. `"frontend.kasear.net"`)
|
||
- raw IP (e.g. `"1.1.1.1"`)
|
||
|
||
Metanix uses this to generate:
|
||
|
||
- `/etc/resolv.conf`
|
||
- DNS stub configurations
|
||
- DHCP relays / clients
|
||
- WireGuard peers
|
||
|
||
---
|
||
|
||
## 6. Identity Model & Policy
|
||
|
||
This is the spine of the whole thing. Try not to break it.
|
||
|
||
### 6.1 `policy.users` – global identity ledger
|
||
|
||
`policy.users` defines who exists in the universe and what they look like _if they exist on a system_.
|
||
|
||
Shape:
|
||
|
||
```nix
|
||
policy.users.<username> = {
|
||
uid = int;
|
||
type = "human" | "service";
|
||
primaryGroup = "groupName";
|
||
extraGroups = [ "group1" "group2" ];
|
||
shell = "/run/current-system/sw/bin/bash";
|
||
|
||
home = {
|
||
type = "standard" | "shared" | "system" | "none";
|
||
path = null | "/path";
|
||
};
|
||
|
||
sshAuthorizedKeys = [ "ssh-..." ];
|
||
passwordHash = null | "hash";
|
||
locked = bool;
|
||
|
||
tags = [ "admin" "homelab" "monitoring" ];
|
||
|
||
homeManager = {
|
||
profiles = [ "desktopBase" "devTools" ];
|
||
extraModules = [ ./home/yaro-extra.nix ];
|
||
options = { programs.git.userName = "Yaro"; ... };
|
||
};
|
||
};
|
||
```
|
||
|
||
Examples:
|
||
|
||
```nix
|
||
policy.users = {
|
||
yaro = {
|
||
uid = 10010;
|
||
type = "human";
|
||
primaryGroup = "yaro";
|
||
extraGroups = [ "admins" "desktopUsers" ];
|
||
shell = "/run/current-system/sw/bin/bash";
|
||
home = { type = "standard"; path = null; };
|
||
sshAuthorizedKeys = [ "ssh-ed25519 AAAA...yaro" ];
|
||
passwordHash = null;
|
||
locked = false;
|
||
tags = [ "admin" "homelab" ];
|
||
homeManager = {
|
||
profiles = [ "desktopBase" "devTools" ];
|
||
extraModules = [ ./home/yaro-extra.nix ];
|
||
options = {
|
||
programs.git.userName = "Yaro";
|
||
programs.git.userEmail = "yaro@kasear.net";
|
||
};
|
||
};
|
||
};
|
||
|
||
monitoring = {
|
||
uid = 10030;
|
||
type = "service";
|
||
primaryGroup = "monitoring";
|
||
extraGroups = [ ];
|
||
shell = "/run/current-system/sw/bin/nologin";
|
||
home = { type = "system"; path = "/var/lib/monitoring"; };
|
||
sshAuthorizedKeys = [ ];
|
||
passwordHash = null;
|
||
locked = true;
|
||
tags = [ "service" "monitoring" ];
|
||
homeManager = {
|
||
profiles = [ ];
|
||
extraModules = [ ];
|
||
options = { };
|
||
};
|
||
};
|
||
};
|
||
```
|
||
|
||
**Important:**
|
||
`policy.users` does **not** decide _where_ a user exists. It defines the global, canonical identity when they do.
|
||
|
||
### 6.2 `policy.groups`
|
||
|
||
Global group ledger:
|
||
|
||
```nix
|
||
policy.groups = {
|
||
admins = { gid = 20010; members = [ "yaro" ]; };
|
||
ops = { gid = 20011; members = [ "ops" ]; };
|
||
desktopUsers = { gid = 20020; members = [ ]; };
|
||
monitoring = { gid = 20030; members = [ "monitoring" ]; };
|
||
};
|
||
```
|
||
|
||
Groups are used for:
|
||
|
||
- Unix group membership
|
||
- ACL principals
|
||
- targeting shared configurations
|
||
|
||
### 6.3 `policy.globals`
|
||
|
||
Global identity hints, mostly for presence / “tends to exist everywhere”:
|
||
|
||
```nix
|
||
policy.globals = {
|
||
owner = [ ]; # global owners (use sparingly)
|
||
admins = [ "yaro" ]; # global admins
|
||
users = [ "monitoring" ]; # plain / service users
|
||
};
|
||
```
|
||
|
||
Metanix uses this as a baseline:
|
||
|
||
- to decide which users “naturally” appear everywhere
|
||
- before location/host/system-specific overrides are applied
|
||
|
||
---
|
||
|
||
## 7. Identity Resolution & Privilege Rules
|
||
|
||
This is the fun part where you avoid Schrödinger’s sudoer.
|
||
|
||
### 7.1 Privilege levels
|
||
|
||
For a given user `U` on a system `S`:
|
||
|
||
```text
|
||
none < user < admin < owner
|
||
```
|
||
|
||
Where:
|
||
|
||
- `user` → exists, no sudo by default
|
||
- `admin` → sudo-capable / elevated
|
||
- `owner` → top-level admin in that scope; default broad control
|
||
|
||
### 7.2 Scopes
|
||
|
||
Privilege hints appear in:
|
||
|
||
- `locations.<loc>.{owner,admins,users}`
|
||
- `locations.<loc>.<subnet>.{owner,admins,users}`
|
||
- `locations.<loc>.<subnet>.hosts.<host>.{owner,admins,users}`
|
||
- `policy.globals`
|
||
- (optionally) `systems.<name>.{owner,admins,users}` later
|
||
|
||
### 7.3 Two key stages
|
||
|
||
**Stage 1: Per-host privilege**
|
||
|
||
Per host:
|
||
|
||
1. Start from location level
|
||
2. Overlay subnet level
|
||
3. Overlay host level
|
||
|
||
> More local scope wins at this stage.
|
||
|
||
Result: for each host, you get a map like:
|
||
|
||
```nix
|
||
# home.dmz.deimos
|
||
{
|
||
yaro = "owner";
|
||
ops = "admin";
|
||
}
|
||
|
||
# cloud.dmz.deimos-cloud
|
||
{
|
||
analytics = "user";
|
||
}
|
||
```
|
||
|
||
**Stage 2: Per-system aggregation (multi-host)**
|
||
|
||
A system can have multiple hosts:
|
||
|
||
```nix
|
||
systems.deimos.hosts = [ "deimos" "deimos-cloud" ];
|
||
```
|
||
|
||
When the same user appears with different host-level privileges for the same system:
|
||
|
||
> **System-level privilege is the highest privilege seen across all its hosts.**
|
||
|
||
So if:
|
||
|
||
- `home.dmz.deimos.owner = "yaro"`
|
||
- `cloud.dmz.deimos-cloud.users = [ "yaro" ]`
|
||
|
||
Then:
|
||
|
||
- Host view:
|
||
|
||
- home plane: `owner`
|
||
- cloud plane: `user`
|
||
|
||
- System view:
|
||
|
||
- `yaro` = `owner`
|
||
|
||
The system must see a single clear privilege; the network can see differing trust per plane.
|
||
|
||
### 7.4 Presence vs privilege
|
||
|
||
Existence (`user gets created at all`) depends on:
|
||
|
||
- privilege level **and**
|
||
- host role
|
||
|
||
Examples:
|
||
|
||
- On a `server` / `workstation` role:
|
||
|
||
- a user in `users` (non-admin) can be created as a plain user.
|
||
|
||
- On an `adminWorkstation` / `coreServer` / `router` role:
|
||
|
||
- plain `users` entries may **not** create accounts by default
|
||
- only `owner` / `admin` entries do
|
||
- unless policy or an explicit host override says otherwise.
|
||
|
||
This prevents admin machines from being stuffed full of random user accounts by accident.
|
||
|
||
### 7.5 Host-context semantics vs system-level semantics
|
||
|
||
System-level privilege:
|
||
|
||
- controls local Unix stuff:
|
||
|
||
- `users.users.<name>.isNormalUser = true`
|
||
- sudo / wheel membership
|
||
- group membership
|
||
|
||
Host-context privilege:
|
||
|
||
- controls **network-plane trust**:
|
||
|
||
- which interfaces are “admin planes”
|
||
- which subnets can reach SSH, mgmt ports, control APIs
|
||
- which subnets can only reach app ports
|
||
|
||
So you can have:
|
||
|
||
- `yaro` is owner on the system (sudo)
|
||
- from `home.dmz` plane, `yaro` is treated as admin-plane → SSH allowed
|
||
- from `cloud.dmz` plane, `yaro` is treated as regular → no SSH, only HTTP
|
||
|
||
That’s intentional: same identity, different trust by plane.
|
||
|
||
---
|
||
|
||
## 8. Policy Configurations & Home-manager
|
||
|
||
### 8.1 `policy.configurations`
|
||
|
||
This is where you define reusable config bundles that get attached to users, groups, systems, locations, etc.
|
||
|
||
Shape:
|
||
|
||
```nix
|
||
policy.configurations.<name> = {
|
||
targets = {
|
||
users = [ "yaro" { tag = "human"; } ];
|
||
groups = [ "devs" "desktopUsers" ];
|
||
systems = [ "deimos" "metatron" ];
|
||
locations = [ "home" "cloud" ];
|
||
subnets = [ "home.main" "cloud.infra" ];
|
||
};
|
||
|
||
nixos = {
|
||
modules = [ ./policy/some-module.nix ];
|
||
options = { services.foo.enable = true; };
|
||
};
|
||
|
||
homeManager = {
|
||
modules = [ ./hm/profile.nix ];
|
||
options = { programs.firefox.enable = true; };
|
||
};
|
||
};
|
||
```
|
||
|
||
Examples:
|
||
|
||
```nix
|
||
policy.configurations = {
|
||
desktopBase = {
|
||
targets = {
|
||
groups = [ "desktopUsers" ];
|
||
};
|
||
|
||
homeManager = {
|
||
modules = [ ./hm/desktop-base.nix ];
|
||
options = {
|
||
programs.firefox.enable = true;
|
||
};
|
||
};
|
||
};
|
||
|
||
devTools = {
|
||
targets = {
|
||
users = [ "yaro" "ops" ];
|
||
};
|
||
|
||
homeManager = {
|
||
modules = [ ./hm/dev-tools.nix ];
|
||
options = { };
|
||
};
|
||
};
|
||
|
||
firefoxProfile = {
|
||
targets = {
|
||
groups = [ "devs" ];
|
||
};
|
||
|
||
homeManager = {
|
||
modules = [ ./hm/firefox-profile.nix ];
|
||
options = {
|
||
extensions = [
|
||
"uBlockOrigin"
|
||
"multi-account-containers"
|
||
];
|
||
homepage = "https://intranet.kasear.net";
|
||
};
|
||
};
|
||
};
|
||
|
||
extraHosts = {
|
||
targets = {
|
||
systems = [ "deimos" "metatron" ];
|
||
};
|
||
|
||
nixos = {
|
||
modules = [ ./policy/extra-hosts.nix ];
|
||
options = {
|
||
hosts = {
|
||
"special.internal" = "203.0.113.7";
|
||
};
|
||
};
|
||
};
|
||
};
|
||
};
|
||
```
|
||
|
||
### 8.2 How home-manager is applied
|
||
|
||
For each **user on each system**, Metanix:
|
||
|
||
1. Determines if home-manager is available / integrated.
|
||
|
||
2. Collects:
|
||
|
||
- `policy.users.<user>.homeManager.profiles`
|
||
- `policy.users.<user>.homeManager.extraModules/options`
|
||
- all `policy.configurations.*` whose `targets` match:
|
||
|
||
- that user
|
||
- any of their groups
|
||
- the system
|
||
- its location/subnet
|
||
|
||
3. Merges HM modules / options in a defined order, e.g.:
|
||
|
||
```text
|
||
global / group bundles
|
||
→ profile bundles (from user.homeManager.profiles)
|
||
→ per-user extraModules / options
|
||
```
|
||
|
||
4. Emits a home-manager configuration for that user on that system.
|
||
|
||
End result:
|
||
|
||
> “This group of users will have Firefox installed with these extensions enabled.”
|
||
|
||
…is expressed once in `policy.configurations`, not copy-pasted.
|
||
|
||
---
|
||
|
||
## 9. Output Artifacts
|
||
|
||
Given `meta.nix`, Metanix is expected to generate, for each system:
|
||
|
||
- NixOS module tree:
|
||
|
||
- `users.users` and `users.groups`
|
||
- `services.*` for DNS, DHCP, WireGuard, nginx, etc.
|
||
- `/etc/hosts` with all local truths from `locations`
|
||
- networking (IP, routes, VLANs) from deterministic IP schema
|
||
|
||
- DNS configuration:
|
||
|
||
- authoritative zones (Knot)
|
||
- stub/resolver configs (Unbound)
|
||
- local zones for internal names
|
||
|
||
- DHCP configuration:
|
||
|
||
- Kea pools
|
||
- reservations from `hw-address` + derived IPs
|
||
- DHCP relays (e.g. dnsmasq relay on downstream routers)
|
||
|
||
- WireGuard configuration:
|
||
|
||
- upstream servers vs downstream clients
|
||
- mesh based on `tags` + `consumers.wireguard`
|
||
|
||
- Firewall:
|
||
|
||
- per-interface policies derived from:
|
||
|
||
- host role (`router`, `adminWorkstation`, `server`, etc.)
|
||
- host-context identity hints
|
||
- `policy.acl` (capabilities → allowed flows)
|
||
|
||
- Home-manager configs:
|
||
|
||
- per user, per system, based on `policy.users` and `policy.configurations`
|
||
|
||
---
|
||
|
||
## 10. Future Work / Open Questions
|
||
|
||
Because you’re not done tormenting yourself yet:
|
||
|
||
- Formalize IP derivation (e.g. mkIp using location/subnet/role bits).
|
||
- Define exact precedence rules for:
|
||
|
||
- HM module merge order
|
||
- NixOS module composition from policy and system configs
|
||
|
||
- Define a small ACL capability vocabulary:
|
||
|
||
- `ssh`, `sudo`, `manage-services`, `mount-nfs`, `read-media`, `scrape-metrics`, etc.
|
||
|
||
- Define how “upstream/downstream” tags automatically:
|
||
|
||
- wire DHCP relays over WG
|
||
- configure Knot + Unbound correctly
|
||
|
||
- Add validation:
|
||
|
||
- error on users referenced in locations but missing from `policy.users`
|
||
- error on groups referenced but missing from `policy.groups`
|
||
- warn when `adminWorkstation` has random non-admin users unless explicitly allowed
|