This commit is contained in:
Yaro Kasear 2025-11-28 15:52:15 -06:00
parent ee31bfb7f6
commit 633a9c1856
12 changed files with 1308 additions and 1210 deletions

780
design.md Normal file
View file

@ -0,0 +1,780 @@
# Metanix: Design Document (WIP)
_Declarative infrastructure, but with fewer repeated DSL war crimes._
---
## 1. High-Level Overview
**Metanix** is a Nix library / flake that generates NixOS configurations and related infrastructure from a higher-level “world description” file, `meta.nix`, plus a `policy` section.
Instead of hand-writing:
- a zillion host-specific `configuration.nix` files
- DHCP, DNS, firewall rules
- user definitions
- dotfiles and home-manager configs
…you describe:
- **locations**, **subnets**, and **hosts**
- **systems** that correspond to those hosts
- **global identity and policy** (users, groups, ACLs, shared configs)
Metanix then:
- infers IPs, roles, and trust relationships
- builds NixOS configs for each system
- wires DNS, DHCP, WireGuard, firewalls, and home-manager for users where applicable
You still _can_ hand-write Nix when you care about specifics. You just dont have to for the 90% of boilerplate that machines are objectively better at than you.
---
## 2. Goals & Non-Goals
### Goals
- **Reduce boilerplate**
Generate as much as possible from a high-level description of the world.
- **Deterministic global identity**
Users and groups have consistent UIDs/GIDs across all managed systems.
- **Declarative RBAC & network trust**
Identity and access control defined once, applied consistently to:
- firewalls
- services
- admin surfaces
- **Location-aware infrastructure**
Use `locations` and `hosts` to drive:
- IP addressing
- control-plane vs data-plane
- which systems are “upstream” vs “downstream”
- **Home-manager integration**
User environments (dotfiles, tools, browser setup) managed from policy, not from random snowflake configs.
### Non-Goals
- Replacing NixOS modules
Metanix composes and configures them; it doesnt rewrite them.
- Being a one-click magic box
This is not “paste YAML, receive Kubernetes.” Youre still expected to understand your network and your systems.
- Hiding complexity at all costs
The complexity is still there. Metanix just centralizes it so you can reason about it.
---
## 3. Core Concepts
### 3.1 `meta.nix` structure (top level)
`meta.nix` is the main entrypoint to Metanix. Simplified structure:
```nix
{
domain = "kasear.net";
locations = { ... };
systems = { ... };
consumers = { ... }; # global defaults for resources
policy = { ... }; # identity, RBAC, shared configs
}
```
Metanix is the flake; `meta.nix` is the data.
---
## 4. Locations, Subnets, Hosts
Locations describe _where_ things live. Subnets describe _how_ theyre sliced. Hosts are the concrete entries inside those subnets.
### 4.1 Locations
Shape:
```nix
locations.<location> = {
owner = "yaro"; # default owner for this location
admins = [ "ops" ]; # location-wide admins
users = [ "monitor" ]; # location-relevant users
<subnet> = { ... };
};
```
Example:
```nix
locations.home = {
owner = "yaro";
admins = [ "ops" ];
main = { ... };
dmz = { ... };
iot = { ... };
};
```
Location-level identity is a _hint_:
- “These users are relevant here”
- “This person is probably in charge here”
Actual presence/privilege on a given system is resolved later via hosts and systems.
### 4.2 Subnets
Shape:
```nix
locations.<location>.<subnet> = {
vlan = 10; # optional
dhcp = { start = 1; end = 250; }; # optional
owner = "ops"; # overrides location.owner
admins = [ "sre" ];
users = [ "resident" ];
hosts = { ... };
};
```
Subnets:
- define per-VLAN semantics (e.g. main, dmz, iot)
- refine identity hints for systems in that subnet
- will eventually feed into IP allocation (e.g. via a deterministic scheme like mkIp)
### 4.3 Hosts
Hosts are **interfaces into contexts**, not necessarily 1:1 machines.
Shape:
```nix
locations.<location>.<subnet>.hosts.<hostname> = {
role = "router" | "server" | "adminWorkstation" | "coreServer" | ...;
hw-address = "aa:bb:cc:dd:ee:ff"; # optional
aliases = [ "fqdn" ... ]; # optional
interface = "eno2"; # optional
dns = false; # optional, default true
hostId = 42; # optional, special cases
# Identity hints in THIS CONTEXT ONLY:
owner = "yaro"; # hosts admin owner
admins = [ "ops" "sre" ];
users = [ "analytics" ];
};
```
Key points:
- One **system** can appear as **multiple hosts** across locations/subnets.
- Each host is “this system, as seen from this network plane.”
- Identity hints here are **per host context**, not global truth.
Examples:
```nix
# Home DMZ view of deimos
locations.home.dmz.hosts.deimos = {
role = "server";
hw-address = "10:98:36:a0:2c:b2";
interface = "eno2";
aliases = [ "kasear.net" "vpn.kasear.net" ... ];
owner = "yaro";
admins = [ "ops" ];
};
# Cloud DMZ view of same system
locations.cloud.dmz.hosts.deimos-cloud = {
role = "server";
interface = "wg0";
users = [ "analytics" ]; # non-admin plane
};
```
---
## 5. Systems, Services, Resources, Consumers
Systems describe machines from Metanixs point of view and how they connect to hosts, services, and resources.
### 5.1 Systems
Shape:
```nix
systems.<name> = {
tags = [ "router" "public" "upstream" "downstream" ];
location = "home";
subnet = "dmz";
# Host keys under locations.*
hosts = [ "deimos" "deimos-cloud" ];
# Optional system-level hints
# owner = "yaro";
# admins = [ "ops" ];
# users = [ "monitor" ];
services = { ... };
resources = { ... };
consumers = { ... };
configuration = ./systems/x86_64-linux/<name>/default.nix;
};
```
**Tags** are semantic hints / profiles. Examples:
- `router` network edge / routing logic
- `public` exposed to the internet
- `upstream` config-plane / authoritative system (e.g. Kea/Knot/Unbound/WG server)
- `downstream` router profile consuming upstream config-plane
Metanix modules will key off these tags to decide default behaviors (e.g. unbound in “upstream” mode vs stub-only).
### 5.2 Services
Services are basically “NixOS modules we want on this system.”
Shape:
```nix
services = {
<serviceName> = {
enable = true; # optional; presence can imply true
tags = [ "upstream" ]; # service-specific tags
config = { }; # free-form module options
};
};
```
Example:
```nix
services = {
headscale = { enable = true; config = { }; };
nginx-proxy = { enable = true; config = { }; };
nginx = { enable = true; config = { }; };
httpd = { enable = false; config = { }; }; # explicit off
jellyfin = { enable = true; config = { }; };
};
```
Metanix will map these entries into:
- `services.<name>.enable = true/false`
- service-specific options
- containerization (if you decide that later) or native services
### 5.3 Resources
Resources are logical capabilities this system **provides**:
```nix
resources = {
dns = { };
media = { };
git = { };
auth = { };
};
```
These serve as:
- symbolic handles for ACLs
- targets for other systems `consumers`
- hints for how to wire firewall / routing / DNS / etc.
### 5.4 Consumers
Consumers describe what this system **depends on**:
```nix
consumers = {
dns = { provider = "eris"; };
dhcp = { provider = "phobos"; };
wireguard = { provider = "frontend.kasear.net"; };
};
```
Resolution order:
- `systems.<name>.consumers.<res>.provider`
overrides
- top-level `consumers.<res>.provider` defaults
Providers can be:
- a host / system name (e.g. `"phobos"`)
- FQDN (e.g. `"frontend.kasear.net"`)
- raw IP (e.g. `"1.1.1.1"`)
Metanix uses this to generate:
- `/etc/resolv.conf`
- DNS stub configurations
- DHCP relays / clients
- WireGuard peers
---
## 6. Identity Model & Policy
This is the spine of the whole thing. Try not to break it.
### 6.1 `policy.users` global identity ledger
`policy.users` defines who exists in the universe and what they look like _if they exist on a system_.
Shape:
```nix
policy.users.<username> = {
uid = int;
type = "human" | "service";
primaryGroup = "groupName";
extraGroups = [ "group1" "group2" ];
shell = "/run/current-system/sw/bin/bash";
home = {
type = "standard" | "shared" | "system" | "none";
path = null | "/path";
};
sshAuthorizedKeys = [ "ssh-..." ];
passwordHash = null | "hash";
locked = bool;
tags = [ "admin" "homelab" "monitoring" ];
homeManager = {
profiles = [ "desktopBase" "devTools" ];
extraModules = [ ./home/yaro-extra.nix ];
options = { programs.git.userName = "Yaro"; ... };
};
};
```
Examples:
```nix
policy.users = {
yaro = {
uid = 10010;
type = "human";
primaryGroup = "yaro";
extraGroups = [ "admins" "desktopUsers" ];
shell = "/run/current-system/sw/bin/bash";
home = { type = "standard"; path = null; };
sshAuthorizedKeys = [ "ssh-ed25519 AAAA...yaro" ];
passwordHash = null;
locked = false;
tags = [ "admin" "homelab" ];
homeManager = {
profiles = [ "desktopBase" "devTools" ];
extraModules = [ ./home/yaro-extra.nix ];
options = {
programs.git.userName = "Yaro";
programs.git.userEmail = "yaro@kasear.net";
};
};
};
monitoring = {
uid = 10030;
type = "service";
primaryGroup = "monitoring";
extraGroups = [ ];
shell = "/run/current-system/sw/bin/nologin";
home = { type = "system"; path = "/var/lib/monitoring"; };
sshAuthorizedKeys = [ ];
passwordHash = null;
locked = true;
tags = [ "service" "monitoring" ];
homeManager = {
profiles = [ ];
extraModules = [ ];
options = { };
};
};
};
```
**Important:**
`policy.users` does **not** decide _where_ a user exists. It defines the global, canonical identity when they do.
### 6.2 `policy.groups`
Global group ledger:
```nix
policy.groups = {
admins = { gid = 20010; members = [ "yaro" ]; };
ops = { gid = 20011; members = [ "ops" ]; };
desktopUsers = { gid = 20020; members = [ ]; };
monitoring = { gid = 20030; members = [ "monitoring" ]; };
};
```
Groups are used for:
- Unix group membership
- ACL principals
- targeting shared configurations
### 6.3 `policy.globals`
Global identity hints, mostly for presence / “tends to exist everywhere”:
```nix
policy.globals = {
owner = [ ]; # global owners (use sparingly)
admins = [ "yaro" ]; # global admins
users = [ "monitoring" ]; # plain / service users
};
```
Metanix uses this as a baseline:
- to decide which users “naturally” appear everywhere
- before location/host/system-specific overrides are applied
---
## 7. Identity Resolution & Privilege Rules
This is the fun part where you avoid Schrödingers sudoer.
### 7.1 Privilege levels
For a given user `U` on a system `S`:
```text
none < user < admin < owner
```
Where:
- `user` → exists, no sudo by default
- `admin` → sudo-capable / elevated
- `owner` → top-level admin in that scope; default broad control
### 7.2 Scopes
Privilege hints appear in:
- `locations.<loc>.{owner,admins,users}`
- `locations.<loc>.<subnet>.{owner,admins,users}`
- `locations.<loc>.<subnet>.hosts.<host>.{owner,admins,users}`
- `policy.globals`
- (optionally) `systems.<name>.{owner,admins,users}` later
### 7.3 Two key stages
**Stage 1: Per-host privilege**
Per host:
1. Start from location level
2. Overlay subnet level
3. Overlay host level
> More local scope wins at this stage.
Result: for each host, you get a map like:
```nix
# home.dmz.deimos
{
yaro = "owner";
ops = "admin";
}
# cloud.dmz.deimos-cloud
{
analytics = "user";
}
```
**Stage 2: Per-system aggregation (multi-host)**
A system can have multiple hosts:
```nix
systems.deimos.hosts = [ "deimos" "deimos-cloud" ];
```
When the same user appears with different host-level privileges for the same system:
> **System-level privilege is the highest privilege seen across all its hosts.**
So if:
- `home.dmz.deimos.owner = "yaro"`
- `cloud.dmz.deimos-cloud.users = [ "yaro" ]`
Then:
- Host view:
- home plane: `owner`
- cloud plane: `user`
- System view:
- `yaro` = `owner`
The system must see a single clear privilege; the network can see differing trust per plane.
### 7.4 Presence vs privilege
Existence (`user gets created at all`) depends on:
- privilege level **and**
- host role
Examples:
- On a `server` / `workstation` role:
- a user in `users` (non-admin) can be created as a plain user.
- On an `adminWorkstation` / `coreServer` / `router` role:
- plain `users` entries may **not** create accounts by default
- only `owner` / `admin` entries do
- unless policy or an explicit host override says otherwise.
This prevents admin machines from being stuffed full of random user accounts by accident.
### 7.5 Host-context semantics vs system-level semantics
System-level privilege:
- controls local Unix stuff:
- `users.users.<name>.isNormalUser = true`
- sudo / wheel membership
- group membership
Host-context privilege:
- controls **network-plane trust**:
- which interfaces are “admin planes”
- which subnets can reach SSH, mgmt ports, control APIs
- which subnets can only reach app ports
So you can have:
- `yaro` is owner on the system (sudo)
- from `home.dmz` plane, `yaro` is treated as admin-plane → SSH allowed
- from `cloud.dmz` plane, `yaro` is treated as regular → no SSH, only HTTP
Thats intentional: same identity, different trust by plane.
---
## 8. Policy Configurations & Home-manager
### 8.1 `policy.configurations`
This is where you define reusable config bundles that get attached to users, groups, systems, locations, etc.
Shape:
```nix
policy.configurations.<name> = {
targets = {
users = [ "yaro" { tag = "human"; } ];
groups = [ "devs" "desktopUsers" ];
systems = [ "deimos" "metatron" ];
locations = [ "home" "cloud" ];
subnets = [ "home.main" "cloud.infra" ];
};
nixos = {
modules = [ ./policy/some-module.nix ];
options = { services.foo.enable = true; };
};
homeManager = {
modules = [ ./hm/profile.nix ];
options = { programs.firefox.enable = true; };
};
};
```
Examples:
```nix
policy.configurations = {
desktopBase = {
targets = {
groups = [ "desktopUsers" ];
};
homeManager = {
modules = [ ./hm/desktop-base.nix ];
options = {
programs.firefox.enable = true;
};
};
};
devTools = {
targets = {
users = [ "yaro" "ops" ];
};
homeManager = {
modules = [ ./hm/dev-tools.nix ];
options = { };
};
};
firefoxProfile = {
targets = {
groups = [ "devs" ];
};
homeManager = {
modules = [ ./hm/firefox-profile.nix ];
options = {
extensions = [
"uBlockOrigin"
"multi-account-containers"
];
homepage = "https://intranet.kasear.net";
};
};
};
extraHosts = {
targets = {
systems = [ "deimos" "metatron" ];
};
nixos = {
modules = [ ./policy/extra-hosts.nix ];
options = {
hosts = {
"special.internal" = "203.0.113.7";
};
};
};
};
};
```
### 8.2 How home-manager is applied
For each **user on each system**, Metanix:
1. Determines if home-manager is available / integrated.
2. Collects:
- `policy.users.<user>.homeManager.profiles`
- `policy.users.<user>.homeManager.extraModules/options`
- all `policy.configurations.*` whose `targets` match:
- that user
- any of their groups
- the system
- its location/subnet
3. Merges HM modules / options in a defined order, e.g.:
```text
global / group bundles
→ profile bundles (from user.homeManager.profiles)
→ per-user extraModules / options
```
4. Emits a home-manager configuration for that user on that system.
End result:
> “This group of users will have Firefox installed with these extensions enabled.”
…is expressed once in `policy.configurations`, not copy-pasted.
---
## 9. Output Artifacts
Given `meta.nix`, Metanix is expected to generate, for each system:
- NixOS module tree:
- `users.users` and `users.groups`
- `services.*` for DNS, DHCP, WireGuard, nginx, etc.
- `/etc/hosts` with all local truths from `locations`
- networking (IP, routes, VLANs) from deterministic IP schema
- DNS configuration:
- authoritative zones (Knot)
- stub/resolver configs (Unbound)
- local zones for internal names
- DHCP configuration:
- Kea pools
- reservations from `hw-address` + derived IPs
- DHCP relays (e.g. dnsmasq relay on downstream routers)
- WireGuard configuration:
- upstream servers vs downstream clients
- mesh based on `tags` + `consumers.wireguard`
- Firewall:
- per-interface policies derived from:
- host role (`router`, `adminWorkstation`, `server`, etc.)
- host-context identity hints
- `policy.acl` (capabilities → allowed flows)
- Home-manager configs:
- per user, per system, based on `policy.users` and `policy.configurations`
---
## 10. Future Work / Open Questions
Because youre not done tormenting yourself yet:
- Formalize IP derivation (e.g. mkIp using location/subnet/role bits).
- Define exact precedence rules for:
- HM module merge order
- NixOS module composition from policy and system configs
- Define a small ACL capability vocabulary:
- `ssh`, `sudo`, `manage-services`, `mount-nfs`, `read-media`, `scrape-metrics`, etc.
- Define how “upstream/downstream” tags automatically:
- wire DHCP relays over WG
- configure Knot + Unbound correctly
- Add validation:
- error on users referenced in locations but missing from `policy.users`
- error on groups referenced but missing from `policy.groups`
- warn when `adminWorkstation` has random non-admin users unless explicitly allowed