metanix/design.md
Yaro Kasear 633a9c1856 Reboot!
2025-11-28 15:52:15 -06:00

780 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Metanix: Design Document (WIP)
_Declarative infrastructure, but with fewer repeated DSL war crimes._
---
## 1. High-Level Overview
**Metanix** is a Nix library / flake that generates NixOS configurations and related infrastructure from a higher-level “world description” file, `meta.nix`, plus a `policy` section.
Instead of hand-writing:
- a zillion host-specific `configuration.nix` files
- DHCP, DNS, firewall rules
- user definitions
- dotfiles and home-manager configs
…you describe:
- **locations**, **subnets**, and **hosts**
- **systems** that correspond to those hosts
- **global identity and policy** (users, groups, ACLs, shared configs)
Metanix then:
- infers IPs, roles, and trust relationships
- builds NixOS configs for each system
- wires DNS, DHCP, WireGuard, firewalls, and home-manager for users where applicable
You still _can_ hand-write Nix when you care about specifics. You just dont have to for the 90% of boilerplate that machines are objectively better at than you.
---
## 2. Goals & Non-Goals
### Goals
- **Reduce boilerplate**
Generate as much as possible from a high-level description of the world.
- **Deterministic global identity**
Users and groups have consistent UIDs/GIDs across all managed systems.
- **Declarative RBAC & network trust**
Identity and access control defined once, applied consistently to:
- firewalls
- services
- admin surfaces
- **Location-aware infrastructure**
Use `locations` and `hosts` to drive:
- IP addressing
- control-plane vs data-plane
- which systems are “upstream” vs “downstream”
- **Home-manager integration**
User environments (dotfiles, tools, browser setup) managed from policy, not from random snowflake configs.
### Non-Goals
- Replacing NixOS modules
Metanix composes and configures them; it doesnt rewrite them.
- Being a one-click magic box
This is not “paste YAML, receive Kubernetes.” Youre still expected to understand your network and your systems.
- Hiding complexity at all costs
The complexity is still there. Metanix just centralizes it so you can reason about it.
---
## 3. Core Concepts
### 3.1 `meta.nix` structure (top level)
`meta.nix` is the main entrypoint to Metanix. Simplified structure:
```nix
{
domain = "kasear.net";
locations = { ... };
systems = { ... };
consumers = { ... }; # global defaults for resources
policy = { ... }; # identity, RBAC, shared configs
}
```
Metanix is the flake; `meta.nix` is the data.
---
## 4. Locations, Subnets, Hosts
Locations describe _where_ things live. Subnets describe _how_ theyre sliced. Hosts are the concrete entries inside those subnets.
### 4.1 Locations
Shape:
```nix
locations.<location> = {
owner = "yaro"; # default owner for this location
admins = [ "ops" ]; # location-wide admins
users = [ "monitor" ]; # location-relevant users
<subnet> = { ... };
};
```
Example:
```nix
locations.home = {
owner = "yaro";
admins = [ "ops" ];
main = { ... };
dmz = { ... };
iot = { ... };
};
```
Location-level identity is a _hint_:
- “These users are relevant here”
- “This person is probably in charge here”
Actual presence/privilege on a given system is resolved later via hosts and systems.
### 4.2 Subnets
Shape:
```nix
locations.<location>.<subnet> = {
vlan = 10; # optional
dhcp = { start = 1; end = 250; }; # optional
owner = "ops"; # overrides location.owner
admins = [ "sre" ];
users = [ "resident" ];
hosts = { ... };
};
```
Subnets:
- define per-VLAN semantics (e.g. main, dmz, iot)
- refine identity hints for systems in that subnet
- will eventually feed into IP allocation (e.g. via a deterministic scheme like mkIp)
### 4.3 Hosts
Hosts are **interfaces into contexts**, not necessarily 1:1 machines.
Shape:
```nix
locations.<location>.<subnet>.hosts.<hostname> = {
role = "router" | "server" | "adminWorkstation" | "coreServer" | ...;
hw-address = "aa:bb:cc:dd:ee:ff"; # optional
aliases = [ "fqdn" ... ]; # optional
interface = "eno2"; # optional
dns = false; # optional, default true
hostId = 42; # optional, special cases
# Identity hints in THIS CONTEXT ONLY:
owner = "yaro"; # hosts admin owner
admins = [ "ops" "sre" ];
users = [ "analytics" ];
};
```
Key points:
- One **system** can appear as **multiple hosts** across locations/subnets.
- Each host is “this system, as seen from this network plane.”
- Identity hints here are **per host context**, not global truth.
Examples:
```nix
# Home DMZ view of deimos
locations.home.dmz.hosts.deimos = {
role = "server";
hw-address = "10:98:36:a0:2c:b2";
interface = "eno2";
aliases = [ "kasear.net" "vpn.kasear.net" ... ];
owner = "yaro";
admins = [ "ops" ];
};
# Cloud DMZ view of same system
locations.cloud.dmz.hosts.deimos-cloud = {
role = "server";
interface = "wg0";
users = [ "analytics" ]; # non-admin plane
};
```
---
## 5. Systems, Services, Resources, Consumers
Systems describe machines from Metanixs point of view and how they connect to hosts, services, and resources.
### 5.1 Systems
Shape:
```nix
systems.<name> = {
tags = [ "router" "public" "upstream" "downstream" ];
location = "home";
subnet = "dmz";
# Host keys under locations.*
hosts = [ "deimos" "deimos-cloud" ];
# Optional system-level hints
# owner = "yaro";
# admins = [ "ops" ];
# users = [ "monitor" ];
services = { ... };
resources = { ... };
consumers = { ... };
configuration = ./systems/x86_64-linux/<name>/default.nix;
};
```
**Tags** are semantic hints / profiles. Examples:
- `router` network edge / routing logic
- `public` exposed to the internet
- `upstream` config-plane / authoritative system (e.g. Kea/Knot/Unbound/WG server)
- `downstream` router profile consuming upstream config-plane
Metanix modules will key off these tags to decide default behaviors (e.g. unbound in “upstream” mode vs stub-only).
### 5.2 Services
Services are basically “NixOS modules we want on this system.”
Shape:
```nix
services = {
<serviceName> = {
enable = true; # optional; presence can imply true
tags = [ "upstream" ]; # service-specific tags
config = { }; # free-form module options
};
};
```
Example:
```nix
services = {
headscale = { enable = true; config = { }; };
nginx-proxy = { enable = true; config = { }; };
nginx = { enable = true; config = { }; };
httpd = { enable = false; config = { }; }; # explicit off
jellyfin = { enable = true; config = { }; };
};
```
Metanix will map these entries into:
- `services.<name>.enable = true/false`
- service-specific options
- containerization (if you decide that later) or native services
### 5.3 Resources
Resources are logical capabilities this system **provides**:
```nix
resources = {
dns = { };
media = { };
git = { };
auth = { };
};
```
These serve as:
- symbolic handles for ACLs
- targets for other systems `consumers`
- hints for how to wire firewall / routing / DNS / etc.
### 5.4 Consumers
Consumers describe what this system **depends on**:
```nix
consumers = {
dns = { provider = "eris"; };
dhcp = { provider = "phobos"; };
wireguard = { provider = "frontend.kasear.net"; };
};
```
Resolution order:
- `systems.<name>.consumers.<res>.provider`
overrides
- top-level `consumers.<res>.provider` defaults
Providers can be:
- a host / system name (e.g. `"phobos"`)
- FQDN (e.g. `"frontend.kasear.net"`)
- raw IP (e.g. `"1.1.1.1"`)
Metanix uses this to generate:
- `/etc/resolv.conf`
- DNS stub configurations
- DHCP relays / clients
- WireGuard peers
---
## 6. Identity Model & Policy
This is the spine of the whole thing. Try not to break it.
### 6.1 `policy.users` global identity ledger
`policy.users` defines who exists in the universe and what they look like _if they exist on a system_.
Shape:
```nix
policy.users.<username> = {
uid = int;
type = "human" | "service";
primaryGroup = "groupName";
extraGroups = [ "group1" "group2" ];
shell = "/run/current-system/sw/bin/bash";
home = {
type = "standard" | "shared" | "system" | "none";
path = null | "/path";
};
sshAuthorizedKeys = [ "ssh-..." ];
passwordHash = null | "hash";
locked = bool;
tags = [ "admin" "homelab" "monitoring" ];
homeManager = {
profiles = [ "desktopBase" "devTools" ];
extraModules = [ ./home/yaro-extra.nix ];
options = { programs.git.userName = "Yaro"; ... };
};
};
```
Examples:
```nix
policy.users = {
yaro = {
uid = 10010;
type = "human";
primaryGroup = "yaro";
extraGroups = [ "admins" "desktopUsers" ];
shell = "/run/current-system/sw/bin/bash";
home = { type = "standard"; path = null; };
sshAuthorizedKeys = [ "ssh-ed25519 AAAA...yaro" ];
passwordHash = null;
locked = false;
tags = [ "admin" "homelab" ];
homeManager = {
profiles = [ "desktopBase" "devTools" ];
extraModules = [ ./home/yaro-extra.nix ];
options = {
programs.git.userName = "Yaro";
programs.git.userEmail = "yaro@kasear.net";
};
};
};
monitoring = {
uid = 10030;
type = "service";
primaryGroup = "monitoring";
extraGroups = [ ];
shell = "/run/current-system/sw/bin/nologin";
home = { type = "system"; path = "/var/lib/monitoring"; };
sshAuthorizedKeys = [ ];
passwordHash = null;
locked = true;
tags = [ "service" "monitoring" ];
homeManager = {
profiles = [ ];
extraModules = [ ];
options = { };
};
};
};
```
**Important:**
`policy.users` does **not** decide _where_ a user exists. It defines the global, canonical identity when they do.
### 6.2 `policy.groups`
Global group ledger:
```nix
policy.groups = {
admins = { gid = 20010; members = [ "yaro" ]; };
ops = { gid = 20011; members = [ "ops" ]; };
desktopUsers = { gid = 20020; members = [ ]; };
monitoring = { gid = 20030; members = [ "monitoring" ]; };
};
```
Groups are used for:
- Unix group membership
- ACL principals
- targeting shared configurations
### 6.3 `policy.globals`
Global identity hints, mostly for presence / “tends to exist everywhere”:
```nix
policy.globals = {
owner = [ ]; # global owners (use sparingly)
admins = [ "yaro" ]; # global admins
users = [ "monitoring" ]; # plain / service users
};
```
Metanix uses this as a baseline:
- to decide which users “naturally” appear everywhere
- before location/host/system-specific overrides are applied
---
## 7. Identity Resolution & Privilege Rules
This is the fun part where you avoid Schrödingers sudoer.
### 7.1 Privilege levels
For a given user `U` on a system `S`:
```text
none < user < admin < owner
```
Where:
- `user` → exists, no sudo by default
- `admin` → sudo-capable / elevated
- `owner` → top-level admin in that scope; default broad control
### 7.2 Scopes
Privilege hints appear in:
- `locations.<loc>.{owner,admins,users}`
- `locations.<loc>.<subnet>.{owner,admins,users}`
- `locations.<loc>.<subnet>.hosts.<host>.{owner,admins,users}`
- `policy.globals`
- (optionally) `systems.<name>.{owner,admins,users}` later
### 7.3 Two key stages
**Stage 1: Per-host privilege**
Per host:
1. Start from location level
2. Overlay subnet level
3. Overlay host level
> More local scope wins at this stage.
Result: for each host, you get a map like:
```nix
# home.dmz.deimos
{
yaro = "owner";
ops = "admin";
}
# cloud.dmz.deimos-cloud
{
analytics = "user";
}
```
**Stage 2: Per-system aggregation (multi-host)**
A system can have multiple hosts:
```nix
systems.deimos.hosts = [ "deimos" "deimos-cloud" ];
```
When the same user appears with different host-level privileges for the same system:
> **System-level privilege is the highest privilege seen across all its hosts.**
So if:
- `home.dmz.deimos.owner = "yaro"`
- `cloud.dmz.deimos-cloud.users = [ "yaro" ]`
Then:
- Host view:
- home plane: `owner`
- cloud plane: `user`
- System view:
- `yaro` = `owner`
The system must see a single clear privilege; the network can see differing trust per plane.
### 7.4 Presence vs privilege
Existence (`user gets created at all`) depends on:
- privilege level **and**
- host role
Examples:
- On a `server` / `workstation` role:
- a user in `users` (non-admin) can be created as a plain user.
- On an `adminWorkstation` / `coreServer` / `router` role:
- plain `users` entries may **not** create accounts by default
- only `owner` / `admin` entries do
- unless policy or an explicit host override says otherwise.
This prevents admin machines from being stuffed full of random user accounts by accident.
### 7.5 Host-context semantics vs system-level semantics
System-level privilege:
- controls local Unix stuff:
- `users.users.<name>.isNormalUser = true`
- sudo / wheel membership
- group membership
Host-context privilege:
- controls **network-plane trust**:
- which interfaces are “admin planes”
- which subnets can reach SSH, mgmt ports, control APIs
- which subnets can only reach app ports
So you can have:
- `yaro` is owner on the system (sudo)
- from `home.dmz` plane, `yaro` is treated as admin-plane → SSH allowed
- from `cloud.dmz` plane, `yaro` is treated as regular → no SSH, only HTTP
Thats intentional: same identity, different trust by plane.
---
## 8. Policy Configurations & Home-manager
### 8.1 `policy.configurations`
This is where you define reusable config bundles that get attached to users, groups, systems, locations, etc.
Shape:
```nix
policy.configurations.<name> = {
targets = {
users = [ "yaro" { tag = "human"; } ];
groups = [ "devs" "desktopUsers" ];
systems = [ "deimos" "metatron" ];
locations = [ "home" "cloud" ];
subnets = [ "home.main" "cloud.infra" ];
};
nixos = {
modules = [ ./policy/some-module.nix ];
options = { services.foo.enable = true; };
};
homeManager = {
modules = [ ./hm/profile.nix ];
options = { programs.firefox.enable = true; };
};
};
```
Examples:
```nix
policy.configurations = {
desktopBase = {
targets = {
groups = [ "desktopUsers" ];
};
homeManager = {
modules = [ ./hm/desktop-base.nix ];
options = {
programs.firefox.enable = true;
};
};
};
devTools = {
targets = {
users = [ "yaro" "ops" ];
};
homeManager = {
modules = [ ./hm/dev-tools.nix ];
options = { };
};
};
firefoxProfile = {
targets = {
groups = [ "devs" ];
};
homeManager = {
modules = [ ./hm/firefox-profile.nix ];
options = {
extensions = [
"uBlockOrigin"
"multi-account-containers"
];
homepage = "https://intranet.kasear.net";
};
};
};
extraHosts = {
targets = {
systems = [ "deimos" "metatron" ];
};
nixos = {
modules = [ ./policy/extra-hosts.nix ];
options = {
hosts = {
"special.internal" = "203.0.113.7";
};
};
};
};
};
```
### 8.2 How home-manager is applied
For each **user on each system**, Metanix:
1. Determines if home-manager is available / integrated.
2. Collects:
- `policy.users.<user>.homeManager.profiles`
- `policy.users.<user>.homeManager.extraModules/options`
- all `policy.configurations.*` whose `targets` match:
- that user
- any of their groups
- the system
- its location/subnet
3. Merges HM modules / options in a defined order, e.g.:
```text
global / group bundles
→ profile bundles (from user.homeManager.profiles)
→ per-user extraModules / options
```
4. Emits a home-manager configuration for that user on that system.
End result:
> “This group of users will have Firefox installed with these extensions enabled.”
…is expressed once in `policy.configurations`, not copy-pasted.
---
## 9. Output Artifacts
Given `meta.nix`, Metanix is expected to generate, for each system:
- NixOS module tree:
- `users.users` and `users.groups`
- `services.*` for DNS, DHCP, WireGuard, nginx, etc.
- `/etc/hosts` with all local truths from `locations`
- networking (IP, routes, VLANs) from deterministic IP schema
- DNS configuration:
- authoritative zones (Knot)
- stub/resolver configs (Unbound)
- local zones for internal names
- DHCP configuration:
- Kea pools
- reservations from `hw-address` + derived IPs
- DHCP relays (e.g. dnsmasq relay on downstream routers)
- WireGuard configuration:
- upstream servers vs downstream clients
- mesh based on `tags` + `consumers.wireguard`
- Firewall:
- per-interface policies derived from:
- host role (`router`, `adminWorkstation`, `server`, etc.)
- host-context identity hints
- `policy.acl` (capabilities → allowed flows)
- Home-manager configs:
- per user, per system, based on `policy.users` and `policy.configurations`
---
## 10. Future Work / Open Questions
Because youre not done tormenting yourself yet:
- Formalize IP derivation (e.g. mkIp using location/subnet/role bits).
- Define exact precedence rules for:
- HM module merge order
- NixOS module composition from policy and system configs
- Define a small ACL capability vocabulary:
- `ssh`, `sudo`, `manage-services`, `mount-nfs`, `read-media`, `scrape-metrics`, etc.
- Define how “upstream/downstream” tags automatically:
- wire DHCP relays over WG
- configure Knot + Unbound correctly
- Add validation:
- error on users referenced in locations but missing from `policy.users`
- error on groups referenced but missing from `policy.groups`
- warn when `adminWorkstation` has random non-admin users unless explicitly allowed