metanix/design.md
Yaro Kasear 633a9c1856 Reboot!
2025-11-28 15:52:15 -06:00

18 KiB
Raw Permalink Blame History

Metanix: Design Document (WIP)

Declarative infrastructure, but with fewer repeated DSL war crimes.


1. High-Level Overview

Metanix is a Nix library / flake that generates NixOS configurations and related infrastructure from a higher-level “world description” file, meta.nix, plus a policy section.

Instead of hand-writing:

  • a zillion host-specific configuration.nix files
  • DHCP, DNS, firewall rules
  • user definitions
  • dotfiles and home-manager configs

…you describe:

  • locations, subnets, and hosts
  • systems that correspond to those hosts
  • global identity and policy (users, groups, ACLs, shared configs)

Metanix then:

  • infers IPs, roles, and trust relationships
  • builds NixOS configs for each system
  • wires DNS, DHCP, WireGuard, firewalls, and home-manager for users where applicable

You still can hand-write Nix when you care about specifics. You just dont have to for the 90% of boilerplate that machines are objectively better at than you.


2. Goals & Non-Goals

Goals

  • Reduce boilerplate Generate as much as possible from a high-level description of the world.

  • Deterministic global identity Users and groups have consistent UIDs/GIDs across all managed systems.

  • Declarative RBAC & network trust Identity and access control defined once, applied consistently to:

    • firewalls
    • services
    • admin surfaces
  • Location-aware infrastructure Use locations and hosts to drive:

    • IP addressing
    • control-plane vs data-plane
    • which systems are “upstream” vs “downstream”
  • Home-manager integration User environments (dotfiles, tools, browser setup) managed from policy, not from random snowflake configs.

Non-Goals

  • Replacing NixOS modules Metanix composes and configures them; it doesnt rewrite them.

  • Being a one-click magic box This is not “paste YAML, receive Kubernetes.” Youre still expected to understand your network and your systems.

  • Hiding complexity at all costs The complexity is still there. Metanix just centralizes it so you can reason about it.


3. Core Concepts

3.1 meta.nix structure (top level)

meta.nix is the main entrypoint to Metanix. Simplified structure:

{
  domain    = "kasear.net";
  locations = { ... };
  systems   = { ... };
  consumers = { ... };  # global defaults for resources
  policy    = { ... };  # identity, RBAC, shared configs
}

Metanix is the flake; meta.nix is the data.


4. Locations, Subnets, Hosts

Locations describe where things live. Subnets describe how theyre sliced. Hosts are the concrete entries inside those subnets.

4.1 Locations

Shape:

locations.<location> = {
  owner  = "yaro";        # default owner for this location
  admins = [ "ops" ];     # location-wide admins
  users  = [ "monitor" ]; # location-relevant users

  <subnet> = { ... };
};

Example:

locations.home = {
  owner  = "yaro";
  admins = [ "ops" ];

  main = { ... };
  dmz  = { ... };
  iot  = { ... };
};

Location-level identity is a hint:

  • “These users are relevant here”
  • “This person is probably in charge here”

Actual presence/privilege on a given system is resolved later via hosts and systems.

4.2 Subnets

Shape:

locations.<location>.<subnet> = {
  vlan = 10;                          # optional
  dhcp = { start = 1; end = 250; };   # optional

  owner  = "ops";                     # overrides location.owner
  admins = [ "sre" ];
  users  = [ "resident" ];

  hosts = { ... };
};

Subnets:

  • define per-VLAN semantics (e.g. main, dmz, iot)
  • refine identity hints for systems in that subnet
  • will eventually feed into IP allocation (e.g. via a deterministic scheme like mkIp)

4.3 Hosts

Hosts are interfaces into contexts, not necessarily 1:1 machines.

Shape:

locations.<location>.<subnet>.hosts.<hostname> = {
  role      = "router" | "server" | "adminWorkstation" | "coreServer" | ...;
  hw-address = "aa:bb:cc:dd:ee:ff";  # optional
  aliases    = [ "fqdn" ... ];       # optional
  interface  = "eno2";               # optional
  dns        = false;                # optional, default true
  hostId     = 42;                   # optional, special cases

  # Identity hints in THIS CONTEXT ONLY:
  owner  = "yaro";                   # hosts admin owner
  admins = [ "ops" "sre" ];
  users  = [ "analytics" ];
};

Key points:

  • One system can appear as multiple hosts across locations/subnets.
  • Each host is “this system, as seen from this network plane.”
  • Identity hints here are per host context, not global truth.

Examples:

# Home DMZ view of deimos
locations.home.dmz.hosts.deimos = {
  role      = "server";
  hw-address = "10:98:36:a0:2c:b2";
  interface  = "eno2";
  aliases    = [ "kasear.net" "vpn.kasear.net" ... ];
  owner      = "yaro";
  admins     = [ "ops" ];
};

# Cloud DMZ view of same system
locations.cloud.dmz.hosts.deimos-cloud = {
  role      = "server";
  interface  = "wg0";
  users      = [ "analytics" ];   # non-admin plane
};

5. Systems, Services, Resources, Consumers

Systems describe machines from Metanixs point of view and how they connect to hosts, services, and resources.

5.1 Systems

Shape:

systems.<name> = {
  tags     = [ "router" "public" "upstream" "downstream" ];
  location = "home";
  subnet   = "dmz";

  # Host keys under locations.*
  hosts    = [ "deimos" "deimos-cloud" ];

  # Optional system-level hints
  # owner  = "yaro";
  # admins = [ "ops" ];
  # users  = [ "monitor" ];

  services = { ... };
  resources = { ... };
  consumers = { ... };

  configuration = ./systems/x86_64-linux/<name>/default.nix;
};

Tags are semantic hints / profiles. Examples:

  • router network edge / routing logic
  • public exposed to the internet
  • upstream config-plane / authoritative system (e.g. Kea/Knot/Unbound/WG server)
  • downstream router profile consuming upstream config-plane

Metanix modules will key off these tags to decide default behaviors (e.g. unbound in “upstream” mode vs stub-only).

5.2 Services

Services are basically “NixOS modules we want on this system.”

Shape:

services = {
  <serviceName> = {
    enable = true;           # optional; presence can imply true
    tags   = [ "upstream" ]; # service-specific tags
    config = { };            # free-form module options
  };
};

Example:

services = {
  headscale   = { enable = true; config = { }; };
  nginx-proxy = { enable = true; config = { }; };
  nginx       = { enable = true; config = { }; };
  httpd       = { enable = false; config = { }; }; # explicit off
  jellyfin    = { enable = true; config = { }; };
};

Metanix will map these entries into:

  • services.<name>.enable = true/false
  • service-specific options
  • containerization (if you decide that later) or native services

5.3 Resources

Resources are logical capabilities this system provides:

resources = {
  dns   = { };
  media = { };
  git   = { };
  auth  = { };
};

These serve as:

  • symbolic handles for ACLs
  • targets for other systems consumers
  • hints for how to wire firewall / routing / DNS / etc.

5.4 Consumers

Consumers describe what this system depends on:

consumers = {
  dns  = { provider = "eris"; };
  dhcp = { provider = "phobos"; };
  wireguard = { provider = "frontend.kasear.net"; };
};

Resolution order:

  • systems.<name>.consumers.<res>.provider overrides
  • top-level consumers.<res>.provider defaults

Providers can be:

  • a host / system name (e.g. "phobos")
  • FQDN (e.g. "frontend.kasear.net")
  • raw IP (e.g. "1.1.1.1")

Metanix uses this to generate:

  • /etc/resolv.conf
  • DNS stub configurations
  • DHCP relays / clients
  • WireGuard peers

6. Identity Model & Policy

This is the spine of the whole thing. Try not to break it.

6.1 policy.users global identity ledger

policy.users defines who exists in the universe and what they look like if they exist on a system.

Shape:

policy.users.<username> = {
  uid          = int;
  type         = "human" | "service";
  primaryGroup = "groupName";
  extraGroups  = [ "group1" "group2" ];
  shell        = "/run/current-system/sw/bin/bash";

  home = {
    type = "standard" | "shared" | "system" | "none";
    path = null | "/path";
  };

  sshAuthorizedKeys = [ "ssh-..." ];
  passwordHash      = null | "hash";
  locked            = bool;

  tags = [ "admin" "homelab" "monitoring" ];

  homeManager = {
    profiles     = [ "desktopBase" "devTools" ];
    extraModules = [ ./home/yaro-extra.nix ];
    options      = { programs.git.userName = "Yaro"; ... };
  };
};

Examples:

policy.users = {
  yaro = {
    uid          = 10010;
    type         = "human";
    primaryGroup = "yaro";
    extraGroups  = [ "admins" "desktopUsers" ];
    shell        = "/run/current-system/sw/bin/bash";
    home = { type = "standard"; path = null; };
    sshAuthorizedKeys = [ "ssh-ed25519 AAAA...yaro" ];
    passwordHash = null;
    locked = false;
    tags = [ "admin" "homelab" ];
    homeManager = {
      profiles     = [ "desktopBase" "devTools" ];
      extraModules = [ ./home/yaro-extra.nix ];
      options = {
        programs.git.userName  = "Yaro";
        programs.git.userEmail = "yaro@kasear.net";
      };
    };
  };

  monitoring = {
    uid          = 10030;
    type         = "service";
    primaryGroup = "monitoring";
    extraGroups  = [ ];
    shell        = "/run/current-system/sw/bin/nologin";
    home = { type = "system"; path = "/var/lib/monitoring"; };
    sshAuthorizedKeys = [ ];
    passwordHash = null;
    locked = true;
    tags = [ "service" "monitoring" ];
    homeManager = {
      profiles     = [ ];
      extraModules = [ ];
      options      = { };
    };
  };
};

Important: policy.users does not decide where a user exists. It defines the global, canonical identity when they do.

6.2 policy.groups

Global group ledger:

policy.groups = {
  admins = { gid = 20010; members = [ "yaro" ]; };
  ops    = { gid = 20011; members = [ "ops" ]; };
  desktopUsers = { gid = 20020; members = [ ]; };
  monitoring   = { gid = 20030; members = [ "monitoring" ]; };
};

Groups are used for:

  • Unix group membership
  • ACL principals
  • targeting shared configurations

6.3 policy.globals

Global identity hints, mostly for presence / “tends to exist everywhere”:

policy.globals = {
  owner  = [ ];               # global owners (use sparingly)
  admins = [ "yaro" ];        # global admins
  users  = [ "monitoring" ];  # plain / service users
};

Metanix uses this as a baseline:

  • to decide which users “naturally” appear everywhere
  • before location/host/system-specific overrides are applied

7. Identity Resolution & Privilege Rules

This is the fun part where you avoid Schrödingers sudoer.

7.1 Privilege levels

For a given user U on a system S:

none < user < admin < owner

Where:

  • user → exists, no sudo by default
  • admin → sudo-capable / elevated
  • owner → top-level admin in that scope; default broad control

7.2 Scopes

Privilege hints appear in:

  • locations.<loc>.{owner,admins,users}
  • locations.<loc>.<subnet>.{owner,admins,users}
  • locations.<loc>.<subnet>.hosts.<host>.{owner,admins,users}
  • policy.globals
  • (optionally) systems.<name>.{owner,admins,users} later

7.3 Two key stages

Stage 1: Per-host privilege

Per host:

  1. Start from location level
  2. Overlay subnet level
  3. Overlay host level

More local scope wins at this stage.

Result: for each host, you get a map like:

# home.dmz.deimos
{
  yaro = "owner";
  ops  = "admin";
}

# cloud.dmz.deimos-cloud
{
  analytics = "user";
}

Stage 2: Per-system aggregation (multi-host)

A system can have multiple hosts:

systems.deimos.hosts = [ "deimos" "deimos-cloud" ];

When the same user appears with different host-level privileges for the same system:

System-level privilege is the highest privilege seen across all its hosts.

So if:

  • home.dmz.deimos.owner = "yaro"
  • cloud.dmz.deimos-cloud.users = [ "yaro" ]

Then:

  • Host view:

    • home plane: owner
    • cloud plane: user
  • System view:

    • yaro = owner

The system must see a single clear privilege; the network can see differing trust per plane.

7.4 Presence vs privilege

Existence (user gets created at all) depends on:

  • privilege level and
  • host role

Examples:

  • On a server / workstation role:

    • a user in users (non-admin) can be created as a plain user.
  • On an adminWorkstation / coreServer / router role:

    • plain users entries may not create accounts by default
    • only owner / admin entries do
    • unless policy or an explicit host override says otherwise.

This prevents admin machines from being stuffed full of random user accounts by accident.

7.5 Host-context semantics vs system-level semantics

System-level privilege:

  • controls local Unix stuff:

    • users.users.<name>.isNormalUser = true
    • sudo / wheel membership
    • group membership

Host-context privilege:

  • controls network-plane trust:

    • which interfaces are “admin planes”
    • which subnets can reach SSH, mgmt ports, control APIs
    • which subnets can only reach app ports

So you can have:

  • yaro is owner on the system (sudo)
  • from home.dmz plane, yaro is treated as admin-plane → SSH allowed
  • from cloud.dmz plane, yaro is treated as regular → no SSH, only HTTP

Thats intentional: same identity, different trust by plane.


8. Policy Configurations & Home-manager

8.1 policy.configurations

This is where you define reusable config bundles that get attached to users, groups, systems, locations, etc.

Shape:

policy.configurations.<name> = {
  targets = {
    users   = [ "yaro" { tag = "human"; } ];
    groups  = [ "devs" "desktopUsers" ];
    systems = [ "deimos" "metatron" ];
    locations = [ "home" "cloud" ];
    subnets   = [ "home.main" "cloud.infra" ];
  };

  nixos = {
    modules = [ ./policy/some-module.nix ];
    options = { services.foo.enable = true; };
  };

  homeManager = {
    modules = [ ./hm/profile.nix ];
    options = { programs.firefox.enable = true; };
  };
};

Examples:

policy.configurations = {
  desktopBase = {
    targets = {
      groups = [ "desktopUsers" ];
    };

    homeManager = {
      modules = [ ./hm/desktop-base.nix ];
      options = {
        programs.firefox.enable = true;
      };
    };
  };

  devTools = {
    targets = {
      users = [ "yaro" "ops" ];
    };

    homeManager = {
      modules = [ ./hm/dev-tools.nix ];
      options = { };
    };
  };

  firefoxProfile = {
    targets = {
      groups = [ "devs" ];
    };

    homeManager = {
      modules = [ ./hm/firefox-profile.nix ];
      options = {
        extensions = [
          "uBlockOrigin"
          "multi-account-containers"
        ];
        homepage = "https://intranet.kasear.net";
      };
    };
  };

  extraHosts = {
    targets = {
      systems = [ "deimos" "metatron" ];
    };

    nixos = {
      modules = [ ./policy/extra-hosts.nix ];
      options = {
        hosts = {
          "special.internal" = "203.0.113.7";
        };
      };
    };
  };
};

8.2 How home-manager is applied

For each user on each system, Metanix:

  1. Determines if home-manager is available / integrated.

  2. Collects:

    • policy.users.<user>.homeManager.profiles

    • policy.users.<user>.homeManager.extraModules/options

    • all policy.configurations.* whose targets match:

      • that user
      • any of their groups
      • the system
      • its location/subnet
  3. Merges HM modules / options in a defined order, e.g.:

global / group bundles
→ profile bundles (from user.homeManager.profiles)
→ per-user extraModules / options
  1. Emits a home-manager configuration for that user on that system.

End result:

“This group of users will have Firefox installed with these extensions enabled.”

…is expressed once in policy.configurations, not copy-pasted.


9. Output Artifacts

Given meta.nix, Metanix is expected to generate, for each system:

  • NixOS module tree:

    • users.users and users.groups
    • services.* for DNS, DHCP, WireGuard, nginx, etc.
    • /etc/hosts with all local truths from locations
    • networking (IP, routes, VLANs) from deterministic IP schema
  • DNS configuration:

    • authoritative zones (Knot)
    • stub/resolver configs (Unbound)
    • local zones for internal names
  • DHCP configuration:

    • Kea pools
    • reservations from hw-address + derived IPs
    • DHCP relays (e.g. dnsmasq relay on downstream routers)
  • WireGuard configuration:

    • upstream servers vs downstream clients
    • mesh based on tags + consumers.wireguard
  • Firewall:

    • per-interface policies derived from:

      • host role (router, adminWorkstation, server, etc.)
      • host-context identity hints
      • policy.acl (capabilities → allowed flows)
  • Home-manager configs:

    • per user, per system, based on policy.users and policy.configurations

10. Future Work / Open Questions

Because youre not done tormenting yourself yet:

  • Formalize IP derivation (e.g. mkIp using location/subnet/role bits).

  • Define exact precedence rules for:

    • HM module merge order
    • NixOS module composition from policy and system configs
  • Define a small ACL capability vocabulary:

    • ssh, sudo, manage-services, mount-nfs, read-media, scrape-metrics, etc.
  • Define how “upstream/downstream” tags automatically:

    • wire DHCP relays over WG
    • configure Knot + Unbound correctly
  • Add validation:

    • error on users referenced in locations but missing from policy.users
    • error on groups referenced but missing from policy.groups
    • warn when adminWorkstation has random non-admin users unless explicitly allowed