No description
Find a file
2025-09-18 11:59:13 +02:00
.gitignore Initial commit 2025-09-18 11:33:25 +02:00
LICENSE Initial commit 2025-09-18 11:33:25 +02:00
README.md Update README.md 2025-09-18 11:59:13 +02:00
rekey_realm.py Rename clean-realm.py to rekey_realm.py 2025-09-18 11:41:05 +02:00

README Keycloak realm export/duplicate/import with deterministic reID

This document explains how to export a Keycloak realm, duplicate it for other environments (e.g., -dev, -tst), and import it automatically with Docker Compose while avoiding “Duplicate resource” errors. It also documents the Python script that rewrites IDs and their references deterministically.

Target environment: Linux/WSL (Ubuntu). Commands use docker compose.


1) Overview

When you export a realm and import multiple variants (dev/test/prod) into the same Keycloak database, you hit uniquekey collisions because the JSON carries internal identifiers (UUIDs and 22char Base64URL IDs) reused across files.

The provided Python tool reIDs every entity and updates all internal references in a deterministic way per realm file. You may rerun it safely; the output is stable.

What the reID tool does

  • Rewrites every id value, whether a UUID or a short Base64URL component ID (22 chars).
  • Rewrites reference fields that point to those IDs (e.g., containerId, parentId, resourceId, …) and list references (e.g., policies, resources, …).
  • Leaves role composites intact (they reference names, not IDs).
  • Removes authenticationFlowBindingOverrides from the realm (you can rebind flows via UI if needed).
  • Creates a .bak backup of each JSON.

2) File structure (example)

project/
├─ docker-compose.yml
├─ scripts/
│  └─ rekey_realm.py
└─ kc-realm/
   ├─ name-realm.json           # exported source realm
   ├─ name-dev-realm.json       # duplicated & renamed
   └─ name-tst-realm.json       # duplicated & renamed

3) The reID script (scripts/rekey_realm.py)

Deterministic per file: new IDs are derived from realm_name|old_id with UUIDv5; nonUUIDs are converted to 22char Base64URL.

#!/usr/bin/env python3
import json
import sys
import uuid
import base64
from pathlib import Path
from typing import Any, Dict, Set

# Keys holding a single ID reference
REF_KEYS_STR = {
    "containerId",
    "parentId",
    "flowId",
    "resourceServerId",
    "resourceId",
    "policyId",
    "realmId",
    "clientScopeId",
    "userId",
    "groupId",
    "roleId",
}

# Keys holding a list of ID references
REF_KEYS_LIST = {
    "resources",
    "scopes",
    "policies",
    "clients",
    "groups",
    "permissions",
    "dependentPolicies",
}

def is_non_empty_str(x: Any) -> bool:
    return isinstance(x, str) and x != ""

def b64url_22_from_uuid(u: uuid.UUID) -> str:
    """Encode UUID bytes into 22char Base64URL (no padding)."""
    return base64.urlsafe_b64encode(u.bytes).decode("ascii").rstrip("=")

def collect_all_ids(obj: Any, found: Set[str]):
    if isinstance(obj, dict):
        v = obj.get("id")
        if is_non_empty_str(v):
            found.add(v)
        for val in obj.values():
            collect_all_ids(val, found)
    elif isinstance(obj, list):
        for val in obj:
            collect_all_ids(val, found)

def build_id_map(found_ids: Set[str], realm_name: str) -> Dict[str, str]:
    id_map: Dict[str, str] = {}
    for old in found_ids:
        u = uuid.uuid5(uuid.NAMESPACE_URL, f"{realm_name}|{old}")
        if old.count("-") == 4 and len(old) >= 36:
            id_map[old] = str(u)                 # UUID format
        else:
            id_map[old] = b64url_22_from_uuid(u) # 22char component ID
    return id_map

def rewrite_ids_and_refs(obj: Any, id_map: Dict[str, str], stats: Dict[str, int]):
    if isinstance(obj, dict):
        # Remove flow binding overrides to avoid crossrealm binding by raw ID
        if "authenticationFlowBindingOverrides" in obj:
            del obj["authenticationFlowBindingOverrides"]
            stats["flow_binding_overrides_removed"] += 1

        # Rewrite own id
        if "id" in obj and is_non_empty_str(obj["id"]):
            old = obj["id"]
            new = id_map.get(old)
            if new:
                obj["id"] = new
                stats["ids_rewritten"] += 1

        # Rewrite scalar references
        for k in list(obj.keys()):
            if k in REF_KEYS_STR and is_non_empty_str(obj[k]):
                old = obj[k]
                new = id_map.get(old)
                if new:
                    obj[k] = new
                    stats["refs_rewritten"] += 1

        # Rewrite list references
        for k in list(obj.keys()):
            if k in REF_KEYS_LIST and isinstance(obj[k], list):
                changed = 0
                new_list = []
                for item in obj[k]:
                    if is_non_empty_str(item) and item in id_map:
                        new_list.append(id_map[item])
                        changed += 1
                    else:
                        new_list.append(item)
                if changed:
                    obj[k] = new_list
                    stats["refs_rewritten"] += changed

        for v in obj.values():
            rewrite_ids_and_refs(v, id_map, stats)

    elif isinstance(obj, list):
        for v in obj:
            rewrite_ids_and_refs(v, id_map, stats)

def validate_refs(obj: Any, valid_ids: Set[str], bad: Set[str]):
    if isinstance(obj, dict):
        for k, v in obj.items():
            if k in REF_KEYS_STR and is_non_empty_str(v):
                if v not in valid_ids:
                    bad.add(f"{k} -> {v}")
            elif k in REF_KEYS_LIST and isinstance(v, list):
                for item in v:
                    if is_non_empty_str(item) and item not in valid_ids:
                        bad.add(f"{k}[] -> {item}")
        for v in obj.values():
            validate_refs(v, valid_ids, bad)
    elif isinstance(obj, list):
        for v in obj:
            validate_refs(v, valid_ids, bad)

def process_file(path: Path):
    data = json.loads(path.read_text(encoding="utf-8"))
    realm_name = data.get("realm") or path.stem

    found: Set[str] = set()
    collect_all_ids(data, found)

    id_map = build_id_map(found, realm_name)

    stats = {
        "ids_rewritten": 0,
        "refs_rewritten": 0,
        "flow_binding_overrides_removed": 0,
    }
    rewrite_ids_and_refs(data, id_map, stats)

    valid_ids = set(id_map.values())
    bad_refs: Set[str] = set()
    validate_refs(data, valid_ids, bad_refs)

    backup = path.with_suffix(path.suffix + ".bak")
    path.rename(backup)
    path.write_text(json.dumps(data, indent=2, ensure_ascii=False), encoding="utf-8")

    print(
        f"✅ {path} réécrit (backup -> {backup})  "
        f"ids_rewritten={stats['ids_rewritten']}  "
        f"refs_rewritten={stats['refs_rewritten']}  "
        f"flow_overrides_removed={stats['flow_binding_overrides_removed']}"
    )
    if bad_refs:
        print("⚠️  Références introuvables (diagnostic) :")
        for s in sorted(bad_refs)[:50]:
            print("   -", s)
        if len(bad_refs) > 50:
            print(f"   ... +{len(bad_refs)-50} autres")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python3 rekey_realm.py realm1.json [realm2.json ...]")
        sys.exit(1)
    for p in sys.argv[1:]:
        process_file(Path(p))

4) Export a realm from an existing Keycloak

Run the export command inside the Keycloak image without the server running:

# Export a single realm to a file inside /opt/keycloak/data/import
# (adjust realm name and output path as needed)
docker compose run --rm keycloak \
  sh -lc \
  '/opt/keycloak/bin/kc.sh export --realm YourRealmName --file /opt/keycloak/data/import/name-realm.json'
  • The container will exit after exporting.
  • Retrieve the file from the bindmounted folder on your host (e.g., ./kc-realm/).

Option B Admin Console (UI)

  • Realm Settings → Export (if available in your version) → select Export realm (including clients, roles, groups, etc.).

Tip: Prefer CLI export for repeatable automation and to keep the same structure as this script expects.


5) Duplicate the realm JSON for other environments

  1. Copy the exported file and rename it, e.g.:

    cp kc-realm/name-realm.json kc-realm/name-dev-realm.json
    cp kc-realm/name-realm.json kc-realm/name-tst-realm.json
    
  2. Open each copy and change only:

    • Toplevel realm (e.g., name-dev, name-tst).
    • Optionally displayName, displayNameHtml.
    • Do not edit internal id/containerId/… values manually.
  3. (Optional) If you previously had authenticationFlowBindingOverrides set, feel free to remove that block in these copies; you can rebind flows postimport via UI.


6) Run the reID script

From the project root (WSL/Ubuntu):

python3 scripts/rekey_realm.py kc-realm/name-realm.json kc-realm/name-dev-realm.json kc-realm/name-tst-realm.json

You should see nonzero ids_rewritten and refs_rewritten counts for each file.

Quick checks (optional):

# No duplicate IDs across files (should output nothing)
jq -r '.. | objects | .id? // empty' kc-realm/name*-realm.json | sort | uniq -d

# Sanity: list all component IDs length distribution
jq -r '.. | objects | .id? // empty' kc-realm/name*-realm.json | awk '{print length}' | sort -n | uniq -c

PowerShell equivalents (if you prep files on Windows):

Copy-Item .\kc-realm\name-realm.json .\kc-realm\name-dev-realm.json
Copy-Item .\kc-realm\name-realm.json .\kc-realm\name-tst-realm.json
python .\scripts\rekey_realm.py .\kc-realm\name*-realm.json

7) Import with Docker Compose

Example docker-compose.yml snippet:

services:
  keycloak:
    image: quay.io/keycloak/keycloak:26.0
    command: ["start-dev", "--import-realm"]
    environment:
      KEYCLOAK_ADMIN: admin
      KEYCLOAK_ADMIN_PASSWORD: admin
      # Optional: control import strategy (default is IGNORE_EXISTING)
      # KC_OPTS: "-Dkeycloak.import.strategy=OVERWRITE_EXISTING"
      # DB settings if you use Postgres, etc.
      # KC_DB: postgres
      # KC_DB_URL: jdbc:postgresql://db:5432/keycloak
      # KC_DB_USERNAME: keycloak
      # KC_DB_PASSWORD: secret
    volumes:
      - ./kc-realm:/opt/keycloak/data/import:ro
    ports:
      - "8080:8080"

Start and watch the logs:

docker compose up -d keycloak
docker compose logs -f keycloak

You should see lines similar to:

Importing from directory /opt/keycloak/data/import
KC-SERVICES0030: Full model import requested. Strategy: IGNORE_EXISTING

Tip (dev only): If an import failed previously and left partial data in the DB, wipe the Keycloak DB volume before retrying to avoid ghost duplicates.


8) Postimport: rebind flows (if needed)

If the script removed authenticationFlowBindingOverrides, reassign the flow bindings in Realm Settings → Authentication → Bindings (e.g., set Browser flow) using the UI. This avoids crossrealm references by raw ID during import.


9) Troubleshooting

  • Duplicate resource error on Component(ID): you still have a 22char Base64URL component ID collision → ensure you ran the script on all realm files that will be imported together.
  • Duplicate resource error on Role/ClientScope/ProtocolMapper: verify that these objects IDs (not their names) were rewritten; rerun the script and check counts.
  • Referenced client scope ... doesn't exist. Ignoring: harmless if you renamed realms but didnt import the referenced scope/file; confirm the scope names exist within the same realm file.
  • JSON deserialization mismatch (e.g., composites.client expected map, got array): indicates the JSON was modified incorrectly by hand or by a previous script → revert to backup .bak and use this script only.
  • Idempotence: running the script again on the same file will generate the same IDs (deterministic), not new ones.

10) Summary checklist

  1. Export realm (kc.sh export) → get name-realm.json.
  2. Duplicate → name-realm.json, name-two-realm.json.
  3. Edit each copy: change only realm (and display names if desired).
  4. Run scripts/rekey_realm.py on all the files.
  5. Start Keycloak with --import-realm and mount the folder to /opt/keycloak/data/import.
  6. Rebind auth flow overrides via UI if needed.

11) Notes

  • The script doesnt alter names/aliases; collisions across realms are fine (aliases are unique per realm).
  • Keep realm files under version control; the process is fully automatable in CI.
  • For production, replace start-dev and configure HTTPS, DB, caches, etc.