Traefik Wildcard Certificate using Azure DNS

dns challenge letsencrypt Azure DNS

Using Traefik as edge router(reverse proxy) to http sites and enabling a Lets Encrypt ACME v2 wildcard certificate on the docker Traefik container. Verify ourselves using DNS, specifically the dns-01 method, because DNS verification doesn’t interrupt your web server and it works even if your server is unreachable from the outside world. Our DNS provider is Azure DNS.

Azure Configuration

Pre-req

  • azure cli setup
  • Wildcard DNS entry *.my.domain

Get subscription id

$ az account list | jq '.[] | .id'
"masked..."

Create role

$ az role definition create --role-definition role.json 
  {
    "assignableScopes": [
      "/subscriptions/masked..."
    ],
    "description": "Can manage DNS TXT records only.",
    "id": "/subscriptions/masked.../providers/Microsoft.Authorization/roleDefinitions/masked...",
    "name": "masked...",
    "permissions": [
      {
        "actions": [
          "Microsoft.Network/dnsZones/TXT/*",
          "Microsoft.Network/dnsZones/read",
          "Microsoft.Authorization/*/read",
          "Microsoft.Insights/alertRules/*",
          "Microsoft.ResourceHealth/availabilityStatuses/read",
          "Microsoft.Resources/deployments/read",
          "Microsoft.Resources/subscriptions/resourceGroups/read"
        ],
        "dataActions": [],
        "notActions": [],
        "notDataActions": []
      }
    ],
    "roleName": "DNS TXT Contributor",
    "roleType": "CustomRole",
    "type": "Microsoft.Authorization/roleDefinitions"
  }

NOTE: If you screwed up and need to delete do like like this:
az role definition delete --name "DNS TXT Contributor"

Create json file with correct subscription and create role definition

$ cat role.json
  {
    "Name":"DNS TXT Contributor",
    "Id":"",
    "IsCustom":true,
    "Description":"Can manage DNS TXT records only.",
    "Actions":[
      "Microsoft.Network/dnsZones/TXT/*",
      "Microsoft.Network/dnsZones/read",
      "Microsoft.Authorization/*/read",
      "Microsoft.Insights/alertRules/*",
      "Microsoft.ResourceHealth/availabilityStatuses/read",
      "Microsoft.Resources/deployments/read",
      "Microsoft.Resources/subscriptions/resourceGroups/read"
    ],
    "NotActions":[

    ],
    "AssignableScopes":[
      "/subscriptions/masked..."
    ]
  }

  $ az role definition create --role-definition role.json 
  {
    "assignableScopes": [
      "/subscriptions/masked..."
    ],
    "description": "Can manage DNS TXT records only.",
    "id": "/subscriptions/masked.../providers/Microsoft.Authorization/roleDefinitions/masked...",
    "name": "masked...",
    "permissions": [
      {
        "actions": [
          "Microsoft.Network/dnsZones/TXT/*",
          "Microsoft.Network/dnsZones/read",
          "Microsoft.Authorization/*/read",
          "Microsoft.Insights/alertRules/*",
          "Microsoft.ResourceHealth/availabilityStatuses/read",
          "Microsoft.Resources/deployments/read",
          "Microsoft.Resources/subscriptions/resourceGroups/read"
        ],
        "dataActions": [],
        "notActions": [],
        "notDataActions": []
      }
    ],
    "roleName": "DNS TXT Contributor",
    "roleType": "CustomRole",
    "type": "Microsoft.Authorization/roleDefinitions"
  }

Checking DNS and resource group

$ az network dns zone list
  [
    {
      "etag": "masked...",
      "id": "/subscriptions/masked.../resourceGroups/sites/providers/Microsoft.Network/dnszones/iqonda.net",
      "location": "global",
      "maxNumberOfRecordSets": 10000,
      "name": "masked...",
      "nameServers": [
        "ns1-09.azure-dns.com.",
        "ns2-09.azure-dns.net.",
        "ns3-09.azure-dns.org.",
        "ns4-09.azure-dns.info."
      ],
      "numberOfRecordSets": 14,
      "registrationVirtualNetworks": null,
      "resolutionVirtualNetworks": null,
      "resourceGroup": "masked...",
      "tags": {},
      "type": "Microsoft.Network/dnszones",
      "zoneType": "Public"
    }
  ]

$ az network dns zone list --output table
  ZoneName    ResourceGroup    RecordSets    MaxRecordSets
  ----------  ---------------  ------------  ---------------
  masked...  masked...            14            10000

$ az group list --output table
  Name                                Location        Status
  ----------------------------------  --------------  ---------
  cloud-shell-storage-southcentralus  southcentralus  Succeeded
  masked...                    eastus          Succeeded
  masked...                    eastus          Succeeded
  masked...                    eastus          Succeeded

role assign

  $ az ad sp create-for-rbac --name "Acme2DnsValidator" --role "DNS TXT Contributor" --scopes "/subscriptions/masked.../resourceGroups/sites/providers/Microsoft.Network/dnszones/masked..."
  Changing "Acme2DnsValidator" to a valid URI of "http://Acme2DnsValidator", which is the required format used for service principal names
  Found an existing application instance of "masked...". We will patch it
  Creating a role assignment under the scope of "/subscriptions/masked.../resourceGroups/sites/providers/Microsoft.Network/dnszones/masked..."
  {
    "appId": "masked...",
    "displayName": "Acme2DnsValidator",
    "name": "http://Acme2DnsValidator",
    "password": "masked...",
    "tenant": "masked..."
  }

  $ az ad sp create-for-rbac --name "Acme2DnsValidator" --role "DNS TXT Contributor" --scopes "/subscriptions/masked.../resourceGroups/masked..."
  Changing "Acme2DnsValidator" to a valid URI of "http://Acme2DnsValidator", which is the required format used for service principal names
  Found an existing application instance of "masked...". We will patch it
  Creating a role assignment under the scope of "/subscriptions/masked.../resourceGroups/masked..."
  {
    "appId": "masked...",
    "displayName": "Acme2DnsValidator",
    "name": "http://Acme2DnsValidator",
    "password": "masked...",
    "tenant": "masked..."
  }

  $ az role assignment list --all | jq -r '.[] | [.principalName,.roleDefinitionName,.scope]'
  [
    "http://Acme2DnsValidator",
    "DNS TXT Contributor",
    "/subscriptions/masked.../resourceGroups/masked..."
  ]
  [
    "masked...",
    "Owner",
    "/subscriptions/masked.../resourcegroups/masked.../providers/Microsoft.Storage/storageAccounts/masked..."
  ]
  [
    "http://Acme2DnsValidator",
    "DNS TXT Contributor",
    "/subscriptions/masked.../resourceGroups/masked.../providers/Microsoft.Network/dnszones/masked..."
  ]

$ az ad sp list | jq -r '.[] | [.displayName,.appId]'
  The result is not complete. You can still use '--all' to get all of them with long latency expected, or provide a filter through command arguments
...

  [
    "AzureDnsFrontendApp",
    "masked..."
  ]

  [
    "Azure DNS",
    "masked..."
  ]

Traefik Configuration

reference

Azure Credentials in environment file

$ cat .env
    AZURE_CLIENT_ID=masked...
    AZURE_CLIENT_SECRET=masked...
    AZURE_SUBSCRIPTION_ID=masked...
    AZURE_TENANT_ID=masked...
    AZURE_RESOURCE_GROUP=masked...
    #AZURE_METADATA_ENDPOINT=

Traefik Files

    $ cat traefik.yml 
    ## STATIC CONFIGURATION
    log:
      level: INFO

    api:
      insecure: true
      dashboard: true

    entryPoints:
      web:
        address: ":80"
      websecure:
        address: ":443"

    providers:
      docker:
        endpoint: "unix:///var/run/docker.sock"
        exposedByDefault: false

    certificatesResolvers:
      lets-encr:
        acme:
          #caServer: https://acme-staging-v02.api.letsencrypt.org/directory
          storage: acme.json
          email: admin@my.doman
          dnsChallenge:
            provider: azure

        $ cat docker-compose.yml 
        version: "3.3"

        services:

            traefik:
              image: "traefik:v2.2"
              container_name: "traefik"
              restart: always
              env_file:
                - .env
              command:
                #- "--log.level=DEBUG"
                - "--api.insecure=true"
                - "--providers.docker=true"
                - "--providers.docker.exposedbydefault=false"
              labels:
                 ## DNS CHALLENGE
                 - "traefik.http.routers.traefik.tls.certresolver=lets-encr"
                 - "traefik.http.routers.traefik.tls.domains[0].main=*.iqonda.net"
                 - "traefik.http.routers.traefik.tls.domains[0].sans=iqonda.net"
                 ## HTTP REDIRECT
                 #- "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
                 #- "traefik.http.routers.redirect-https.rule=hostregexp({host:.+})"
                 #- "traefik.http.routers.redirect-https.entrypoints=web"
                 #- "traefik.http.routers.redirect-https.middlewares=redirect-to-https"
              ports:
                - "80:80"
                - "8080:8080" #Web UI
                - "443:443"
              volumes:
                - "/var/run/docker.sock:/var/run/docker.sock:ro"
                - "./traefik.yml:/traefik.yml:ro"
                - "./acme.json:/acme.json"
              networks:
                - external_network

            whoami:
              image: "containous/whoami"
              container_name: "whoami"
              restart: always
              labels:
                - "traefik.enable=true"
                - "traefik.http.routers.whoami.entrypoints=web"
                - "traefik.http.routers.whoami.rule=Host(whoami.iqonda.net)"
                #- "traefik.http.routers.whoami.tls.certresolver=lets-encr"
                #- "traefik.http.routers.whoami.tls=true"
              networks:
                - external_network

            db:
              image: mariadb
              container_name: "db"
              volumes:
                - db_data:/var/lib/mysql
              restart: always
              environment:
                MYSQL_ROOT_PASSWORD: somewordpress
                MYSQL_DATABASE: wordpress
                MYSQL_USER: wordpress
                MYSQL_PASSWORD: wordpress
              networks:
                - internal_network

            wpsites:
              depends_on:
                - db
              ports:
                - 8002:80
              image: wordpress:latest
              container_name: "wpsites"
              volumes:
                - /d01/html/wpsites.my.domain:/var/www/html
              restart: always
              environment:
                WORDPRESS_DB_HOST: db:3306
                WORDPRESS_DB_USER: wpsites
                WORDPRESS_DB_NAME: wpsites
              labels:
                 - "traefik.enable=true"
                 - "traefik.http.routers.wpsites.rule=Host(wpsites.my.domain)"
                 - "traefik.http.routers.wpsites.entrypoints=websecure"
                 - "traefik.http.routers.wpsites.tls.certresolver=lets-encr"
                 - "traefik.http.routers.wpsites.service=wpsites-svc"
                 - "traefik.http.services.wpsites-svc.loadbalancer.server.port=80"
              networks:
                - external_network
                - internal_network

        volumes:
              db_data: {}

        networks:
          external_network:
          internal_network:
            internal: true

WARNING: If you are not using the staging endpoint for LetsEncrypt strongly reconside doing that while working on this. You can get blocked for a week.

Start Containers

$ docker-compose up -d --build
whoami is up-to-date
Recreating traefik ... 
db is up-to-date
...
Recreating traefik ... done

Showing some log issues you may see

$ docker logs traefik -f
    ...
    time="2020-05-17T21:17:40Z" level=info msg="Testing certificate renew..." providerName=lets-encr.acme
    ...
    time="2020-05-17T21:17:51Z" level=error msg="Unable to obtain ACME certificate for domains
    ..."AADSTS7000215: Invalid client secret is provided.

$ docker logs traefik -f
    ...
    \"keyType\":\"RSA4096\",\"dnsChallenge\":{\"provider\":\"azure\"},\"ResolverName\":\"lets-encr\",\"store\":{},\"ChallengeStore\":{}}"
     acme: error presenting token: azure: dns.ZonesClient#Get: Invalid input:     autorest/validation: validation failed: parameter=resourceGroupName constraint=Pattern value=\"\\\"sites\\\"\" details: value 

$ docker logs traefik -f
    ...
    time="2020-05-17T22:23:38Z" level=info msg="Starting provider *acme.Provider {\"email\":\"admin@iqonda.com\",\"caServer\":\"https://acme-staging-v02.api.letsencrypt.org/   directory\",\"storage\":\"acme.json\",\"keyType\":\"RSA4096\",\"dnsChallenge\":{\"provider\":\"azure\"},\"ResolverName\":\"lets-encr\",\"store\":{},\"ChallengeStore\":{}}"
    time="2020-05-17T22:23:38Z" level=info msg="Testing certificate renew..." providerName=lets-encr.acme
    time="2020-05-17T22:23:38Z" level=info msg="Starting provider *traefik.Provider {}"
    time="2020-05-17T22:23:38Z" level=info msg="Starting provider *docker.Provider {\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"defaultRule\":\"Host({{  normalize .Name }})\",\"swarmModeRefreshSeconds\":15000000000}"
    time="2020-05-17T22:23:48Z" level=info msg=Register... providerName=lets-encr.acme

In a browser looking at cert this means working but still stage url: CN=Fake LE Intermediate X1

NOTE: In Azure DNS activity log i can see TXT record was created and deleted. Record will be something like this: _acme-challenge.my.domain

Browser still not showing lock. Test with https://www.whynopadlock.com and in my case was just a hardcoded image on the page making it insecure.

admin

Bio Info for Riaan