• Home
  • About
  • Résumé
  • RunLog
  • Posts
    • All Posts
    • All Tags

Terraform and Bastion on AWS ECS Fargate

20 Dec 2021

Reading time ~8 minutes

While we were talking about our bastion EC2 hosts at work, we decided to reprovision it in ECS Fargate. This makes spinning up a bastion very fast and reset changes that might have been made by engineers. Another reason is to give us access into ECS Fargate instances for troubleshooting; networking, credentials, permissions, etc.

We use linuxserver.io’s openssh-server image for this purpose, with custom init scripts which we clone via git in the entryPoint in the task definition.

  • Bastion Features
    • SSH tunnel and CRON
    • Up to date public SSH keys
    • Sudo User
    • Automatic Shutdown
  • Environment Variables
  • ECS Service
    • Task Definition
    • IAM Policy
  • Custom Init Scripts
    • 00-install-packages
    • 05-update-ssh-port
    • 10-restore-ssh-host-keys
    • 15-update-etc-issue
    • 20-update-motd
    • 25-update-shell-env
    • 30-enable-container-cred
    • 35-create-sudo-user
    • 40-sync-bastion-tools
    • 45-clone-bastion-keys
    • 50-setup-sftp-access
    • 80-update-dns
    • 85-start-server-idle-check
    • 90-notify-slack
  • References

Bastion Features

These are some of the requirements for our bastion host.

  • SSH tunnel
  • CRON
  • Access to MySQL/ElasticSearch
  • Up to date user public ssh keys
  • Limited access to AWS api
    • Update DNS record for bastion host
    • Update ECS task to shutdown bastion host after x minutes of inactivity
  • Limited user acces with option to su - to sudoer user

SSH tunnel and CRON

To enable ssh tunnel and cron, simply set the environment variable in the task definition. Every module can be added with a delimeter of |.

DOCKER_MODS: "linuxserver/mods:openssh-server-ssh-tunnel|linuxserver/mods:universal-cron"

Up to date public SSH keys

We have our public ssh keys in a github repo with each AWS Account as a sub directory. A cronjob that runs every 5 minutes to pull the repo and udpate the authorized_keys file.

Sudo User

By default, the container image has a user abc, which then can be changed via the USER_NAME and SUDO_ACCESS environment variables. We will define the default login user to be ec2-user and disable sudo access.

We will then have a custom init script to create an additional user with full sudo access. This user gets a password from our SSM parameter store.

Automatic Shutdown

Since we don’t want our bastion to run 24/7, we decided to check for inactivity and set the container count to 0. For this, we check the sytem for pty and network ESTABLISHED connections on ssh port every minute. The timeout value is set in environment variable via task-definition.

Environment Variables

Used by the docker image. These are optional but good to have. We do want ssh-tunnel and cron on the server, so DOCKER_MODS is enabled for sure.

  • DOCKER_MODS: linuxserver/mods:openssh-server-ssh-tunnel|linuxserver/mods:universal-cron
  • PUBLIC_KEY: ssh-rsa ***** cwong@laptop
  • PUBLIC_KEY_DIR: /root/public_keys/keys/*****
  • SUDO_ACCESS: false
  • USER_NAME: ec2-user

Used by the scripts in custom-cont-init.d. These are optional as well, but helps with the init scripts nicely.

  • OPENSSH_DNS_ZONE: qa.your-company.com
  • OPENSSH_GIT_PUBLIC_KEY_REPO: github.com/your-company/devops-bastion.git
  • OPENSSH_GIT_TOOLS_DIR: /root/devops-tools/bastion
  • OPENSSH_IDLE_TIMEOUT: 1800
  • OPENSSH_PORT: 22
  • OPENSSH_ENABLE_SFTP: true
  • OPENSSH_SUDO_USER: sudoer

If these environment variables are set (recommend setting as secrets in task-definition), then the custom init scripts won’t do additional query.

  • OPENSSH_GITHUB_TOKEN
  • OPENSSH_SLACK_WEBHOOK
  • OPENSSH_SUDO_PASSWORD

These are company specific; We use bastion mainly for MySQL and ElasticSearch tunnel, so I set these as variables for easy access.

  • OPENSSH_ELASTICSEARCH_ENDPOINT
  • OPENSSH_RDS_RO
  • OPENSSH_RDS_RW

ECS Service

We can setup the aws_ecs_service resource without load balaner. The other resources are pretty generic, mainly the task definition and the IAM policy require some attentions.

Task Definition

$ aws ecs describe-task-definition --task-definition qa-bastion-01 | jq -r '.taskDefinition.containerDefinitions[] | select(.image == "linuxserver/openssh-server:latest") | {image: .image, entryPoint: .entryPoint, command: .command, environment: .environment, linuxParameters: .linuxParameters, secrets: .secrets}'
{
  "image": "linuxserver/openssh-server:latest",
  "entryPoint": [
    "sh",
    "-c"
  ],
  "command": [
    "apk add --no-cache aws-cli git && git clone https://`aws ssm get-parameters --region us-west-2 --names /devops/github/PERSONAL_ACCESS_TOKEN --with-decryption --query 'Parameters[].Value' --output text`@github.com/your-company/devops-tools.git /root/devops-tools && cp -R /root/devops-tools/bastion/custom-cont-init.d/ /config/ && chmod 700 /config/custom-cont-init.d && echo Alpine > /etc/issue && sh -x /shared/lacework.sh && /init"
  ],
  "environment": [
    {
      "name": "DOCKER_MODS",
      "value": "linuxserver/mods:openssh-server-ssh-tunnel|linuxserver/mods:universal-cron"
    },
    ...
  ],
  "linuxParameters": {
    "initProcessEnabled": false
    }
  ]
}

IAM Policy

In addition to giving bastion the essential ECS policy access; including arn:aws:iam::aws:policy/ReadOnlyAccess, we also grant the following.

resource "aws_iam_policy" "bastion_rw" {
    name      = "qa-us-west-2-bastion-rw"
    path      = "/"
    policy    = jsonencode(
        {
            Statement = [
                {
                    Action   = "ecs:UpdateService"
                    Effect   = "Allow"
                    Resource = "arn:aws:ecs:us-west-2:*****:service/qa-default-01/qa-bastion-01"
                },
                {
                    Action   = "route53:ChangeResourceRecordSets"
                    Effect   = "Allow"
                    Resource = "arn:aws:route53:::hostedzone/*****"
                },
            ]
        }
    )
    ...
}

Custom Init Scripts

Per linuxserver.io’s suggestion, I have all the start up scripts copied to /config/custom-cont-init.d/ via entryPoint in the task-definition.

00-install-packages

#!/usr/bin/with-contenv bash

echo ">> installing essential packages ..."

apk update
apk add --no-cache \
    aws-cli \
    bind-tools \
    busybox-extras \
    busybox-suid \
    figlet \
    file \
    git \
    mariadb-client \
    ncurses \
    openssl \
    p7zip \
    perl \
    rsync \
    screen \
    tcpdump \
    tree \
    util-linux \
    vim

05-update-ssh-port

#!/usr/bin/with-contenv bash

echo ">> updating openssh-server port to ${OPENSSH_PORT:-2222} ..."

sed \
    -i \
    -e "s/^USER_NAME=.*/USER_NAME=root/" \
    -e "s/2222$/${OPENSSH_PORT:-2222}/" \
    /etc/services.d/openssh-server/run

10-restore-ssh-host-keys

#!/usr/bin/with-contenv bash

# files: /config/ssh_host_keys/ssh_host*
# password: /devops/bastion/SUDO_PASSWORD
# 7-zip encryption: 7za a -p -mx=9 -mhe -t7z qa.7z ssh_host*

echo ">> restoring ssh host keys ..."

aws_environment="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name" | split("-") | .[0]')"
ssm="/devops/bastion/SUDO_PASSWORD"
sudo_password="${OPENSSH_SUDO_PASSWORD:-$(aws ssm get-parameters --region $AWS_DEFAULT_REGION --output text --names $ssm --with-decryption --query 'Parameters[].Value')}"
ssh_host_keys_dir="/config/ssh_host_keys"
archive_file="${OPENSSH_GIT_TOOLS_DIR}/ssh-host-keys/${aws_environment}.7z"

if [[ ! -d $ssh_host_keys_dir ]]; then
    mkdir -p $ssh_host_keys_dir
fi

if [[ -f $archive_file ]]; then
    7za e -o${ssh_host_keys_dir} -p"${sudo_password}" -y $archive_file
fi

chown -R root $ssh_host_keys_dir
chmod 711 $ssh_host_keys_dir
chown 911 $ssh_host_keys_dir/ssh_host*
chmod 600 $ssh_host_keys_dir/ssh_host*

15-update-etc-issue

#!/usr/bin/with-contenv bash

echo ">> adding Alpine version to /etc/issue ..."

os_type="$(uname -o)"
os_version="$(grep alpinelinux.org /etc/apk/repositories | head -n 1 | awk -F'/' -v type=$os_type '{print toupper(substr($(NF-2),0,1))tolower(substr($(NF-2),2)), type, $(NF-1)}')"

grep -q "^Alpine$" /etc/issue && echo "$os_version" > /etc/issue

20-update-motd

#!/usr/bin/with-contenv bash

echo ">> updating /etc/motd ..."

timeout_in_min="$(echo "${OPENSSH_IDLE_TIMEOUT:-1800} / 60" | bc -q)"
aws_service="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name" | split("-") | .[1] + "." + .[0]')"

git clone https://github.com/xero/figlet-fonts.git /usr/share/figlet/figlet-fonts
printf "" > /etc/motd
figlet -d /usr/share/figlet/figlet-fonts -f Bloody -w 200 "${aws_service}" >> /etc/motd
printf "" >> /etc/motd

cat >> /etc/motd <<EOF
... is running on ECS Fargate, it WILL terminate after ${timeout_in_min}min of inactivity!!!

MySQL -
    - write: ${OPENSSH_RDS_RW:-undefined}
    - read: ${OPENSSH_RDS_RO:-undefined}
ElasticSearch -
    - ${OPENSSH_ELASTICSEARCH_ENDPOINT%*/}

EOF

25-update-shell-env

#!/usr/bin/with-contenv bash

echo ">> setting up SHELL prompt/environment ..."

aws_service="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name"')"

sed -i "/BASH_VERSION/s/h:/\u@${aws_service}:/" /etc/profile

cat >> /etc/profile <<EOF
# added by ${0##*/}
#
alias ls='ls -F --color=auto'
alias rm='rm -i'
export ECS_CONTAINER_METADATA_URI=$ECS_CONTAINER_METADATA_URI
export ECS_CONTAINER_METADATA_URI_V4=$ECS_CONTAINER_METADATA_URI_V4
EOF

30-enable-container-cred

#!/usr/bin/with-contenv bash

echo ">> setting up IAM credential ..."

echo "export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" >> /etc/profile

35-create-sudo-user

#!/usr/bin/with-contenv bash

echo ">> creating user ${OPENSSH_SUDO_USER:-sudoer} with sudo access ..."

ssm="/devops/bastion/SUDO_PASSWORD"
sudo_password="${OPENSSH_SUDO_PASSWORD:-$(aws ssm get-parameters --region $AWS_DEFAULT_REGION --output text --names $ssm --with-decryption --query 'Parameters[].Value')}"

useradd -m -g wheel -s /bin/bash "${OPENSSH_SUDO_USER:-sudoer}"
echo "${OPENSSH_SUDO_USER:-sudoer}:${sudo_password:-ChangeMeASAP}" | chpasswd
sed -i '/%wheel/s/^# //' /etc/sudoers

40-sync-bastion-tools

#!/usr/bin/with-contenv bash

echo ">> sync'ing bastion tools to /usr/local/bin ..."

chmod +x ${OPENSSH_GIT_TOOLS_DIR}/tools/*.sh
cp ${OPENSSH_GIT_TOOLS_DIR}/tools/* /usr/local/bin/

45-clone-bastion-keys

#!/usr/bin/with-contenv bash

echo ">> cloning public keys ..."

ssm="/devops/github/PERSONAL_ACCESS_TOKEN"
github_token="${OPENSSH_GITHUB_TOKEN:-$(aws ssm get-parameters --region $AWS_DEFAULT_REGION --output text --names $ssm --with-decryption --query 'Parameters[].Value')}"
aws_account_id="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.cluster" | split(":") | .[4]')"
git_repo="https://${github_token}@${OPENSSH_GIT_PUBLIC_KEY_REPO:-}"
public_keys_dir="/root/public_keys"
authorized_keys="/config/.ssh/authorized_keys"

if [[ ! -d "$public_keys_dir" ]]; then
    git clone "$git_repo" "$public_keys_dir"
    (crontab -l; echo "*/5 * * * * git -C $public_keys_dir pull > /dev/null && cat ${public_keys_dir}/keys/${aws_account_id}/*.pub > $authorized_keys") | crontab -
fi
cat ${public_keys_dir}/keys/${aws_account_id}/*.pub > $authorized_keys

50-setup-sftp-access

#!/usr/bin/with-contenv bash

echo ">> setup sftp access ..."

if [[ "${OPENSSH_ENABLE_SFTP:-true}" == "true" ]]; then
    chmod go+rw /dev/null
fi

80-update-dns

#!/usr/bin/with-contenv bash

echo ">> updating DNS record ..."

my_ip="$(curl -s checkip.amazonaws.com)"
aws_service="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name"')"
aws_hosted_zone_id="$(aws route53 list-hosted-zones --region $AWS_DEFAULT_REGION --output text --query 'HostedZones[?Name==`'"${OPENSSH_DNS_ZONE:-}."'`][Id]')"

if [[ "${aws_hosted_zone_id:-}" ]]; then
    aws route53 change-resource-record-sets \
        --hosted-zone-id "${aws_hosted_zone_id##*/}" \
        --change-batch '{
            "Comment": "Updating DNS record to '"$my_ip"'",
            "Changes": [ {
                "Action": "UPSERT",
                "ResourceRecordSet": {
                    "Name": "'"${aws_service}"'.'"${OPENSSH_DNS_ZONE:-}"'",
                    "Type": "A",
                    "TTL": 120,
                    "ResourceRecords": [ {
                        "Value": "'"$my_ip"'"
                    } ]
                }
            } ]
        }' | \
        jq -c
fi

85-start-server-idle-check

#!/usr/bin/with-contenv bash

echo ">> starting server idle check ..."

my_ip="$(curl -s checkip.amazonaws.com)"
ssm="/devops/bastion/SLACK_WEBHOOK"
slack_webhook="${OPENSSH_SLACK_WEBHOOK:-$(aws ssm get-parameters --region $AWS_DEFAULT_REGION --output text --names $ssm --with-decryption --query 'Parameters[].Value')}"
timeout_in_min="$(echo "${OPENSSH_IDLE_TIMEOUT:-1800} / 60" | bc -q)"
timestamp="/root/timestamp.$$"
aws_service="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name"')"
aws_cluster="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.cluster" | capture("/(?<cluster>[a-z0-9-]+)").cluster')"

if [[ "$timeout_in_min" -eq 0 ]]; then
    exit 0
fi

if [[ ! -f $timestamp ]]; then
    date '+%s' > $timestamp
fi

while sleep 1m; do
    lastcheck="$(cat $timestamp)"
    tty_conn="$(ps -eo user,tty,command | awk '$2 ~ /^pts/ {print $2}')"
    network_conn="$(netstat -nt | awk -v port=":${OPENSSH_PORT:-2222}$" '$4 ~ port && $6 ~ /^ESTABLISHED$/ {print $4}')"
    if [[ -z "${tty_conn:-}" && -z "${network_conn:-}" ]]; then
        if [[ $(date +'%s') -ge $(echo "$lastcheck + ${OPENSSH_IDLE_TIMEOUT:-1800}" | bc -l) ]]; then
            shutdown_message="shutting down instance due to ${timeout_in_min}min of inactivity ..."
            slack_message=":skull: ${aws_service} (${my_ip}) - ${shutdown_message}"
            echo ">> $shutdown_message"
            curl \
                -s \
                -X POST \
                --data-urlencode "payload={\"text\": \"${slack_message}\"}" \
                "$slack_webhook"
            aws ecs update-service \
                --region "$AWS_DEFAULT_REGION" \
                --cluster "$aws_cluster" \
                --service "$aws_service" \
                --desired-count 0 | \
                jq -c
            curl -s "${ECS_CONTAINER_METADATA_URI_V4:-}"
        fi
    else
        date '+%s' > $timestamp
    fi
done &

90-notify-slack

#!/usr/bin/with-contenv bash

echo ">> notifying slack channel ..."

my_ip="$(curl -s checkip.amazonaws.com)"
aws_service="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name"')"
ssm="/devops/bastion/SLACK_WEBHOOK"
slack_webhook="${OPENSSH_SLACK_WEBHOOK:-$(aws ssm get-parameters --region $AWS_DEFAULT_REGION --output text --names $ssm --with-decryption --query 'Parameters[].Value')}"
slack_message=":rotating_light: ${aws_service} (${my_ip}) is up and ready: \`ssh -l ec2-user ${aws_service}.${OPENSSH_DNS_ZONE:-undefined}\`"

if [[ "$slack_webhook" != "false" ]]; then
    curl \
        -s \
        -X POST \
        --data-urlencode "payload={\"text\": \"${slack_message}\"}" \
        "$slack_webhook"
fi

References

  • linuxserver/docker-openssh-server


technologydocdevopsawsterraformbastionecsfargate Share Tweet +1