While we were talking about our bastion EC2 hosts at work, we decided to reprovision it in ECS Fargate. This makes spinning up a bastion very fast and reset changes that might have been made by engineers. Another reason is to give us access into ECS Fargate instances for troubleshooting; networking, credentials, permissions, etc.
We use linuxserver.io’s openssh-server image for this purpose, with custom init scripts which we clone via git in the entryPoint
in the task definition.
Bastion Features
These are some of the requirements for our bastion host.
- SSH tunnel
- CRON
- Access to MySQL/ElasticSearch
- Up to date user public ssh keys
- Limited access to AWS api
- Update DNS record for bastion host
- Update ECS task to shutdown bastion host after x minutes of inactivity
- Limited user acces with option to
su -
to sudoer user
SSH tunnel and CRON
To enable ssh tunnel and cron, simply set the environment variable in the task definition. Every module can be added with a delimeter of |
.
DOCKER_MODS: "linuxserver/mods:openssh-server-ssh-tunnel|linuxserver/mods:universal-cron"
Up to date public SSH keys
We have our public ssh keys in a github repo with each AWS Account as a sub directory. A cronjob that runs every 5 minutes to pull the repo and udpate the authorized_keys file.
Sudo User
By default, the container image has a user abc
, which then can be changed via the USER_NAME
and SUDO_ACCESS
environment variables. We will define the default login user to be ec2-user
and disable sudo access.
We will then have a custom init script to create an additional user with full sudo access. This user gets a password from our SSM parameter store.
Automatic Shutdown
Since we don’t want our bastion to run 24/7, we decided to check for inactivity and set the container count to 0. For this, we check the sytem for pty
and network ESTABLISHED
connections on ssh port
every minute. The timeout value is set in environment variable via task-definition.
Environment Variables
Used by the docker image. These are optional but good to have. We do want ssh-tunnel and cron on the server, so DOCKER_MODS
is enabled for sure.
DOCKER_MODS
:linuxserver/mods:openssh-server-ssh-tunnel|linuxserver/mods:universal-cron
PUBLIC_KEY
:ssh-rsa ***** cwong@laptop
PUBLIC_KEY_DIR
:/root/public_keys/keys/*****
SUDO_ACCESS
:false
USER_NAME
:ec2-user
Used by the scripts in custom-cont-init.d
. These are optional as well, but helps with the init scripts nicely.
OPENSSH_DNS_ZONE
:qa.your-company.com
OPENSSH_GIT_PUBLIC_KEY_REPO
:github.com/your-company/devops-bastion.git
OPENSSH_GIT_TOOLS_DIR
:/root/devops-tools/bastion
OPENSSH_IDLE_TIMEOUT
:1800
OPENSSH_PORT
:22
OPENSSH_ENABLE_SFTP
:true
OPENSSH_SUDO_USER
:sudoer
If these environment variables are set (recommend setting as secrets in task-definition), then the custom init scripts won’t do additional query.
OPENSSH_GITHUB_TOKEN
OPENSSH_SLACK_WEBHOOK
OPENSSH_SUDO_PASSWORD
These are company specific; We use bastion mainly for MySQL and ElasticSearch tunnel, so I set these as variables for easy access.
OPENSSH_ELASTICSEARCH_ENDPOINT
OPENSSH_RDS_RO
OPENSSH_RDS_RW
ECS Service
We can setup the aws_ecs_service
resource without load balaner. The other resources are pretty generic, mainly the task definition and the IAM policy require some attentions.
Task Definition
$ aws ecs describe-task-definition --task-definition qa-bastion-01 | jq -r '.taskDefinition.containerDefinitions[] | select(.image == "linuxserver/openssh-server:latest") | {image: .image, entryPoint: .entryPoint, command: .command, environment: .environment, linuxParameters: .linuxParameters, secrets: .secrets}'
{
"image": "linuxserver/openssh-server:latest",
"entryPoint": [
"sh",
"-c"
],
"command": [
"apk add --no-cache aws-cli git && git clone https://`aws ssm get-parameters --region us-west-2 --names /devops/github/PERSONAL_ACCESS_TOKEN --with-decryption --query 'Parameters[].Value' --output text`@github.com/your-company/devops-tools.git /root/devops-tools && cp -R /root/devops-tools/bastion/custom-cont-init.d/ /config/ && chmod 700 /config/custom-cont-init.d && echo Alpine > /etc/issue && sh -x /shared/lacework.sh && /init"
],
"environment": [
{
"name": "DOCKER_MODS",
"value": "linuxserver/mods:openssh-server-ssh-tunnel|linuxserver/mods:universal-cron"
},
...
],
"linuxParameters": {
"initProcessEnabled": false
}
]
}
IAM Policy
In addition to giving bastion the essential ECS policy access; including arn:aws:iam::aws:policy/ReadOnlyAccess
, we also grant the following.
resource "aws_iam_policy" "bastion_rw" {
name = "qa-us-west-2-bastion-rw"
path = "/"
policy = jsonencode(
{
Statement = [
{
Action = "ecs:UpdateService"
Effect = "Allow"
Resource = "arn:aws:ecs:us-west-2:*****:service/qa-default-01/qa-bastion-01"
},
{
Action = "route53:ChangeResourceRecordSets"
Effect = "Allow"
Resource = "arn:aws:route53:::hostedzone/*****"
},
]
}
)
...
}
Custom Init Scripts
Per linuxserver.io’s suggestion, I have all the start up scripts copied to /config/custom-cont-init.d/
via entryPoint
in the task-definition.
00-install-packages
#!/usr/bin/with-contenv bash
echo ">> installing essential packages ..."
apk update
apk add --no-cache \
aws-cli \
bind-tools \
busybox-extras \
busybox-suid \
figlet \
file \
git \
mariadb-client \
ncurses \
openssl \
p7zip \
perl \
rsync \
screen \
tcpdump \
tree \
util-linux \
vim
05-update-ssh-port
#!/usr/bin/with-contenv bash
echo ">> updating openssh-server port to ${OPENSSH_PORT:-2222} ..."
sed \
-i \
-e "s/^USER_NAME=.*/USER_NAME=root/" \
-e "s/2222$/${OPENSSH_PORT:-2222}/" \
/etc/services.d/openssh-server/run
10-restore-ssh-host-keys
#!/usr/bin/with-contenv bash
# files: /config/ssh_host_keys/ssh_host*
# password: /devops/bastion/SUDO_PASSWORD
# 7-zip encryption: 7za a -p -mx=9 -mhe -t7z qa.7z ssh_host*
echo ">> restoring ssh host keys ..."
aws_environment="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name" | split("-") | .[0]')"
ssm="/devops/bastion/SUDO_PASSWORD"
sudo_password="${OPENSSH_SUDO_PASSWORD:-$(aws ssm get-parameters --region $AWS_DEFAULT_REGION --output text --names $ssm --with-decryption --query 'Parameters[].Value')}"
ssh_host_keys_dir="/config/ssh_host_keys"
archive_file="${OPENSSH_GIT_TOOLS_DIR}/ssh-host-keys/${aws_environment}.7z"
if [[ ! -d $ssh_host_keys_dir ]]; then
mkdir -p $ssh_host_keys_dir
fi
if [[ -f $archive_file ]]; then
7za e -o${ssh_host_keys_dir} -p"${sudo_password}" -y $archive_file
fi
chown -R root $ssh_host_keys_dir
chmod 711 $ssh_host_keys_dir
chown 911 $ssh_host_keys_dir/ssh_host*
chmod 600 $ssh_host_keys_dir/ssh_host*
15-update-etc-issue
#!/usr/bin/with-contenv bash
echo ">> adding Alpine version to /etc/issue ..."
os_type="$(uname -o)"
os_version="$(grep alpinelinux.org /etc/apk/repositories | head -n 1 | awk -F'/' -v type=$os_type '{print toupper(substr($(NF-2),0,1))tolower(substr($(NF-2),2)), type, $(NF-1)}')"
grep -q "^Alpine$" /etc/issue && echo "$os_version" > /etc/issue
20-update-motd
#!/usr/bin/with-contenv bash
echo ">> updating /etc/motd ..."
timeout_in_min="$(echo "${OPENSSH_IDLE_TIMEOUT:-1800} / 60" | bc -q)"
aws_service="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name" | split("-") | .[1] + "." + .[0]')"
git clone https://github.com/xero/figlet-fonts.git /usr/share/figlet/figlet-fonts
printf "[34m" > /etc/motd
figlet -d /usr/share/figlet/figlet-fonts -f Bloody -w 200 "${aws_service}" >> /etc/motd
printf "[0m" >> /etc/motd
cat >> /etc/motd <<EOF
... is running on ECS Fargate, it WILL terminate after [32m${timeout_in_min}min[0m of inactivity!!!
MySQL -
- write: ${OPENSSH_RDS_RW:-undefined}
- read: ${OPENSSH_RDS_RO:-undefined}
ElasticSearch -
- ${OPENSSH_ELASTICSEARCH_ENDPOINT%*/}
EOF
25-update-shell-env
#!/usr/bin/with-contenv bash
echo ">> setting up SHELL prompt/environment ..."
aws_service="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name"')"
sed -i "/BASH_VERSION/s/h:/\u@${aws_service}:/" /etc/profile
cat >> /etc/profile <<EOF
# added by ${0##*/}
#
alias ls='ls -F --color=auto'
alias rm='rm -i'
export ECS_CONTAINER_METADATA_URI=$ECS_CONTAINER_METADATA_URI
export ECS_CONTAINER_METADATA_URI_V4=$ECS_CONTAINER_METADATA_URI_V4
EOF
30-enable-container-cred
#!/usr/bin/with-contenv bash
echo ">> setting up IAM credential ..."
echo "export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" >> /etc/profile
35-create-sudo-user
#!/usr/bin/with-contenv bash
echo ">> creating user ${OPENSSH_SUDO_USER:-sudoer} with sudo access ..."
ssm="/devops/bastion/SUDO_PASSWORD"
sudo_password="${OPENSSH_SUDO_PASSWORD:-$(aws ssm get-parameters --region $AWS_DEFAULT_REGION --output text --names $ssm --with-decryption --query 'Parameters[].Value')}"
useradd -m -g wheel -s /bin/bash "${OPENSSH_SUDO_USER:-sudoer}"
echo "${OPENSSH_SUDO_USER:-sudoer}:${sudo_password:-ChangeMeASAP}" | chpasswd
sed -i '/%wheel/s/^# //' /etc/sudoers
40-sync-bastion-tools
#!/usr/bin/with-contenv bash
echo ">> sync'ing bastion tools to /usr/local/bin ..."
chmod +x ${OPENSSH_GIT_TOOLS_DIR}/tools/*.sh
cp ${OPENSSH_GIT_TOOLS_DIR}/tools/* /usr/local/bin/
45-clone-bastion-keys
#!/usr/bin/with-contenv bash
echo ">> cloning public keys ..."
ssm="/devops/github/PERSONAL_ACCESS_TOKEN"
github_token="${OPENSSH_GITHUB_TOKEN:-$(aws ssm get-parameters --region $AWS_DEFAULT_REGION --output text --names $ssm --with-decryption --query 'Parameters[].Value')}"
aws_account_id="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.cluster" | split(":") | .[4]')"
git_repo="https://${github_token}@${OPENSSH_GIT_PUBLIC_KEY_REPO:-}"
public_keys_dir="/root/public_keys"
authorized_keys="/config/.ssh/authorized_keys"
if [[ ! -d "$public_keys_dir" ]]; then
git clone "$git_repo" "$public_keys_dir"
(crontab -l; echo "*/5 * * * * git -C $public_keys_dir pull > /dev/null && cat ${public_keys_dir}/keys/${aws_account_id}/*.pub > $authorized_keys") | crontab -
fi
cat ${public_keys_dir}/keys/${aws_account_id}/*.pub > $authorized_keys
50-setup-sftp-access
#!/usr/bin/with-contenv bash
echo ">> setup sftp access ..."
if [[ "${OPENSSH_ENABLE_SFTP:-true}" == "true" ]]; then
chmod go+rw /dev/null
fi
80-update-dns
#!/usr/bin/with-contenv bash
echo ">> updating DNS record ..."
my_ip="$(curl -s checkip.amazonaws.com)"
aws_service="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name"')"
aws_hosted_zone_id="$(aws route53 list-hosted-zones --region $AWS_DEFAULT_REGION --output text --query 'HostedZones[?Name==`'"${OPENSSH_DNS_ZONE:-}."'`][Id]')"
if [[ "${aws_hosted_zone_id:-}" ]]; then
aws route53 change-resource-record-sets \
--hosted-zone-id "${aws_hosted_zone_id##*/}" \
--change-batch '{
"Comment": "Updating DNS record to '"$my_ip"'",
"Changes": [ {
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "'"${aws_service}"'.'"${OPENSSH_DNS_ZONE:-}"'",
"Type": "A",
"TTL": 120,
"ResourceRecords": [ {
"Value": "'"$my_ip"'"
} ]
}
} ]
}' | \
jq -c
fi
85-start-server-idle-check
#!/usr/bin/with-contenv bash
echo ">> starting server idle check ..."
my_ip="$(curl -s checkip.amazonaws.com)"
ssm="/devops/bastion/SLACK_WEBHOOK"
slack_webhook="${OPENSSH_SLACK_WEBHOOK:-$(aws ssm get-parameters --region $AWS_DEFAULT_REGION --output text --names $ssm --with-decryption --query 'Parameters[].Value')}"
timeout_in_min="$(echo "${OPENSSH_IDLE_TIMEOUT:-1800} / 60" | bc -q)"
timestamp="/root/timestamp.$$"
aws_service="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name"')"
aws_cluster="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.cluster" | capture("/(?<cluster>[a-z0-9-]+)").cluster')"
if [[ "$timeout_in_min" -eq 0 ]]; then
exit 0
fi
if [[ ! -f $timestamp ]]; then
date '+%s' > $timestamp
fi
while sleep 1m; do
lastcheck="$(cat $timestamp)"
tty_conn="$(ps -eo user,tty,command | awk '$2 ~ /^pts/ {print $2}')"
network_conn="$(netstat -nt | awk -v port=":${OPENSSH_PORT:-2222}$" '$4 ~ port && $6 ~ /^ESTABLISHED$/ {print $4}')"
if [[ -z "${tty_conn:-}" && -z "${network_conn:-}" ]]; then
if [[ $(date +'%s') -ge $(echo "$lastcheck + ${OPENSSH_IDLE_TIMEOUT:-1800}" | bc -l) ]]; then
shutdown_message="shutting down instance due to ${timeout_in_min}min of inactivity ..."
slack_message=":skull: ${aws_service} (${my_ip}) - ${shutdown_message}"
echo ">> $shutdown_message"
curl \
-s \
-X POST \
--data-urlencode "payload={\"text\": \"${slack_message}\"}" \
"$slack_webhook"
aws ecs update-service \
--region "$AWS_DEFAULT_REGION" \
--cluster "$aws_cluster" \
--service "$aws_service" \
--desired-count 0 | \
jq -c
curl -s "${ECS_CONTAINER_METADATA_URI_V4:-}"
fi
else
date '+%s' > $timestamp
fi
done &
90-notify-slack
#!/usr/bin/with-contenv bash
echo ">> notifying slack channel ..."
my_ip="$(curl -s checkip.amazonaws.com)"
aws_service="$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Labels."com.amazonaws.ecs.container-name"')"
ssm="/devops/bastion/SLACK_WEBHOOK"
slack_webhook="${OPENSSH_SLACK_WEBHOOK:-$(aws ssm get-parameters --region $AWS_DEFAULT_REGION --output text --names $ssm --with-decryption --query 'Parameters[].Value')}"
slack_message=":rotating_light: ${aws_service} (${my_ip}) is up and ready: \`ssh -l ec2-user ${aws_service}.${OPENSSH_DNS_ZONE:-undefined}\`"
if [[ "$slack_webhook" != "false" ]]; then
curl \
-s \
-X POST \
--data-urlencode "payload={\"text\": \"${slack_message}\"}" \
"$slack_webhook"
fi