I develop well architected cloud-native platforms & build SRE teams

Hello, my name is Alaa. I studied Computer Science at the University of Greenwich and have over 12 years of experience in Site Reliability Engineering, Cloud Systems, and Distributed Systems. I have worked with startups across Europe, the United States, and Japan, as well as in various industries such as Telecom, Automotive, Energy Transmission, Gaming, AR & Generative AI. I offer hands-on consulting, training, and team building services.

POD Readiness Gates

08 July 2022 – 1 min read

Your application can inject extra feedback or signals into PodStatus, Pod readiness. To use this, set readinessGates in the Pod's spec to specify a list of additional conditions that the kubelet evaluates for Pod readiness.

SRE Readiness Kubernetes LifeCycle External

6 Important things you need to run Kubernetes in production

23 March 2022 – 1 min read

Setting up a Kubernetes stack according to best-practices requires expertise, and is necessary to set up a stable cluster that is future-proof. Simply running a manged cluster and deploying your application is not enough. Some additional things are needed to run a production-ready Kubernetes cluster. A good Kubernetes setup makes the life of developers a lot easier and gives them time to focus on delivering business value.

SRE Production Kubernetes GitOps IaC External

Scaling Kubernetes to Over 4k Nodes and 200k Pods

13 February 2022 – 1 min read

Unlike Apache Mesos, which can scale up to 10,000 nodes out of the box, scaling Kubernetes is challenging. Kubernetes’ scalability is not just limited to the number of nodes and pods, but several aspects like the number of resources created, the number of containers per pod, the total number of services, and the pod deployment throughput. This post describes some challenges we faced when scaling and how we solved them.

SRE Scaling Kubernetes etcd Workloads External

How to Work Asynchronously as a Remote-First SRE

06 December 2021 – 1 min read

The core practices for remote work at Netlify are prioritising asynchronous communication, being intentional about our remote community building, and encouraging colleagues to protect their work-life balance. Sustainable remote work starts with sustainable working hours, which includes making yourself "almost" unreachable with clear boundaries and protocols for out of hours contact

SRE Culture Remote Commuincation Teams External

Introducing Karpenter Kubernetes Cluster Autoscaler

01 December 2021 – 1 min read

Karpenter is an open-source, flexible, high-performance Kubernetes cluster autoscaler built with AWS. It helps improve your application availability and cluster efficiency by rapidly launching right-sized compute resources in response to changing application load. Karpenter also provides just-in-time compute resources to meet your application’s needs and will soon automatically optimize a cluster’s compute resource footprint to reduce costs and improve performance.

EKS Autoscaler Events Capacity Compute External