profile-pic
Vetted Talent

Fateh Khan

Vetted Talent

Fateh Khan Demonstrates proficiency in a wide array of technologies including Terraform, Git, Helm Charts, and Kubernetes, among others, in the role of DevOps and SRE Engineer Manager at V4You Technologies.  Showcasing exceptional performance as an Infrastructure SRE, delivering reliable 24x7 infrastructure and application operations, meeting business expectations and serving as a management escalation point during major issues. Expertise in automating infrastructure using Terraform, implementing CI/CD pipelines with Git, GitHub Action & Jenkins CICD pipelines and maintaining Helm charts for application deployment. Implemented Kubernetes Tanzu to optimize container orchestration, ensuring the security and availability of Microservices. Experienced in Infrastructure as Code (IaC) using Ansible, with hands-on experience in AWS, GCP, and Azure Cloud platforms, with a strong background in server hardening, networking, and troubleshooting. Skilled in disaster recovery planning, system administration, automation, and performance tuning in Unix environments. Designed & implemented disaster recovery plans, ensuring business continuity and data integrity in high-pressure environments. Led a diverse team of global application reliability, infrastructure, and operations engineers; delivering effective talent management practices; fostered a continuous learning culture.

  • Role

    Senior DevOps Engineer

  • Years of Experience

    7 years

Skillsets

  • DevOps - 6 Years
  • Kubernetes - 6 Years
  • Docker - 6 Years
  • AWS - 4 Years
  • CI/CD - 5 Years
  • GKE - 3 Years
  • Ansible
  • Terraform
  • Google Cloud
  • Monitoring
  • Kubernetes
  • AWS
  • Git
  • Jenkins
  • Linux
  • Kubernetes

Vetted For

14Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Kubernetes Support Engineer (Remote)AI Screening
  • 50%
    icon-arrow-down
  • Skills assessed :Ci/Cd Pipelines, Excellent problem-solving skills, Kubernetes architecture, Strong communication skills, Ansible, Azure Kubernetes Service, Grafana, Prometheus, Tanzu, Tanzu Kubernetes Grid, Terraform, Azure, Docker, Kubernetes
  • Score: 45/90

Professional Summary

7Years
  • Mar, 2024 - Present1 yr 7 months

    DevOps and SRE Engineer Manager

    V4YOU Technologies
  • Aug, 2021 - Feb, 2022 6 months

    Sr. DevOps Engineer & Release Engineer

    Intelly Labs Private Limited
  • Jan, 2020 - Jul, 20211 yr 6 months

    Server Administrator & DevOps Engineer

    IDS Logic Pvt. Ltd.
  • Aug, 2017 - Sep, 20181 yr 1 month

    IT Executive

    Ryddx Pharmetry (P) Ltd
  • Sep, 2018 - Jan, 20201 yr 4 months

    System Administrator

    Mindz Technology

Applications & Tools Known

  • icon-tool

    Terraform

  • icon-tool

    Git

  • icon-tool

    Helm Charts

  • icon-tool

    Kubernetes

  • icon-tool

    Ansible

  • icon-tool

    AWS

  • icon-tool

    GCP

  • icon-tool

    Azure

  • icon-tool

    GitHub Action

  • icon-tool

    Docker

  • icon-tool

    Docker-Compose

  • icon-tool

    Helm

  • icon-tool

    Prometheus

  • icon-tool

    Grafana

  • icon-tool

    Loki

  • icon-tool

    Zabbix

  • icon-tool

    Cloud Watch

  • icon-tool

    Vercel

  • icon-tool

    ArgoCD

  • icon-tool

    Nginx

  • icon-tool

    HA-proxy

  • icon-tool

    IIS

  • icon-tool

    SQL

  • icon-tool

    NoSQL

  • icon-tool

    SonarQube

  • icon-tool

    ElasticSearch

  • icon-tool

    Varnish

  • icon-tool

    VPN

  • icon-tool

    Proxmox

  • icon-tool

    VMware

  • icon-tool

    Hyper-V

  • icon-tool

    Vagrant

  • icon-tool

    VirtualBox

Work History

7Years

DevOps and SRE Engineer Manager

V4YOU Technologies
Mar, 2024 - Present1 yr 7 months
    • Designing, implementing & maintaining systems and infrastructure to ensure high reliability, availability, and performance.
    • Developing & deploying automation tools and frameworks to automate repetitive tasks and streamline operations.
    • Setting up and managing monitoring systems to track the health and performance of services; configuring alerts and escalations to quickly respond to incidents and minimize downtime.
    • Leading incident response and post-mortem analysis to identify root causes of outages and implement preventive measures.
    • Conducting capacity planning and performance tuning to ensure systems can handle current and future loads.
    • Scaling infrastructure as needed to accommodate growth.
    • Implementing infrastructure as code (IaC) practices using tools like Terraform, Ansible, or Chef to provision and manage infrastructure in a consistent and repeatable manner.
    • Collaborating with development teams to automate deployment processes and implement continuous integration and continuous deployment (CI/CD) pipelines.
    • Delivering day-to-day backup and recovery support activities including server availability & administrative processes.
    • Implementing security best practices to protect systems and data; performing security audits and vulnerability assessments.
    • Working closely with cross-functional teams including developers, system administrators, and product managers to understand requirements, prioritize tasks, and deliver solutions.
    • Building, managing, and improving the build infrastructure for global software development engineering teams including implementation of build scripts, continuous integration infrastructure and deployment tools.
    • Managing Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Jenkins.
    • Leading and mentoring a team of 9 members; guiding them to perform better.

Sr. DevOps Engineer & Release Engineer

Intelly Labs Private Limited
Aug, 2021 - Feb, 2022 6 months
    • Administered monitoring and alerting systems for smooth maintenance of engineering environments.
    • Ensured quick restoration of services during outage through strong troubleshooting skills.
    • Responded to alerts and perform root cause analysis; validated changes within specified maintenance windows.

Server Administrator & DevOps Engineer

IDS Logic Pvt. Ltd.
Jan, 2020 - Jul, 20211 yr 6 months
    • Created freestyle & pipeline CI Jenkins projects to deploy applications like Node-JS, PHP.
    • Configured LAMP, LEMP Server at Ubuntu & Centos Servers using Ansible.
    • Designed Bash scripts to take backup of Production and Staging servers.
    • Configured Ubuntu systems using Bash scripts; hosted Web Application on testing server and production server.
    • Used Docker to provide ready to build environment for dev team.
    • Administered Windows 10 & Server 2012, IIS, DHCP, DNS Active Directory; resolved error in the windows environment.
    • Deployed code of ASP.net at Staging and Production server.
    • Delivered server support to web hosting clients both (foreign clients & domestic clients); configured Linux Testing servers.
    • Worked with Apache, Nginx, Mysql, and UFW Firewall and Server panels like Plesk, Cpanel.
    • Managed company's internal SonicWALL Firewall; configured VPN, mapped ports, and applied network policies. Resolved
    • Network related issue like Duplicate IP Addresses, IP Address Exhaustion, DNS Problems, so on.

System Administrator

Mindz Technology
Sep, 2018 - Jan, 20201 yr 4 months
    • Worked on Git, Docker, Jenkins, GitLab and maintained Version Control; deployed Jenkins freestyle project.
    • Configured Windows 7/8/10/ Server 2012, Windows Server for SQL, DHCP, and AD.
    • Applied policy on Domain Users; managed active directory users; reset password and assigned Shared drive in users account.
    • Installed & configuring user environment variable Windows, Linux; Web servers like LAMP and XAMPP; Software.
    • Hosted Web Application on testing server and production server; obtained backup of SQL server, restoring, created databases.
    • Supported Fortigate 60e Firewall; assigned IP addresses to users by setting up policy on IP; created policy on firewalls.

IT Executive

Ryddx Pharmetry (P) Ltd
Aug, 2017 - Sep, 20181 yr 1 month
    • Maintained all systems up to date; installed and upgraded software; resolved computer and user issues.
    • Assembled & disassembled desktop and laptop; configured mail in Thunderbird, Outlook, Apple Mail; set up ESSL Biometric.

Achievements

  • Successfully delivered projects for Oyo Japan, Dunzo and Cars24.
  • Awarded as Best Employee for 3 months.
  • Implemented cost optimization strategies that led to annual savings exceeding $20,000 for two clients.
  • Leading companys three major project at once with best practices and approach.
  • Implemented comprehensive monitoring solution using Grafana, Prometheus, Data Dog for V4You Technologies clients.
  • Optimized CI/CD pipelines, reducing deployment time and increasing overall efficiency.
  • Introduced automated server provisioning, implemented robust DR plans and conducted security audits.

Major Projects

5Projects

OYO Japan Tabist

Mar, 2024 - Present1 yr 7 months
    Utilized Terraform for managing and provisioning AWS infrastructure resources. Created and maintained infrastructure components such as Lambda functions, DNS configurations, EKS clusters, RDS databases, AWS Secrets Manager, Parameter Store, ECR repositories, EC2 instances, and OIDC configurations. Implemented CI/CD pipelines using GitHub Actions for automating build, test, and deployment processes; GitOps practices using ArgoCD for Kubernetes and Git repositories.

Digiboxx

Jan, 2023 - Jan, 20241 yr
    • Managed data center operations to ensure continuous system availability.
    • Migrated the entire application from Bare Metal VM to VMware Tanzu.
    • Implemented Kubernetes, Helm, and Argo CD from scratch to streamline application management.
    • Set up Grafana, Prometheus, and Loki for comprehensive monitoring of the entire application.
    • Converted freestyle Jenkins jobs into Pipelines, enabling seamless deployment without reliance on the DevOps or SRE team.
    • Set up a new Mini-O cluster on bare metal infrastructure to expand current storage capabilities.
    • Implemented diverse backup solutions; deployed backup systems using software options, Proxmox Backup, and Acronis. Collaborated
    • With Developers and SecOps team and kept companys other products updated with all the possible practices

Cars24

ANNEXURE
Jan, 2022 - Jan, 20231 yr
    • Collaborated with Google cloud and migrated service from AWS to GCP.
    • Created pipelines for GKE & ECS and deployed EKS using Jenkins & TeamCity.
    • Optimized cost for GCP and AWS.
    • Set up services: RDS, Cloud Run, Cloud Function, Coud build, Vault, LB with templates, Cloud Scheduler, Pub/Sub, EnventArc.
    • Upgraded current cluster to latest Kubernetes version.
    • Created dashboard and monitored on Datadog.

Go Empyrean

Jan, 2022 - Dec, 2022 11 months
    • Managed companys on prem infra using Proxmox cluster.
    • Added node to the cluster Creating VMs, managing running application on provision VMs.
    • Configured MySQL master salve setup for the application; deployed PHP and Java applications using Jenkins.
    • Used Nginx and HA-proxy to act as load balancer and webserver.
    • Patched up and upgraded servers of Ubuntu from 14.04 to 22 version.

MSK (Memorial Sloan Kettering Cancer Center)

ANNEXURE
Jan, 2021 - Jan, 20221 yr
    • Used Kubernetes Tanzu for container orchestration and management.
    • Developed and deployed microservices using Helm charts and ArgoCD on VMware Tanzu infrastructure.
    • Scaled up and managed Kubernetes clusters in a production environment.
    • Maintained high availability, scalability, and security of microservices deployed on Kubernetes.
    • Analyzed application requirements and designed efficient deployment strategies.
    • Automated infrastructure provisioning and configuration using Infrastructure as Code (IaC) principles.
    • Monitored and optimized resource utilization within Kubernetes clusters.
    • Troubleshot issues related to containerized applications and infrastructure.
    • Managed OpenVPN server.

Education

  • Master of Computer Applications

    Vivekanand Global University (2024)
  • Bachelor of Computer Applications (BCA)

    JECRC University (2023)

AI-interview Questions & Answers

Hi. My name is Fadek Khan, and, uh, I'm working from 7 years as a DevOps engineer and SRE manager. Along with that, I do have worked with, uh, multiple organization where I was working as a senior server administrator and a marketing engineer as well. I do have experience in Kubernetes where I have worked with GKE and EKS very closely along with Kubernetes. I do have expertise in monitoring, deploying the application in containerized and non containerized way. Uh, I'm very proficient in scaling up the infrastructure using Terraform and other tools like Call Me in Python. Other than that, I do have experience in GitOps as well where I used to advise provide the administration on Git and manage, uh, node DevOps practices along with, uh, monitoring as well. I do have experience in databases where I have, uh, provided the administration on my SQL and, uh, on SQL and non SQL. Thank you. So Helm basically, Helm is a packet manager which, uh, help to, uh, manage your Kubernetes application where, uh, you can, uh, populate the configuration using by email, and you can install it on any environment you want. Just we have to change the, uh, input variable using helm command. And the best benefit is we do not have to work around with the manifest again and again, and helm will be there to take care of the and roll back if there's any issue with it. Other than that, we can also work with, uh, we can also we can we can also utilize Helm in GitOps practices using ROCD where every single component of Helm will be get managed by ROCD itself.

So if you want to, uh, attach a storage class to a stateless application and if you are using GKE or EKS, we do get, uh, the option to utilize services like, uh, the assistant volume in GCP and, uh, your EFS and EPS, uh, services in GK EKS itself where you can attach, uh, the volumes using volume claim, volume claim, and volumes volume storage as a disk to the port itself. The moment any port get die and it's can spin up, the moment any new port gets spin up, the remain data will the data will be remain in in the same process in this, and it will be get attached to the newer port newer port, which will be get available for the deployment itself. Other than that, it also cost it is also possible to attach, uh, port I mean, the, uh, persistent storage at a run time in the department itself. 1st, we have to claim, uh, the storage, and then we have to attach it as a PPC.

So, uh, we can use horizontal pod scaling to scale up the environment if the traffic if the defined threshold get cross. We can use metrics over there, and we have to define a manifest for the deployment which will work at at the level of selectors. Let's say if the deployment's having the selector label as application 1, we will be going to define HPA with the, uh, where we will be defining the API. The kind will be, uh, horizontal for scaling and the name and the selector and then the metrics. We can define the metrics as per CPU level and as per RAM level. Other than that, we can also define, uh, the capacity, how much we want. The the port should begin to scale up with the number of, uh, replica set. So anytime anything happen, let's say the threshold that cross the defined threshold that cross, it will, uh, it will scale up the deployment itself.

So we can use, uh, blue green deployment to deploy the application new deployment take place uh, every time when a new deployment take place in a grouping manner, we have to update the DNS over there to make it happen. Once the deployment is successfully done, what we can do is, uh, what we can do, we can deploy the application. 1st, we have to deploy the application, and, uh, once that application get deployed, we can we can we can we can we can update the DNS, uh, once once we tested everything, it's running fine.

So what are the strategies you would imply to ensure zero downtime to the admin transition? So zero downtime. Again, it's nothing but just a practice where we deploy the application and transfer the data and transfer the traffic to the newer version. So zero time, we we can we can what we can do, uh, we can create the same, uh, deployment set and same application deployment over the, uh, over the, uh, AKS side and point the DNS entries over there. Point the DNS entry is over there where, uh, uh, once the DNS is pointed, the application will be running from from the AKS itself. And we will we can we can keep the Tanzu application running till the time we verify that everything is fine or not.

So when we, uh, when we're setting up, uh, Kubernetes pipeline, we we have to be sure about that which application we are deploying for, whether it will be in a helm or manifest, whether there will be any GitOps operation if they are not. If the GitOps operation are involved, which application we will be going to use, whether it will be Spinach or Argo CD or Git, uh, GitHub Action itself or Jenkins. Uh, other than that, we can, uh, we can also, uh, we we we also have to look around the, uh, deployment replica set, uh, the storage set. And if the application is getting deployed, uh, if if the application is getting deployed and there is some, uh, API API, uh, decommission has happened on the Kubernetes upgrade side. We also need to make conditions that if the cluster version is this, then it going to install this version of API and the cluster version is this, then we're going to install this version of API. So, uh, while deploying the application, we also we we also has to make sure that the current, uh, stable version is running absolutely fine after doing smooth tests. And we also have to make sure that the end charts are properly running or not over there. Then after that, we can we can proceed for the deployment.

Consideration I'm not sure about this.

So Kubernetes rely on hard on time environment. Could be container for the prior environment 21. It was running on Docker, and then after that, it start, uh, replacing the Docker Docker mechanism from the cluster itself. Now, uh, they are running container dot d as a default and type it. And, uh, and then then and, uh, the deployment is getting managed by the deployment is getting managed by q API, uh, server proxy, basically, which is which send all the, uh, inputs and output to the, uh, to the API server. The scheduler is responsible for deploying the application on the side of node, and the API server is responsible for managing the and replacing the current deployments. An EDCD data is there to contain the name and keyword of every deployment that has taken place within the cluster itself.

So service mesh implementing. So it will give you more control on the service side where you can control the entire traffic flow and, uh, the the network, basically, where the request you want to send. It also give me the entire network diagram like, uh, Kaldi as a dashboard if we are using a SKU. And and then other than that, uh, it basically works with the service discovery. As long as the service discovery is working, the STU will be keep running, and the SEO will be sending the data on the port side only after when the service gets successfully initialized. Uh, so the best advantage of using link, uh, network mesh, uh, technology in Cuba discussed, sir, that allows you to fully control the network. And you can you can describe that if the request is coming from a particular resource, so you can block it or you can allow it for a particular services. These are the best practices and, uh, the best way and the features that SEO can provide for network mesh.