Vetted Talent

Bibek Rauniyar

Vetted Talent

As a Senior DevOps Engineer at Honest, I manage the production Kubernetes cluster that runs over 100 microservices, ensuring optimal performance through scaling and resource allocation. I also successfully optimized costs on Confluent Cloud, achieving a notable 48% reduction in total billing.

With a Bachelor of Technology in Mechanical Engineering from GLA University and multiple certifications in AWS, Python, and Microsoft DevOps, I have 6+ years of experience in pioneering technological advancements. I have proven expertise in cloud computing, CI/CD, monitoring and analytics, data governance, security, and cost optimization. I have designed and implemented HA infrastructure for mobile and web applications, and contributed to the organization's technological excellence through innovative solutions.

Role
Senior DevOps Engineer
Years of Experience
7.6 years
Professional Portfolio
View here

Skillsets

Grafana
Terraform
SOC 2
Serverless
Python
Prometheus
PCI-DSS
Multi-cloud architecture
Microservices Architecture
Istio service mesh
Kubernetes - 6 Years
Go
GitHub Actions
GCP
Bash
Azure
ArgoCD
Ansible
AWS - 7 Years

Vetted For

14Skills

Roles & Skills
Results
Details

Senior Kubernetes Support Engineer (Remote)AI Screening
61%

Skills assessed :Ci/Cd Pipelines, Excellent problem-solving skills, Kubernetes architecture, Strong communication skills, Ansible, Azure Kubernetes Service, Grafana, Prometheus, Tanzu, Tanzu Kubernetes Grid, Terraform, Azure, Docker, Kubernetes
Score: 55/90

Professional Summary

7.6Years

Feb, 2023 - Present2 yr 8 months
Senior DevOps Engineer
Honest Technologies
Jul, 2021 - Feb, 20231 yr 7 months
Site Reliability Engineer
Dkatalis Labs
Feb, 2021 - Jul, 2021 5 months
DevSecOps Consultant
FPL Technologies
Dec, 2017 - Feb, 20202 yr 2 months
Senior System Engineer
Infosys Ltd
Feb, 2020 - Feb, 20211 yr
DevOps Consultant
CloudCover Consultancy

Applications & Tools Known

GCP
AWS
Azure
Kubernetes
Terraform
Docker
GitHub Actions
ArgoCD
Jenkins
GitLab
Harness
Spinnaker
Ansible
Terragrunt
Kyverno
Falco
DefectDojo
Istio
IAM
Prometheus
Grafana
Loki
ELK Stack
Datadog
Dynatrace
Bash
Python
GoLang
Cloudflare
Promtail
Fortinet
Wazuh
Superset
Lambda
SIEM
GCP
AWS
GKE
EKS
AKS
Helm
Terraform
GitHub Actions
Prometheus
Loki
Logstash
Elasticsearch
Kibana
Looker Studio
Vault
IAM
Snyk
SonarQube
Trivy
GoLang
TypeScript
Confluent Kafka
RabbitMQ
Git
Bitbucket
PagerDuty
Linux Administration
Networking
SRE
Microservices
API Development

Work History

7.6Years

Senior DevOps Engineer

Honest Technologies

Feb, 2023 - Present2 yr 8 months

Designed and optimized Kubernetes platform managing 200+ microservices with strategic scaling and resource tuning. Automated CI/CD pipelines increasing deployment frequency. Enhanced security posture achieving PCI-DSS/SOC2 compliance audit pass rate.

Site Reliability Engineer

Dkatalis Labs

Jul, 2021 - Feb, 20231 yr 7 months

Led cross-functional team to design secure cloud infrastructure supporting 10M+ users. Extended Istio service mesh across the organization enabling Zero Trust Security.

DevSecOps Consultant

FPL Technologies

Feb, 2021 - Jul, 2021 5 months

Deployed enterprise SIEM using Amazon OpenSearch. Conducted PCI-DSS Level 1 compliance audits and orchestrated application containerization.

DevOps Consultant

CloudCover Consultancy

Feb, 2020 - Feb, 20211 yr

Executed multi-cloud migration of 22 microservices and developed CI/CD pipeline for payment gateway processing. Established disaster recovery strategies achieving 99.9% data recovery success.

Senior System Engineer

Infosys Ltd

Dec, 2017 - Feb, 20202 yr 2 months

Contributed to cloud migration of PostgreSQL database infrastructure. Enhanced database performance through optimization and indexing improvements.

Achievements

Led successful cloud migrations
Achieved a 48% reduction in cloud costs
Tenfold improvement in deployment frequency
Reduced Prometheus RAM usage by 80%
Zero Downtime Migration from Ali cloud to Google Cloud
Implemented centralized SIEM for a USA-based enterprise across 80+ AWS accounts
Awarded the Rising Star award for exceptional collaboration on projects with Infosys and MBRDI, recognizing outstanding performance and contributions.
Demonstrated superior problem-solving skills by effective query tuning and indexing leading to a significant database performance boost

Major Projects

3Projects

Mastering Cloudflare Ruleset Engine with Terraform

Published article with 2K+ views on Dev.to platform.

AI-powered job matching platform

Built platform using GenAI, vector search, and LLM technologies.

Reusable Terraform modules

Created modules enabling rapid deployment of AWS Glue pipelines.

Education

B.Tech: Mechanical Engineering
GLA University, Mathura (2017)

Certifications

Aws certified solutions architect associate saa-c02
Az-400: designing and implementing microsoft devops solutions

AI-interview Questions & Answers

Uh, could you help me understand more about your background by giving a brief introduction of yourself? Yeah. Sure. So, uh, I am a DevOps and cloud professional, uh, having more than a 6.5 years of experience working in the same field. I have worked for MNC and a lot of start ups. I help the start up to grow. I have, like, work on almost all the tools and technologies, cutting edge technologies, which are used in the most field and the cloud technology. So I have worked with almost all the big providers of cloud like AWS, GCP, and Azure. I'm also certified in AWS and Azure. Uh, regarding the cutting edge technologies, I have worked I have a lot of experience with Terraform as a iX. I have, uh, good experience in the core NetEase. I have also very good experience in the CHCD end to end pipeline design. Uh, apart from that, I'm also, uh, supporting as a team. Uh, I'm I I I can lead a project alone. And also, I cannot I can prove myself as a good team player because, uh, currently, I'm working in a team that is very versatile. Uh, I'm also, um, uh, working on the on call, like, calling the on call, maintaining the gate repository more than 4 50 repositories, uh, doing weekly on call rotations, uh, maintaining, designing, and supporting the code. So pretty much that's that's all about myself.

How do you go about it? How would you go about exposing a Kubernetes service to the Internet? So, yeah, to expose the service to the, uh, and, uh, to expose the service to the Internet, uh, I can use the ingress or any load balancer provided by any of the cloud providers. Also, I can host my, uh, uh, my also, I can host the self hosted ingress, or I can use, uh, the load balances provided by the all the clouds. So, basically, what we need to do, uh, we need to first think whether whether we're gonna do as a as a domain based routing or a path based routing. Uh, once we confirm and once, uh, we confirm about our approach, uh, we need to finalize then we need to finalize, uh, are we going to use the load balancer? Are we going to expose the the load balancer and control, or we can just, uh, use a proxy and then control. So, basically, we can use the NGINX ingress in between the applications in between the service and the Internet, and then, uh, route the traffic according to, uh, according to audio stress. So, uh, in simple word, we can use nginx ingress or a load balancer to expose the service to the Internet.

What are the main components of Tanju Kubernetes grid, and how did it I'm really sorry. I have never interacted with Tanju, so I have no idea on this. I would like to skip this question.

So coming to the life cycle of PARD, PARD can be either running or it can be a terminated status. It can be in a running status. It can be in a terminated status. It can be in a backlog status. So running is basically when the party is up and running, everything is good. We are serving a topic. Uh, terminated, it's basically when the, uh, parties totally got wiped off, uh, like, deleted. We have also called a back, uh, back off, uh, back loop crash loop. Sorry. A crash loop where port is not able to get the correct configuration or not able to point to the basically, a current configuration, what it is, uh, allocated for or a config map. And we can also get a part with evicted status. So it means, basically, when, uh, when the pod doesn't have, uh, uh, evicted status is basically when the port. Uh, how can I explain? Yeah. When the port is not getting enough, uh, resource, uh, on the node, then it will be evicted. So, basically, when there is a resource constraint on the node and there is not any node available to schedule the port, then every time it will be evicted. So, yeah, that is a life cycle of life cycle of the Kubernetes port, uh, port life cycle, basically.

In, uh, Kubernetes think Kubernetes like a c. It is a very deep Kubernetes is very fast, and it's a whole architecture where we deploy the life cycle or dependencies of the application. So think like this. If if a Kubernetes is a big, big, big chunk of room, So we are dividing it in a partition to manage it. So it's a a virtual virtual uh, we can define the namespace as a virtual clustering of the diff, uh, virtual clustering of the GK. So we will create, uh, different different namespace where we define, uh, the life cycle of a port, how it behaves, and how it works. So it's a, uh, you can basically call it some virtual clustering in GKE. So it will allow you to ease up the management of applications. It will allow us to control numerous factor. Uh, we can control numerous factor, numerous Kubernetes configuration at the namespace level. We can also define, uh, how you want to how you want to control entire life cycle of a port and the deployment into that namespace. So, yeah, this is the namespace, and it helps it make a life easy to manage the, uh, manage the applications over the, uh, over the g k by virtually virtually virtually clustering it.

While setting up a CICD pipelines for Kubernetes application, first, we need to figure out how we're gonna roll out the deployment. Like, what will be the controller for the deployment, uh, whether we are going to manage it via some of the third party applications like Argo City where it will do the job. Once that no. We need to use the package manager to package all the helm, uh, sorry. Package all the manifest of Kubernetes. Like, to deploy a, uh, to deploy a part, basically, we need all the config maps, secrets, whatever the secrets, code, database. A lot of things are there. Right? And to make it work, we need to package all the service template, all the manifest of a Kubernetes into a one package manager. So we probably have to use Helm to package all your Kubernetes manifest. And then, uh, then then you need to plan it out how you're gonna roll it. Like, what will be the controller for the, uh, deployment. Once this is done, uh, then, yeah, you can, uh, create a CICD, uh, basically, uh, from if you're coming from this scratch, then you can kill the CI by creating a email, then publishing the image, tagging the image correctly with the stages, whatever the stages we have. Make sure, uh, the the published image, uh, the push image that we are pushing to our registry that are able to put, uh, in the deployment, uh, file that are able to pull. So and then and then and then once the CI is good, uh, make sure we have a CD for the same where we will deploy the workload on the Kubernetes by using either by Argo CD or any third party controller or directly by using help. Uh, Yeah. Oh, uh, wide setting up. Uh, Uh, what can be the key component there? Basically, I think about this key component. What's the thing is? Uh, what key component is legitimate? What do you mean with this question? Like, am I on the right track? Uh, this does this question makes, uh, does this question seek another round another kind of answers? I'm not able to fully, uh, fully give you the proper answer because this question is little bit vast. So, uh, if you're talking about the manifest, like, what are the manifest that we need? Contain maps, secrets, uh, we need service, deployment. Right? Uh, rule, Prometheus rule, PDB, part reception budget, uh, replica set, all this stuff. And if you want to design the CICD pipeline for this, then you can write a GitHub workflow where we pull the images from the CI, uh, configure the credentials for the Argo CD or whatever Helm or Kubernetes, yeah, waiting up and deploy. If you want to use the Helm, then just use the Helm command, like up deploy Helm upgrade, Helm install, first install it, then upgrade it depending upon the revision version. Or if you want to go for the Argo CD, then just create a new repository where you can push the state of the file and just point that state to the, uh, uh, to the algo city. Basically, it's a gate of city. So it depends, like, how you wanna design the CI part and the CD part. Uh, I hope this answered your questions.

Uh, while setting the network policies, first of all, we need to think how the port gonna communicate to the another port. Uh, how we're gonna maintain the authentication if needed? How we're gonna authorize, like, particular part? Can one leak access to the particular workload or reports? So, uh, you can design like that. You you we can do, uh, network policies in a multiple level. So first, we need to think, like, which level we are targeting to set up the network policy, whether it's a global level or a namespace level. And then whether we are planning to use for the authentication or for the authorization, We need to check this factor. Like, what is the purpose of this network policies? And then, uh, we can check, uh, we can also check, like, what are the, uh, uh, what are the card operations or, uh, or what are the services or endpoint? What are the, uh, methods you want to give access to the particular part. So these are all the network, uh, policies. And coming to a global part, you can check, uh, consider the factor, uh, what what access should I allow in my Kubernetes cluster you want to do at the global level. So if you want to suppose if you want to block any traffic from the particular, uh, source, uh, source IP, then you can allow a network you can set up the network policies at a global level that, uh, from this IP, you can just exclude the exclude the you just block the traffic from the source IP. So yeah. That's it. Yeah. Uh, right now, I can think whatever I ask.

What are the benefits of using Helm charts in Kubernetes and how we do manage the dependencies? Correct. So help benefits of help is that it will allow you to package or bundle all the Kubernetes manifest in one single in one single file. Basically, one single template, we can call it as a file or a template, whatever. So it will allow you to bundle all the different different manifest into a one single bundle. It will allow you to deploy that one single bundle bundle using some help command. So you can install it in the, uh, you can install it. You can also make it item potent because of values file. You can give, uh, you can just create a generic chart and give different values depending upon the application requirement, how to manage their dependences. Basically, what kind of dependences you want to manage. Uh, it depends on that. Like, if you want to down suppose your application, uh, just need, uh, need it, uh, just, uh, your application need database they need a database or they need Nginx, then in the chat.chat.yml, uh, file configuration, you can define the differences. Like, this chart is dependent on this, and this chart is dependent on this. So if you define the chart dot and the dependencies of, uh, of that, uh, uh, of that chart, then it will download that chart and, uh, spin up. Like, when you install it or when you upgrade it, then it will install it and make sure all the dependencies are installed, uh, while installing the applications or while managing the application using the help. So there are a lot of benefits of, uh, using help to manage the applications. I can think of I can think of bundling it. It is very idempotent. Right? You can, uh, have a multiple version of a chat running. It is very, uh, easy for the rollback. You can rollback to any of the versions if you need. So, yeah, these are the perks of using help.

And then as a container or or case stressor system, Kubernetes relies on what underlying technologies to isolate and manage containers. So, yeah, this is quite good. So this, uh, if you go to the history, then this Kubernetes application first developed by the Google. Uh, they use at a bar bar code application they use for to they they use to manage their use in prime the containers. Uh, if you will talk about the architect, like, if we're not talking about architecture and if we want to go for the inside, like, what it inside use, then they are following the slave architecture. Kubernetes just follow the master slave architecture, where master plane is the control cleaner where there were a lot of components managing the managing the life cycle of that. Look. It's an inside the Kubernetes cluster. So there is a component called cellular. There is a node controller. There is a, uh, ETCD database. The, uh, the job of a cellular is to just, uh, cellular parts in a particular node. Okay? The the job of a node controller is to is to, uh, provide the node and and may maintain the communication between the nodes. Okay? The, uh, ETCB database is their database where it store all the cluster information. Like, what is the current state of the cluster, it is stored in a ETCB database. And from the cube CTL client, we just query it with the information, uh, from the database. So there is a client called cube CTL. There are a lot of flowing component, uh, in the g t to orchestrate your applications. There is also we can call up a lot of controllers. Like, it's a admission controller. There is a replication controller. There is a admission hook controller. There is a lot of single single component that is, like, combining as a unit and providing the functionality to orchestrate the system. And what I can think of, um, what I can think yeah. There is a there is a few components of the node instead of, uh, uh, talking about the master plan or a control plan. There is a few components on the node that will help us help us route the correct traffic from the node that like q proxy. It will maintain it will, uh, make sure the communication is happening between the notes and the ports. Right? So, yeah, there are a lot of, uh, moving components that is working as a unit to support this ark artist's presence system.

So in Kubernetes, there is call, uh, there is 1, uh, one Kubernetes resource called secrets. We can manage the secrets using that, uh, Kubernetes secrets, uh, secret resource around SIP. There is a Kubernetes object called secret. We can manage the secret. But it just store the date, uh, the secret, and only base 64 format. Uh, base 64 encoded. Like, if even unfortunately, if any attackers got the, uh, got ins got the access to our Kubernetes or just dig inside or just hack the Kubernetes cluster, then they will got the access to all the secrets. Right? And that is a simple base 64 encoded. Like, they can, uh, decode it right away by using the base 64 default command. So I don't, uh, think this is a better way to manage the secrets. To better manage the secret, we have to rely on the 3rd party, third party applications. Uh, recently recently, we are using a VoIP. So VoIP will provide a VoIP is a very good mechanism to handle the secrets. It will handle the static redincers or dynamic redincers. So we can use, uh, uh, we can leverage the functionality of VoIP to manage the secrets secrets. Uh, VoIP have a lot of more good functionalities, just like dynamic secret quotations. They will do the secret quotation. It will provide the functionality to not store the secrets in not expose the secrets on the GK level, but it will store in its own void, uh, in encrypted format wherever you want to store when whatever the back end is. And then you can just, uh, inject a sidecar container to get the value from the from the vault. That's how it works, uh, for the dynamic banking vault. We can call it a dynamic vault. So, uh, to manage the to manage the secrets efficiently, we have to rely on the third party applications. I am much expensive in the Hasikov board, but I I think there are a lot more services, uh, lot more, uh, managed services provided by big providers of our Google uh, sorry, cloud. We can leverage the functionality using a w we can use AWS Secret Manager also, and we can use, um, uh, in GCP, we can use, uh, there is also a services. I might be not remembering the name. But, yeah, definitely, we can we have to leave this, uh, the 3rd party uh, 3rd party application rely on the 3rd party application to perfectly manage the secrets inside the COVID disclosure.

How do you approach a performance testing for deployment in Kubernetes? How did you keep on capacity planning? Yeah. Uh, so this is a very good question. How do you approach performance testing? So to to do the performance testing of any of the applications to generate the load, we had to have to, like, also rely on the open source tool. Like, we can use testing to generate the load and to do the performance testing on the application level. Uh, there is a tool called k nine which will eventually help you test the application running inside the Kubernetes cluster. This approach, like testing performance testing will eventually give you a depth inside how you how you manage the resource inside the Kubernetes cluster. It is very important to, uh, know about the know about your environment, like, know about the environment, know about how the application is behaving. We have to know, like, how the application is behaving. Right? What are the accurate amount of, uh, memory or CPU that we need to allocate? What are the men accurate amount of resource that we need to allocate for a particular part? If we are overprovisioning, then we are losing. If we are over, uh, if we are downprovisioning, then also we are losing because, eventually, it will impact the performance of the application or even, uh, or if there is a memory class, then it will just kill your application. So I believe performance, uh, I believe canine canine tool is there, which we need, which will help to generate the load and do the load testing on the any of the applications. And this will predict and also help in proper planning of the resource that we're gonna that we're gonna allocate to the parts or any of the application that is running inside the Kubernetes. So, uh, I I think this is very important part to, uh, to do the performance testing to get to get know about applications running in the GT cluster, like, running in the Kubernetes cluster. And to set the appropriate amount of request, uh, amount of resource on the application is very, very, very much, uh, likely very much important because it will it will save the cost, uh, for you. It will, uh, not allow to over provision the resource. Uh, it will give you the high, like, uh, good, uh, good example. It will it will give you the very good, uh, good amount of debugging knowledge. Like, if there is any memory leak in the applications And if you have fine tuned the configuration, if you have done the, uh, performance testing, if you have already fine tuned the configuration, then you can basically know if it is the issue of a application where that is leaking, when there is a memory leak, when there is a CPU leak, or it's a bug in the code that we have to fix. So eventually, yeah, uh, there are a lot, uh, a lot of, uh, lot of, uh, what do you how do you approach both So it will yeah. Uh, if we do a proper performance testing, they'll, uh, then it will help us understand better about the

Bibek Rauniyar

Senior DevOps Engineer

7.6 years

View here

Skillsets

Vetted For

Professional Summary

Applications & Tools Known

Work History

Senior DevOps Engineer

Site Reliability Engineer

DevSecOps Consultant

DevOps Consultant

Senior System Engineer

Achievements

Major Projects

Mastering Cloudflare Ruleset Engine with Terraform

AI-powered job matching platform

Reusable Terraform modules

Education

B.Tech: Mechanical Engineering

Certifications

Aws certified solutions architect associate saa-c02

Az-400: designing and implementing microsoft devops solutions

AI-interview Questions & Answers