
Sr AI Engineer
LogilityTechnical Lead ML
EncoraSenior ML Engineer
AcquiaSoftware Developer
Tech MahindraASMTS & Team Member
TibcoSenior Consultant
Capgemini
Airflow

MLflow

Kubernetes

Snowflake

GitHub Actions

SonarQube

Checkmarx

Git

Confluence

Rally
.png)
Docker

ArgoCD

Helm Charts
.png)
Flask

Django
.png)
FastAPI

Elasticsearch

Airflow

MLflow

AWS
Azure

ArgoCD

Checkmarx

Rally
.png)
Datadog
Sure. So I am Sashank from Delwal. I have 11 years of experience in this IT industry. I have 8 years of experience in Python and, uh, 5 years of experience in machine learning and machine learning operations. Okay? The model operations and machine operations. I work on classical ML algorithms, deep learning, and. Okay? And now I'm, uh, focusing more on JNI, MLMs from engineering, line chain, AI, all these things. Okay. So this was in short about me. Thank you.
How can you optimize resource allocation in Kubernetes cluster running heavy Python based machine learning workloads without overprovisioning. So what we can do is we can use the right sizing resource request and limits. Okay. We can use appropriate, uh, secure request and appropriate secure limit and also number of nodes, refresh and number of nodes limited. Okay. We can second thing is to use more affinity on tense variations. So use node affinity to schedule ML workloads on nodes with specific characteristics. Example, modes with GPU or high memory notes. Yeah. What is 10 tolerance? 10 is like 10 10 tolerations. Use 10s and tolerations to control which parts can be, uh, scheduled on certain notes, helping to isolate the unmanned workloads from other less critical workloads. Then you can use auto scaling, which is HPA, automatically scale the number of hard replicas based on the CPU or memory usage. Okay. So this ensures your application can handle, uh, uh, what we say, varying loads without the manual interventions. Okay. Then we can use the cluster autoscaler to automatically adjust the size of the Kubernetes cluster based on the resource request. This ensures that a cluster can scale up to accommodate the increased workloads and scale down to save the cost when demand decrease. Then we have a resource quotas and. Okay. So what we can do is we can set resource quotas at the namespace level to control the aggregate resource consumption of all the ports within a namespace. This has driven resource starvation and ensure fair resource distribution. Then we have efficient resource utilization. Okay. So, I mean, we can use the spot instances for noncritical or batch ML workloads. Like, we can use GPU. Okay. Uh, you can jobs are efficiently utilizing the GPU resource. Okay. Yeah. That's pretty much apart from this, we can have a monitoring and logging so that we can continuously, uh, monitor and limit alerts if we saw any, uh, hiccups. Okay.
Propose a logging strategy for a Python machine learning application. Kubernetes that balance between its storage consideration. That's actually a very good question. Okay. So first thing is about the log levels. So we have different log levels like debug, info, warning, error, critical. Okay. So we have to define these log levels and use it wisely. Okay. Then we should also use the login configuration. Okay. So to use, uh, we can use Python's inbuilt logging model to configure the log levels, formats, and handlers. Okay. Then, uh, the structured logging. So we can use a structured logging to make the logs more, uh, readable and easier to pass. Okay. So the libraries like Python JSON logger. Uh, yes. It can be used to form another log in JSON. Right. Then, uh, next thing is you should use the centralized logging solution. Okay. And we to store our logs. Okay. What we can do is we can, uh, choose some space. You can create a PVC. Okay. Uh, what will happen is even if your cluster scales up screen down your notes and scales down, you know, those logs will persist. Other thing you do what you can do is you can, uh, dump these logs into some monitoring tool like Datadog. Okay. We don't have to even have a ELK or have a PhD for this. Uh, all logs will work better. Okay. And, uh, yeah. That's it.
Let's take the steps to containerize a Python based machine learning inference service using Docker. Okay. So, what we can do is, first, we have you should have 3 things. I mean, 4 things we should have. You should have your source code folder, Your test suite unit test suite folder. Okay. And then your source code will have your. It will have your model file. Okay. Then, uh, we will have source code. We will have a test suite. Then we should have a requirement of TXT file, and we should have a Docker file. Okay. Why we should have a test file is we should have a test file because before every Docker, you should actually try to run the unit test. You can check if, uh, if there's a code will come up. Okay. This will prevent necessary day cycle. I mean, this will reduce the day cycles. Okay. Then what we can do is we can, uh, create a Docker file. In Docker file, uh, we can start with some, uh, base, uh, file like a Python 3.10 c version or depending on the requirement that we have. Then we can have a work history in that. We can copy our all the code there. Then we can keep install, uh, form requirements with hyphen. Okay. And whatever endpoint that we want to explore expose for our app. Okay. That we will expose. Okay. If you want to have some, uh, environment keys there that we can set. Okay. Uh, after that, if we want to have, uh, whatever command we want to run to run the Docker, uh, application. So at the end, we'll add that command in the CMD, the brackets, and each word and the quotes. Okay. Then we will do docker build. Okay. We will build the docker image docker build, hyphen t, uh, and then whatever. Repository name that we want to give or building we want to give. Okay. Then we will we will run the containers from that build, okay, locally, And we will test the inference service locally. Once this test is also done, first one's unit test case. Second was this buffer test. Okay. Now everything is working here. Then we can push the docker image to its registry wherever you want to register. Okay. And then, uh, then we can have, uh, Helm charts. You can deploy those Helm chart. In that Helm chart, we we we will have to mention the registry URL and the tag of the image. Yeah. This will be the steps.
What approach will you take to troubleshoot performance bottlenecks in a Python based machine learning API running on Kubernetes? Again, that's a very, very, uh, interesting question. One is a very good experience to answer this question. I can let me try on this. So first thing is, uh, we have to collect the metrics and know the bottlenecks. Okay. We cannot directly go and fix the automate. Because we we have to find the bottleneck. So for that purpose, we will set the monitoring. Okay. Uh, we can use the monitoring with Grafana or can have it using the Datadog. Okay. Then, uh, you can also choose to use metrics server. Okay. So that we will have stats around the resource usage metrics. Okay. Uh, then what should we monitor? We should monitor the port metric, like CPU and memory usage. Okay. Uh, number of restarts and, uh, resource request limits. Then comes mode metrics. Overall, the source usage across our cluster. Okay. Then comes, uh, this, uh, what we say, the, uh, custom metrics. So m n model inference time, uh, then number of, uh, content request. Okay. And request time number of conference request. Yeah. Yeah. For perform, this there should be, uh, section then. Uh, we should analyze the resource utilization, like resource, resource for for the this is for mode. Okay. And, uh, we should investigate the log as well. So we should handle the log in to Datadog or or we should have some ELK set up where we can just aggregate and analyze our logs. Okay. Uh, try to find some pattern out of the logs. Uh, try to find a pattern out of the metrics. Okay. Uh, create a story out of it. Okay. Uh, one should one thing we should also do is profiling of the application. Okay. Uh, once all this is set is there, then we should go and run the load testing. And with the load testing, if we found any issue, we should try to, uh, recreate it. And then if we are able to recreate it, then find the bottlenecks around it using all of these things that I have described, uh, earlier. Okay. And based on that, we'll optimize the code and the configuration, whichever is required. Uh, we will also implement the resources. We'll if we are underutilizing it, we will do it accordingly. If we are overutilizing it, and, uh, we will work on it. We also configure network and storage. Uh, you say
Outline the process for converting stateless machine learning APIs in Python to stateful services in Kubernetes for complex processing needs. Okay. So first, we'll have to, uh, define the state requirements. Okay. Determine what state information is to be maintained across the request. So for example, user sessions or the intermediate features between the, uh, APIs or or the cache or models. Okay. Then, uh, we'll have to, uh, do the state management. So, uh, we will have to take the decision on how and where the state will be stored. Are we going to store the state in database or in locally or in m two? Okay. Accordingly, what we can do is we can modify the API to handle the state. Okay. So we'll integrate a state storage mechanism, like, as I said, ready, send through MongoDB going into the API to store and retrieve this state information. Okay. Then, accordingly, we have to upgrade the apps, so we have to update a Docker configuration to if you add a menu dependencies or anything, um, for that purpose. Okay. Then, uh, we have we'll have to implement a stateful logic in the application as well. Okay. And then we'll have to deploy this, uh, stateful services into the Kubernetes. Okay. I'm just going fast because, uh, in my last question I was gonna be submitted. So, uh, that's pretty much thing.
Please identify the problem with doctor's files needed that could potentially break bill when leveraging the cashiers. Okay. The first issue are I can tell you the order three instructions. So we are first doing, uh, initializing from the base image. I don't see. But when we are doing the run click install. So, uh, it it cannot be the first thing. Okay. So what we'll have to do is first initialize it, then we have to create a work directory app. Once work directory app is there, then we can copy, uh, the requirements and other things to app. Okay. Uh, I don't think here we should use the add, but we should actually use copy command. But these are the issues.
So for this, we'll have to, like, ensure that to the enterprise properly set up the necessary resources and the tools. Okay. Uh, we have to define the testing mechanism for this. So first thing is we'll have to also define what to cache it, what part to cache. Okay. Then we will choose the caching. Right? We can use or we can use m cache. So if we have data structure, if we just store the memory data structure, then we should use the identity. It is widely used for. But if you want to use the distributed memory, uh, object caching, then we will go and use the memcache. Um, yes. Because it is Kubernetes, and it is kind of this should have a thing that we should go I can go with Mercaser, but For airflow things, like, we use Redis and we can deploy the airflow on the. Uh, it depends on actually the use case. You cannot say a blanket if you are using. So you should use.
Design a fault tolerant and high availability solution for Kubernetes cluster that serves Python based machine learning models for critical real time application. Okay. So it's Kubernetes cluster and Python based machine learning model. So, uh, for this purpose, first, we'll have to set up the global, uh, tremendous cluster. Okay. We we should use, actually, the multi zone cluster. So we will deploy the plus right of the multiple availability zones is this, okay, to ensure you see and high availability. Okay. This way, if one AC goes down, okay, the others can, uh, continue to serve. Okay. Then we should use the, uh, the cluster or auto scaler. Okay. So we'll use this test auto scalar to automatically adjust the size of the tomato cluster based on resource usage and workload demands. Okay. Uh, then, uh, second, deployment of service configuration. So we should deploy with replicas. Like, we should deploy with multiple of ML model They're serving pause to ensure the redundancy. Then you should use the HPA, And, uh, we should also get a service configuration. Uh, like, we should use more service, which will the traffic across the replicas. Okay. Next thing is storage and data management. Okay. So we should like, we can use PVC process on storage to manage storage for stateful components like database or artifacts. Uh, you can use distributed storage solutions, uh, like, uh, you can use Amazon EFS or Google Cloud file storage. Okay. Or Azure files for high availability. Yeah. The next will become monitoring and logging. Okay. We should also think about a disaster recovery. We should create the backups, uh, of the critical data regularly. Okay. So this should also include the more artifacts, configuration files, and databases. Okay. There should be a front of setup which can, uh, take care of this property, uh, uh, on the regular basis. Okay. Then the load balancing and, uh, traffic management, we we should also do, like, we should use the ingress controller. Okay. Like, uh, NGINX or, uh, what what is the traffic to manage the external access to your services and provide the load balancing. Okay. Next thing is, uh, for this, we should also have a security and compliance. Okay. So we should have a role based access control and the network policies defined to control the access of the resources and and ensure secure connection between the ports.
Priorities, key considerations when selecting AWS cloud services for deploying a scalable Python machine learning application. So when selecting AWS cloud services, uh, for deploying the scalable machine learning applications. Okay. So we should consider the, uh, following things. So first should be the ball latency. Okay. So first, we should consider the low low latency. Okay. 2nd, we should configure scalability. So for scalability, we can use Amazon EC to auto scaling. Uh, It should all we can also use AWS Lambda, uh, which will be serverless and, uh, event driven architecture. We can use, uh, like, EKS. K. So, uh, EKS will have clusters and the ports which which will scale up, scale down. Okay. We should set up the high availability and fault tolerance. So, like, if we use the database, then we can use Amazon r r d s multi AZ. Okay. Or we can use, uh, Amazon s 3. We can also, uh, uh, we should also define the Amazon Route 53, so that we should have a high relevance, scalable, uh, DNS service for routing the traffic. Okay. Then we'll come about the, uh, ELDs. So how it will distribute the, uh, incoming traffic from multiple targets. Okay. Then we should think of the performance optimization. Okay. So for performance optimization, we have to talk about, like, Amazon instance, so we should use, uh, instance for according to our workload, if it is computer optimized like, computer optimized or memory optimized. Okay. Uh, if you talk about, like, Amazon FSX for last year. I mean, if you need the, uh, high performance file system for high speed processing of large datasets, and we can use this. Okay. Uh, if we have, uh, if you want to improve the, uh, what we say, global, uh, ability and the performance of application, then introduce the, uh, Airbus global accelerator. Then will come, uh, the data management and storage. Okay. And, uh, next thing will come on security and compliance. The same thing I also talked about in the previous answer. So in security and compliance, we have to define IM rule. We have to define the KMS. Uh, we should also use AWS Shield. Okay. Uh, then it was WAF. Uh, then for monitoring all of this, I'll introduce AWS CloudTrail, uh, which will, uh, help to monitor the all the API calls, and then we can do auditing auditing around it. Yeah. And the important thing is cost efficiency. So whatever we are doing, are we under repricing or utilizing it. Uh, if we have any batch thing, we can do the spot and sensors. Okay.
On the rule of MFluence in language management of Python machine learning model life cycles in a community based platform. Elaborate on the role of MLflow in simplifying the management of Python machine learning model life cycles on a Kubernetes based platform. So, uh, MLflow, we use MLflow for tracking our model. So, uh, what we generally do is uh, whenever we train our model, okay, we track them using the MLflow. So we upload all the artifacts around the model. We upload the model. Okay. Uh, and everything to MLflow. Then once we are done the multiple experiments, we can go to MLflow. We can actually, uh, compare these models, all the graphs that we have generated, uh, all the expression that we have generated in storing m l four. We can all the com we can compare them, uh, across all the runs. And then whatever we choose, we will use API to the first model. Once we have a model registered, we can, uh, have endpoint created. We can host that, uh, model into a Kubernetes best platform and have a endpoint created. And, uh, and we can, uh, initially use that endpoint. We can generate the stats around it, uh, for monitoring purpose, deploy those stats to some, uh, monitoring tool like data dot and compare. We we should go back testing the data data drift or the concept. Okay. If there is anything, then we should go for another training around. The whole process will repeat. Yeah. So everything can be done using this, uh, MLP.