
Director Product Development
Keenai GlobalPrincipal Technical Product Manager
Juspay TechnologiesPractice Lead
FICO INDIA PVT LTDTechnical Manager
Indecomm Technology Pvt LtdData Architect
TEK SYSTEMS INDIA (CISCO INDIA).png)
Apache Spark

Pyspark

Kubernetes

AWS

Redis
.png)
Informatica

Prometheus

Clickhouse

Puppet
.png)
Jenkins

Kibana
Hi. I have got 19 years of experience into data architecture building big data pipeline, uh, using Spark and Hadoop Technologies. Primarily, I have played a role in technical leadership capacity, uh, building a team from scratch. And I have also worked, uh, across SaaS platform, payment systems, managing full stacks application, and on Big Data Technologies.
Hi. I have got 19 years of experience primarily into, uh, technical leadership role, managing medium to large scale team size. I have worked on various domains starting from, uh, managing end to end SaaS application, full tech full stack technologies, uh, managing, uh, big data platform using Spark Hadoop Technologies. Uh, I have worked across multiple domain. I've worked on payment system. I've worked on BFSI domain. I have also managed project on compliance.
You can collect all the logs generated from multiple talent jobs into a Kafka system or Kafka cube. And, uh, from the Kafka, you can have, uh, the different consumer service, which can read those log and upload it into Elasticsearch. You can you can build Kibana on top of the Elasticsearch where it can view all the log metrics.
Well, if you want to, uh, if you have a requirement where you want to process both the batch and the streaming data, uh, That's a case where you should go with a Lambda type architecture, uh, which can handle both the batch data processing and also the streaming applications.
Well, Python comes with the same learning library, uh, MLIP, which can be integrated within Spark, uh, AWS pipeline.
Well, you can have a job which can, uh, uh, which basically compares the data in RDS versus the data what you have in s 3. So you can have a CDC job, uh, and you can have a threshold. So and you collect the metrics between that, and you you compare it with the count. In RDS is higher than, uh, count of record in RDS. If it is higher than what you have in s 3, that means, uh, the data are not in sync between RDS and s 3. So you have to run a CDC job to sync up the data. And while this happened, we also need to generate, uh, logs and metrics through which you can monitor the entire system.
You can do an end check process, uh, to ensure that the file, uh, The file which is which is available the CSV file which is which is that in the source system is complete before you write data into the SD SDFS system.
You can have you you can design the CICD, uh, pipeline. So, basically, what you have to do is, uh, you can have a Jenkins pipeline on a Docker based system. So after a change is committed to get, uh, auto the source code depository, you can trigger the pipeline to build unit test and deploy, uh, and create an image, docker image. That docker image, you can upload that to a docker registry, which can be, uh, deployed into a target system.
You can You can use partition pruning so that you're not scanning the entire partition. Uh, you're reading the partition where where the data is and all you are interested with.