Vetted Talent

Bharath Shroff

Vetted Talent

Results-driven professional with 5+ years of experience in AI, data science, and software engineering, consistently leveraging cutting-edge technologies to drive innovation. Proven expertise in automating financial and data processes, building scalable solutions, and delivering actionable insights for global stakeholders. Skilled in AI/ML, Python, RAG, Next.js, and cloud platforms like Databricks and Azure. Adept at enhancing decision-making through advanced analytics, end-to-end application development, and agile methodologies, with a strong foundation in project management and client-focused solutions.

Role
Data Scientist
Years of Experience
6 years
Professional Portfolio
View here

Skillsets

REST API - 2 Years
React Js - 2 Years
react - 2 Years
Scala - 1 Years
React Js - 2 Years
Next Js - 1 Years
Next Js
Selenium - 2 Years
MLOps - 1 Years
LLMs - 1 Years
K-Means - 1 Years
Backend - 2 Years
Financial reports - 1 Years
Node Js - 1 Years
PowerBI - 2 Years
MySQL - 5 Years
Git - 4 Years
PowerBI - 2 Years
rag
Data engineering and manipulation
Tableau - 1 Years
Reporting - 3 Years
Relational Database - 5 Years
PyTorch - 1 Years
Python - 6 Years
SQL - 5 Years
PySpark - 5 Years
Cloud - 1 Years
Next Js - 1 Years
Databricks - 5 Years
Odoo
Big Data - 5 Years
Data Engineering - 5 Years
MLFlow - 1 Years
JavaScript - 4 Years
React Native - 1 Years
Databricks cloud
Finance - 1 Years
Restful APIs - 5 Years
LLM - 1 Years
AI - 3 Years
Data Engineer - 5 Years
Data warehouse - 5 Years
Azure - 2 Years
API - 3 Years

Vetted For

10Skills

Roles & Skills
Results
Details

Python Developer (AI/ML & Cloud Services) - RemoteAI Screening
66%

Skills assessed :GCP/Azure, Micro services, Django /Flask, Neo4j, Restful APIs, AWS, Docker, Kubernetes, machine_learning, Python
Score: 59/90

Professional Summary

6Years

Aug, 2024 - Present1 yr 2 months
Contract Data Scientist
MCSquared AI
Aug, 2024 - Oct, 2024 2 months
AI Innovation Specialist - Finance
Trilogy
May, 2022 - Jul, 20242 yr 2 months
Full Time Data Scientist
MCSquared AI
May, 2018 - Jul, 2018 2 months
RnD Intern
DELL EMC
Jun, 2019 - Jul, 20212 yr 1 month
Associate IT Consultant
ITC Infotech
Aug, 2021 - Apr, 2022 8 months
Full Stack Developer Volunteer
Isha Foundation
May, 2016 - Jul, 2016 2 months
RnD Intern
Computer Institute of Japan

Applications & Tools Known

Odoo
Apache
NumPy
WordPress
Palantir Foundry
Databricks
Azure Data Factory
Power BI
Next JS
LangChain
React Native
Git
DevOps
Selenium
PowerShell
Scala
Kaggle
Scrapy
SVM
Naive Bayes
Tkinter

Work History

6Years

Contract Data Scientist

MCSquared AI

Aug, 2024 - Present1 yr 2 months

Led the team to build a pipeline in Databricks feeding into a map view dashboard containing proximity hotspots of leads around business provided site locations leveraging Bing Maps API and 3rd party Real world data sources like Citeline, Health Verity, IQVIA.

AI Innovation Specialist - Finance

Trilogy

Aug, 2024 - Oct, 2024 2 months

Deriving Financial Insights using LLM chatbot built on React for the frontend and Express JS for the backend, which updated the RAG Vector DB upon new file uploads, reducing manual analysis time by an hour.

Full Time Data Scientist

MCSquared AI

May, 2022 - Jul, 20242 yr 2 months

Deployed Machine Learning Survival model to production replacing the previous XGBoost model on Databricks using the medallion architecture capable of self re-training every month with new data and auto archive or promote to production based on the champion model using MLFlow for model versioning and evaluating the model performance based on C-score.

Full Stack Developer Volunteer

Isha Foundation

Aug, 2021 - Apr, 2022 8 months

Developed a web application using the open-source Odoo Framework built on Python, streamlining processes and digitizing multiple forms required to be filled by hand by 100s of visitors saving hours of work both for the visitors and the staff.

Associate IT Consultant

ITC Infotech

Jun, 2019 - Jul, 20212 yr 1 month

Deployed end-to-end modules using Git DevOps for Continuous Deployment across the 4 stages (DEV->QA->UAT->PROD), ensuring seamless transitions and operational efficiency for MLOps.

RnD Intern

DELL EMC

May, 2018 - Jul, 2018 2 months

Developed Python scripts for automated reporting, flagging approximately 100 high-priority reports daily, enhancing efficiency in report management.

RnD Intern

Computer Institute of Japan

May, 2016 - Jul, 2016 2 months

Helped in improving the accuracy of multi-class Classification of emails and Achieved 70%+ accuracy.

Achievements

Football Secretary (IIT Hyderabad)
Inter IIT Football Captain
Participated in Table Tennis Inter-Departmental / Inter-Year Tournaments

Major Projects

7Projects

Melanoma Classification

Achieved 85% AUC score in Identifying Melanoma using Convolutional Neural Network (CNN) models.

Network traffic analysis ITC Infotech

Oct, 2020 - Oct, 2020

Extracting insights by transforming Apache access logs and visualizing through plots showing traffic originates from 10 different countries. Processed 6 million+ rows of server logs fetched from Open Source Apache Server Logs. Done as part of a training for PySpark.

Network traffic analysis

Oct, 2020 - Oct, 2020

Extracting insights by transforming 6 million + Apache server logs and visualizing through plots showing traffic originates from 10 different countries.

Machine Learning Library from scratch

Aug, 2020 - Aug, 2020

Implemented a few ML algorithms only using NumPy with the intention of developing a deep understanding of the Machine Learning algorithms. Regression 3 models, Classification 3 models, No use of any existing modules libraries apart from NumPy (math library). Also 9 Normalization algorithms for Data Standardization in an effort to understand them.

Image classification of fruits

May, 2020 - Jul, 2020 2 months

Multi Class Classification of Fruits using images, dataset used from Kaggle with 90380 annotated images. Leveraging Pretrained models like VGG, ResNet, AlexNet, Mobile Net for mobile deployable model.

Tic-tac-toe Extended 2player

Apr, 2019 - Apr, 2019

Implementation of an advanced version of the Tic-Tac-Toe game in python. 2 player as of now. Learnt about this game of 2 layered Tic-Tac-Toe from a friend where we used to play on the behind of our notebooks. Implemented as a side project during college, to be played manually by 2 people as of now, ambitious objective of using ML as a future scope.

IITH Main Website

Jan, 2019 - Mar, 2019 2 months

Built our college website from scratch using WordPress Templating which included integrating from over 10 departments.

Education

Bachelor of Technology in Mechanical Engineering
Indian Institute of Technology (2019)
Bachelor of Technology in Mechanical Engineering
Indian Institute of Technology (IIT) (2019)
Bachelor of Technology, Mechanical Engineering
Indian Institute of Technology (IIT) Hyderabad (2019)

Certifications

Certified azure data engineer associate (dp-200, 201) microsoft 2021
Certified azure data engineer associate (dp-200, 201) | microsoft | 2021
Microsoft certified azure data engineer associate (dp-200, 201)

AI-interview Questions & Answers

Hi, my name is Bharat Shroff and I'm from Bangalore, Karnataka. Starting my career as an associate IT consultant where my responsibilities basically included those of a data engineering role, I worked with two clients. In the first client, I helped them build an Azure data factory in which we orchestrated a pipeline, event-driven pipeline, which every day there would be a file that would be uploaded and that would trigger a pipeline of notebooks which would take the data from the raw, put some transformations to generating some analytics on it and pushing that to a Power BI and Synapse Analytics which would then be consumed by further stakeholders. In the second one, it was majorly on Databricks, Azure Databricks, again, creating a similar data pipeline. And then after that, I worked in Isha Foundation for a considerable amount of time and there I basically helped them build or I built the website which helped digitize because it was a very manual process of every time a person comes to the Isha Yoga Center, they have to fill a form, a handwritten form and that used to take like hours of work from the team and from the participants as well. So we created a digital profile, storing all that information and integrating different aspects of the different activities like the accommodation or any other programs they would do by integrating those APIs and build a common website where the user or the visitor can come and just book through that. For this, I used Python and Udoo. So Udoo is an open source framework. So there I got exposed to a lot of full stack where I developed both the backend and the frontend. Then coming back to MC squared, I switched to MC squared where I worked as a data scientist. There also I worked with two clients. The first client, they had their own, they had a different platform, data platform, which was called Palantir and there I basically worked on preparing contours, which is essentially visualizations. So it's like a POC on visualizations on which stakeholders would be interested. And that did involve some health checks on the data, data monitoring, data drift monitoring, all this kind of KPIs. In the second client I worked with, it was basically again on Databricks, but this basically had had the process of identifying data vendors from which we can buy data and using the client's proprietary data, do analysis, competent analysis and other analysis which would help grow their business essentially. And in my latest, the current project that I am working on is basically it's on an LLM where we are trying to, we have built an agent which you can ask questions and which will create SQL queries and that will go and fetch it from the database required. So yeah, it's been a good journey with very varied experiences and tech stacks. Thank you.

How do you instrument and improve the reliability of a distributed task? So on AWS, I'm not sure which is the equivalent of Azure Data Factory in AWS, but I'm assuming AWS SageMaker would be a close match to that, which would help orchestrate pipelines as notebooks written in the AWS Glue that would contain the actual Python machine learning and data processing logic, Python code, and that would help orchestrating and automating the whole pipeline. Yeah.

So Redis cache is one of the industry leading standards here and that would help us drastically optimize the performance of any cloud platform by storing or even edge caching which would store certain relevant data on the edge devices which should be near real time retrieval speed. And if the AI model itself is small enough to be able to be hosted on the edge device then the latency between the server load and the latency between each query which comes back to the server and server uses the AI model to generate the response and serves it back that would greatly reduce the latency between that by hosting and minimizing the AI model size so that it can be hosted on an edge device.

When designing a low latency API, which is serving machine learning predictions, or at least from a user interface user experience perspective, it is important. And it is very, the perceived time, time delay or the perceived latency using a streaming is definitely shown to improve user experience. So as the as we start getting the responses, just start showing each of the words. And then ultimately, once the whole response is generated, then format, I think that's what the major or major in interactions or the UIs do and what other low latency. Using vector databases definitely helps speeds up the process.

Now, we'll destructure a Python code base, keeping solid principles in mind. So an ML project, it is important to accommodate the flexibility in data and the flexibility in training of a model and the retraining with the update of data. So it is very important to accommodate for that. And based on what I have used is the database architecture of bronze, silver and gold layers, where the bronze layer contains the raw data, silver layer contains the feature engineering or feature extraction and basically all the features that we want to feed into a machine learning model. Then the gold layer has the data which is filtered and just before it goes into the machine learning model. And in the gold layer is where the predictions are created. And then beyond that, we obviously would want a retraining process and which would utilize a sense of what with MLflow, I would be maybe a bit biased about that. But any other Apache Airflow or similar strategies would work on them, where we retrain the model on a new data using then using a champion model comparison, whether based on certain metrics, which is relevant to the particular use case, we would either archive the previous model or based on which model is performing better or just continue with the champion model. So all these would help build a self-sustaining pipeline, which would maintain the data as well as the quality of predictions and the accuracy would improve because the more data an ML model has, the better the accuracy.

What strategy would you employ to optimise a Python application's interaction with S3? Parallel processing is one of the major computationally or which can handle the computation and not block or cause any blockages which is essential for user experience so that all these S3 buckets by default they have parallel access so use multiprocessing or multithreading also would work so that in the Python itself so that the Python app is leveraging multithreading and accessing for each user or even not even for each user for each prediction it uses a different thread so that and that thread can independently and in parallel access the S3 buckets so that there because by default Python application is a sequential application and by helping parallelise that would significantly improve or optimise how S3 native AWS S3 natively supports parallel accesses reads and writes so yeah

Find any incorrect results of one more lyrics. Querying a knowledge graph, select property value. I have worked with SQL majorly. So I don't know about graph. But this question mark property question mark value and it's not a valid SQL query at least and this hyphen sorry not hyphens backslash quote would doesn't make sense. It's not correct Python syntax. So we don't we do not need that backslash just three quotes would do and yeah the query itself. I don't know if we should be using commas and the where condition it doesn't have and what should be the condition exactly. So these are the query doesn't look right to me.

Neo4j is basically a graph-based database framework, so based on, so any use case which involves maintaining relationships, these kind of node or graph kind of representation like a social media network where you have friends who are friends of friends and so on, that's how a graph, a node is connected to another node, so your friend is connected to another friend, so this setup is ideal for these kind of scenarios and the machine learning in this case inherently knows about these relationships and would try to leverage similar nodes not only by the individual nodes attributes but using the relationships as well which would help the machine learning model learn about these things instead of usual table structure which would require additional training to integrate the relationship aspect, so explaining how one row is related to another row, that wouldn't be something straightforward to teach an ML model using a tabular or a columnar structure.

that can enhance ML prediction capabilities for a system designed in this strategy. Neo4j, like I said before, it is a graph-based database. So building a knowledge or implementing a knowledge graph would be very straightforward and leveraging this for machine learning predictions, I mean, assuming it is a use case which is very suitable for a graph, Neo4j natively supports the nodes, relationships and this would be easily captured by the machine learning model which would help train or implement a knowledge graph and the machine learning model can immediately learn about how the knowledge graph is structured.

So Skykit, so the project that I had worked on which involved initially we used XGBoost on Skykit Lore, but based on the use case a survival model was a much better fit. So there is another library by Skykit called Skykit Lore, Skykit Survival which we implemented to tailor fit our use case which just made sense instead of using the traditional ML which are majorly good for classification kind of problems or I mean regression of course.

FastAPI, since I've worked with FastAPI, it natively supports asynchronous, although there is a little tricky part there where if you specify or manually specify an async function then it actually becomes a sequential function, which I think was a major topic of debate – not debate, major topic of confusion, which was clarified in a PyCon in Ireland or something, which in a talk the person clarified on how to exactly use this for asynchronous. So basically you just define the function as they are, you do not manually specify async and because FastAPI natively supports async, it would just automatically run the functions in an asynchronous method and it is important to keep any API asynchronous so that one user's query is not blocking another user's query and optimize the server load, the compute of the server so that there is no idle time for the CPU.

Bharath Shroff

Data Scientist

6 years

View here

Skillsets

Vetted For

Professional Summary

Applications & Tools Known

Work History

Contract Data Scientist

AI Innovation Specialist - Finance

Full Time Data Scientist

Full Stack Developer Volunteer

Associate IT Consultant

RnD Intern

RnD Intern

Achievements

Major Projects

Melanoma Classification

Network traffic analysis ITC Infotech

Network traffic analysis

Machine Learning Library from scratch

Image classification of fruits

Tic-tac-toe Extended 2player

IITH Main Website

Education

Bachelor of Technology in Mechanical Engineering

Bachelor of Technology in Mechanical Engineering

Bachelor of Technology, Mechanical Engineering

Certifications

Certified azure data engineer associate (dp-200, 201) microsoft 2021

Certified azure data engineer associate (dp-200, 201) | microsoft | 2021

Microsoft certified azure data engineer associate (dp-200, 201)

AI-interview Questions & Answers