profile-pic
Vetted Talent

Jagannath Das

Vetted Talent

Proficient in Python, SQL, TensorFlow and Pytorch with a passion for effectively communicating intricate data. Actively pursue further education in these technologies to remain at the forefront of the field. Possess a B.Tech degree in Electrical and Electronics Engineering from NIST and have successfully completed multiple Google certified courses in data analysis and engineering. Motivated to apply my technical expertise to a data-driven organization, generating significant outcomes through strategic data utilization.

  • Role

    data anlayst

  • Years of Experience

    3 years

Skillsets

  • MySQL - 3.5 Years
  • Git - 2 Years

Vetted For

10Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Machine Learning Engineer ( Remote )AI Screening
  • 66%
    icon-arrow-down
  • Skills assessed :Algorithms, Artificial Intelligence, Cnn, Generative AI, LLM, Mathematics, data-science, machine_learning, Python, Statistics
  • Score: 59/90

Professional Summary

3Years
  • Jun, 2022 - Present3 yr 4 months

    DATA ANALYST

    DHRUV TECHNOLOGY AND SYSTEM SOLUTION
  • Jan, 2020 - Present5 yr 9 months

    Data Analyst

    kugvytvhb
  • Nov, 2021 - May, 2022 6 months

    DATA SCIENCE INTERN

    FUTURE GENERALI

Applications & Tools Known

  • icon-tool

    XAMPP

Work History

3Years

DATA ANALYST

DHRUV TECHNOLOGY AND SYSTEM SOLUTION
Jun, 2022 - Present3 yr 4 months
    • Collaborated with senior Data Scientist and Project Manager to identify performance improvement opportunities and seamlessly integrate disciplined models, contributing to enhanced project outcomes and efficiency.
    • Put through and applied data mining and machine learning solutions addressing business challenges, including predictive modeling for fraud detection and loan prediction, resulting in actionable insights and improved decision-making processes, and deployed in AWS.
    • ML pipeline, trained models with supervised and unsupervised algorithms, attaining a 60% reduction in fraud, demonstrating effective fraud detection capabilities.

Data Analyst

kugvytvhb
Jan, 2020 - Present5 yr 9 months
    • In this role, I focus on delivering high-quality data solutions and leveraging my technical expertise to support organizational goals. By employing cutting-edge techniques and fostering collaboration, I contribute to continuous improvement and innovation.

DATA SCIENCE INTERN

FUTURE GENERALI
Nov, 2021 - May, 2022 6 months
    • Engineered impactful model leveraging Python for data cleaning, preprocessing, and visualization.
    • Proficiently managed MySQL databases for efficient data storage and retrieval.
    • Engaged in training sessions to raise skills in machine learning and deep learning.

Achievements

  • Proficiently managed MySQL databases for efficient data storage and retrieval Engaged in training sessions to raise skills in machine learning and deep learning.
  • Applied data mining and machine learning solutions addressing business challenges, including predictive modeling for fraud detection and loan prediction, resulting in actionable insights and improved decision-making processes, and deployed in AWS.
  • Developed ML pipeline, trained models with supervised and unsupervised algorithms, attaining a 60% reduction in fraud, demonstrating effective fraud detection capabilities.

Major Projects

1Projects

Sentiment Analysis

    Conducted sentiment analysis on multiple text files simultaneously, achieving accurate results without applying transformer models. Established a custom solution for sentiment analysis, presenting proficiency in NLP techniques. Followed efficient text processing methods to analyze sentiment across large datasets, showcasing advanced NLP skills.

Education

  • Bachelor of Technology in Electrical and Electronics Engineering

    National Institute of Science and Technology

Interests

  • Singing
  • Cooking
  • Cricket
  • AI-interview Questions & Answers

    Hello. My name is, and, uh, I am I am a data analyst with 2 2 plus years of experience working in group technology and system solutions, uh, and building into machine learning models with in collaboration with senior data scientist. And, uh, as a data analyst, I also build a dashboard in Tableau and and work in, uh, I work in machine learning part mainly. So I have completed my b tech on 2019, uh, August and securing 67.8 c g p a in my fine in my semesters.

    Can we discuss the Python technique you have used for feature selection in your dataset and the impact it had on multiple things. Uh, technique, uh, some Python technique I have used for feature selection like, uh, p value. We will do it, uh, to you know, we use p value to see that how this feature are efficient. Like, uh, if the p value of a feature is greater than 0.05, then we reject that feature for the consideration model building. And we do, uh, we do recursive feature elimination. We, uh, by removing the least important feature, uh, same way using v value. Uh, this, uh, this way, we do using BIF, checking BIF of the, um, model. So this impact significantly in model and another, uh, some of the models have not need not to do this p value selections. And some of the models have to do this p value, uh, selections, like, uh, some of the models like, uh, random forest. And, uh, those are the bagging methods, uh, like random forest and, uh, sorry, ensemble methods. Random forest and, uh, XGBoost. Those models don't require, uh, those model don't require, uh, this, uh, p value selections. And the model which requires p value selection and logistic regression, uh, linear regression, uh, those type of models are required, uh, p value selections. Like, if the p value is less than 0.05, then we'll omit the features.

    How would you deploy a Python based machine learning model to a production environment? So so the Python based machine learning model, uh, to deploy the machine learning model, first, we start from the business objective of the business. We talk with the client. We understand the objective of the business. Uh, then we decide that what to go with the business, how to go with the business with the, uh, in a meeting with the senior data scientist and project manager and, uh, team manager. So after then, we prepare data. We prepare we get the data from, uh, data engineer or client server, and then we do feature engineering. Then we clean the data. Then after we do feature engineering, then after, uh, we do, uh, exploratory data analysis also, uh, uh, during the cleaning data, uh, after the cleaning data, um, so that which feature is impacting will will impact more in the model. That's why we do explode exploratory data analysis. Then after doing EDA, we do, uh, feature engineering, uh, like, uh, for categorical and discrete and numerical variables. Uh, for categorical variables, we do one of encoding or, uh, p d dot getummies for the variables. Then after we do, uh, for the discrete variables, we also, we can do, uh, one of the encoding also. And for the continuous variables, we do scaling, min max scalar, or standard scalar. Like that, we we process the data and then train the data. Then we evaluate the data and test it. We evaluate the data and test it. And then after, you know, we do the deployment process, and then we build the model on top of a wave framework to deploy in the server like Flask or FastAPI. Then we after building that, uh, web framework, we test the model in local server. Then after testing it in local server, then we containerize the model. So before that, uh, when the model was built, we save it in a dot job link file or dot WQL file so that to then after we containerize the model with the docker So after containerizing the with the docker, then we, um, I, uh, I prefer it we'll leave prefer I prefer to do deployment on code AWS. Sorry. AWS EC 2 instance. So in EC 2, we deploy the docker image. Like that, we do deployment on AWS Institute. Uh, using, uh, the, uh, one of the continuous integration or continuous deployment, uh, pipeline also. We use Jenkins for CICD. So with that offer, with the help of that Jenkins, uh, the continuous integration and deployment processes, uh, continued.

    What approach will you take in passing to mitigate lower fitting in a super risk revenue model? So mitigate so to overcome the overfitting in supervised learning model, we have various options like, uh, we can go for the if the data is having very less number, so we can go with the, uh, asking data more data from the client or data engineer. Uh, so then after getting the data, if we train the model, then it is, uh, it might reduce some overfitting. And after that, uh, we do we have reach and lasso for the supervised learning problems. So, uh, these, uh, and those are called also l one and l two regularizations. So Lasso, uh, penalize large coefficient and and, uh, reach and reach, uh, I think, yes, less of analyze the last coefficient, uh, to 0, and this this this regression makes it, uh, it don't make the coefficient fully 0, but it's minimize the effect of those, uh, features by decreasing by lowering the, uh, coefficient value or the weights of the features. So, uh, and, uh, we can do pruning and decision tree like, uh, the number of flip nodes would prevent overfitting means decreasing the number of flip nodes. And in assembling method, uh, integration only, We we do, uh, we do in assembling methods like random forest takes boost. Uh, those we can do. Okay. Uh, for supervised learning, we can do this much.

    Explain how you would use Python's multiprocessing or threading APIs to execute So we use Python's multiprocessing, uh, telling libraries, uh, like, uh, libraries, like, using GIL, uh, Global Interpreting Lock, uh, using trading, uh, like, library is more suitable. The trading library is more suitable input output context for model, um, for for model training, multi purpose thing is typically preferred, but we we we we use threading. So in uh, for so and multi processing suitable, uh, CPU bound task like model training. It create a separate process, each with its own Python and memory space, efficiently bypassing the GIL and threading those. And that's the more appropriate to input output on task. Due to the GIL threading may not provide significant speed up, uh, for the CPU bound task.

    Can we discuss your machine project where you had to optimize back and forth to to move to memory and computational budget constraints. Uh, yes. So one of the my project the recent project which, uh, I was doing was the, uh, fraud detection in in service claim. So, uh, the way, optimize code and, like, uh, wherever we have been, uh, like, that is a huge amount of data. So previously, we are doing just working on the huge amount of data, but, uh, after that, uh, we thought of and that data is also imbalanced, so we thought of making it balanced and have a sample from the data. It is almost around 6,300,000 of data. So, uh, we had a stratified sampling from the data, uh, which balances the data and also maintains the ratio of the data of that was a classification classification problem. So we have a it also balances the both the classes, uh, instead of a k four. So by this way, uh, we have optimized model. And, like, uh, it doesn't require much time to load the data. And, like, previously, we are, uh, doing work on all 6000000 6,300,000 data, then we took only, uh, 10% of, uh, data by sampling, uh, using stratified sample a sampling technique from a scale and library. So in this static sampling also, we are trying to do using pandas and using that group by method. Yeah. We can try we have tried it within group and Lambda. Uh, we tried to do the stratified sampling, but it took much more time to do stratified sampling using pandas. So then after we shifted to a scalar library for stratified sample, uh, this reduces the memory usage and improve the computational efficiency of the before building the model process, which we do in EDA or, uh, into those, uh, feature engineering part and maintain the model opens. And despite the optimization, we insert the model predictive performance, implement robust with the AUC's protect meter project requirements also.

    In the following Python function, design table that machine learning security, locate and explain the bug that could cause incorrect result. So calculate a glitch increase, breach levels, I'll look at and explain them what that could cause incorrect result. The original, uh, code snippet lacks the intention, uh, which is necessary in Python. So there is an intention error. So the logic itself is correct. I found I found the logic. It's correct. So the so the current accuracy accuracy equals to calculate, uh, accuracy equals to okay. Calculate accuracy, like, prediction comma level and print accuracy. Used to here used to double instead of and w equals to if a trade equals to equals to correct this input and friction. Accuracy equals to So I don't find any bug in the code. Uh, I think the code is, uh, correcting it since.

    Given this Python produce for the loading machine model, discuss what could go wrong and how it is. So in the load, uh, in the last line of loaded mod loaded underscore model equals to load underscore model. Uh, it has some error like, uh, inverted commiser, uh, beyond the black backslash in beginning itself. So and the backslash is also, uh, after the model dot pkl, which is not correct, I think, uh, which is not correct. So, uh, the it is supposed to write that load underscore model, um, bracket stat, uh, inverted comma stat, and then we keep the path there over, of the model that we get. Uh, and then we, uh, then we close the inverted commas. Without using that last backslash. We have to also omit that last backslash on that model that we can.

    How would you implement a distributed training of a machine learning model? How would you implement distributed training of a machine learning model using Python? Implementing distributed training of machine learning using Python have several steps and considerations, uh, like uh, multi worker mirrored strategy. Uh, and, uh, like, uh, we choose a framework like TensorFlow or PyTorch, then we set up environment, uh, creating the virtual environment in that, uh, in in that system, uh, in that coding platform. Then we initialize the, uh, hardwood, uh, hardwood. Yes. Uh, how about next, uh, this to training with the TensorFlow straightforward by providing a simple API, uh, for initializing and operation. So so okay. Then, uh, we do and then we do potential challenges, uh, like, uh, data handling, um, efficient distributing distribute efficient distributing data across multiple nodes of GPU can be complex. So we can also use communication like a synchronizing ingredients and model words across devices can introduce latency. So these are the communication overhead and hardware availability, uh, like, this is also one type of challenge that, uh, hardware availability. We have to work within the hardware is available, uh, to us. So answering all notes that, uh, the we use this. So for that, uh, hardware optimization for the hardware used hardware, uh, uses, uh, we can do containerization, uh, like Docker. And, uh, we do, uh, we do check for fault tolerance. So implement implement the checkpoint of safe model. Uh, do scalability by monitoring the performance and efficiency of the training model, uh, training process, uh, optimize the workload distribution. We use, uh, we also see that complexity in debugging. So we use here for solving this reuse, logging, and monitoring tools, uh, like, uh, uh, Craftsman. We do for monitoring we use for monitoring Craftsna. So like this, we do, uh, distributed training much many models.

    So, uh, we can use a Python generator like for memory efficiency, then improved performance, uh, then simplified and data pipelines. I can do lazy eval generator is used to basically for lazy evaluation, uh, gender generators. So handling infinite data streams, uh, and enhance readability and the maintain maintainability, uh, and parallelism and the concurrency, uh, using we can also do pipeline integration using this, uh, Python generators.

    In Python, how do you use a library like nonparameters to streamline complex calculation in data pre processing? So by using, uh, NumPy itself is a, uh, also known as, like, a scientific library for the calculation, uh, for all data centers and data into the data centers and to the rest. So, uh, NumPy basically, uh, very much efficient while we go for reshaping reshaping the data and using the edit calculations. And Python is very much efficient while handling the missing datas. And this, uh, this also gives some, uh, Python also gives some aggregation functions like dot group by and to to use for, uh, aggregation uh, and if you're not sort. And we have some mathematical operations, uh, like efficient which is efficient in, uh, areas for NumPy. And, uh, we use pandas for data loading, data exploration, uh, and then after data cleaning, data transformation, grouping. I like I said, grouping and aggregations, uh, and marching and joining the columns, uh, for often data datas.