profile-pic
Vetted Talent

Shivam Nitin Vazare

Vetted Talent
A challenging carrier as Data Scientist / Web Developer where my Python Machine Learning / Data Intelligence/ Django REST skills can be effectively used and upgraded. Data Scientist with strong Statistics and Mathematics background and Overall 3 years of experience using Predictive Modeling, Data Processing, and Data Mining Algorithms to solve challenging business problems. Involved in Python Open Source Community and passionate about Deep Reinforcement Learning. Looking for a challenging career in the field of IT-Software Industry especially for roles such as Django REST /Data Scientist/ML/AI +Python Programming where my strong a SQL and UNIX knowledge and experience in Programming Concepts and Methodologies in Software Development are shared and my all-rounder development is encouraged.
  • Role

    Data Scientist

  • Years of Experience

    3 years

  • Professional Portfolio

    View here

Skillsets

  • XML
  • Python - 3.1 Years
  • SciPy
  • Seaborn
  • SOAP
  • SQL
  • Tableau
  • TCP/IP
  • TensorFlow - 3.1 Years
  • Unix
  • Websphere
  • PySpark
  • Kml
  • Beautiful Soup
  • MapReduce
  • WebLogic
  • Computer Vision - 3.1 Years
  • NO SQL - 3.1 Years
  • Deep Learning - 3.1 Years
  • PyTorch - 3.1 Years
  • NLP - 3.1 Years
  • HTML
  • AWS Cloud Computing
  • Bootstrap
  • CSS
  • DHTML
  • Django REST
  • ETL
  • Hadoop
  • HDFS
  • Hive
  • Apache Tomcat
  • HTTP/HTTPs
  • JavaScript
  • JBoss
  • jQuery
  • JSON
  • Matplotlib
  • Mongo DB
  • NumPy
  • pandas

Vetted For

12Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Data Scientist (Remote)AI Screening
  • 57%
    icon-arrow-down
  • Skills assessed :Communication Skills, Jira, Retrieval-Augmented Generation, Computer Vision, Deep Learning, PyTorch, TensorFlow, GitLab, machine_learning, NLP, NO SQL, Python
  • Score: 51/90

Professional Summary

3Years
  • Jun, 2023 - Present2 yr 4 months

    Data Scientist

    INCIF Technologies Pvt. Ltd
  • Jul, 2021 - May, 20231 yr 10 months

    Associate Consultant (DS)

    Capgemini Technologies

Applications & Tools Known

  • icon-tool

    AWS

  • icon-tool

    DevOps

  • icon-tool

    Tableau

  • icon-tool

    ETL

  • icon-tool

    Linux

  • icon-tool

    Html

  • icon-tool

    CSS

  • icon-tool

    Bootstrap

  • icon-tool

    Git

  • icon-tool

    Docker

  • icon-tool

    Pyspark

  • icon-tool

    Airflow

  • icon-tool

    Heroku

  • icon-tool

    Jenkins

  • icon-tool

    EC2

  • icon-tool

    VPC

  • icon-tool

    EBS

  • icon-tool

    S3

  • icon-tool

    Postman

  • icon-tool

    Microsoft Azure

Work History

3Years

Data Scientist

INCIF Technologies Pvt. Ltd
Jun, 2023 - Present2 yr 4 months
    Highly efficient Data Scientist/Data Analyst with 3+ years of experience in Data Analysis, Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Data Cleaning, Data Engineering, Features Scaling, Features Engineering, Statistical Modeling, Dimensionality Reduction, Testing and Validation, Data Visualization.

Associate Consultant (DS)

Capgemini Technologies
Jul, 2021 - May, 20231 yr 10 months
    Experience in design, development, testing and implementation of various stand-alone and client-server architecture-based enterprise application software in Python on different domains. Managed the entire data science project life cycle including Data Acquisition, Data Cleaning, Data Engineering, Features Scaling, Features Engineering, Statistical Modeling, Dimensionality Reduction, Testing and Validation, and Data Visualization.

Achievements

  • Received Star Performer Award for good performance.
  • Received appreciation for E2E Delivery from client Mercado Liber, Mexico

Major Projects

2Projects

Omni-Channel Merchandising

    This E-Commerce analytics solution is a key component of the digital transformation of businesses. It makes it possible to track the customer journey across Omni-channel touch points and build a comprehensive view of what drives revenue. This insight informs better business decision-making.

Cloud Networking for Industrial Ethernet Switches - [ CNofIES ]

Jun, 2023 - Present2 yr 4 months
    Cisco's Catalyst IE3400 Rugged Series switches combine full Gigabit Ethernet switch solutions with advanced features in a modular, future-proof design. Expandable up to 26 ports in a compact form factor, these rugged switches are optimized for size and power, and bring Cisco intent-based networking to Industrial Ethernet applications. Provides secure access for new high-speed applications in the industrial space.

Education

  • PG in Artificial Intelligence And Machine Learning

    Pravara College Of Engineering, Ahmednagar (2023)
  • Bachelors Of Engineering (B.E.)

    Pravara College Of Engineering, Ahmednagar (2019)

Certifications

  • Google cloud data and machine learning fundamentals

  • Domain foundation by automation academy

  • Power bi by automation academy foundation level certification

  • Python programming by great learning 2023

  • Introduction to cyber-security

  • Aws cloud practitioner essentials

  • Istqb foundation level 1 certificated

Interests

  • Driving
  • Bike Rides
  • Technology Research
  • Youtube Learning
  • Travelling
  • AI-interview Questions & Answers

    Yes. Could you help me understand more about background by giving a brief introduction of yourself? Yes. Sure. Hello. First of all, giving this opportunity to introduce myself. I'm Shumitin Ozre. I think there is an interruption. Okay. So let me, continue, from first. Yeah. Thanks for giving this opportunity to introduce myself. I'm Shivniti Nozary. I'm delighted to introduce as a data scientist having 3 years of experience in this in IT industry. I completed my B from Pune University in 2019, and recently, I complete postgraduation diploma from, University of Texas at Austin. My passion for data analysis and problem solving led me to pursue a career in this ever evolving and dynamic field. I work on diverse project from predictive modeling to the data driven business strategies. I work on diverse project from Purnu, I excel extracting value of insights from complex data with various tools and technologies. Including Python, TensorFlow, PyTorch, Py Park a GitHub, Django, Docker, ComputerVisa, NoSQL SQL, deep learning, machine learning, MLOps, And, yes, I'm well known about my ability to communicate technical findings to nontechnical stakeholders, also making the data driven decision within the organization. And, yes, I'm, yes, I'm super excited about the opportunity where I continue contributing my expertise and driving a a Data driven innovation role in Mac no. Yeah. And I work on various, project domains like telecom, ecommerce, and payment. And my contribution in that is to leveraging my expertise in Python. After that, statistical analysis, data manipulation technique to optimize and analysis workflows. Also, I contributed, we can say the collaborative with cross functional teams to ensure alignment of the, data science initiatives and project objectives. Also, I employed ETL techniques

    Providing an example, how would you implement a sequence to sequence model in TensorFlow for machine trans translation task? Okay. Okay. To, uh, there are various steps, actually. No? So data preparation is there. After that, uh, to implement the sequence to sequence model in TensorFlow for machine learning. No? Uh, machine machine translation follow these steps. No? Data preparation is there. Model architecture. After that, define the model. Okay. After that, training and evaluation. So tokenize and preprocess. You are now a source and target text. Use a tensor flows, tokenizer, and pad sequences to prepare input sequences and target sequences. Model architecture, in that, we there are, uh, the sub steps are encoder, decoder. Encoder is used in LSTM and, uh, or we can say the GRU, uh, layer to encode the input sequence into a fixed size context vector. And, uh, also, we can, uh, stack multiple layers for better performance. And, uh, decoder use another LSTM and GRU layer with an attention mechanism to generate the output sequence. And after that, we kinda define the model by coding part. Okay. Uh, after that, the last one is, as I said, is training and evaluation. So train the model using pairs of scores and targeted sequences. Evaluate the performance using metrics like, uh, BLU, uh, score for translation quality. Uh, Yeah.

    So how would you benchmark the performance of NoSQL database against SQL when dealing with large unstructured datasets using Python. Okay. Uh, actually, there are here also we can use, uh, stay follow the step by step setup and configuration in that the NoSQL database is there. Choose a NoSQL database. Example, MongoDB, Cassandra. All we can do is, uh, we have to just set it up. After that, SQL database is there. Choose an SQL database, for example, PostgreSQL, MySQL, and set it up. 2nd part is data preparation. Uh, in that, we can generate details data, create a large unstructured data set to use a benchmark. King, this could be a collecting of documents with, uh, varied fields for NoSQL and, uh, no, similar tab, uh, tables with large volumes of rows for SQL. Benchmarking task like insertion performance measure the time taken to insert a large number of records, documents, um, or we can say into the board database. Uh, query performance execute the various queries, like, uh, the simple retrievals after the complex aggregating aggregations and the measures, the response times for both database. So, also, we can update the performance, deletion performance. After that, Python, uh, no, uh, we can no. Uh, there is a Python code example also there. No. So we can use that. Analyze result, preparing the performance metrics in in terms of insertion time, query response time to determine which database performs better under the given conditions. Consider factors like, uh, scalability, ease of use, uh, also now and the specific use case, uh, requirements in addition to raw performance metrics. Yeah. Uh, and what we consideration in that, no? Uh, ensure the environment, uh, hardware network is considers consistent when running benchmarks. After that, the, uh, test with a variety of operations and test sizes to get a comprehensive view of performance. Yeah. So that's the case.

    So what factors will you consider when choosing between convulational neural networks and recurrent neural networks in computer vision task. So what factors would you consider when choosing between convolutional neural networks and recurrent neural networks in computer vision task? Uh, wait. And, uh, unable to recall it. Uh, what factors would you consider when choosing between convolutional neural network and recurrent neural networks in computer vision task? What factors would you consider when choosing between convolutional neural networks and recurrent neural networks in competition task? Something here. What is going on?

    K. Which Python classes or frameworks will assist you in developing an anomaly detection system with PyTorch, and what will be your validation strategy? Okay. Uh, strategy, uh, we, uh, we can now follow various steps in that. 1st step is import necessary libraries after that, generating synthetic data, creating sequences, defining the autoencoder model, converting sequences in PyTorch sensors. No? After that yeah. This is the steps we can follow.

    Which Python tools you would use for text tokenization and sentiment analysis in in an NLP pipeline, And why would you choose them? According to SoundScrapers, text block Okay. So in that case, we use, uh, text blob. Text blob is a must for developers who are, uh, starting NLP in Python and want to to make, uh, want to make the most of their first encounter with NLTK. It provides beginners with an easy interface to help them learn the most, uh, basic NLP tasks like sentiment analysis, postaging, or noun phrase extraction. Yeah.

    Oh, give the following Python code snippet. What is the issue that will prevent, uh, it from currently, uh, currently creating and machine learning model pipeline? I'll show the original code in, uh, the issue in the original code. No? Uh, there is improper importer. A typo is there. From sklearn.svmimport s v, uh, s v c is correct, but the pipeline definition, uh, s v c should be replaced with s, uh, capital s, SVC. So the correct last name is, uh, capital SVC with uppercase, we can say, not a small smaller case. Syntax error in pipeline steps. In the original code. Uh, the pipeline steps are correct incorrectly formatted. You have, uh, tested a test in a instead of a proper comma. No? So and separated and incorrect brackets bracket usage. It will be a list of tuples, uh, with each tuple containing the name of step and the corresponding estimator or transformers. And, uh, assuming x train and y train are defined, while not syntax error in that ensures that the x train, y train are properly defined and contained the data you intend to use for fitting the model. So yeah. So there are several issues. So yeah.

    The import statement is there. No. It's we have to correct it that. Import, uh, light 3. It should be on a separate line, and, uh, the corrected line is from Flask import Flask, uh, JSON JSONify. And, uh, the second one is Flask app initialization. App Flask name should be a app Flask name using, uh, in is equal to operator to assign the Flask instance to the variable app and, uh, the the method list formatting. So in method is equal to, in bracket, the gate, use normal codes and make sure the list is properly formatted. Data fetching and return. So data curves dot fetch all should be a data dot fetch on assign the result of those 2. Okay. And also, genify data as soon as data is in a format that can be directly, uh, serialized to JSON. A SQLite fetch returns a list of tuples which need conversion to a JSON serializable format. For example, convert it to a list of dictionaries if required. So yeah. Also, we can add some additional considerations. So database column names to convert, uh, the top polls into dictionaries with column names. You might need to know, uh, or retrieve column names from the cursor descriptions. Error handling for production and work being considered adding error handling to manage exemption during database operations.

    Can you devise a Python workflow that applies both deep learning and NLP techniques to extract insights from visual and textual data simultaneously? Uh, yes. Yes. We can. We can use, uh, Fastai is a Python based open source machine learning framework, no, that offers high level abstraction of deep learning model training. And, uh, and yeah. So we can yes. We can devise a Python workflow that applies the both deep learning and NLP technique. There are various methods and sources we have. The PyTorch is there after TensorFlow is a Keras. Uh, OpenCV is there. No?

    Can you illustrate how version control with Jira would add in collaboration for a remote data science team deploying TensorFlow model. Actually, Git is a version control system that tracks file changes. GitHub is a platform that allows developers to collaborate and store their code in the cloud. So think of it, uh, think of it this way. Git is responsible for everything GitHub related that happens locally on your computer. No? So yeah. So, uh, that's the basic main reason that we can illustrate the version control with the help of GitHub. So, also, we can use no version control device as a GitHub. So you can integrate the various local machines as a developer in in together a bit together, work together. And we get to know about the the the the status of the task task, the, yeah, task completion, the pending task, the requirements, yeah, whatever changes happened. So can Git, uh, GitLab, we can just track the records.

    Discuss a technique in Python to automatically handle missing or corrupted data in large dataset that might affect the machine learning model performance. Uh, deleting the columns with the machine data missing data. In this case, let's delete the column edge and then, uh, feed the model, uh, and check the accuracy. This is one method. After that, uh, other imputation method is there. Uh, filling the missing values is there. L one sorry. Uh, k n n is there. Uh, dealing deleting the row with missing data, which no. Suppose the column have, uh, more than the half of data is missing or null values, then we can just delete or drop the whole column from the database. Yeah. So on the basis of that, we just now deal with the missing values and the corrupted data. K NN is there. Uh, also, we can, uh, also replace with that, uh, with the, you know, specific mean or median on the basis of how much, uh, what type of data we get. So on the basis of that, uh, we, uh, replace the values, mean and median values on on that.