profile-pic
Vetted Talent

Abhishek Srivastava

Vetted Talent
Data Engineer with 2.5+ years of experience in the tech industry. Proven ability to use cloud computing platforms to store, process, and analyze data. Expertise in data migration, and data warehousing. Strong problem-solving and analytical skills.
  • Role

    Data Engineer

  • Years of Experience

    3 years

Skillsets

  • AWS
  • Data Analysis
  • Python
  • SQL
  • NumPy
  • Team management
  • pandas
  • ETL
  • Problem Solving

Vetted For

9Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Data Engineer With Snowflake (Remote)AI Screening
  • 52%
    icon-arrow-down
  • Skills assessed :Azure Synapse, Communication Skills, DevOps, CI/CD, ELT, Snowflake, Snowflake SQL, Azure Data Factory, Data Modelling
  • Score: 47/90

Professional Summary

3Years
  • Data Engineer

    Lagozon Technologies Private Limited
  • Data Analyst

    Intrics Solution Private limited

Applications & Tools Known

  • icon-tool

    Snowflake

  • icon-tool

    SQL Server

  • icon-tool

    AWS CloudWatch

  • icon-tool

    AWS Glue

  • icon-tool

    AWS EC2

  • icon-tool

    Excel

  • icon-tool

    S3

  • icon-tool

    SSMS

  • icon-tool

    MySQL

Work History

3Years

Data Engineer

Lagozon Technologies Private Limited
    Data Engineer responsible for migration of data, task management, and system optimization.

Data Analyst

Intrics Solution Private limited
    Data Analyst focused on data validation and translation of data into actionable outcomes.

Achievements

  • Successfully migrated data from SQL Server to Snowflake using a variety of techniques.
  • Reduced data loss by 80% in reports by developing task alerts using AWS CloudWatch for multiple scenarios.
  • Accomplished data transformation from JSON to structured database format using advanced data flattening techniques.
  • Designed and deployed optimized stored procedures at the pipeline level, driving streamlined and efficient data processing.
  • Proficiently managed tasks, streams, and Snow pipes to achieve seamless data ingestion and processing in Snowflake.
  • Implemented robust alert and monitoring systems ensuring seamless data ingestion, resulting in significant time and cost savings.
  • Expertly executed Python-based Glue jobs to automate and trigger data ingestion processes.
  • Developed intricate queries to extract essential business data, ensuring accurate and comprehensive information for impactful business reporting.

Major Projects

3Projects

Online Retail Data Ingestion

    Facilitated data migration and developed AWS Glue ETL jobs for data ingestion from S3 to Snowflake.

Loyalty Audit Report Generation

    Created comprehensive data strategies and extracted data for multi-level reporting.

Online Shoppers Purchasing Intention Business

    Identified Revenue generating factors, pre-processed and selected features for models predicting shoppers purchasing intentions.

Education

  • Post Graduate Programme In Data Science And Engineering

    Great Lakes Institute of Management
  • Bachelor of Technology

    IEC College of Engineering And Technology

Certifications

  • Aws certi cloud practitioner (10/2022 - 10/2025)

AI-interview Questions & Answers

Hi. I'm. I have, uh, 3 of data science data engineer experience with, uh, uh, with having a strong background in Python SQL. No big AWS is s 38 AWS. AWS, uh, Blue AWS, Redshift, and, uh, relevant, uh, relevant experience in Azure also, uh, in the, uh, and I have used the cloud, uh, Snowflake extensively for the last 2 years and, uh, with AWS. And, uh, also worked on, uh, Azure for the multiple project in house project. And, uh, prior to that, I have been working with, uh, I was working with private limited at Intrex Solution Private Limited, uh, as a associate data analyst, uh, working on, uh, working on the data, uh, data with, uh, having, uh, technologies like MongoDB, SQL, Python, and Excel. And, uh, and I have done, uh, PGP in 2020 to 21 in data science and engineering, uh, where I have learned about the data engineering techniques and, uh, data science projects. And, uh, I've been working, uh, I have learned pandas, not by, uh, SQL, uh, Excel, and other and other, uh, useful business tactics to, uh, model the data in business field to, uh, bring the business

Hi. I'm. Hi. So in this transactional control, I would be using, uh, the, uh, I would be monitoring the test and development and test environment, uh, that all the transaction, uh, are, uh, going 1 by 1 on concurrent detail processes. And I would be testing each and every, uh, in the automate testing, uh, flowing with me. And I will be using streams streams for the all the, uh, transaction, uh, transaction that is happening on the table on the table. Like, uh, we can, uh, we can use streams for insert, update, and delete all the transaction transaction DML performed on the table. Uh, so we can use streams so that, uh, every transaction get recorded on the Snowflake Snowflake table, uh, that can be used, uh, that can be used when the, uh, data is, uh, changed in the table. So for the CDC purpose, for the CDC change data capture, we use streams. Uh, we use streams, We use streams for the, uh, looking at the, uh, transaction control so that we can get, uh, when the data get updated or, uh, uh, which data is being inserted or updated, uh, in the last, uh, few days or, um, month back so we can, um, maintain the data consistency of the

I would be, uh, using the 3 stage layer in which, uh, uh, 1 is the development environment, 1 is the testing environment, and, uh, other is the, uh, production environment. So once, uh, once the data get developed, uh, once the, uh, ETL process is pipeline is developed, We can, uh, we can, uh, test it, and it will be an automated session which can debug the, uh, debug the errors errors, uh, debug the errors which are being produced in the data pipelines for the multiple, uh, high volume data. And, uh, in the production, then we can move the, uh, pipeline to the production, and we can follow the 3 layer architecture which is, uh, 2 layer architecture that is stage and prod, stage and ODS layer. On the staging layer, it will be, uh, raw data that is inserted, uh, then, uh, it will be transformed in the Snowflake, and then it will be moved to ODS layer. Layer. So this this is the architectural consideration, uh, that we'll be, uh, be using for the high volume data processing so that no data get lost and, uh, no, uh, redundant data can be inserted in this

We can be implementing independent, uh, potency in the, uh, stopping data ingestion by using the Snowpipe. Snowpipe, uh, once the later data is loaded and the, uh, once the data get, uh, loaded and the, uh, source table, then, uh, automatically, it will be inserted in the

For the CDC, uh, solution in Azure Data Factory, we can we can be using we can be using uh for the CDC in, uh, is your data factory? We can be, uh, using a particular column, uh, particular column, uh, for the modification of the data. And once the it get modifies, uh, every any data any row get modified, it will be updated with the, uh, with the date at which it will be a date or time it will be updated, then we can capture it incrementally that, uh, data is being updated.

So, for the semi-structured data like JSON, XML, so for the ELT process, we can directly load the JSON or semi-structured data into a Snowflake environment in a table, which is called as, the data type called as Variant by defining the column name as Variant and from that, we can run a procedure, so we can extract the data from that JSON and load it in another table incrementally and we can use streams so that every time we get new data, we get new data, standard stream, we can, we get every time a row is inserted or updated in the data, in the staging layer table, we get the, we run the procedure and we can run the procedure on that stream, stream data, which is, once the data, fresh data is loaded, we can use that streams for the increment loading of the, increment loading of the table, increment loading of the table, we can get the incremental data, incremental data from that, we can use the, use that stream in that procedure. So we can get the incremented data loaded in the final table. So that is how we can smooth the performance of the ETL process in the Snowflake involving semi-structured data. So for the lateral flatten technique, we can flatten the table, flatten the data smoothly in the Snowflake environment.

It's a matter that affects the model

Let's wait for the conversation to end. Still doing jobs. Still not sleep. So... There is a stage in build and deploy. So it is a two-layer architecture in which we build, we develop, we test it, and then we deploy to the prod. Deploying to the prod. So steps that is involving that building a job and build and test, building and testing. So it will be impacted like first we get developed that CI-CD pipeline for the, we use in this CI-CD pipeline, we get the first stage as development. And then we have that inside that development, we have the build and test. So there is a two-layer architecture that is first is build and then is deploy. So first stage is build, in that we build and we test it, we build and test the pipeline and then in the deploy stage we use, we get deployed to prod means we need to deploy this into the prod. And that step says deploying to the production. So it will impact like we have built the pipeline, we have built the pipeline and we test it. So it will get, it will go smooth.

So, for the machine learning pipelines, we need data accurate, we need accurate and instant data as it gets updated. So, for the machine learning pipeline, we have built a snowflake from this snow pipe for the source where we get the data. And from that source, once the data is uploaded, updated in the source data, then it gets smoothly updated in the staging layer. From that we run a task, in that task we run a procedure to update the new record, update the new record in the ODS layer. So, in that ODS layer, we get the updated data as soon as it gets updated on the source. And for the streamline, for the quick process, we use streams. So we get the incremented data and it will be optimized, it will be in the optimized manner, it will be optimized manner and in that we use on the snowflake. So in that, on that table, on the ODS layer table, we have built a machine learning model, machine learning model that is getting updated as soon as the source data is updated. So this is how the new data will become available as soon as the source data is changed. And in that task, we can, we can, we can add a when command on this when command, we have used when stream has data, the stream name, once the data gets into the stream, then it will directly load it into the final table. So it will smooth the pipeline for the, for our machine learning data.

Let's design a CICD pipeline in which, uh, there is a development development stage, testing stage, and production stage. In the development stage, we, we build, uh, we build a pipeline for the Snowflake. Then we use, uh, automate testing to debug the error. Then we moved it to the which there is a minimum interruption in the, uh, data data services. Uh, so for that, we can