profile-pic
Vetted Talent

Pallavi S

Vetted Talent

Oriented Snowflake Developer with passion for leveraging data to drive business decisions. Seeking to utilize my 3 years of experience in Snowflake development, Advance SQL proficiency, and expertise in ETL processes to contribute to the success of dynamic organization. Dedicated to delivering scalable and efficient data solutions that meet and exceed business objectives

  • Role

    Data Engineer

  • Years of Experience

    3 years

Skillsets

  • Data Warehousing
  • ETL processes
  • Snowflake development

Vetted For

9Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Data Engineer With Snowflake (Remote)AI Screening
  • 64%
    icon-arrow-down
  • Skills assessed :Azure Synapse, Communication Skills, DevOps, CI/CD, ELT, Snowflake, Snowflake SQL, Azure Data Factory, Data Modelling
  • Score: 58/90

Professional Summary

3Years
  • Aug, 2021 - Present4 yr 2 months

    Snowflake Developer

    Tata consultancy services
  • Snowflake Developer

    TCS
  • Junior ETL Developer

    TCS

Applications & Tools Known

  • icon-tool

    Snowflake

  • icon-tool

    Advance SQL

  • icon-tool

    Talend

  • icon-tool

    Python

  • icon-tool

    Azure Data Factory

  • icon-tool

    Unix

  • icon-tool

    Jenkins

  • icon-tool

    Github

  • icon-tool

    SQL Server

  • icon-tool

    Tableau

  • icon-tool

    Putty

Work History

3Years

Snowflake Developer

Tata consultancy services
Aug, 2021 - Present4 yr 2 months

    Experienced in working on Enhancement  and Development using Snowflake. Proficient in designing, optimizing, and  maintaining Snowflake data warehouses. Experienced in Snowflake architecture,  including schema design and data loading  optimization. Development of SQL scripts to clean,load  and run data tasks.Capable of leveraging  SQL for data analysis, modeling and  database administration tasks Expertise in writing complex SQL queries,  including joins, subqueries and window functions. Experienced in designing, developing, and  deploying ETL processes using Talend Data Integration or IBM DataStage. Capable of extracting, transforming and loading data from various sources to target systems. Skilled in implementing data quality checks and error handling mechanisms within ETL workflows.

Snowflake Developer

TCS
    Create, test, and implement enterprise-level apps with Snowflake. Solve performance issues and scalability issues in the system. Build, monitor, and optimize ETL and ELT processes with data models. Engineered efficient ETL processes using Snowflake tasks and streams, automating data ingestion and transformation workflows.

Junior ETL Developer

TCS
    Experience in Designing, Developing, Testing, Documenting of ETL Jobs and monitoring the application to process data using SQL and Unix for processing data. Proficiency in writing and debugging complex SQL queries. Engaged in Patching and testing of the Data in production monthly and quarterly. Reports are viewed and processed on Tableau.

Major Projects

6Projects

ECOLAB ,Manufacturing Domain

TCS
Mar, 2023 - Present2 yr 7 months

    Designed, implemented and Orchestrated Scalable robust Snowflake Data Warehouses. Innovatively crafted schema designs and data loading strategies, ensuring optimal performance and scalability of Snowflake environments. Loading data from Azure factory into Snowflake using Pipelines. Developement of SQL scripts.

ECOLAB

    Designed, implemented and Orchestrated Scalable robust Snowflake Data Warehouses. Innovatively crafted schema designs and data loading strategies, ensuring optimal performance and scalability of Snowflake environments.

ABB

    Create, test, and implement enterprise-level apps with Snowflake.

Fifth Third Bank

    Experience in Designing, Developing, Testing, Documenting of ETL Jobs.

ABB, Manufacturing domain

TCS
Aug, 2022 - Mar, 2023 7 months

    Role : Snowflake developer Client : US Language : SQL server Tools: Snowflake, Qlik compose, SQL server Create, test, and implement enterprise-level apps with Snowflake Solve performance issues and scalability issues in the system Build, monitor, and optimize ETL and ELT processes with data models. Engineered efficient ETL processes using Snowflake tasks and streams, automating data ingestion and transformation workflows.

FTB, Banking Domain

TCS
Aug, 2021 - Aug, 20221 yr

    Role : Junior ETL developer Client : US Language : SQL server Tools: IBM Datastage, Talend, AQT, Unix, Service now Experience in Designing, Developing, Testing, Documenting of ETL Jobs and monitoring the application to process data  using SQL and Unix for processing data. Proficiency in writing and debugging complex SQL queries Engaged in Patching and testing of the Data in production monthly and quarterly. Reports are viewed and processed on Tableau. Extract, Transform, Load process, and technological infrastructure implementation. debugging and fixing issues during development and implementation of given requirements.

Education

  • Bachelor of engineering

    KSIT (2021)
  • Bachelor of Engineering

    K S INSTITUTE of Technology

Interests

  • Gyming
  • Travelling
  • AI-interview Questions & Answers

    Hi, I'm Pallavi, based out of Bangalore. I have been working with TCS from past 2.8 years, where I started off my journey as a junior ETL developer, where I worked under a BFSI unit. As an ETL developer, I've worked with several ETL tools like Talent, IBM Datastage, where I've used different technologies like SQL, PostgreSQL. My daily task was to create, develop and design and do small implementations on the jobs where I would be dealing with real-time transactional data coming in every day. Also, I was dealing with header handling and managing the tasks which was happening on a daily routine. So this was my task as a junior ETL developer. Currently, I've been working as a Snowflake developer from past 1.6 years, where I've been working on improving and developing Snowflake data warehouses and advanced SQL scripting, where I've been creating complex use and working with window functions, CTEs and also I've been working on joins, sorting and filtering data. And as a Snowflake developer, I have indulged myself in creating pipelines, dealing with data ETL and ELT process that is loading and unloading data in and out data warehouses. We have been using cloud-based data warehouse that is Snowflake and also Azure Data Factory and Azure Data Lake to load the data permanently. And as a Snowflake developer, I've been given daily task that is to do implementations or develop SQL scripts and also indulge in loading data from source to target and design new pipelines and the workflow for the same. And as a Snowflake developer, I have been using tools like Azure Data Factory, Azure Data Lake, SQL, AdWords SQL and Snowflake. And this was my current project. I have worked as junior Snowflake developer for six months in my previous project that was where I was been using Click Compose as my ideal tool. So overall, I have used this app. The journey of my Snowflake developer has made me confident to do things and debug things by myself. So I have also worked with testing team where I have understood how to work on my errors and how to go back and recheck my faults, which has helped me to grow and learn a lot in this journey. So I'm looking forward to use my capabilities to work in your organization. Thank you.

    methods for implementing idempotency in snowflake data ingestion let's go to methods for implementing idempotency in data ingestion so for data ingestion we have primarily been using azure data factory where we will be loading our data into our azure data lake so while we have been doing this we have to create pipelines to bring in the seamless serverless and without code the seamless process that is loading data through pipelines so when we are doing this we have to understand i have understood how to take in the steps that is i will see the quality of the data the volume of the data the size and the frequency of the data it is coming in based on which we will be understanding the data ingestion is it a bulk load or is it a continuous data loading or is it a real time streaming data so based on this we will be creating pipelines so if at all it is manually if we are manually going and running the pipelines then we don't have to we don't have to give any kind of functionality or scheduling but if i have if we have been triggering the pipelines based on the streaming data we'll have to give the metadata metadata entities in there based on the the data we've been receiving so for data ingestion primarily we'll have to focus on the type of data it is coming in and how the data has been processed or how the elt process is going on so for data ingestion mainly we'll have to focus on the source from where we've been receiving is it coming in from the databases the apis or it is coming from the files so we have to make sure the same and the blockers for data ingestion could be if we have not given the proper target where it should go and land or the proper place if it is not if it is overloaded we'll have to truncate and then load the data so these are the challenges which we'll face when we are doing a data ingestion

    What ways would you leverage data modeling techniques to enhance query performance in Snowflake? To enhance query performance in Snowflake? To enhance query performance in Snowflake? To enhance query performance in Snowflake? To enhance query performance in Snowflake? To enhance query performance in Snowflake? To enhance query performance in Snowflake? To enhance query performance in Snowflake? To enhance query performance in Snowflake? To enhance query performance in Snowflake? To enhance query performance in Snowflake? To enhance query performance in Snowflake?

    How would you design a resilient data pipeline in Azure Data Factory to handle intermittent source data available? Source data? How would you design a resilient data pipeline in Azure Data Factory to handle intermittent source data availability. So to design a data pipeline, first, um, it it plays a very major role because, um, the first step will be data ingestion. To to do the a data ingestion, sure, I should understand the data and how the data is coming in. Is it, uh, through the databases I've been receiving, or is it coming from APIs, or is it coming from, uh, SAP, or am am I getting raw data from SAP or am I getting data from the files? Once I rectify this particular part, then I'll understand how the data is being given. Is it I would want to load it. Uh, I will understand the method of data ingestion. Is it bulk loading or is it continuous data load or is it the real time streaming? So once I understand this, I will be able to create a data pipeline. And, also, I would want, uh, is it the the format of the data, size of the data, and the frequency of the data in which it is coming in. So, uh, these are the 3 factors I would consider before creating a data pipeline. So when I'm creating a pipeline, um, which is for the coming the incoming data that is from the source, uh, so after creating the pipeline, I would want to also bring in, um, if at all, uh, we have been dealing with Snowflake and Azure Data Factory. So we'll be loading, um, the data in batches. So which will be loaded in Azure data Azure, um, data lake. So we'll be loading our data in there using data pipelines. So when, um, uh, east, uh, I have worked on both of these where I've been, um, choosing where I've used manually, triggering the pipelines which I've created. And, also, I have used the automated web method to, um, trigger my data pipeline. So when I have manually been loading it, um, I will be giving, uh, the the required details when about the metadata, uh, to my leader. The triggering part, I will be giving, uh, at what particular time for every 5 minutes or 10 minutes I would want to trigger my pipeline. So this is what, um, I would con consider when I am creating, oh, a resilient data pipeline. So once the pipeline is created, I will make sure the pipeline is running fine. Before I run it, I will do a testing of the pipeline. That is I will follow the debugging format so that, um, uh, I will understand where it is failing. So when I do the debugging part, I will understand where the data pipeline is failing or what are the, um, you know, issues it is going through. I will fix it, and this is how I have, um, you know, successfully understood the way of bringing in a proper data pipeline. So I will do the testing first, and then I will bring in my proper, um, real time data or bulk data or continuous data to load it into my required target. So I to bring in a resilient data pipeline in which to handle the storage ability, I would do the testing, uh, after creating the data pipeline. So testing has, uh, saved a lot of errors and, um, has and the the issues which I've faced before, and, uh, it has been working fine after

    Architectural consideration would you keep in mind when designing a high volume data processing pipeline using Snowflake What are the architectural consideration? What are what architectural consideration would you keep in mind data processing pipeline data processing pipeline using things like in Azure Data Factory. Okay. So since I've been, uh, dealing with, um, high volume data, so as a data engineer, I have understood that, uh, before loading any kind of a data, uh, Snowflake has its own way of beautiful way of dealing with the, uh, huge volumes of data. That is we have, um, um, data storage and we have virtual machines and we have a data service layer, which can hassle free you know, improve the process or processing the high volumes of data. So if at all, uh, we've been loading, um, uh, Harsho, um, you know, high volumes of data from the Azure Data Factory, Firstly, um, we have, um, a technique under data storage where it will own it will compress and encrypt the data, and it reduces it has, uh, reduce it has reduced the data redundancy. So when I'm storing the high volumes of data, so I don't have to worry about, uh, you know, data redundancy here. So my data has has been encrypted and or compressed. So my storage kept as the storage, I have to pay only for what I've been using. So this problem has been solved here. And, also, here, uh, I can scale up and down. Um, uh, the virtual machines can be scaled up and down based on the data which is coming in. So this can, uh, solve all seamlessly or can function faster even though I have been getting large volumes of data. Um, I, uh, for a reason, I know that virtual machines can be scaled up, uh, if at all if there is a huge ones of data from a small to medium to large, I can increase my size of the virtual machine to in to to increase the speedness of the processing. So using this, I can improve my data processing by it's getting up and down. And, also, um, I don't have to worry about, um, you know, the data loss because we have a time travel here. We have features like, um, metadata policy here. We have, um, micropartitions coming in here. So when we have, um, time travel so if at all I am processing a data using a SQL query, once it is running, it it it is already going and, uh, the storage is happening. So the metadata is stored in cloud service layer and its micro partition like, metadata stores micro partitions details, account details, max, min, the format, the the size of the data. So all the information is stored in my metadata, which is stored in the cloud service layer. So I don't have to worry about the data loss or anything. So we have time travel here, so I can give my data retention period accordingly. So for we have for enterprise level, it is for 90 days. So even if I, um, lose the the data or anything, I can go back and bring in my data. So this is one way of using the architectures of Snowflake. So what are the visual considerations will you keep in mind when, um, designing a high volume data processing pipeline? So when I have so many features available in Snowflake and also we have a wide variety of, um, you know, um, the features under

    Describe how would you configure Azure Data Factory to handle dynamic scaling based on describe how you would configure how you would configure Azure Data Factory to handle dynamic scaling. So, um, um, Azure Data Factory is mainly used to load data into our, um, Azure storage. That is, uh, I've been using Azure Data Lake here, uh, which can, um, load, store, uh, restructured or unstructured data. So dynamically scaling based on the workload demands. So, uh, for example, I've been getting huge volumes of data at a time. So I can load this bulk data. Uh, I can I can load bulk data? So if at all, I'm loading bulk data as well here. I've been bringing in, uh, the concept of pipelines here. So even I can load, um, multiple, um, data or the huge forms of data using pipeline at a time because we can think in max of, uh, 40 40, uh, activities at a time, uh, where 1 pipeline can take 40 activities at a time when Azure Data Factory. So if at all that is not happening, I can bring in another pipeline to load this bulk data. So which can, uh, which can be used as a run to child Azure child Azure Data Factory or the child pipeline, uh, I can run that together. So 2 pipeline at a time will be able to bring in my bulk data insight. So if at all, I've been dealing with, um, the continuous data load that I that is I've been getting a 1000,000 records for every 10 minutes. So I would bring in, um, I will understand the frequency of what it is coming in. For example, I have told that 10:10 minutes I've been getting data. For that particular thing, I will create a pipeline, uh, where I will be scheduling to trigger the particular stream. I will create a stream. I will create a task and where I will be giving in, um, times of the schedule for every 10 minutes. So every 10 minutes this particular uh, task will trigger and this pipeline will bring in the data into my external stage for every 10 minutes. This is how I will deal with, um, the continuous data load. If at all I have continuous real time streaming data that is for every 1 minute or for every 60 seconds or for every 2 minutes I've been getting data. I will use the same technique that is tasks or or schedule and the streaming. Streaming I'll create a stream because I have to capture the change it to capture, uh, what kind of data is coming in, what are the changes happening to my table when I'm loading it. So for this also, I'll be using this team, uh, triggering method to store all the, um, to bring in all the data batch wise into my data. So either way, we can deal if at all the data is huge also, we can handle using our DataFactory. And if it is, um, medium page data also, we can handle using Azure Data Factory. So this is my experience how I have, uh, dealt with loading data using Azure Data Factory.

    In this code snippet using data build tool, there is an error that affects the model deployment. Can you spot and explain the error the error in configuration of this method? Error in configuration of in this course we're using data built tool. Uh, according to, uh, the the coding, I think, uh, the creating the index part should be done in the prehook part, not in the posthook part. Because, um, if at all you are creating a model, you would want, uh, the indexing to be done in the, pre pre step, but not in the post, uh, hook step. So if you're doing the indexing part in the pre step, it'll help you to, um, improve your performance and increase the speed of the performance, and, um, it'll help you Yeah. So indexing is not needed in the post hook part. So that is the error. So the create create part should be given in the prehook part. The drop index if exists And, also, when you're granting, you have to give the table details when you're granting, uh, any kind of uh, access. So it is just on the role, but not on the table or any kind of object entity, uh, or attribute. I don't see that as well. So that is another error I have seen here. So

    Suppose you come across this section within a CID CD pipeline configuration using Azure Data Factory or Azure's identify the possible oversight and expose identify the possible oversight and expose down. Deploy. So so so deployment process in my project or what should be followed here is so when you're building certain things, um, you would first want to test the same in your, um, the QA, the test environment, uh, UAT or the test environment. So once the testing has been successful there, only then, uh, we have to be moving it to the prod the production part. So when, uh, you are not being moving it to production, if you've been moving it to production directly, uh, we have chances of seeing a lot of errors there. So I think, um, deployment process first should be done into, uh, test environment, then into, uh, the UAT and then into the production. So to ensure the most safer because the most safer way because you will be, uh, seeing a lot of, uh, bugs or errors when you have been doing, um, the testing in your test environment, uh, where you will be doing 1, 2 iterations, um, and then you will be moving it to UAT 2 or the testing there, and then you will be moving it to production. So production is a final place where you will be deploying things. So I guess prior, um, testing should be done before deploying it to the production. So, uh, that is what I guess it's missing here. So directly, you've built it, and then you've moving to you've been deploying it to production. So I guess that step is missing here.

    Machine learning data pipeline in Snowflake and ensure it is updatable as a new data data data pipeline, motion learning, data, and smokeless and ensure is if I would want to build or using machine learning, um, if I'm if I'm building a data pipeline using machine learning, So firstly, um, I would only, uh, build on the data which I've been, um, continuously, uh, dealing with. That is, um, if I have, uh, the same routine data coming in. So I know how the data functions. I know the metadata. I know the, uh, where the data has been loading. So, um, I will understand the, uh, dataset properly, and then I'll be going for creating the data pipeline for Snowflake using machine learning. So machine learning, um, once I've trained, it functions like if I train the data model, it'll, uh, start giving the output accordingly. So, um, understanding my, um, frequency of the data, data format, data size, and, um, the the attributes and the entities and the constraints that are present in my data. I will list out them, and then I will create a, um, a model. And then I will try, uh, building a a data pipeline using these entities. And um, uh, with this, I will be successfully, uh, able to create a data pipeline using my data using the data modeling techniques. And then I will run the pipeline the same wise. And I will use a automation technique here so that I have routine data coming in. So the routine routine way the data pipeline also starts functioning. So so if at all, I would want it as a new data available in Snowflake. So, uh, once I create the data pipeline, uh, the data will be landing in, um, the data, uh, lake. So I will be bringing in new database or a new external tables so that I can pull in and use it as a new data. So the ex it's a complete new data. So this is one way. Or if I'm manually doing it, I can also, um, trigger my pipeline, which I have created manually, uh, by giving the entities, um, which I have learned from the patterns of my previous routine data. So I will create a data model from the same, and I will try running my, um, data pipeline by testing it first by using the debugging, you know, patterns and then understanding, uh, what are the errors. If I have any errors after performing the debugging part on the pipeline, I will run the pipeline. I will land the data in Datalake. I will try pulling the same data using, uh, external table giving the path of the, um, the the path of the URL of our particular, um, you know, be it Azure or Blob or Azure storage or AWS storage or GCP storage. I will put the data directly into the external storage, and then it will be a complete new data. I'll start using it for further querying. Um, after querying, I'll be using the

    Apply DevOps practices to improve collaboration and reduce lead time in smoothly data operations. So we've been using, um, DevOps practices to improve collaboration and reduce lead time in small data operations. Okay. So, uh, we have been using Azure DevOps, for our day to day planning. Um, so we've we've used DevOps to plan our project to code for code management for building and testing, uh, the codes, and for deploying the codes or releasing the codes into the production and monitoring. So how, uh, I have how it has collaborated, um, how it has been a collaboration between the developers and the testers is we have, um, you know, the designing and the developing and the planning of the project goes on on our Azure, um, or DevOps that is on the dashboards. And we have scrums, and we have reporting tools as well available in our Azure DevOps. And, um, so once this is done and the building and the testing part goes on when we're working on the code management of the coding part. So where we can bring in our codes and we have our version controls here. So each developer who's been coding can bring in his new version here. And if someone would want to refer the code, they can go back there and refer the codes from the code version parts. So we have Azure repos where we'll be uploading our codes. And, um, after uploading, um, where our testing team comes here where they can plan their test uh, they can run their test plans here. And if they have any kind of problems with the testing part, they can raise the bugs, uh, so and, uh, they can attach that particular developer name there for that particular bug. So if I have count mismatch, uh, or if I have any, um, bug in that particular, um, testing part, I can add the developer there. And we have sprint wise bugs so that developer can fix that particular bug in that particular sprint and come back. So, uh, this way, um, you know, uh, the collaborations have been going on well and it has seamlessly helped us to collaborate things. And, also, we can, um, you know, um, you know, continuous integration and continued deployment also is going on using Azure DevOps where we have Jenkins to deploy things. And we also have GitHub Labs, and we also have, um, different tools here. So for deploying, uh, we can directly use of, you know, different environments here. So test, um, u a t and production. So for we have unit testings also going on here. So we can also run our unit tests, which is uploaded by developers on Azure DevOps. So this has helped us to, um, you know, interact with, um, developers and, um, you know, bring in, um, the we we can have the live coding going on between 2, um, developers. So it has helped in planning, managing the code, building and, um, testing the same and, um, you know, deploying and releasing the data into the production and also monitoring. So we have varieties of, um, things in our Azure DevOps. That is we have dashboards. We have reporting tools. We have scrums. We have sprints. We have boards. Uh, we have boards wherein we can see the projects and we can also plan the projects. And we can also see our user stories uploaded, uh, which is, um, for each sprint, uh, where we can upload our codes, uh, we can also verify things

    Optimize data retrieval time. Optimize data retrieval time in smoothly while dealing with large semi structured data sets using Snowflake SQL. Optimize data retrieval optimize data retrieval time in Snowflake while dealing with the large semi structured So if at all we are dealing with large semi structured, uh, JSON, uh, we have uh, you know, things where we can convert, uh, the JSON file, uh, using a variant, and then we can convert that back into our Snowflake. So this is this has no difference, um, or it is not a very large, um, you know, task for us to do. So once you're converting that variant in into your required format, uh, into a required acquiring format, it is, um, you know, easy for us to, um, structure it because when I'm converting, uh, the JSON file into variant and then I'm converting the same variant file into rows and columns once it is um that that particular process is done it is very easy for me to, um, you know, process these rows and columns, um, using MySQL. So that is, uh, once it is in, uh, the format of, um, rows and columns, database table or schema format or be it a table format, it will be easy for me to, uh, process the data that is clean the data uh, transform the data, or manipulate perform data manipulations on the, uh, the the just now, you know, the the transform the adjacent file into variant into a table. Uh, so when once it is in the table format, it should be easy for the developers to process the data based on the client requirement. So optimizing the data retrieval, uh, time in Snowflake, uh, can be done in this particular format that is converting the semi structured JSON, file into the variant variant type and then converting that variant type into the rows and columns format and use that particular, uh, formatting for the processing or the cleaning of the data. So this will, um, uh, help us reduce, uh, the time, and you can optimize the complete, um, large same structured JSON file. And, also, you can process the, uh, table of that is the variant in the form of columns and rows in a very faster manner or with speed manner, it'll it'll improve the performance.