
Data Engineer
Lagozon Technologies Private LimitedData Analyst
Intrics Solution Private limited
Snowflake

SQL Server

AWS CloudWatch

AWS Glue

AWS EC2

Excel

S3

SSMS

MySQL
Hi. I'm. I have, uh, 3 of data science data engineer experience with, uh, uh, with having a strong background in Python SQL. No big AWS is s 38 AWS. AWS, uh, Blue AWS, Redshift, and, uh, relevant, uh, relevant experience in Azure also, uh, in the, uh, and I have used the cloud, uh, Snowflake extensively for the last 2 years and, uh, with AWS. And, uh, also worked on, uh, Azure for the multiple project in house project. And, uh, prior to that, I have been working with, uh, I was working with private limited at Intrex Solution Private Limited, uh, as a associate data analyst, uh, working on, uh, working on the data, uh, data with, uh, having, uh, technologies like MongoDB, SQL, Python, and Excel. And, uh, and I have done, uh, PGP in 2020 to 21 in data science and engineering, uh, where I have learned about the data engineering techniques and, uh, data science projects. And, uh, I've been working, uh, I have learned pandas, not by, uh, SQL, uh, Excel, and other and other, uh, useful business tactics to, uh, model the data in business field to, uh, bring the business
Hi. I'm. Hi. So in this transactional control, I would be using, uh, the, uh, I would be monitoring the test and development and test environment, uh, that all the transaction, uh, are, uh, going 1 by 1 on concurrent detail processes. And I would be testing each and every, uh, in the automate testing, uh, flowing with me. And I will be using streams streams for the all the, uh, transaction, uh, transaction that is happening on the table on the table. Like, uh, we can, uh, we can use streams for insert, update, and delete all the transaction transaction DML performed on the table. Uh, so we can use streams so that, uh, every transaction get recorded on the Snowflake Snowflake table, uh, that can be used, uh, that can be used when the, uh, data is, uh, changed in the table. So for the CDC purpose, for the CDC change data capture, we use streams. Uh, we use streams, We use streams for the, uh, looking at the, uh, transaction control so that we can get, uh, when the data get updated or, uh, uh, which data is being inserted or updated, uh, in the last, uh, few days or, um, month back so we can, um, maintain the data consistency of the
I would be, uh, using the 3 stage layer in which, uh, uh, 1 is the development environment, 1 is the testing environment, and, uh, other is the, uh, production environment. So once, uh, once the data get developed, uh, once the, uh, ETL process is pipeline is developed, We can, uh, we can, uh, test it, and it will be an automated session which can debug the, uh, debug the errors errors, uh, debug the errors which are being produced in the data pipelines for the multiple, uh, high volume data. And, uh, in the production, then we can move the, uh, pipeline to the production, and we can follow the 3 layer architecture which is, uh, 2 layer architecture that is stage and prod, stage and ODS layer. On the staging layer, it will be, uh, raw data that is inserted, uh, then, uh, it will be transformed in the Snowflake, and then it will be moved to ODS layer. Layer. So this this is the architectural consideration, uh, that we'll be, uh, be using for the high volume data processing so that no data get lost and, uh, no, uh, redundant data can be inserted in this
We can be implementing independent, uh, potency in the, uh, stopping data ingestion by using the Snowpipe. Snowpipe, uh, once the later data is loaded and the, uh, once the data get, uh, loaded and the, uh, source table, then, uh, automatically, it will be inserted in the
For the CDC, uh, solution in Azure Data Factory, we can we can be using we can be using uh for the CDC in, uh, is your data factory? We can be, uh, using a particular column, uh, particular column, uh, for the modification of the data. And once the it get modifies, uh, every any data any row get modified, it will be updated with the, uh, with the date at which it will be a date or time it will be updated, then we can capture it incrementally that, uh, data is being updated.
So, for the semi-structured data like JSON, XML, so for the ELT process, we can directly load the JSON or semi-structured data into a Snowflake environment in a table, which is called as, the data type called as Variant by defining the column name as Variant and from that, we can run a procedure, so we can extract the data from that JSON and load it in another table incrementally and we can use streams so that every time we get new data, we get new data, standard stream, we can, we get every time a row is inserted or updated in the data, in the staging layer table, we get the, we run the procedure and we can run the procedure on that stream, stream data, which is, once the data, fresh data is loaded, we can use that streams for the increment loading of the, increment loading of the table, increment loading of the table, we can get the incremental data, incremental data from that, we can use the, use that stream in that procedure. So we can get the incremented data loaded in the final table. So that is how we can smooth the performance of the ETL process in the Snowflake involving semi-structured data. So for the lateral flatten technique, we can flatten the table, flatten the data smoothly in the Snowflake environment.
It's a matter that affects the model
Let's wait for the conversation to end. Still doing jobs. Still not sleep. So... There is a stage in build and deploy. So it is a two-layer architecture in which we build, we develop, we test it, and then we deploy to the prod. Deploying to the prod. So steps that is involving that building a job and build and test, building and testing. So it will be impacted like first we get developed that CI-CD pipeline for the, we use in this CI-CD pipeline, we get the first stage as development. And then we have that inside that development, we have the build and test. So there is a two-layer architecture that is first is build and then is deploy. So first stage is build, in that we build and we test it, we build and test the pipeline and then in the deploy stage we use, we get deployed to prod means we need to deploy this into the prod. And that step says deploying to the production. So it will impact like we have built the pipeline, we have built the pipeline and we test it. So it will get, it will go smooth.
So, for the machine learning pipelines, we need data accurate, we need accurate and instant data as it gets updated. So, for the machine learning pipeline, we have built a snowflake from this snow pipe for the source where we get the data. And from that source, once the data is uploaded, updated in the source data, then it gets smoothly updated in the staging layer. From that we run a task, in that task we run a procedure to update the new record, update the new record in the ODS layer. So, in that ODS layer, we get the updated data as soon as it gets updated on the source. And for the streamline, for the quick process, we use streams. So we get the incremented data and it will be optimized, it will be in the optimized manner, it will be optimized manner and in that we use on the snowflake. So in that, on that table, on the ODS layer table, we have built a machine learning model, machine learning model that is getting updated as soon as the source data is updated. So this is how the new data will become available as soon as the source data is changed. And in that task, we can, we can, we can add a when command on this when command, we have used when stream has data, the stream name, once the data gets into the stream, then it will directly load it into the final table. So it will smooth the pipeline for the, for our machine learning data.
Let's design a CICD pipeline in which, uh, there is a development development stage, testing stage, and production stage. In the development stage, we, we build, uh, we build a pipeline for the Snowflake. Then we use, uh, automate testing to debug the error. Then we moved it to the which there is a minimum interruption in the, uh, data data services. Uh, so for that, we can