
Data Scientist
INCIF Technologies Pvt. LtdAssociate Consultant (DS)
Capgemini Technologies
AWS

DevOps

Tableau

ETL

Linux

Html

CSS

Bootstrap

Git
.png)
Docker

Pyspark

Airflow
.png)
Heroku
.png)
Jenkins

EC2

VPC

EBS

S3

Postman

Microsoft Azure
Yes. Could you help me understand more about background by giving a brief introduction of yourself? Yes. Sure. Hello. First of all, giving this opportunity to introduce myself. I'm Shumitin Ozre. I think there is an interruption. Okay. So let me, continue, from first. Yeah. Thanks for giving this opportunity to introduce myself. I'm Shivniti Nozary. I'm delighted to introduce as a data scientist having 3 years of experience in this in IT industry. I completed my B from Pune University in 2019, and recently, I complete postgraduation diploma from, University of Texas at Austin. My passion for data analysis and problem solving led me to pursue a career in this ever evolving and dynamic field. I work on diverse project from predictive modeling to the data driven business strategies. I work on diverse project from Purnu, I excel extracting value of insights from complex data with various tools and technologies. Including Python, TensorFlow, PyTorch, Py Park a GitHub, Django, Docker, ComputerVisa, NoSQL SQL, deep learning, machine learning, MLOps, And, yes, I'm well known about my ability to communicate technical findings to nontechnical stakeholders, also making the data driven decision within the organization. And, yes, I'm, yes, I'm super excited about the opportunity where I continue contributing my expertise and driving a a Data driven innovation role in Mac no. Yeah. And I work on various, project domains like telecom, ecommerce, and payment. And my contribution in that is to leveraging my expertise in Python. After that, statistical analysis, data manipulation technique to optimize and analysis workflows. Also, I contributed, we can say the collaborative with cross functional teams to ensure alignment of the, data science initiatives and project objectives. Also, I employed ETL techniques
Providing an example, how would you implement a sequence to sequence model in TensorFlow for machine trans translation task? Okay. Okay. To, uh, there are various steps, actually. No? So data preparation is there. After that, uh, to implement the sequence to sequence model in TensorFlow for machine learning. No? Uh, machine machine translation follow these steps. No? Data preparation is there. Model architecture. After that, define the model. Okay. After that, training and evaluation. So tokenize and preprocess. You are now a source and target text. Use a tensor flows, tokenizer, and pad sequences to prepare input sequences and target sequences. Model architecture, in that, we there are, uh, the sub steps are encoder, decoder. Encoder is used in LSTM and, uh, or we can say the GRU, uh, layer to encode the input sequence into a fixed size context vector. And, uh, also, we can, uh, stack multiple layers for better performance. And, uh, decoder use another LSTM and GRU layer with an attention mechanism to generate the output sequence. And after that, we kinda define the model by coding part. Okay. Uh, after that, the last one is, as I said, is training and evaluation. So train the model using pairs of scores and targeted sequences. Evaluate the performance using metrics like, uh, BLU, uh, score for translation quality. Uh, Yeah.
So how would you benchmark the performance of NoSQL database against SQL when dealing with large unstructured datasets using Python. Okay. Uh, actually, there are here also we can use, uh, stay follow the step by step setup and configuration in that the NoSQL database is there. Choose a NoSQL database. Example, MongoDB, Cassandra. All we can do is, uh, we have to just set it up. After that, SQL database is there. Choose an SQL database, for example, PostgreSQL, MySQL, and set it up. 2nd part is data preparation. Uh, in that, we can generate details data, create a large unstructured data set to use a benchmark. King, this could be a collecting of documents with, uh, varied fields for NoSQL and, uh, no, similar tab, uh, tables with large volumes of rows for SQL. Benchmarking task like insertion performance measure the time taken to insert a large number of records, documents, um, or we can say into the board database. Uh, query performance execute the various queries, like, uh, the simple retrievals after the complex aggregating aggregations and the measures, the response times for both database. So, also, we can update the performance, deletion performance. After that, Python, uh, no, uh, we can no. Uh, there is a Python code example also there. No. So we can use that. Analyze result, preparing the performance metrics in in terms of insertion time, query response time to determine which database performs better under the given conditions. Consider factors like, uh, scalability, ease of use, uh, also now and the specific use case, uh, requirements in addition to raw performance metrics. Yeah. Uh, and what we consideration in that, no? Uh, ensure the environment, uh, hardware network is considers consistent when running benchmarks. After that, the, uh, test with a variety of operations and test sizes to get a comprehensive view of performance. Yeah. So that's the case.
So what factors will you consider when choosing between convulational neural networks and recurrent neural networks in computer vision task. So what factors would you consider when choosing between convolutional neural networks and recurrent neural networks in computer vision task? Uh, wait. And, uh, unable to recall it. Uh, what factors would you consider when choosing between convolutional neural network and recurrent neural networks in computer vision task? What factors would you consider when choosing between convolutional neural networks and recurrent neural networks in competition task? Something here. What is going on?
K. Which Python classes or frameworks will assist you in developing an anomaly detection system with PyTorch, and what will be your validation strategy? Okay. Uh, strategy, uh, we, uh, we can now follow various steps in that. 1st step is import necessary libraries after that, generating synthetic data, creating sequences, defining the autoencoder model, converting sequences in PyTorch sensors. No? After that yeah. This is the steps we can follow.
Which Python tools you would use for text tokenization and sentiment analysis in in an NLP pipeline, And why would you choose them? According to SoundScrapers, text block Okay. So in that case, we use, uh, text blob. Text blob is a must for developers who are, uh, starting NLP in Python and want to to make, uh, want to make the most of their first encounter with NLTK. It provides beginners with an easy interface to help them learn the most, uh, basic NLP tasks like sentiment analysis, postaging, or noun phrase extraction. Yeah.
Oh, give the following Python code snippet. What is the issue that will prevent, uh, it from currently, uh, currently creating and machine learning model pipeline? I'll show the original code in, uh, the issue in the original code. No? Uh, there is improper importer. A typo is there. From sklearn.svmimport s v, uh, s v c is correct, but the pipeline definition, uh, s v c should be replaced with s, uh, capital s, SVC. So the correct last name is, uh, capital SVC with uppercase, we can say, not a small smaller case. Syntax error in pipeline steps. In the original code. Uh, the pipeline steps are correct incorrectly formatted. You have, uh, tested a test in a instead of a proper comma. No? So and separated and incorrect brackets bracket usage. It will be a list of tuples, uh, with each tuple containing the name of step and the corresponding estimator or transformers. And, uh, assuming x train and y train are defined, while not syntax error in that ensures that the x train, y train are properly defined and contained the data you intend to use for fitting the model. So yeah. So there are several issues. So yeah.
The import statement is there. No. It's we have to correct it that. Import, uh, light 3. It should be on a separate line, and, uh, the corrected line is from Flask import Flask, uh, JSON JSONify. And, uh, the second one is Flask app initialization. App Flask name should be a app Flask name using, uh, in is equal to operator to assign the Flask instance to the variable app and, uh, the the method list formatting. So in method is equal to, in bracket, the gate, use normal codes and make sure the list is properly formatted. Data fetching and return. So data curves dot fetch all should be a data dot fetch on assign the result of those 2. Okay. And also, genify data as soon as data is in a format that can be directly, uh, serialized to JSON. A SQLite fetch returns a list of tuples which need conversion to a JSON serializable format. For example, convert it to a list of dictionaries if required. So yeah. Also, we can add some additional considerations. So database column names to convert, uh, the top polls into dictionaries with column names. You might need to know, uh, or retrieve column names from the cursor descriptions. Error handling for production and work being considered adding error handling to manage exemption during database operations.
Can you devise a Python workflow that applies both deep learning and NLP techniques to extract insights from visual and textual data simultaneously? Uh, yes. Yes. We can. We can use, uh, Fastai is a Python based open source machine learning framework, no, that offers high level abstraction of deep learning model training. And, uh, and yeah. So we can yes. We can devise a Python workflow that applies the both deep learning and NLP technique. There are various methods and sources we have. The PyTorch is there after TensorFlow is a Keras. Uh, OpenCV is there. No?
Can you illustrate how version control with Jira would add in collaboration for a remote data science team deploying TensorFlow model. Actually, Git is a version control system that tracks file changes. GitHub is a platform that allows developers to collaborate and store their code in the cloud. So think of it, uh, think of it this way. Git is responsible for everything GitHub related that happens locally on your computer. No? So yeah. So, uh, that's the basic main reason that we can illustrate the version control with the help of GitHub. So, also, we can use no version control device as a GitHub. So you can integrate the various local machines as a developer in in together a bit together, work together. And we get to know about the the the the status of the task task, the, yeah, task completion, the pending task, the requirements, yeah, whatever changes happened. So can Git, uh, GitLab, we can just track the records.
Discuss a technique in Python to automatically handle missing or corrupted data in large dataset that might affect the machine learning model performance. Uh, deleting the columns with the machine data missing data. In this case, let's delete the column edge and then, uh, feed the model, uh, and check the accuracy. This is one method. After that, uh, other imputation method is there. Uh, filling the missing values is there. L one sorry. Uh, k n n is there. Uh, dealing deleting the row with missing data, which no. Suppose the column have, uh, more than the half of data is missing or null values, then we can just delete or drop the whole column from the database. Yeah. So on the basis of that, we just now deal with the missing values and the corrupted data. K NN is there. Uh, also, we can, uh, also replace with that, uh, with the, you know, specific mean or median on the basis of how much, uh, what type of data we get. So on the basis of that, uh, we, uh, replace the values, mean and median values on on that.