
9 Years of experience in Data mining, Cleaning, Validation, Modelling, Analysis, Data Ware housing, Visualization, Statistical modelling with large data sets of Structured and Unstructured data with technologies Power BI, Tableau, Azure SQL, Azure Analysis service, SSAS, Dataflow, DataMart, Datahub, Microsoft Fabric.
Senior Power BI Consultant
Y&L Consultancy - US ContractSr. Power BI Data Analyst
Indium Software Pvt LtdSr. Power BI Developer
Crafsol Technology Solutions Pvt LtdBI Developer
KPIT Technologies LtdBI Developer
Cap Gemini Software Pvt Ltd
Microsoft Power BI

Power BI Service

DAX

SSAS

SSIS

Dataflow

DataMart

Azure Data Factory
%20.png)
Azure Data Lake Storage Gen2 (ADLS)

T-SQL

Microsoft Azure SQL Database

MySQL

SQL Server Reporting Services

Python

R Programming Language

Azure Analysis Services

MDX

Git

Azure DevOps Server

Data Warehousing

Data Analysis

Data Modelling

Power BI

SQL Server

Hadoop

Hive

Sqoop

PowerShell

Power BI Mobile

Tableau

QlikView

Power BI Desktop

SSAS

Tableau

Azure Data Lake

Azure Synapse

DAX

SQL Server

My SQL

Hive

Oracle

MongoDB
.jpg)
Teradata

Salesforce

SSIS
Recognizing Exceptional Performance
At MGAE Entertainment, we believe in recognizing and celebrating exceptional performance, and Mohan Vankudoth has exemplified these qualities in every aspect of their work. As Sr Power BI Consultant, Mohan Vankudoth has consistently demonstrated a level of dedication, expertise, and professionalism that has truly set them apart.
Mohan Vankudoth has been instrumental in the success of our Power BI project, playing a pivotal role in its conception, implementation, and ongoing optimization. Their expertise in data analytics, coupled with their strategic mindset, has been invaluable in unlocking insights and driving impactful outcomes for our organization.
What truly sets Mohan Vankudoth apart is their exceptional ability to translate complex data into actionable insights that inform decision-making at all levels of the organization. Whether it's analyzing sales performance, identifying growth opportunities, or optimizing inventory management, Mohan Vankudoth approaches every challenge with a meticulous attention to detail and a commitment to excellence.
Moreover, Mohan Vankudoth has been a catalyst for driving innovation and fostering a culture of continuous improvement within our team. Their proactive approach, willingness to explore new ideas, and collaborative spirit have inspired their colleagues to raise the bar and push the boundaries of what's possible.
Beyond their technical expertise, Mohan Vankudoth embodies the core values of MGAE Entertainment creativity, passion, and integrity. Their unwavering dedication to delivering exceptional results, coupled with their humility and team-first attitude, make them a true asset to our organization.
In recognition of Mohan Vankudoth's exceptional performance and contributions, we extend our sincerest gratitude and appreciation. Their leadership, dedication, and commitment to excellence serve as an inspiration to us all, and we look forward to continuing our journey of success together.
Chris Martin
Business Intelligence Manager
MGA Entertainment, Chatsworth
MGAE Entertainment aims to leverage Power BI to gain actionable insights into its sales data across regions, product lines, and distribution channels. The project's primary objectives are to enhance decision-making, identify growth opportunities, optimize inventory management, and ultimately increase profitability.
Key Components:
Project Overview:
LAM Research, a leader in semiconductor manufacturing equipment, is embarking on a Power BI project to optimize its supply chain operations. The project aims to integrate data from various sources across the supply chain, analyze key performance indicators, and visualize insights to improve decision-making and efficiency.
Key Components:
Project Overview:
Inteva Products, a global automotive supplier, is initiating a project to migrate its existing business intelligence (BI) platforms from QlikView, Tableau, and Cognos to Microsoft Power BI. The migration aims to consolidate BI tools, standardize reporting processes, and leverage Power BI's advanced analytics capabilities to drive better decision-making across the organization.
Key Components:
Project Overview:
PepsiCo is embarking on a Power BI project aimed at enhancing its supply chain management processes. The project focuses on leveraging data analytics to optimize inventory management, streamline production, and improve distribution efficiency.
Key Components:
Hi. Myself, Mohan, and I'm from Hyderabad. I have overall 8.6 years experience as a COBRA consultant. I've been to 4 organizations in my, uh, 8.6 years of experience with a different different domain knowledge. I've been working on different different skills like Power BI, Power BI report server, Power BI desktop, and then writing task queries, writing SQL queries, MSSQL, MySQL, Oracle, and then connecting with different different data sources like the SharePoint, Azure SQL, Azure Analysis Services. And I do have to have a understanding on Azure environment like Azure Data Factory, Azure, uh, data pipelines, and then Microsoft Fabric, Synapse. And in this in this, uh, 4 point in in this, uh, 8.6 years of experience, I I've been to different different domains like health care, automobiles, and then incidents, retail, banking. These different different domains that I knowledge that I would have. Currently, I'm not working. I'm looking for an opportunity where I can join immediately. My previous submission was, uh, my internal consultancy with the client of Farm America, where in this project, my roles and responsibilities are connected with the clients, understand their requirements, their own data, and sit with the data engineering team to get the data from one source to another source to have a reports implementation to do the analysis in terms of a Power BI desktop. Here, we we did the migration of QuickView, Tableau, and QuickSens published reports migration from to Power BI Desktop by understanding their data modeling, data transformations, the features that they have implemented, everything from end to end.
How would you integrate a Python based machine learning model with the big data pipeline on AWS? How would you integrate a Python based machine learning model with a big data pipeline on AWS? Prepare the machine learning model. That is the first thing where develop and train your machine learning model using Python libraries such as scikit learn, TensorFlow, or PyTorch. Serialize and save the trained model to a file using a format such as pickle or joblib. The second step is set up an AWS big data pipeline. Choose AWS services for building your big data pipeline such as Amazon S3 for storage, Amazon EMR for data processing, and AWS Glue for ETL, which is extract, transform, and load. Design and implement the data workflow for data ingestion processing and storage according to your requirements. Display, deploy the machine learning model. That's the that's the third part where set up an environment on AWS where you can deploy your machine learning model. This could be an EC 2 instance, ew, AWS Lambda, or an AWS SageMaker endpoint. Depending on your requirement for scalability and latency, install the necessary dependencies and libraries for running the machine learning model. The 4th part would be integration. Implement a mechanic mechanism to trigger AWS step functions, or custom scripts. Then define the input and output data formats for machine learning model. Ensure that the data processed by your big data pipeline can be efficiently fed into the model for interface. Retrieve the input from your big data pipeline. Preprocess if it is necessary and pass it to deployed machine learning model for prediction interference. Capture the output from the model and store it in suitable locations such as s three bucket or a database as a part of your big data pipeline output. 5th part would be testing and monitoring. Test the end to end integration of the machine learning model with your big data pipeline to ensure behaves as expected. Implement monitoring Implement monitoring and logging to track the performance of the model interference process and identify. Deploying and scaling. Deploy the integrated solution into your production environment and configure auto scaling mechanisms if it is necessary to handle variations in workload and data volume, monitor the performance and scalability of the integrated system, and make adjustments as needed to optimize resource utilization and maintain readability or reliability.
What would be your approach to design a resilient stream processing system using Kafka and Spark on AWS? What would be your approach to design a resilient stream processing system using Kafka and Spark on AWS? What would be your approach to design a resilient stream processing system using Kafka and Spark on AWS? So the first thing would be architecture design, define the overall architecture of your stream processing systems, including the components, data flow, and interactions between different services. Choose AWS services such as Amazon, Kinesis, Amazon EMR, and Amazon EC2 to build the core components of your system. Second thing would be data ingestion, set up data processes to ingest streaming data into Kafka topics. This could involve using AWS Lambda functions, Kinesis data streams, or custom applications. Configure Kafka Connect to stream data from Kafka topics to Spark for processing. Stream processing with Spark, that would be third step, where deploy a Spark cluster on Amazon EMR to process the streaming data from Kafka topics, use Spark streaming or structured streaming APIs to read data from Kafka topics, perform real-time processing, and write the results to downstream systems for storage. Fault tolerance and resilience, implement fault tolerance mechanisms in your Spark streaming application to handle failures gracefully. This could involve checkpointing right-hand logs and handling transient errors. Configure Spark to automatically recover from failures, restart processing from the last checkpoint. Monitor and alerting, that would be the fifth thing, and scaling and performance of optimization and data processing and storage, choose appropriate storage solutions for storing both raw and processed streaming data. This could include Amazon S3, Amazon DynamoDB, or Amazon Redshift. And then security and access control, implement security measures such as encryption, authentication, and authorization to protect sensitive data, and ensure compliance with security standards.
How would you handle schema evaluation in I tables being populated by ongoing UTM jobs? How to handle schema evaluation in 5 tables being populated by ongoing ETL jobs. Okay? So the first step would be the first approach is schema on grid. Design your data pipeline to follow a schema on grid approach where data is ingested into high tables without enforcing a strict schema upfront. This allows flexibility in handling changes to the data data schema over a time. Then partitioning. Partitioning your, uh, high tables based on time or other relevant attributes to segregate data and facilitate efficient query. When schema changes occurs, you can create new partitions with the updated schema without affecting existing partitions. External tables is a third approach where Using external tables, use, uh, this, uh, big how to decouple the storage data storage from the schema definition. This allows you to alter the schema without modifying the underlying data files. Schema evaluation support. Leverage HIVE's built in support for schema evolution, which allows you to add new columns to existing tables or modify column data types without requiring a table effect. This feature helps minimizing downtime and disruption to ongoing ATM jobs. AGRO or PORCAT formats. Store data in hard tables using avro or parquet file formats which supports schema evaluation by design. This file format store schema information along with the data, making it easier to handle changes schema changes without impacting existing data. Versioning. Implement versioning mechanisms of your hand tables to track changes to the schema over time. This allows you to maintain a history of schema changes and roll back to previous versions if needed. ETL jobs, versioning, and then data validation, and then we can do the, uh, communication and coordination between the team members with data engineers and data scientists and other stakeholders involved in the schema evolution.
How would you design a talent job to not only process data transformations, but also handle error logging and recover gracefully? How would you design a talent job to not only process data transformations, but also handle error logging and recover gracefully? So the first thing is data transformations. First use talent's built-in components like TMAP, T-join, T-filter row, etc. to perform the necessary data transformations. Ensure the data quality by using T-data quality components to validate and standardize the data. And then error logging. So implement a T-log catcher to capture any Java exceptions or talent errors. Use T-stat catcher to gather statistics about job performance and T-flow meter for real-time monitoring. Direct error logs to a file or database using a T-file output delimiter or TDB output. Then graceful recovery can be done using a utilizer T-die and T-wall components to manage error handling and warnings. Set up checkpoints using T-checkpoint to save the state of a job at specific points. And then test use T-run job to modularize your job and allow for easier reruns of a specific sub-jobs in case of failure. So job orchestration would be like employ T-pre-job and T-post-job for initializing resources and cleaner tasks. Use T-context variables and T-context load for dynamic job configuration. Transaction management for database operations use T-begin, T-commit and T-rollback to manage transactions and ensure data integrity. Retry mechanism we can with T-loop or T-wait for a file to handle transient issues.
What strategy would you apply for ETL testing to ensure data integrity across different storage systems like S3 and HDFS? Data validation performs source-to-target count checks to ensure that the number of records loaded into the target system matches the source. Use data profiling to understand the data and identify anomalies or patterns that need attention. Transformation rules testing validates that business rules are correctly applied during the transformation process. Data type consistency and format correctness to ensure data conforms to the target schema requirements. End-to-end testing conducts that recover the ETL entire process for extracting data from source to loading into the target source system. Automated regression testing to quickly identify issues after changes into the ETL process. Error handling and recovery. Error handling mechanisms to ensure they capture and log errors accurately. Recover processes to confirm that the system can recover from failures without data loss or corruption. Performance tuning of the ETL process to ensure it can handle expected data volumes within the required timeframes. Data integrity checksums or hash totals to verify that the data has not been altered during the ETL process. And comparative analysis between the source and target systems to ensure consistency. Source and compliance testing with data security measures are effective throughout the ETL process. Compliance with relative data protection regulations and standards check value. Testing automation. Utilize ETL testing tools to automate such as for the testing process of possible reducing and manual effort and the potential of illuminate.
In a talent job designed to process large datasets using a tmap component. You observe that the job is failing due to an out of memory error. What could be the cause of this error, and how would you debug this issue to identify the root cause and propose a solution. First thing is insufficient heat space. That is the first, which is, uh, the Java virtual machine might not have enough free histories allocated to, which is necessary for processing large volumes of data. And the data volume might process to exceed the volume available memory, especially if the team app is performing complex transformations or lookups, insufficient data processing where the job design might not be optimized for memory usage, leading to insufficient processing and memory overflow. To debug this issue, identify the root cause. You can increase the JVM memory, adjust the xmx parameter to increase the, uh, mix maximum chip size available to the job. Then optimize TMAP settings were enabled, store temp data to disk option in TMAP to offload some of the data processing to disk, reducing memory consumption, Review job design and then use ETL components such as pushing processing down to the database level and reducing the memory load to talent. Monitor memory usage to recover to observe memory usage during job execution to identify when and where the memory issue occurs. Test with sample data, and then increase the JVM. So the solution, uh, to address the out of memory issue is optimized with a team app, job refractory, and then database side filtering where apply filters directly in the database query to reduce the amount of data retrieved and processing entire. Streaming options would be like if using a database input components enable streaming options to process a row 1 by row instead of loading all data into memory.
In a talent big data job, the developer has used a file input delimited component to read a CSC file and a THDFS output component to write the processed data to HDFS. However, the job is producing incomplete files on HDFS without any errors. What can you infer from this, and how would you debug and resolve this issue? The first thing would be buffer data not flush, which is data might still be in the buffer and not flush to HDFS when the job ends and then incorrect file handling. The job might not be handling file, uh, file writing correctly, especially if it's being stopped or caused unexpectedly. And then configuration issues, like there could be a misconfiguration in the connection setup with HDFS, such as incorrect replication factor or block size. To debug and resolve this issue, check job configurations, verify the HDFS collection is correctly configured with the right host port and user credentials, review component settings like the THVFS output component, ensure that the merge result to single file option is not causing conflicts, Examine job logs and then validate data flow using a t log rule, and then test with a smaller dataset and ensure proper job termination. Make sure that the job is allowed to finish completely and isn't terminated prematurely. Use t HDFS put component. If you're using a talent for big data, consider using t HDFS put component as a alternative to write to HDFS. Check h HDFS and ensure that HDFS is running correctly and then that there are no underlying issues with the h and Hadoop cluster monitoring Hadoop HDFS during job executions directly HDFS directly while the job is running to check if the files are being written and updated correctly or not.
Design a method to orchestrate the deployment of a new version of an ETL pipeline, minimizing downtime and data inconsistency. First thing is, uh, you can follow the, you know, steps such as, like, uh, version control. Use a version control system to manage changes to the ETL pipeline too, and then testing environment, set up a separate testing environment that mirrors production to test the new version. Then continuous integration, continuous deployment, which is CICD, implements CICD practices to automate the testing and deployment process. Bluegreen Deployment, which uses the bluegreen deployment strategy to switch between 2 identical production environments. Feature toggles. Implement feature toggles to enable or disable features without deploying a new port. Data migration scripts. Pip prepare data migration scripts to handle changes in data structures or schemas. Monitoring and alerts. Set up monitoring and alerts to quickly identify issues during deploy deployment and have a rollback plan in case the new version, uh, introduces unexpected issues. Perform data validation checks to ensure consistency between the old and new versions. Gradually roll out the new version to a subset of users before a full deployment. Update documentation to reflect changes in the new version, and then train the t team on new features that changes the changes in the pipeline, and then utilize orchestration tools like Apache Airflow and, uh, Azure Data Factory to manage the workflow.
How would you integrate ETL processes with S3 event notifications for real-time alerting on data ingestion issues? So first thing is, there are several steps like configure S3 event notifications where set up Amazon S3 to publish events such as S3 object created star to capture when new data is ingested and then AWS Lambda function which is to create an AWS Lambda function that is triggered by the S3 event notification and this function will process the event and perform initial checks on the ingested data then error detection logic where implement logic within the Lambda function to direct detect any issues with the data such as format errors or incomplete files and then use Amazon simple notification service which is SNS or Amazon simple queue service which is SQS to send alerts if the Lambda function detects any issues integration with ETL tools like tool or platform is capable of receiving these alerts and can respond accordingly either by triggering or a corrective workflow or notifying an administrator and then set up a cloud watch to monitor the ETL process and long way log any events or errors for further analysis and then depending on the ETL tool you are using you may be able to set up automated responses to certain types of errors such as retrying the ingestion or running a cleanup job
What strategies can you utilize for reducing the time it takes to visualize complex datasets with Power BI? So to reduce the time it takes to visualize complex datasets with Power BI, uh, we can follow the, like, you know, uh, certain strategies like performance analyzer, which is, you know, used to analyze the identify and locate any bottlenecks in your reports. Like, as, you know, it gives us the 3 parameters for each individual where in the tags, visual, and other. So based on that, you know, we can take, um, understanding on, you know, whether it is having the issue with the or at a time of visual rendering because of the data points that we can move. And then optimize the data modeling where, you know, speed up the data model, which is often main source of performance issues. So this includes minimizing the use of calculated column and avoiding complex relationships such as many to many relationships and the many to many relationships with both sides filter directions or cross filter directions in both side. These will cause a lot of issues with the, you know, optimize, uh, with the with the performance of a report. And then simplify visuals where limit the number of visuals on a report page and evaluate the performance of a custom visuals so that, uh, we we do not overburden a particular report page to load too much of a data with the too many visuals, um, based on the filters that we're applying. Handling, uh, data handling techniques such as implement a proper data ingestion and summarizing techniques to avoid loading unnecessary data into memory of a PowerStore. DirectQuery mode is for large datasets considering using a direct query mode for incremental refresh to improve responsiveness and reduce data loads. And then filtering filter or filter and aggregate data early. Like, that means, um, at the time of model creation process to reduce the amount of data stored in the memory, we can, you know, directly use the, uh, power query or SQL queries to, uh, filter and aggregate the data.