profile-pic

Anisha Boopathi

  • Having almost 2 years of experience using the Hadoop Ecosystem technologies (HDFS, Hive, Sqoop, Apache Spark and AWS) for designing , developing and maintaining Big Data applications
  • Real-world experience in technologies relevant to Hadoop and Big Data, including data storage, querying, processing, and analysis
  • Recognises and Understanding the needs of big data for complex data processing and has experience creating codes and modules to meet those needs.
  • Knowledge of the HDFS, Hive, Sqoop, and Spark components of the Hadoop ecosystem, including how to install, configure, and use them.
  • Worked with Amazon Web Services (AWS) such as S3 and EMR.
  • Strong experience in configuring Sqoop to handle complex data structures such as nested and hierarchical data.
  • Experienced in setting up and configuring Sqoop-based data import and export solutions for large-scale data warehousing projects.
  • Used a several of file formats, including text files, AVRO data files, JSON, and XML.
  • Understanding Optimization which requires in-depth Hive and Spark SQL's partitioning, bucketing, and incremental import techniques.
  • Knowledge of utilizing Flume to gather log data from various sources and integrate it into HDFS and staging data in HDFS for additional analysis.
  • Able to process massive amounts of structured and semi-structured data.
  • Having knowledge of data architecture, including pipeline design and data ingestion.
  • Skilled in setting up secure connections between Hadoop and databases using Sqoop's security features
  • Having Expertise knowledge of moving data between RDBMS and HDFS using Sqoop.
  • Implement initiatives for process simplification and optimisation to increase application efficiency.
  • Ability to troubleshoot common issues with Hive tables, such as data skew, table corruption, and query optimization.
  • Capable of conducting source-to-target data mapping, design, and review; evaluating business rules; working with stakeholders.
  • Role

    Data Engineer

  • Years of Experience

    3 years

Skillsets

  • Spark
  • AWS
  • Sqoop
  • Hive
  • Cloudera
  • Windows
  • Hadoop
  • Apache Spark
  • SQL Server
  • MySQL
  • Scala
  • Python
  • SQL

Professional Summary

3Years
  • Sep, 2022 - Present3 yr 1 month

    Bigdata Developer

    Blue Dart
  • Apr, 2021 - Feb, 2022 10 months

    Bigdata Developer

    Cars 24

Work History

3Years

Bigdata Developer

Blue Dart
Sep, 2022 - Present3 yr 1 month

    Developed Spark applications for data validation, cleansing, transformation, and custom aggregation.


    Created EC2 instances and EMR clusters for development and testing.


    Experienced in using Sqoop to import and export data from and to cloud-based data storage services such as Amazon S3


    Proficient in creating and managing Hive tables, including managed, external, and partitioned tables.


    Familiarity with Hive query optimization techniques, such as subquery unnesting, predicate pushdown, and vectorization, and their impact on query performance and resource utilization.


    Loaded and transformed large sets of semi-structured data like XML, JSON, Avro, and Parquet.


    Proficient in developing and implementing Spark RDD-based and DataFrame-based data processing workflows using Scala, Java, or Python programming languages.


    Handled Hadoop Map Reduce jobs to process large data sets.


    Processed web URL data using Scala and converted it to data frames for further transformations.


    Generated complex JSON data after all the transformations for easy storage and access as per client requirements.


    Experienced in optimizing Spark RDD and DataFrame performance by tuning various configuration settings, such as memory allocation, caching, and serialization.


    Optimized Spark jobs and data processing workflows for scalability, performance, and cost efficiency using techniques such as partitioning, compression, and caching


    Developed reusable transformations to load data from flat files and other data sources to the data warehouse.


    Developed Hive SQL queries, mappings, tables, and external tables for analysis across different banners and worked on partitioning, optimisation, compilation, and execution.


    Responsible for the design and development of analytic models, applications, and supporting tools that enable developers to create algorithms in a big data ecosystem.


    Implemented Spark using Scala and Spark SQL for faster testing and processing of data.


    Proficient in designing Avro schema for Hive tables and managing schema evolution to accommodate changes in data structure and format.


    Strong understanding of Hive serialized data processing performance optimization techniques, such as using columnar storage, data partitioning, and indexing, and their trade-offs in terms of query performance and resource utilization.


    Line management of team members and their professional development.

Bigdata Developer

Cars 24
Apr, 2021 - Feb, 2022 10 months

    Experienced in using Spark RDD, DataFrame and SQL transformations and actions to process large-scale structured and semi-structured data sets, including filtering, mapping, reducing, grouping, and aggregating data.


    Created hive schemas using performance techniques like partitioning and bucketing.


    Responsible for continuous monitoring and managing the Elastic MapReduce (EMR) cluster through the AWS console.


    Handled Hadoop for accelerating the extraction, transformation, and loading of massive structured and unstructured data.


    Deployed the application jar files into AWS instances.


    Adept in scheduling and automating Sqoop jobs for incremental runs.


    Designed and developed batch processing data pipelines on Amazon EMR using Apache Spark, Python, and Scala to process terabytes of data in a cost-effective and scalable manner.


    Developed Spark scripts to import large files from Amazon S3 buckets.


    Developed MapReduce programmes for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and pre-processing.


    Strong experience in configuring Sqoop to handle complex data structures such as nested and hierarchical data.


    Knowledge of Spark RDD optimization techniques, such as data partitioning, shuffle tuning, and pipelining, and their impact on query performance and resource utilization.


    Involved in writing the incremental data to the snowflake.


    Development of code and peer review of assigned tasks and bug fixing


    Skilled in working with binary and textual data formats in Spark, such as CSV, JSON, and XML, and their serialization and deserialization using Spark DataFrames and RDDs.


    Understand and execute change and incident management.


    Ability to troubleshoot common issues with Hive performance, such as out-of-memory errors, query hangs, and slow query execution times.


    Involved in requirement gathering, design, and deployment of the application using Scrum (Agile) as the development methodology.


    Designed and developed Spark applications to implement complex data transformations and aggregations for batch processing jobs, leveraging Spark SQL and DataFrames.


    Used JIRA for bug tracking and Bit bucket to check-in and checkout code changes.


    Exported data from HDFS to RDBMS via Sqoop for business intelligence, visualisation, and user report generation.

Major Projects

2Projects

Blue Dart

Sep, 2022 - Present3 yr 1 month
    Blue Dart Express Ltd., South Asia's premier express air, integrated transportation & Distribution Company, offers secure and reliable delivery of consignments. Designed to enhance the reliability of our operations and process efficiency, and add value to the customer through time and cost savings. State-of-the-art Technology, indigenously developed, for Track and Trace, MIS, ERP, Customer Service, Space Control and Reservations.

Cars 24

Apr, 2021 - Sep, 20221 yr 5 months
    CARS24 is an ecommerce platform for pre-owned cars. They have brought cutting-edge technology (technological devices, techniques or achievements that employ the most current and high-level IT developments) with country-wide partners.

Education

  • BCA

    PSG Arts & Science college (2021)
  • Kongu Arts & Science college

    Kongu Arts & Science college (2019)
  • Kongu Matriculation Higher Secondary School

    Kongu Matriculation Higher Secondary School (2016)