Antrixsh Gupta

BlockChain Engineer, Data Scientist, ML/AI Expert

Danalitic India Pvt. Ltd

Java 10 years
Python 6 years
node.js 6 years
Go 6 years
Solidity 4 years
Hyperledger 2 years

11 years of total IT experience in all phases of Hadoop Development, Java Development along with experience in Application Development, Data modeling, Data mining, Data Science, Machine Learning, Deep Learning & NLP also a Blockchain revolution enthusiast.
-Good experience with Big Data Ecosystems, ETL
-Expertise in Java, Python and Scala
-Experience in data architecture including Data ingestion pipeline design, Data analysis and Data Analytics, advanced Data processing. Experience optimizing ETL workflows.
-Experience in Hadoop (Cloudera, HortonWorks, MapR, IBM Big Insights) - Architecture, Deployment and Development.
-Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target Data warehouse.
-Experience with database SQL and NoSQL (MongoDB) (Cassandra )
-Hands on experience with Hadoop Core Components (HDFS, MapReduce) and Hadoop Ecosystem (Sqoop, Flume, Hive, Pig, Impala, Oozie, HBase).
-Experience in ingesting real time/near real time data using Flume, Kafka, Storm
-Experience in importing and exporting the data using Sqoop from Relational Database to HDFS and reverse.
-Hands on Experience on Linux systems
-Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files
-Good knowledge in writing Spark application using Python, Scala and Java
-Experience in writing MapReduce jobs.
-Efficient in analyzing data using HiveQL, Pig Latin, partitioning an existing data set with static and dynamic partition, tune data for optimal query performance.
-Good experience transformation and storage: HDFS, MapReduce, Spark
-Good understanding of HDFS architecture.
-Experienced in Database development, ETL, OLAP, OLTP
-Knowledge of extracting an Avro schema using avro-tools and evolving an Avro schema by changing JSON files
-Experience in HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
-Experience in UNIX Shell scripting.
-Developing and maintaining applications on the AWS platform
-Experience with developing and Maintaining Applications written for Amazon Simple Storage Service, Amazon Dynamo DB, Amazon Simple Queue Service, Amazon Simple Notification Service, Amazon Simple Workflow Service, AWS Elastic Beanstalk, and AWS Cloud Formation.
-Picking the right AWS services for the application
-Proven expertise in employing techniques for Supervised and Unsupervised (Clustering, Classification, PCA, Decision trees, KNN, SVM) learning, Predictive Analytics, Optimization methods and Natural Language Processing(NLP), Time Series Analysis.
-Experienced in Machine Learning Regression Algorithms like Simple, Multiple, Polynomial, SVR(Support Vector Regression), Decision Tree Regression, Random Forest Regression.
-Experienced in advanced statistical analysis and predictive modeling in structured and unstructured data environment.
-Strong expertise in Business and Data Analysis, Data Profiling, Data Migration, Data Conversion, Data Quality, Data Governance, Data Lineage, Data Integration, Master Data Management(MDM), Metadata Management Services, Reference Data Management (RDM).
-Hands on experience of Data Science libraries in Python such as Pandas, NumPy, SciPy, scikit-learn, Matplotlib,
Seaborn, BeautifulSoup, Orange, Rpy2, LibSVM, neurolab, NLTK.
-Good Understanding of working on Artificial Neural Networks and Deep Learning models using Theano and Tensorflow
packages using in Python.
-Experienced in Machine Learning Classification Algorithms like Logistic Regression, K-NN, SVM, Kernel SVM, Naive
Bayes, Decision Tree & Random Forest classification.
-Hands on experience on R packages and libraries like ggplot2, Shiny, h2o, dplyr, reshape2, plotly, RMarkdown,
ElmStatLearn, caTools etc.
-Efficiently accessed data via multiple vectors (e.g. NFS, FTP, SSH, SQL, Sqoop, Flume, Spark).
-Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with
expertise in writing/documenting Technical Design Document(TDD), Functional Specification Document(FSD), Test
Plans, GAP Analysis and Source to Target mapping documents.
-Excellent understanding of Hadoop architecture and Map Reduce concepts and HDFS Framework.
-Strong understanding of project life cycle and SDLC methodologies including RUP, RAD, Waterfall and Agile.
-Very good knowledge and understanding of Microsoft SQL server, Oracle, Teradata, Hadoop/Hive.
-Strong expertise in ETL, Data warehousing, Operational Data Store (ODS), Data Marts, OLAP and OLTP technologies.
-Experience working on BI visualization tools (Tableau, Shiny & QlikView).
-Analytical, performance-focused, and detail-oriented professional, offering in-depth knowledge of data analysis and
statistics; utilized complex SQL queries for data manipulation.
-Equipped with experience in utilizing statistical techniques which include Correlation, Hypotheses modeling, Inferential
Statistics as well as data mining and modeling techniques using Linear and Logistic regression, clustering, decision
trees, and k-mean clustering.
-Expertise in building Supervised and Unsupervised Machine Learning experiments using Microsoft Azure utilizing
multiple algorithms to perform detailed predictive analytics and building Web Services models for all types of data:
continuous, nominal, and ordinal.
-Expertise in using Linear & Logistic Regression and Classification Modeling, Decision-trees, Principal Component
Analysis (PCA), Cluster and Segmentation analyses, and has authored and coauthored several scholarly articles
applying these techniques.
-Mitigated risk factors through careful analysis of financial and statistical data. Transformed and processed raw data for
further analysis, visualization, and modeling.
-Proficient in research of current process and emerging technologies which need analytic models, data inputs and output,
analytic metrics and user interface needs.
-Assist in determining the full domain of the MVP, create and implement its relevant data model for the App and work with
App developers integrating the MVP into the App and any backend domains.


Positive Attitude
  4.6 / 5
Team work
  4.7 / 5
  4.7 / 5
  4.8 / 5
Problem Solving
  4.8 / 5
  4.7 / 5


English communication
  4.7 / 5
Past work clarity
  4.7 / 5
Client interaction experience
  4.8 / 5
  4.6 / 5
Open to learning
  4.6 / 5
Open source contribution
  4.6 / 5




  • BlockChain Startup

    BlockChain Engineer

    Node.jsReact.jsKoa.jsGitGithubReactReduxDevops (Chef/Puppet/Docker)Smart ContractSolidityChain CodesGoLangHyperledgerSolidity SecurityPrivate NetworkRover Network

    Node.js development working with various blockchain daemons: Bitcoin Ethereum, etc.
    API Development using Express.js, Koa.js and asyncawait Library for a readable and intuitive code.
    Server management of various blockchain node types: Bitcoin, Litecoin, Ethereum, Omni, etc.
    React/ Redux UI development of features including eCharts.js implementations of candlestick and market depth data.
    Led the charge to integrate company-wide unit testing and trained the team on unit testing best practices.
    Ethereum specific knowledge will be an added advantage: Go, Solidity (Ethereum), and experience with Mist wallet, Mix IDE, and open-source Ethereum clients
    Cryptographic techniques such as hash, symmetric-key encryption, public-key encryption etc
    RDBMS concepts, SQL, NoSQL databases
    GitHub and IDE environments (Eclipse, IntelliJ IDEA, and others)
    Strong knowledge of Unix, Linux environments
    Cloud infrastructure management on MS Azure, Amazon AWS, IBM BlueMix etc
    General system and network administration skills
    Networking resources within both the tech and international development realm an asset
    Experience in working with technology corporates, start-ups and technical consortiums an asset
    Familiarity with UN mission and work, including humanitarian response an asset
    Experience as a lead consultant from similar assignments

  • Hadoop Cluster

    Hadoop Developer & Machine Learning algorithmic

    LinuxPythonMap ReduceHDFSDB2CassandraHivePigSqoopFTP

    Exported data to a Mysql from HDFS using Sqoop and NFS mount approach.
    Developed Scala programs for applying business rules on the data.
    Developed and executed hive queries for denormalizing the data.
    Works with ETL workflow, analysis of big data and loaded them into the Hadoop cluster.
    Installed and configured Hadoop Cluster for development and testing environment.
    Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
    Automated the workflow using shell scripts.
    Performance tuning of the Hive queries, written by other developers.
    Mastered major Hadoop distros HDP/CDH and numerous Open Source projects
    Prototype various applications that utilize modern Big Data tools.

  • John Wiley & Sons

    Hadoop Developer

    HadoopAWSJavaHDFSMapReduceSparkPigHiveImpalaSqoopFlumeKafkaHBaseOozieJavaSQL scriptingLinux shell scriptingEclipseCloudera

    • Helped the team to increase cluster size from 55 nodes to 145+ nodes. The configuration for additional data nodes was
    managed using Puppet.
    • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs
    • Integrate Apache Spark with Hadoop components
    • Python for data cleaning and preprocessing.
    • Extensive experience in writing HDFS and Pig Latin commands.
    • Developed complex queries using HIVE and IMPALA.
    • Developed data pipeline using Flume, Sqoop, Pig and Java map-reduce to ingest claim data and financial histories into
    HDFS for analysis.
    • Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
    • Implemented Map Reduce jobs in HIVE by querying the available data.
    • Configured Hive metastore with MySQL, which stores the metadata for Hive tables.
    • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
    • Written Hive and Pig scripts as per requirements.
    • Developed Spark Application by using Scala
    • Implemented Apache Spark data processing project to handle data from RDBMS and streaming sources.
    • Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
    • Developed Spark SQL to load tables into HDFS to run select queries on top.
    • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
    • Highly skilled in integrating Kafka with Spark streaming for high speed data processing
    • Used Spark Dataframes, Spark-SQL, Spark MLLib extensively
    • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbaseand
    Hive by integrating with Storm • Designed the ETL process and created the high-level design document including the
    logical data flows, source data extraction process, the database staging and the extract creation, source archival, job
    scheduling and Error Handling. • Worked on Talend ETL tool and used features like context variable and database
    components like input to Oracle, output to Oracle, tFile compare, tFile copy, to Oracle close ETL components.
    • Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into
    the target database.
    • Developed the ETL mappings using mapplets and re-usable transformations, and various transformations such as source
    qualifier, expression, connected and un-connected lookup, router, aggregator, filter, sequence generator, update strategy,
    normalizer, joiner and rank transformations in Power Center Designer.
    • Created, altered and deleted topics (Kafka Queues) when required with varying
    • Performance tuning using Partitioning, bucketing of IMPALA tables. • Experience in NoSql database such as
    • Involved in cluster maintenance and monitoring.
    • Load and transform large sets of structured, semi structured and unstructured data • Involved in loading data from the UNIX file
    system to HDFS.
    • Created an e-mail notification service upon completion of a job or the particular team which requested for the data.
    • Worked on NOSQL databases which differ from classic relational databases. • Conducted requirements gathering
    sessions with various stakeholders
    • Involved in knowledge transition activities to the team members.
    • Successful in creating and implementing complex code changes.

  • Hertz

    Hadoop Developer

    HadoopAWSGoogle CloudJavaHDFSMapReduceSparkPigHiveImpalaSqoopFlumeKafkaHBaseOozieJavaSQL scriptingLinux shell scriptingEclipseCloudera

    • Experienced in development using Cloudera distribution system.
    • As a Hadoop Developer, my responsibility is to manage the data pipelines and data lake.
    • Working on data ingestion and aggregation for our Data lake Hadoop cluster from 150+ hospital facilities.
    • The data present in the data lake is of Petabyte scale.
    • Performing Hadoop ETL using hive on data at different stages of the pipeline.
    • Worked in an agile technology with Scrum.
    • Scooped data from different source systems and automating them with oozie workflows.
    • Generation of business reports from data lake using Hadoop SQL (Impala) as per the Business Needs.
    • Automation of Business reports using Bash scripts in Unix on Datalake by sending them to business owners.
    • Developed Spark scala code to cleanse and perform ETL on the data in data pipeline in different stages.
    • Worked in different environments like DEV, QA, Datalake and Analytics Cluster as part of Hadoop Development.
    • Snapped the cleansed data to the Analytics Cluster for reporting purpose to Business.
    • Developed pig scripts, python to perform Streaming and created tables on the top of it using Hive.
    • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, and SQL.
    • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
    • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
    • Developed Oozie workflow engine to run multiple Hive, Pig, sqoop and Spark jobs.
    • Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
    • Developed pig, hive, sqoop, Hadoop streaming, spark actions in Oozie in the workflow management.
    • Supported Map Reduce Programs those are running on the cluster.
    • Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume and Kafka.
    • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement Specifications.
    • Good Understanding of Workflow management process and in implementation.
    • Involved in the development of frameworks that are used in Data pipelines and co-ordinated with Cloudera consultant.