· Deploying a Hadoop cluster, maintaining a Hadoop cluster, adding and removing nodes using cluster monitoring tools like Ganglia Nagios or Cloudera manager, configuring the NameNode high availability and keeping a track of all the running Hadoop jobs.
· Implementing, managing and administering the overall Hadoop infrastructure with various distributions (Cloudera, Hortonworks or IBM BigInsights)
· Takes care of the day-to-day running of Hadoop clusters
· Participate in the architectural discussions, perform system analysis which involves a review of the existing systems and operating methodologies.
· Participate in the analysis of latest technologies and suggest the optimal solutions which will be best suited for satisfying the current requirements and will simplify the future modifications
· Design and Build the necessary infrastructure for optimal ETL from a variety of data sources
· Collaborate with the business to scope requirements. Key Skills/Experience
· 5 + years of Data Engineering experience working with both distributed architectures, ETL, EDW, and Big Data technologies
· Extensive experience working with SQL across a variety of databases
· Experience working with both structured and unstructured data sources.
· Experience with NoSQL databases, such as HBase, Cassandra, MongoDB or similar
· Experience with Big Data tools such as Pig, Hive, Impala, Sqoop, Kafka, Flume, Jupiter
· Experience with Hadoop, HDFS, Spark, Informatica BDM
· Experience in Linux administration, Unix Shell Scripts
· Person will be responsible to Perform Hadoop Administration on Production/DR Hadoop clusters.
· Perform Tuning and Increase Operational efficiency on a continuous basis
· Monitor health of the platforms and Generate Performance Reports and Monitor and provide continuous improvements
· Working closely with development, engineering and operation teams, jointly work on key deliverables ensuring production scalability and stability
· Develop and enhance platform best practices
· Ensure the Hadoop platform can effectively meet performance & SLA requirements
· Responsible for support of Hadoop Production environment which includes Hive, Ranger, Kerberos, YARN, Spark, SAS, Kafka, base, etc.
· Perform optimization, capacity planning of a large multi-tenant cluster.
· Identify, Implement and continuously enhance the data automation process
If you''re interested in this role, please send your resumes to #removed#