With increasing velocities, varieties and complexity of data at PI, the PI Data Engineering Team becomes the important role supporting existed or new projects using big data techniques to manipulate large streaming/batch dataset with various format on highly efficient way.
● Design and implement data processing pipeline for ETL/ELT job on Hadoop and Spark eco-systems.
● Ingesting data from different sources to Kafka Streaming platform and HDFS/HIVE.
● Design appropriate data schema on NoSQL database for low latency query on large scale dataset.
● Provide data and web services in Java, Python and Scala programming language.
● 3 years+ of working experience in programming. Well understanding of object oriented programming skills (Java) and functional programming(Scala).
● Streaming system design on Kafka, Spark, and Hadoop.
● Understanding of schema design on key-value NoSQL Database such as HBase, Cassandra, etc.
● Linux/Unix experiences and basic shell scripting skill.
● Experiences on Cloudera or Hortonworks Hadoop Distributions.