<About the job>
As the Senior Data Engineer or Data Architect, you‘ll join the various advanced Data Science & AI Projects in the corporate headquarters. As well as developing intelligent applications via related AI and Big Data Analytics Technology for digital transformation, you will have plenty of opportunities to develop emerging applications based on different use cases and expand your tech skillset in this world-class company (Fortune Global 500, 20th).
<Job Description>
*Type 1:Senior Data Engineer
Responsible for acquiring data using API, Web scraping, or other data accessing protocol/scripting and developing the ETL data pipelines, and the data aggregation systems. Using software development experience to design and build high-performance automated systems. As below:
1. Data Processing
(1)Data ETL(Extracting/Transforming/Loading) process engineering and querying from relational data management(such as SQL Script).
(2)Building systematic data quality processes and checks to ensure data quality and accuracy.
(3)Solid coding experience in Python or Java.
2. Data Pipeline Development
(1)Develops a data integration process, including creating scalable data pipelines and building out data services/data APIs.
(2)Create a data processing automation and monitoring mechanism by optimizing the data pipeline process.
(3)Experience with dataflow/workflow/management tools, such as Apache Nifi, Apache Airflow, Azkaban, etc., is preferred.
3. Data Crawling
(1)Build scalable tools that automate web crawling, scraping, and data aggregation from various web pages using frameworks such as Scrapy.
(2)Accessing data from REST APIs, particularly in parsing data in disparate formats such as JSON and XML, and developing automated engineering.
(3) (Nice to Have) Knowledge of server-based front-end/UI technologies, including Vue/React and HTML/CSS, is preferred.
*Type 2:Data Architect
Responsible for designing, implementing, and maintaining scalable and reusable system architectures/data architectures for complex data structures and large data in data science and AI projects. As below:
1. Data Schema Design
(1)Collaborate with the team to design DB/Table Schema and Data Schema.
(2)Consolidate the requirements and use data engineering tech to design and implement a robust Data Mart.
(3)Experience handling all kinds of structured/semi-structured/unstructured data and streaming data is preferred.
(4)Hands-on experience with Dimensional Data Modeling(Column-based data warehouse) or NoSQL Schema design.
2. Data Platform Architecture
(1)Design and build data infrastructure/platform components to support complex data pipelines ingesting various data from multiple internal and external data sources and processing.
(2)Familiar with Big Data frameworks and processing technologies, ex: Hadoop, Apache Spark, NoSQL ...etc.
(3)Familiar with AZURE or AWS cloud data services(hands-on experience with cloud infrastructure will be a plus)
(4)Familiarity with the Linux OS environment, the Shell Scripting, and infrastructure knowledge.
(5)(Nice to Have)Experience with declarative infrastructure/container technologies, such as Docker and Kubernetes (k8s/k3s). (Nice to Have).