The Complaints Handling Analysis & Reporting Tool, recorded information about any tax related complaints and there was a requirement to store this data relating to the category for which the complaint resided within a centralised data warehouse.
My role as the senior data engineer was to move this data from an cloud storage into a local repository, cleanse it, apply the correct encryption before sending it downstream using the cloud and a notification API.
This project was delivered using the Scrum Agile Methodology and documentation was delivered on Confluence (Atlassian).
It consisted of a team of 2 data team developers to which I was the lead, project delivery manager, and architect. Outsourcing the infrastructure was required as our focus was the data cleansing solution.
Amazon EC2 Instances is what the development and test environments were built upon, with a unix operating system (bash).
Utilising AWS S3 Buckets for transfers into our local environment.
Data Cleansing was carried out using Pentaho Data Integration with PGP.
AWS Hashicorp Vault was utilised as part of a daily routine.
The scheduling tool used to process these files daily was Berlin Job Scheduler. To which jobs and orders were configured using curl commands.
Deployments were carried out using the gitlab continuous integration, continuous deployment functionality it provides against provisioned gitlab runners for each environment. This was achievable with a YAML based script solution combined with the ansible playbook functionality.
My role as the data engineer meant I had responsibility for all of the above throughout the project lifecycle except the provisioning of the gitlab runners.
The project made it to pre-production testing, to which was outsourced to another supplier before I rolled off. The development of this product was a success.