The Data Loading Service, loads in tax return data of any format for managing manipulating, storage and analytics. This solution was designed to be metadata driven, to be able to load any tax return dataset into a MYSQL database to them be passed downstream to other systems.
This project was delivered using the Scrum Agile Methodology using Jira as the main requirement monitoring software and documentation was delivered on Confluence (Atlassian).
The team that I worked within also managed multiple other downstream systems and therefore had changes in the product backlog relating to these.
My role as the senior data engineer was to apply enhancements that were listed as a priority by the product owner. This included but not limited to: front-end dashboarding changes to which the users process the tax returns through; optimisation of the Pentaho jobs and transformations; ingestion of new types of tax return; fixes to the current solution; filtering sole traders and encryption of data based around individual records, Auditing reports for data flowing through all of these systems and the capturing of this data.
It consisted of a team of 5-10 data team developers, project delivery manager, QA and business analyst.
The development environment for this project was built within the AWS workspace, with Pentaho DI, Pentaho BA and MYSQL being the main tools utilised within this space. There was a lot of shell script and SQL script development also.
All of the code was stored in multiple gitlab repositories.
When deploying to testing environments, we switched to utilising a Unix based operating system.
A downstream system that managed the standardisations and encryption of data utilised the Apache Hive database built upon a HDFS. Many changes were required to ensure data was standardised and tokenised correctly within this data warehouse.
Another downstream system, allowed the automatic creation of views from this data to then be exploited within BI systems. Views were built using HQL files.
The auditing report I created for specific feeds were created using a Mondrian OLAP analytical cube (XML structure) that allowed for adhoc reporting of record counts and other measures within the analysis in Pentaho BA.
The ongoing backlog was still in-progress when I rolled-off, as a team we successfully managed to deliver 28 sprints of enhancements which equated to 19 production releases over 18 months.