The Message Handling Service, was a service built around the extraction of data from XML prescription messages that had been stored on a Cassandra Data Lake. Prescription messages at this point had changed from Green Paper slips to a semi-structured data source which opened a new avenue of analytics in the Pharma industry.
The Cassandra Data Lake was queried using Spark SQL to find the prescription deltas.
The XML prescription messages could be cleansed and stored within a columnar data warehouse for the use of analytics. The XML messages were generated at every stage of the prescription lifecycle, allowing comprehensive reporting covering, generation right through to cost post dispense.
My role on this project was as a Data Warehouse Developer/Senior Data Engineer, to design and develop ETL processing within Pentaho Data Integration v9.1. To store the data extracted from the XML into a Postgres data warehouse.
This project was delivered using the Scrum Agile Methodology using Jira as the main requirement monitoring software and documentation was delivered on Confluence (Atlassian).
Data extracts were required to be uploaded to S3 compatible storage using Duck CLI.
XML Deltas were obtained from a Cassandra data lake queried using Spark SQL.
ETL Flows were designed and built within Pentaho Data Integration v9.1, XML messages were read and extracted into a columnar format.
Data was eventually stored within a Postgres data warehouse.
Code repository was stored within BitBucket (Atlassian)
Data extracts were sent to the Data Analytics team through S3 compatible storage, this was achieved using the duck CLI within pentaho to SFTP the extracts to the cloud storage location.
As a team we successfully implemented a self managing solution, that collected prescription deltas from a data lake, cleansed, transformed and reloaded the data into a Postgres Data Warehouse on a daily basis. Ready for analytics and scientific algorithms to be applied to it.