
EDF: Humanised service through big data processing
At a glance
We partnered with the UK’s largest producer of low-carbon electricity with stringent requirements for data handling, data processing and identity management, to co-create a secure, scalable customer analytics platform on AWS that unites and protects the data of millions of customers and unlocks its potential. The results? A loosely coupled, cellular architecture that minimizes data and user blast radius, keeps EDF employees and other trusted third parties away from raw data, and ensured that data could be anonymized, and accessible to the right business units.
What we did:
- Customer analytics platform using cellular architecture
- Data lake on AWS
- Amazon EMR big data processing
- Fully automated infrastructure pipelines with CIS compliant security static code analysis at build and runtime
- Centralised identity management across the platform, including Hadoop and other EMR applications with on-premises Active Directory
- Fully private network and infrastructure on AWS, accessible through encrypted VPC endpoints and via on-premises networks.
Servicing customers wherever they go
EDF is the United Kingdom’s largest producer of low-carbon electricity, with a vision of “helping Britain achieve net zero" carbon emissions. It was recently rated number one for customer service out of 40 energy suppliers in the country. That’s no small feat when you have 3-million customers and your market’s awash in energy start-ups intent on disrupting the industry.
"Why" versus "What"
For years, EDF has been working to build a robust data strategy to improve internal reporting and get more value from its data. It has moved much of that data to the cloud but wanted to do more with it: prepare it, engineer it, and open it up for different teams to use.
The first true use case for the platform became debt recovery, specifically by matching lapsed customer accounts to active ones for the same customers. This called for data processing on a massive scale – especially if EDF ever wanted to integrate other use cases in the future – so Slalom helped EDF move from data warehouses to a cloud-based data lake architecture on AWS with three analytics zones: a data landing zone, a technical zone, and a democratisation zone. The result is a secure, scalable analytics platform that separates storage from compute and is designed to be used by data engineers, data scientists, and product teams alike.
Security first-approach to AWS services
To build the platform, Slalom leveraged Amazon EMR, the big data processing service of AWS. In the technical zone, EMR helps ingest raw data from the fully protected data landing zone, process it, and then release it in structured format. Data can then be prepared for different use cases in the democratisation zone, which displays only what users need to see based on their credentials.
Slalom worked closely with AWS to ensure the stringent security requirements were successfully implemented for EDF. There were several security layers that needed to be baked in including:
-
- End-to end encryption of the data: Fully encrypted data in flight and at rest ensures the data cannot be tampered with. It ensures it remains confidential and its integrity is maintained throughout the platform to the data democratization zone.
- Automate the creation and use of separate KMS keys: Operators and processes must have the necessary permissions for each data type to ensure the data with unique characteristics is clearly demarcated. By segregating the data type, it ensures that if a data source, or EDF’s team working on that data source had their encryption keys compromised accidentally or malicious, it then doesn’t affect the rest of the organizations data, or other teams working on their project.
- Encrypted applications and Hadoop clusters in technical zones: By fully encrypting this ensures that the EMR clusters’ applications cannot access data by misconfiguration as data is being processed on the cluster. Nor can EDF’s users deliberately or maliciously use applications on the EMR cluster to point to data that they should not be able to access.
- Configure EMR clusters to create a cross-realm of trust with an Active Directory domain: To achieve this for EDF, the team used automation and AWS services to issue encryption certificates to guarantee the encryption was from EDFs internal PKI.
- Keeping network traffic private: By using VPC endpoints as part of cluster bootstrapping – all network traffic was completely private from the internet. Additionally, SparkUI and JupyterHub configurations all required active directory credentials be uses from on-premises networks via DirectConnect. By using DirectConnect, traffic remained completely private and authenticated users from a trusted identity store without introducing new user stores or new password stores.
- Infrastructure as code that is fully CIS compliant and checked end-to-end: Checkov, the popular open-sourced static code analysis from Bridgecrew.io was integrated with IDEs and CI/CD pipelines to verify CIS compliant rules for infrastructure resources was met when checking in code and running Terraform and Cloudforamtion as part of automation pipelines. This ensured we could guarantee the quality of our code build at time, as well a runtime to provide auditable evidence of security best practices for CIS and AWS architecture standards.
To accommodate EDF’s security requirements, Slalom also collaborated with Amazon to enhance EMR’s encryption key provisioning.
In its first quarter of use, the platform helped EDF match hundreds of thousands of pounds in outstanding balances to the right customers. On an annual basis, it’s projected to match multiple millions of pounds. But the financial benefits of the platform go beyond debt recovery. Hutchins considers the decoupling of storage and compute to be an essential feature of the platform for its ability to optimise costs. The platform’s modular architecture and EMR enable EDF to quickly isolate what it needs from billions of rows of data. EDF only pays for what it uses, and the number of compute instances can be increased or decreased automatically. “We minimise the amount of compute that is running all the time,” says Hutchins. “There are companies that don’t separate [storage and compute] with services like EMR and if we were one of them, we would be paying a lot of money to host those tables.”
A connected customer experience
EDF is already thinking of new uses for the platform. One involves machine learning to help service agents proactively offer payment plans or bundled tariffs to customers who might want them. Agents could make offers in near-real time as customers’ usage and payment history ran through the model. Another involves data mapping and visualisation to illustrate to customers how their energy usage compares to that of their neighbors. If usage is comparatively high, EDF could then provide recommendations for reducing it, such as adding insulation – all in service of EDF’s vision to help Britain achieve net-zero emissions.
It’s no coincidence that both use cases play a role in improving customer experience. They put more power in the hands of customers to manage their accounts – just like the original use case did. It marks the beginning of what’s possible when you unite customer data and unlock its potential. Hutchins hopes that the platform will lead to “a better, more seamless customer experience and relationship.”
As he says, “It’s already helped us improve our offboarding and onboarding process so that it has some great impact. Now we can say, ‘Hey, Mrs. Smith. Thanks for coming back to EDF!’”