Living on the edge? Your data might be there already.

There are lots of definitions of “the edge”. Ask four different people, and you’ll likely get four different answers.

In ancient times – say, oh, ten years ago – people commonly referred to the edge as remote offices, branch offices, or systems located out in inhospitable places. Those definitions capture some of the challenges associated with edge computing, but it’s far from the whole picture. For one thing, it doesn’t really capture a true sense of scale, particularly important these days, as more data gets generated, processed, and consumed outside of traditional data centres.

As the edge has evolved, so too have the available opportunities. Now we need to take this evolution to the data centre itself.

Cities are getting smarter. Last month, the Philippine government committed to the completion of six separate ‘smart city’ projects, the latest in a chain of related projects occurring all over the world. Continued adoption of Internet of Things (IoT) technology, frequently an essential element of a smart city, along with improved connectivity via 5G networks, continues to add speed to the push towards the ‘edge’. So too have consumer expectations, which have increased to match the immediate ability to meet all kinds of needs digitally, from checking maps on the fly to informing their oven that it needs to begin preheating before they head home from the office.

Data centres are designed as “task-specific” containers for housing IT equipment, which comes bundled with some key size, power, temperature control, access control, environmental regulation, and accessibility requirements.

The edge, on the other hand, has different characteristics to that traditional data centre model.

The edge creates strong incentives to conserve power.

Either because there isn’t much available, and getting more is nearly impossible, or because power is sold at a premium. Think colocation facilities. And of course, excess power turns into unwanted heat and the chances of an optimally cooled environment (and the additional power being available to run that) are increasingly slim the further to the edge you get. Designing for optimal efficiency becomes essential.

The edge creates strong incentives to conserve bandwidth.

With data being regularly piped back from the edge to somewhere else across a finite physical resource (cables in the ground, 5G radio towers, satellite links, etc.), bandwidth must be protected.

There’s not always a pair of ‘intelligent hands’ at the edge.

The highly distributed, workload-specific nature of edge computing can allow for a far broader range of circumstances and installations than a typical data centre, and there may not always be a human onsite to perform hands-on tasks.

You may also need to deal with these additional sources of pain, depending on your edge infrastructure use case:

wide variations from site to site in security and access control infrastructure,
degraded environmental controls, particularly around heat, particulates, and vibration control,
unreliable connectivity to the rest of the world, and
these installs might literally be in motion at times!

Are you ready for the edge?

If you aren’t managing data on the edge, chances are that will be changing very soon. It might start with off-site backups to a service provider or colocation facility, or when you implement Microsoft Office 365 campus-wide, or maybe regulatory requirements require citizen data processed by your organisation to be physically located within their country of origin.

Is Ceph a good fit for the edge? It depends.

So, I'm all about Ceph, but I'm also never going to recommend a single tool to meet every need. Ceph can be a good choice for edge deployments, but it depends on your specific requirements and constraints. Here are some things to think about:

Scalability

Ceph is highly scalable and can handle large amounts of data across numerous nodes, handy if your edge deployments need to store and manage significant volumes of data.

Data redundancy and high availability

Ceph provides robust data redundancy and high availability through its replication and erasure coding features.

Flexibility

Ceph supports block, object, and file storage, so it's very flexible around how data is stored and accessed.

Open source

Being open-source software with a thriving community, Ceph offers adaptability, security and cost advantages.

Resource requirements and complexity

On the negative side, depending on your replication rules, skill level etc. Ceph can become resource-intensive. Speaking of skill levels, setting up and managing a Ceph cluster for a complex solution can be daunting. I work for a company that directly addresses these two issues with our full stack Ceph-based private cloud solution, but not everyone is looking for that of course - some people just want the SDS! In which case, you might want to think about how you are going to minimise support and maintenance at the edge.

Alternatives

If you want something lightweight, other options could be GlusterFS (filesystem only - it's kinda in the name), OpenEBS (container native storage), or MinIO (S3 compatible object store). Or if managing storage infrastructure is a concern, you could consider managed edge storage services provided by cloud providers (e.g. AWS Outposts, Azure Stack Edge, Google Anthos).

Or, if your needs are really simple, basic local storage with periodic synchronisation to a central data centre or cloud might be sufficient and more efficient for your edge deployments.

Either way, hope this helps someone out!