As part of the hpcDIRECT team, based in the London offices of Verne Global, we continue to develop and refine our "high performance computing as a service" (HPCaaS) product that was launched in December last year.
We are taking an agile approach to its evolution - made possible as a result of the solid technical foundations on which hpcDIRECT was conceived and designed. Because of this approach, we can fine tune specifications for core industrial use cases in a direct response to the needs expressed by our customers, and through the feedback we receive as the product continues to grow. Here at Verne Global, we are proud to have a culture of innovation, and the development of hpcDIRECT is being pushed forward early and often. As a technical team, and as a company, we are not afraid to adapt our plans as we find out what works well, and what does not. The customer experience of the product is everything.
To those unfamiliar with our bare metal as a service cloud solution, hpcDIRECT provides our customers with dedicated clusters of high performance servers, that can be delivered automatically preconfigured with a range of scientific, engineering, mathematical and economics packages.
You may be modelling the climate for weather forecasting, performing analysis of genetic data, or need to perform a finite element analysis for automotive crash simulations. Maybe you need to run pre-trade analytics, or run some cheminformatics experiments? Whatever your HPC requirements, we cluster your software package across the desired number of bare metal nodes and tailor each deployment to your needs, wrapping up all the configuration into one of our customised templates that we call "blueprints".
A blueprint provides two distinct functions. First, the blueprint acts as infrastructure as code, enabling you to summon a cluster on-demand in a reproducible manner. To enable this, we use Heat Orchestration Templates to describe how a cluster should be built on OpenStack. However, rather than virtual machines, which is what OpenStack would typically deploy, we are using Ironic, which allows us to provision bare metal machines instead. The second part of the blueprint specifies the configuration of the software package and any further set-up required, such as adding users, keys, monitoring, and so on. We’ve packaged each of these one-time post-deployment tasks into a layered, reversible and modular software deployment that we call an "Element”. In actual fact, these Elements are usually just Ansible roles that are integrated into the blueprint in order to perform a particular action.
To our customers, the crucial aspect of the blueprints and elements concepts is that they are all based on existing open-source technologies, so there is no need to rewrite your deployment tools when changing environments - and the blueprint idea is simple and flexible enough to adapt to customer requirements to incorporate promising new technologies as they arise, such as Terraform, if the demand is there. They offer convenience without the lock-in. We believe that the merits of running your HPC workloads with Verne Global - utilising Iceland’s low-cost, reliable and 100% renewable energy - is so compelling that there is no need for us to place artificial barriers in the way to make it difficult for you to leave, as seems to be increasingly common in the industry.
The software deployed via a blueprint could be an off-the-shelf solution, or it could just as easily represent your own in-house HPC application. Either way, it will be ready to be deployed at the click of a button.
The cloudpoint portal and hpcDIRECT API
Of course, there's no point in having these blueprints ready to be deployed at the click of a button, unless we have a button we can actually click, so in tandem with developing the service, we have also been working on a web-based user interface to this service that we call “cloudpoint”.
The cloudpoint portal (shown below) is designed to make managing your clusters easier, presenting information in a user-friendly, clear and concise manner. It includes such advanced features as cluster-deployment from a blueprint, monitoring of clusters you have currently running and user management, When you’ve finished, you can simply tear down clusters, safe in the knowledge that recreating the cluster is now a trivial task. Of course there is more on our roadmap, including the ability to create your own blueprints.
A core tenet of the design of the portal is that it operates through our hpcDIRECT API - the same public facing API that is also available to our customers. This means that everything that can be done through the graphical user interface is also available for our power-users to access programmatically. And of course, we still offer the option to use the OpenStack APIs directly, should you so require.
Behind the scenes, the portal is actually one part of a collection of containerised microservices that make up cloudpoint. A microservices architecture is a method of separating out what would otherwise be a complex monolithic system into many individual services, each contained into discrete, conceptually understandable units. Designed to follow the UNIX philosophy of doing one thing, and doing that one thing well, each microservice is small, lightweight and modular. Their isolation and reduced functional scope means that the impact of changes are limited, and the ongoing cost of maintenance is reduced, resulting in a far more manageable and reliable system. Each one is encapsulated in a REST API, with this API being the sole medium of interaction between the various microservices.
In addition to the portal, we have built several other microservices, amongst which we have a dedicated microservice for presenting the customer-facing hpcDIRECT API, one for handling blueprints, and another for managing clusters.
To keep on top of these services, we use Kubernetes to deploy, scale and manage them all. A plethora of APIs does have the potential to present a large attack surface, and, as such, security is at the forefront of our minds.
Striving to match the physical security of our data center that comes with locating it on a secure, former NATO base on the west coast of Iceland (uniquely located between USA and Europe, I might add), we only expose the hpcDIRECT API to the outside world and keep the others strictly internal, with checks at every step to make sure any request comes from within the system itself. This is handled by yet another microservice that oversees and controls the interactions between the others, specified by a fine-grained permission system, so that nothing is allowed to access anything else unless the user, the requesting microservice and the operation all check out.
The final essential tool in our development process is Gitlab - Independent of our choice of Gitlab as our main code-repository (although we do still share some public repositories on GitHub), we had already opted to use the GitLab flow git branching strategy. Now, we also take full advantage of its built-in CI/CD system.
As we are implementing agile software development practices, such as test driven development (TDD) - where we write tests in advance based on requirements, and then write or improve our code-base to pass the new tests - we are ideally placed to implement Continuous Integration (CI). CI is the practice of merging our code into a shared master branch, early and often, and running automated unit tests with every merge request. This process keeps our codebase fully clean and makes sure we find and fix problems before they land on production
Gitlab’s pipelines also lend themselves well to Continuous Delivery (CD). We already assume that we are able to deploy to production every time a feature branch is merged, but soon we aim to be able to confidently deploy straight to production every time a feature branch lands. Due to the complexities of testing each and every microservice, our manual and automated testing regime is not quite sufficiently rigorous to deploy on every merge to any one microservice. However, this is our next goal - soon, we should be able to deploy microservices one at a time if necessary, as and when they are ready.
Together, all of these technologies and processes help us to be adaptive to customer feedback and feature requests, without needing a large lead time from making the code changes required for a feature and merging those changes into the production code-base.
Our philosophy has always been one of adaptability, led by customer demand. Our architecture grants us the flexibility to incorporate other technologies into the mix in the future: We could add alternative shiny new backend technologies alongside our existing OpenStack-based HPC as a service solution, or theoretically even completely replace it with a different backend altogether - all with minimal disruption - depending on the needs of our customers in the months and years ahead.
So far, the team have built a solid foundation for hpcDIRECT, having constructed a scalable system that will grow along with us as we expand hpcDIRECT, and which will also allow us to move fast and evolve quickly. The system is already functional - so much so that our ops team are “dogfooding” the cloudpoint platform to perform our own operations - and it is only going to get better.
Best of all, the system is built entirely on open-source technologies, with our blueprints giving you the convenience and support that you would expect from a cloud provider, the raw power of bare metal and all without the lock-in.