Apache EU Roadshow: Schedule

Cloud Track

Wednesday, 13th June 2018. Room: Palais Atelier

10:30 - 11:15, Building Clouds with Apache Cloudstack

Giles Sirett, CEO & Founder, ShapeBlue

Apache CloudStack is open source software designed to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. This talk will give an introduction to the technology, its history and its architecture. It will look common use-cases (and some real production deployments) that are seen across both public and private cloud infrastructures and where CloudStack can be completed by other open source technologies.

The talk will also compare and contrast Apache Cloudstack with other IaaS platforms and why he thinks that the technology, combined with the Apache governance model will see CloudStack become the de-facto open source cloud platform. He will run a live demo of the software and talk about ways that people can get involved in the Apache CloudStack project.

11:25 - 12:10, What's New in Apache CloudStack 4.11

Paul Angus, VP Technology & Cloud Architect, ShapeBlue

Apache CloudStack 4.11 recently shipped with 1000 updates including over 50 new and enhanced features. This presentation will lift the covers on some of the exciting new features that users and cloud operators can enjoy in the 4.11.

12:20 - 13:05, Successfully Running CloudStack with High-Performance Workloads Using Managed Primary Storage

Andrija Panic, Senior Cloud Systems Engineer, HIAG DATA AG & Mike Tutkowski, Senior CloudStack Developer, SolidFire

This presentation will focus on my real-world experiences transitioning from a smaller CloudStack IaaS provider with light workloads to a more sophisticated configuration required to run enterprise workloads. We will cover specific differences in running KVM-based CloudStack on shared storage.

In particular, we will focus on discussing light workloads vs. those having more serious CPU and IO demands and how to solve different stability, performance and usability challenges by leveraging Managed Storage in CloudStack. We will delve into the critical topics of implementing proper storage QoS and volume snapshots with Managed Storage. Finally, we will cover migrating VMs from traditional shared storage to Managed Storage with no downtime and discuss specific challenges here and how they can be addressed. Presentation is intended for CloudStack system administrators and technical people operating CloudStack on daily basis.

14:20 - 15:05, From Docker to Kubernetes: Running Apache Hadoop in a Cloud Native Way

Marton Elek, Lead Developer Engineer, Hortonworks

Kubernetes slowly become one of the most popular container runtime environment while Hadoop has already been widely used open source bigdata platform since a long time. The questions is here: how can we use the new cloud-native toolset to administer and manage Hadoop based clusters? Is there any benefit to run Hadoop and other bigdata application on top of Kubernetes?

In this presentation I will show that Hadoop is not a legacy application but it could be run very easy in cloud-native environment thanks to the generic and distributed by design. The first step to run an application in a Kuberrnetes cluster is containerization. Creating containers for an application is easy (even if it’s a goold old distributed application like Apache Hadoop), just a few steps of packaging. The hard part isn't packaging: it's deploying How can we run the containers together? How to configure them? How do the services in the containers find and talk to each other? How do you deploy and manage clusters with hundred of nodes?

Modern cloud native tools like Kubernetes or Consul/Nomad could help a lot but they could be used in different way. In this presentation I will demonstrate multiple solutions to manage containerized clusters with different cloud-native tools including kubernetes, and docker-swarm/compose. No matter which tools you use, the same questions of service discovery and configuration management arise. This talk will show the key elements needed to make that containerized cluster work.

15:15 - 16:00, Apache Mesos: Orchestrating Container and Big Data

Benjamin Bannier, Senior Software Engineer, Mesosphere

Processing Big Data and managing large-scale container deployments both necessitates large compute cluster. And large clusters -especially when running multiple Big Data systems- require some kind of cluster manager and cluster scheduler. In this talk, we will give an overview how Apache Mesos help solves the problems of large-scale clusters and then take a look at the current state of the Container Orchestration and Big Data ecosystem built on top of this foundation. For example, Mesos enables users to easily deploy and manage container using different frameworks (e.g., kubernetes, marathon, or Apache Aurora).

Furthermore, we will look at the growing Big Data ecosystem on top of Apache Mesos and DC/OS including, for example, Apache Spark, Apache Cassandra, and Apache Kafka. Finally, we will also provide some insights into future developments, both for the foundation (i.e., Apache Mesos) as well as the ecosystem on top.

16:35 - 17:20, Serverless Computing with Apache OpenWhisk

Lorna Jane Mitchell, Developer Advocate, IBM Watson Data Platform

Serverless technology is a way of deploying individual functions to the cloud and running them on demand. This makes them very easy to work with, isolated from one another, and also individually scalable. Come and learn how begin using Apache Openwhisk for your own applications such as APIs or chatbot integrations.

This session is recommended for tech leads and developers of all levels.

17:30 - 18:15, Cross Data Center Replication in Apache Solr

Amrit Sarkar, Search Engineer, Lucidworks & Anvi Jain, Senior Software Engineer, Progress Software Pvt Ltd

High availability of data across geographic regions for search and analytical applications is a challenging task. Mission critical applications need effective failover strategies across data centers. Apache Solr offers Cross Data Center Replication (CDCR) as a feature from 6.0 and has added more features in subsequent releases.

The first part of session will center on an active-passive design model with one data-center as the primary and other data-centers as secondary clusters. The second design model centers on designing an active-active bidirectional setup such that both querying and indexing traffic can gracefully be redirected to the failover cluster. The third part of session will center on an actual use case: An analytics application with high availability. We will discuss the improvements observed in terms of maintenance, performance, and throughput. The session concludes with challenges and/or limitations in the current design and what improvements are forthcoming for Cross Data Center Replication in Apache Solr.