Testability
Each Ops Component can be run and tested in isolation, surrounded by mocks or in partial or full integration. Start‐up times and round trips are reduced, leading to higher productivity.
Parallel development
Each ops‐component can be assigned to a single team. Teams can work in parallel, following different release cycles. Suitable interfaces decouple Ops Components as well as teams. Teams would agree on interface definitions, provide mocks early and manage interface changes or incompatibilities, thus minimising mutual dependencies.
Independent deployment
Each team is free to release and deploy their Ops Components whenever they feel like as long as all deliverables are thoroughly tested and all interface contracts fulfilled, thus reducing time to market.
And why should Ops Components be large?
Reduced complexity
Few Ops Components are easier to handle than tens, hundreds or thousands of them.
No distribution debt
Every software distribution creates a distribution debt: cross‐component calls are more complex and have a higher latency than local ones; cross‐component refactoring gets more complicated. A monolith has no distribution debt.
Improved diagnosability
Failures within a monolith are confined to the context of that single process. So it is easy to analyse the logs, metrics and traces and correlate them to user actions. In a heavily distributed system it is a challenge to correlate all logs, metrics and traces across all nodes involved.
Improved scalability
Scaling a monolith is easy: just clone it. But with many Ops Components working in parallel it’s hard to figure out what component to scale when performance deteriorates. This is particularly true in the frequent case of non‐linear distributed execution traces.
Easy integration
Integration of monoliths is well‐understood: a continuous integration system takes all the code, compiles it, tests it and thereby figures out how well‐integrated the current code baseline is.
Reduced resource consumption
Each Ops Component produces overhead: memory required for a virtual machine (like the JVM), disk space required for libraries. A monolith produces overhead exactly once per instance. Systems with lots of different Ops Components induce overhead for each and every instance.
And now? It depends.
The following opinionated approach is proven in practice: Start with a monolith and leave it at that if it is good enough. Decompose it whenever the demand arises [4]. This is called the Monolith First approach. If a monolithic Ops Component doesn’t work satisfactorily, examine every Dev Component and decide if it is worth deriving a separate Ops Component or if it can be clustered into a single one, forming a unit of runtime, scaling, release and deployment. Ops Component should not be larger than a team of three up to five engineers can handle. Do not cut down to the nanoservice level unless you know exactly what you are doing. At nanoservices level, each Dev Component is divided into several Ops Components, thus dissolving its logical boundaries.
To summarise we would like to stress that all desirable software properties discussed above are achieved at two levels: 1.A good software architecture is paramount, regardless of how Ops Components are cut.
2.Suitably designed Ops Components at the level of subsystems or components make sure that the potential of a given architecture is fully exploited.
How to Handle State?
The cloud community regretfully mostly ignores the issue of state handling across Ops Components. State handling is paramount for performance and robustness. Within a distributed system handling shared states between different Ops Components or even different instances of the same Ops Component is hard. As partition tolerance is a must‐have for distributed applications you can only decide between consistency and availability (with eventual consistency) according to the CAP theorem [7]. You can use state synchronization mechanisms based on gossip protocols if you want availability or based on consensus protocols if consistency is required.
How to Handle Transactions?
Conventional distributed transaction handling mechanisms like two phase commit are not partition tolerant and thus unsuitable for clusters. Distributed technical transactions are nowadays considered bad architecture style anyway. The best practice for distributed transactions has been the same for long, regardless of mode 1 or 2: Have technical transactions confined to single components and provide business‐specific undo workflows as distributed rollbacks.
How to Communicate with Other Components?
Ops Components communicate over remote interfaces. Asynchronous communication is the preferred way as it improves responsiveness. For synchronous communication the REST protocol prevails; more efficient binary protocols like gRPC are available. Asynchronous communication works on REST as well as on gRPC. Synchronous calls may be performed asynchronously on the client side. Messaging protocols like AMQP, JMS or Kafka are asynchronous by concept, thus suitable for async‐only communication.
How to Provide a User Interface?
There are multiple ways how to tackle user interfaces of cloud native applications: 1.A standalone user interface is provided by each Ops Component; all of them are linked together with hyperlinks. This is the so‐called self‐contained system approach.
2.A standalone user interface is provided by a particular UI Ops Component.
3.A user interface frame is provided by a UI Ops Component which integrates partial UIs from other Ops Components.
The suitable approach depends on how modular the UI should be. If it is completely modular (1) or (3) should be chosen. If the UI is rather integrative then (2) might be the best option.
63.2.3 The Anatomy of a Cloud Native Stack
The Ops Components need a stack to run upon. This stack is called the cloud native stack. Its anatomy converges to what is show in (Fig. 63.3), with many new technologies continuously emerging. As described in Sect. 63.1.3, there is basically a cluster operating system (COS) to execute applications on a cluster and a platform atop of the COS providing all required infrastructure for cloud native applications (Cloud Native Application Platform).
Fig. 63.3The Anatomy of a Cloud Native Stack
The Cluster Resource Manager represents the bottom of the COS. It provides a uniform interface for allocating and releasing cluster resources (computing, networking, storage, memory). To unify resources regardless of their provenience (IaaS cloud, virtualized ore bare metal resources) the cluster resource manager uses overlay techniques like operating system virtualization for computing resources, software‐defined networks for networking, distributed file systems for storage and in‐memory data grids for memory. The analogy of a cluster resource manager on a single node is the driver subsystem of an operating system.
The Cluster Scheduler’s task is to execute Ops Components packaged in a container. The Cluster Scheduler uses the cluster resource manager to acquire and allocate resources, applies a scheduling algorithm to determine where and when to execute containers and finally monitors the container execution throughout the container lifespan. Scheduling is a multi‐objective optimization e. g. aiming at high utilization, high throughput, short make span and fairness [5, 8, 9]. The analogy of a cluster scheduler on a single node operating system is the process scheduler.
The Cluster Orchestrator runs an application on a cluster. It uses the cluster scheduler to execute and monitor all application containers (Ops Components) and automates a lot of standard operations procedures such as application deployment, rollback, scaling and configuration changes (DevOps Interface). Complex deployment scenarios like canary releases and green‐blue deployments are usually supported as well. The cluster orchestrator also detects and handles failu
res by performing rollbacks or by re‐scheduling to other resources. The analogy of a cluster orchestrator on a single node operating system is the init‐daemon. Examples for a COS are Kubernetes, DC/OS, and Docker Datacenter.
The Cloud Native Application Platform provides several infrastructure components for implementing cloud native applications on top. Its features include:
The Microservice Chassis (syn. Microservice Fabric, Microservice Container) is a container for microservices or Ops Components in general which handles the microservice lifecycle and exposes its interfaces. Examples are Spring Boot and JEE micro containers like Wildfly Swarm or KumuluzEE.
The Service Client calls other Ops Components. It performs service lookups, client‐side load balancing and failure handling using the circuit breaker pattern. An example is Netflix Feign with Ribbon and Hystrix.
The Service Discovery is used by Ops Components to register their own services and lookup others. It may also perform service health checks. Examples are Consul or Eureka as well as a DNS provided by the COS like Mesos‐DNS.
The API Gateway (syn. Edge Server) exposes services to the outer world (Edge Interface). The API gateway uses the service discovery to lookup the appropriate services. It performs actions like authentication and authorization, load shedding, load balancing, rate limiting and request validation. Examples are Traefik, Zuul, marathon‐lb, or Kubernetes Ingress.
The diagnosibility & Monitoring Service provides for cluster‐wide collecting, storing and analysing metrics, logs and traces. Examples are Prometheus, ZipKin, and the ELK stack.
The Configuration & Coordination Service stores consistently cluster‐wide configuration states and coordinates services such as locks, messages and leader election. It uses consensus protocols such as Raft or Paxos. Examples are Zookeeper, etcd, or Consul.
63.3 Summary
The cloud native stack allows everyone to build applications that hyperscale, are antifragile and allow continuous feature delivery. It abstracts away the complexity of a cluster by making it look like one single, huge machine. Applications are operated as one or many Ops Components. Ops Components transfer the idea of component‐based software into the realm of operations and are stand‐alone units of testing, releasing, deploying, scaling and transporting software.
Building applications like Google does is not only about technology – organizational and methodological changes are required as well. The benefits are clear: improved scalability in terms of traffic, data and features. The risks arise from barriers to change within an organization, less than mature technology, additional complexity and the lack of wide‐spread know‐how. But all of them can be mitigated by starting small: if you don’t know what you are doing, don’t do it on a large scale.
References
1.
J. Humble and D. Farley, Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation, Addison Wesley, 2010.
2.
N. N. Taleb, Antifragile. Things that Gain from Disorder, Penguin Books, 2012.
3.
L. Barroso, J. Clidaras and U. Hölzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines., Morgan and Claypool Publishers, 2009.
4.
M. Fowler, “www.martinfowler.com,” June 2015. [Online]. Available: http://martinfowler.com/bliki/MonolithFirst.html. [Accessed 25 08 2016].
5.
B. Hindman, “Mesos: a platform for fine-grained resource sharing in the data center,” 2011.
6.
E. Evans, Domain-Driven Design. Tackling Complexity in the Heart of Software., Addison-Wesley, 2003.
7.
N. L. S. Gilbert, “Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services,” 2002.
8.
M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek and J. Wilkes, “Omega: flexible, scalable schedulers for large compute clusters,” in SIGOPS European Conference on Computer Systems (EuroSys), Prague, 2013.
9.
A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune and J. Wilkes, “Large-scale cluster management at Google with Borg,” in Proceedings of the European Conference on Computer Systems (EuroSys), Bordeaux, 2015.
Further Reading
10.
S. Newman, Building Microservices, O’Reilly, 2015.
Footnotes
1 http://whatis.techtarget.com/definition/hyperscale-computing (retrieved 10/18/2016).
2 https://www.theguardian.com/technology/2007/jul/25/media.newmedia (retrieved 08/25/2016).
3 http://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide (retrieved 08/25/2016).
4 http://oneops.com.
5Werner Vogels, CTO Amazon.
6 http://business.time.com/2013/11/06/how-twitter-slayed-the-fail-whale.
7 http://www.gartner.com/it-glossary/bimodal (retrieved 10/18/2016).
8 http://scs-architecture.org.
9 https://12factor.net.
© Springer-Verlag GmbH Germany 2018
Claudia Linnhoff-Popien, Ralf Schneider and Michael Zaddach (eds.)Digital Marketplaces Unleashedhttps://doi.org/10.1007/978-3-662-49275-8_64
64. The Forecast Is Cloud – Aspects of Cloud Computing in the Broadcast Industry
Klaus Illgner-Fehns1 , Rainer Schäfer1 , Madeleine Keltsch1 , Peter Altendorf1 , Gordana Polanec-Kutija1 and Aylin Vogl1
(1)Institut für Rundfunktechnik, Munich, Germany
Klaus Illgner-Fehns (Corresponding author)
Email: [email protected]
Rainer Schäfer
Email: [email protected]
Madeleine Keltsch
Email: [email protected]
Peter Altendorf
Email: [email protected]
Gordana Polanec-Kutija
Email: [email protected]
Aylin Vogl
Email: [email protected]
64.1 Introduction
While cloud‐based applications such as e‐mail, file sharing and streaming services are already established in everyday life, cloud solutions for the professional broadcast and media production only slowly prevail. Perhaps this is more a matter of perception due to the complexity and opacity of the term “cloud computing”, which – depending on who you are talking to – can have very different meanings. This is the same for the term “broadcast”, where the extent to which elements of the value chain are associated with broadcasting varies from market to market. Important to note is that “broadcast production” includes a very broad range of genres, from news to features, to fiction, sports, and shows. Besides all technical and operational aspects cloud computing has a substantial economic impact as it changes the business models and the market players.
64.2 Today’s Common Understanding of “Cloud” Is IT‐Centric
64.2.1 Setting Cloud in a Structural Perspective
Today the cloud is omnipresent and the terms cloud or cloud computing are used for a multitude of different developments and solutions. It seems to be “in vogue” to offer some kind of cloud service and everybody wants to have a piece of the cake. Among experts, however, the definition from the National Institute of Standards and Technology (NIST) of the United States of America has become the established standard (see Fig. 64.1). The German Federal Office for Information Security (BSI) relies on the cloud computing definition by NIST as well, which states:Cloud computing is a model for enabling ubiquitous, convenient, on‐ demand network access to a shared pool of configurable computing resources (e. g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential cha
racteristics, three service models, and four deployment models [1].
Fig. 64.1Cloud computing model according to NIST
According to NIST and BSI cloud computing is also defined by the following five characteristics [2]: (1)On‐demand self‐service – Automatic provision of resources/services by the user
(2)Broad network access – Availability of resources/services via the Internet/network
(3)Resource pooling – Provider resources are (virtually) pooled to serve multiple consumers
(4)Rapid elasticity – Resources can be rapidly and elastically provisioned
(5)Measured services – Use of resources can be measured and monitored
Cloud computing, therefore, also includes a comprehensive control and management authority (orchestration layer) to achieve the stated automatic scalability and resource pooling. This, above all, is what distinguishes it from simple virtualization or even outsourcing solutions.
Digital Marketplaces Unleashed Page 97