Why do we use containers?
Why do we use virtual machines (cloud hosts)? Why use a physical machine? There is no uniform standard answer to this series of questions. Because these technology stacks have their own most applicable scenarios, they are irreplaceable under best practices. There was no virtual machine, all typesBusiness applications are running directly on physical hosts. Computing and storage resources are difficult to increase or decrease. Either they have been insufficient or they have been wasting excess resources. So later, we see that more and more virtual machines (or cloud hosts) are used, and the usage scenarios of physical machines are greatly compressed.Special types of applications such as database systems are available.
There were no containers at all. We ran most of our business applications on virtual machines (or cloud hosts) and a few special types of applications on physical hosts. But now all the virtual machine technology solutions can not avoid two main problems, one is virtualization hypervisor.The resource consumption of management software itself and the performance of disk IO are reduced. Another is that virtual machine is still an independent operating system, which is too heavy for many types of business applications, resulting in low efficiency in scaling and configuration management of virtual machines. So, we found the container later.The advantage is that all business applications can run directly on the operating system of the physical host. They can read and write disks directly. They are separated by computing, storage and the namespace of network resources, thus forming a logically independent “container operating system” for each application. In addition, container technology has the following advantagesThe following advantages: simplified deployment, multi-environment support, rapid start-up, service orchestration, easy migration.
Some shortcomings of container technology: still unable to achieve complete security isolation, technology stack complexity soared, especially after the application of container cluster technology. So if it is only used on a small scale, it is possible to do experiments or tests, and the production environment needs to think twice before doing so.
Evolution of Container and Container Cluster Technology
Evolution Route of Container Technology
Note: The picture above is taken from the White Paper on Container Technology and Its Application [v1.0].
Evolution of Container Cluster Technology
The above figure describes the evolution of container technology. The development of container technology in the past three years is mainly centered on container cluster technology, as shown below.
Operating Principle and Basic Components of Containers
DockerContainers are mainly based on the following three key technologies: – Namespaces – CgroupsTechnology – Image Mirror
Container engine Container Engine or Runtime is the core of container system. It is also the reference object of the word “container” used by many people. Container engines can create and run containers, and container definitions are typically stored in text, such as Dockerf.Ile.
- Docker Engine ：At present, the most popular container engine is also the industry’s de facto standard.
- Rkt：CoreOS The container engine launched by the team, with a simpler architecture, has always been a direct competitor of Docker, and is one of the container engines supported by the kubernetes scheduling system.
- containerd：This new Daemon is a refactoring of Docker’s internal components to support the OCI specification. Container’s main responsibilities are mirror management (mirror, meta-information, etc.), container execution (calling the final runtime component execution), and up to Docker Daem.On provides a gRPC interface, which combines runC with containerd-shim downwards, enabling the engine to upgrade independently.
- docker-shim：shim By calling containerd to start the docker container, each container starts a new docker-shim process. Docker-shim is specified by three parameters: container id, boundle directory, and runtime.The default is runC) to call the API of runC to create a container.
- runC ：It is a concrete implementation of Docker according to the Open Container Format (OCF), which realizes the functions of container start-stop, resource isolation and so on. So it can use runC directly without using the docker engine.Running a container. It also supports the option of running with other containers by changing the parameter configuration. RunC can be said to be the result of the cooperation and compromise among the major CaaS manufacturers.
Note: RunC is widely used in the production environment driven by various CaaS manufacturers. Kubernetes currently only supports RunC containers, and does not support Docker beyond its container abstraction layer. Similarly, Mesos also uses its UnifIED Containerizer supports only RunC containers, and Docker is currently supported, but future plans are to support only Unified Containerizer. CF also supports RunC only through Garden, not DockerBeyond RunC.
Why do we need a docker-containerd-shim process during the start-up or operation of the container? The purpose is as follows: – It allows container runtime (runC) to exit after starting the container, which simply means that there is no need to run a container runtime (runC) for each container all the time – even if both container and dockerd are hung up.In addition, the container’s standard IO and other file descriptors are also available – reporting the exit status of the container to the container
rktWhat’s the difference between containerd and containerd? One major difference is that rkt, as a daemonless tool, can be used to integrate and execute special critical-use containers in production environments. For example, CoreOS Container LinuX uses RKT to execute Kubernetes agent, kublet, in a container mirror manner. More examples include using RKT to mount volume in a containerized way in the Kubernetes ecosystem. It also means RKT.It can be integrated and used with Linux’s init system, because RKT itself is not an init system. Kubernets supports container deployment, not just dockers, but CoreOS RKT is also a container player.Firstly, although it is still obviously in an absolute disadvantage compared with docker, there is always better competition than no competition.
Container Arrangement and Management System
Containers are a lightweight technology, which means that more instances of containers can be created on the basis of equal resources than physical machines and virtual machines. Once faced with large-scale applications distributed on multiple hosts and with hundreds of containers, traditional or stand-alone container management solutions become availableOne’s strength does not match one’s ambitions. On the other hand, because of providing more and more perfect native support for micro-services, the container size in a container cluster is getting smaller and smaller, and the number of containers is increasing. In this case, containers or micro services need to be managed and orderly accessed to the external environment to achieve scheduling, load balancing and allocation.Other tasks. Simple and efficient management of rapidly growing container instances naturally becomes the main task of a container arrangement system.
Container cluster management tools can manage applications composed of multiple containers on a set of servers. Each application cluster seems to be a deployment or management entity in container layout tools. Container cluster management tools automate application clusters in all aspects, including application instance deployment, application update, health check, elastic scaling, and so on.Automatic fault tolerance and so on.Layered Structure Diagram of Container Arrangement and Management System
Major players in container layout and management systems – Kubernetes：Google Open source container management system originated from Borg system, which has a long history. Because its rich functions are used by many companies, its development route focuses on standardization and vendor neutrality, supports different container runtime and engines (such as Rkt) at the bottom, and gradually removes Dock.Er dependence. The core of Kubernetes is how to deploy, extend and manage containerized applications automatically. At present, the number of Stars on GitHub is 43k. Docker Swarm: InAfter Docker 1.2, Swarm is integrated into the Docker engine. Users can easily and quickly build a docker container cluster, almost fully compatible with the characteristics of the docker API. At present, the project is on GitHub SThe number of tar is 5.3k. Mesosphere Marathon: Apache Mesos’s scheduling framework aims to become the operating system of the data center and take over the management of the data center completely. The idea of Mesos is Data Center Operating System (DC)OS, in order to solve the network, computing and storage problems of IaaS layer, so the core of Mesos is to solve the problem of physical resource layer. Marathon is a container orchestration platform designed for Mesosphere DC/OS and Apache Mesos. Currently the itemThe number of Stars on GitHub is 3.7k.
Note: Many companies at home and abroad are engaged in innovation and Entrepreneurship Based on the above three basic technology platforms, providing value-added services for enterprises. Rancher is a good one. Its products can be compatible with kubernetes, mesos and swarm cluster systems at the same time.There are many commercial solutions, such as OpenShift.
Performance of China Market In the Chinese market, in June 2017, K8SMeetup, the Chinese community of Kubernetes, organized the first survey of container developers and enterprise users in China. Nearly 100 respondents and businesses brought us information about KuberneAccording to the first-hand survey on the landing status of TEs in China: – Kubernetes accounts for 70% of the market share of container layout tools, in addition to Mesos about 11%, Swarm less than 7%; – Among Chinese enterprise users interviewed, KuberneteThe application types running on S platform are very wide, including almost all kinds of applications except Hadoop big data technology stack; – The distribution of the underlying environment of Kubernetes running by Chinese enterprises interviewed shows that 29% of customers run container clusters directly on bare computers, while in packages.Sixty percent of customers run container cluster services on pan-cloud platforms, including OpenStack, VMWare, Aliyun and Tencent Cloud.
About the CNCF Foundation
Major container technology manufacturers (including Docker, CoreOS, Google, Mesosphere, Red Hat, etc.) have established Cloud Native Computing Foundation (CNCF). CNCFThe definition of cloud primitiveness is: – Cloud native technology helps companies and organizations build and run flexible and scalable applications in new dynamic environments such as public, private and hybrid clouds. Native cloud representational technologies include containers, service grids, micro services, immutable infrastructure, and declarative APIs. These technologies can build fault-tolerant,A loosely coupled system is easy to manage and observe. With reliable automation tools, cloud native technology can make it easy for developers to make frequent and predictable major changes to the system. CNCF is committed to fostering and maintaining a vendor-neutral open source ecosystem for promotion.Cloud native technology. We make these innovations available to the public by making the most cutting-edge models universal.
The following is a list of cloud native projects for CNCF as of 2018.11:
- Container is the core technology of cloud native, which is divided into two layers: Runtime and Orchestration. Runtime is responsible for container computing, storage and network; Orchestration is responsible for container cluster scheduling, service discovery and resource management.。
Note: The above figure only intercepts the core components of the original figure. The complete chart is detailed at https://landscape.cncf.io/images/landscape.png.
KubernetesSchematic diagram of core components
- etcdIt is a distributed database of Kubernetes’storage state, using raft protocol as consistency algorithm (the principle of raft protocol can be seen in an animated demonstration http://thesecretlivesofdata.com/raft/).
- API ServerComponents provide authentication and authorization, run a set of access controllers and manage API versions. They provide services to the outside world through REST API, allowing various components to create, read, write, update and monitor resources (Pod, Deployment, Service, etc.).。
- SchedulerComponents that select appropriate nodes for creating Pods based on cluster resources and state.
- Controller ManagerComponent to implement the behavior of ReplicaSet.
- KubeletComponent, responsible for monitoring a set of Pods bound to its node, and can return the running status of these Pods in real time.
Create the entire process sequence diagram for Pod
The large-scale use of containers also provides higher requirements for the network. The inflexibility of the network is also the shortcoming of many enterprises. At present, many companies and projects are trying to solve these problems, hoping to put forward a network solution in the container era. Docker adopts plug-in network mode and provides bridg by default.E, host, none, overlay, macvlan and network plugins are several network modes. When running the container, you can set which mode to use through the network parameter.
– bridge：This is Docker’s default network driver, which allocates Network Namespace and IP settings for each container and connects the container to a virtual bridge. If no network driver is specified, this driver is used by default. Host: This network driver is directConnect to the host network.
– none：This driver does not construct a network environment. With none network driver, you can only use loopback network devices, and the container can only use 127.0.0.1 local network.
– overlay：This network driver enables multiple Docker daemons to connect together and communicate with each other using swarm services. The overlay network can also be used to communicate between swarm services and containers and between containers.
– macvlan：This network allows the container to be assigned a MAC address and the container to be the physical device in the network, so that Docker daemon can access the route through the MAC address. For legacy applications that want to connect directly to a network, this network driver may sometimes be the best choice.。
– Network plugins：Third party network plug-ins can be installed and used. These plug-ins can be obtained from Docker Store or third-party vendors.
By default, Docker uses bridge network mode.
Container Network Model (CNM)
CNMIntroduced by Docker in 2015, CNM has IP Address Management (IPAM) and network plug-in capabilities. IPAM plug-ins can create IP address pools and allocate, delete and release container IP. The Network Plug-in API is used to create/delete networks and add/delete containers from networks.
Container Network Interface (CNI)
CNIBorn in April 2015 and launched by CoreOS, CNI is a network system plug-in in the container, which makes it easier for management platforms like Kubernetes to support IPAM, SDN or other network solutions. The basic idea of CNI implementation is: ContiWhen creating containers, aner runtime creates network namespace first. In the actual operation, the first container created is the Pause container. Then call the CNI plug-in to configure the network for the netns, and finally start the containerProcess.
CNI PluginResponsible for configuring the network for containers, including two basic interfaces: – Configuring the network: AddNetwork (net network Config, RT Runtime Conf) (types. Result, error) – Cleaning the networkNetwork: DelNetwork Config (rt Runtime Conf) error
Each CNI plug-in needs only two basic operations: the ADD operation to create the network, and the DEL operation to delete the network (and an optional VERSION view version operation). So the implementation of CNI is really very simple, and the complex logic is handed over to the specific Network Plu.Gin implementation.
Kubernetes CNI Plug-in unit
- Flannel：CoreOS The open source network scheme is designed for kubernetes. Its function is to make the Docker containers created by different node hosts in the cluster have the unique virtual IP address of the whole cluster. Flannel’s underlying communication protocols have many options, such as UDP, VXlaN, AWS VPC and so on. The network communication efficiency under different protocols is quite different. The default is to use UDP protocol, which is simple to deploy and manage. So far, the Network Policy of k8s has not been supported.
- Calico：A pure three-tier network solution, using BGP protocol for routing, can be integrated into openstack and docker. Calico node network can directly utilize the network structure of data center (whether L2 or L3), without additional NAT, tunneling.Channel or Overlay Network, network communication performance is good. Calico also provides a rich and flexible network policy based on iptables, guaranteeing multi-tenant isolation, security groups, and Workload through ACLs on each node.Its accessibility restriction and other functions. If the BGP protocol can be opened in the enterprise production environment, the calico BGP scheme can be considered. But in reality, the network does not always support BGP routing, so Calico also designed an IPIP mode, using Overlay mode.To transmit data.
- Weave Net：weaveworks The scheme of the given network is implemented by using vxlan technology through Overlay network, which supports the isolation and security of the network, and is relatively simple to install and use.
- Contiv: Cisco is open source, compatible with CNI model and CNM model, supports VXLAN and VLAN solutions, and has complex configuration. Support Tenant, tenant isolation, support multiple network modes (L2, L3, overlay, Cisco SDN solution ti)On). Contiv makes it easy for users to access the container instance IP directly.
- Canal：A project to provide a network firewall between Kubernetes Pod based on Flannel and Calico.
- Cilium：Using the network solution provided by Linux native technology, support the access strategy of L7 and L3 and L4 layers.
- Romana：Panic Networks The proposed network open source scheme is based on L3 network connectivity, so there is no performance loss caused by Overlay network, but only through IP segment planning to achieve tenant partition.
In theory, the network speed of these CNI tools should be divided into three speed levels.The fastest are Romana, Flannel in Gateway mode and Calico in BGP mode.The next level is Calico of IPIP mode, Overlay network of Swarm, Flannel of VxLan mode and Weave of Fastpath mode. * The slowest are Flannel in UDP mode and We in Sleeve mode.Ave.
- UDPPackets use Flannel’s custom packet header protocol. Data is packaged and unpacked in Linux’s user mode. So when data enters the host, it needs to undergo two transitions from kernel mode to user mode. Network communication efficiency is low and there are unreliable factors.
- VxLANPacket is a standard protocol built in Linux kernel, so although its packet structure is more complex than UDP mode, because all data loading and unpacking processes are completed in the kernel, the actual transmission speed is much faster than UDP mode. The complexity of Vxlan solutions in large-scale applicationsLifting, fault location analysis is complex.
- FlannelThe Gateway model is comparable to the Calico model, and even faster in theory. Flannel’s Host-Gateway mode, in which Flannel brushes routing information from the container network to the master through agent processes on each nodeOn the machine’s routing table, all hosts will have the routing data of the whole container network. The Host-Gateway approach does not introduce additional packages and unpacking operations like Overlay. It is completely a common network routing mechanism, and its efficiency is almost the same as that of direct communication between virtual machines.。 The Host-Gateway model can only be used for two-tier directly accessible networks, which are usually relatively small due to broadcast storms. Routing network has a great impact on the existing network equipment. Router’s routing table has space limitations, usually 230,000, while most of the containers.The application scenario is to run micro services, and the number set is very large.
FlannelSchematic diagram of network communication principle
Functional comparison of CNI network plug-ins:
Because of the short lifetime of containers, the state of containers (stored data) must be independent of the life cycle of containers. Therefore, the storage of containers becomes very important. Ceph: A distributed storage system that supports block storage, file storage and object storage. It has a long history and is stable.It has also been verified. Previously it was widely used in the OpenStack community, and now it’s a good option in the container community. GlusterFS: Red Hat’s product, easy to deploy and scalable. Commercial Storage: DELL EMC, NetApp et al. – CSI (Container Storage Interface): Define the project of cloud application scheduling platform and various storage service interfaces. The core goal is that storage providers can be integrated by writing only one driver.Go to any container platform. Rook: Depth integration into the container project of Kubernetes container platform based on Eph as background storage technology, because Ceph and Kubernetes are two popular technologies selected and mentioned.For automatic deployment, management, expansion, upgrade, migration, disaster preparedness and monitoring functions
KubernetesSupported storage types
- fc (fibre channel)
- gitRepo (deprecated)
KubernetesDifferent storage systems are docked in the form of in-tree plugins to meet the needs of users to use these plug-ins to provide storage services to containers. At the same time compatible users use FlexVolume and CSI customized plug-ins.
Generally speaking, Pod in Kubernetes accesses storage resources in three ways: direct access, static provision and dynamic provision (creating PV dynamically using Storage Class)
The combination of containers and microservices has created another buzz and made service discovery a success. Microservices can be easily extended, but tools are also needed to meet the need for mutual discovery between services. DNS servers monitor Kubernetes that create new servicesAPI to create a set of DNS records for each service. If DNS is always enabled for the entire cluster, then all Pods should be able to automatically resolve service names. The technical implementation is through kubernetEs API monitors changes in service resources and generates DNS records based on service information and writes them to etcd. DNS provides DNS query service for Pod in cluster, while DNS records are read from etcd. – kube-dns:kUbe-dns is a built-in plug-in in Kubernetes and is currently maintained as an independent open source project. Kubernetes DNS pod includes three containers: kube-dns, sidecar, dnsmasq-COreDNS: CoreDNS is a set of flexible and extensible authoritative DNS servers. As a hosted project in CNCF, it has been formally used as an additional option for cluster DNS since version k8s 1.11, and takes effect by default when users use kubeadm. Provide betterFor reliability, flexibility and security, you can choose to replace Kubernetes plug-in kube-dns with CoreDNS.
State data storage
At present, there are three main tools. Most container management systems also support these three tools at the same time. Etcd: CoreOS open source distributed key-value storage that provides services through HTTP/HTTPS protocols. Etcd is just one.Key-value storage, which by default does not support service discovery, requires three-party tools to integrate. Kubernetes uses etcd as storage by default. ZooKeeper: A subproject of Hadoop, originally as HadoopCluster-managed data storage has also been applied to the field of containers, and the development language is Java. Consul: A Distributed Service Discovery and Configuration Management Tool Developed by HashiCorp
The main function of these tools is to ensure that the dynamic information of the cluster can be stored uniformly and consistently, so that each node and container can get the current information of the cluster correctly.
KubernetesProvides two types of health checks to support three types of detection: HTTP, Command and TCP. Readines probes are designed to let Kubernetes know when your application is ready for its traffic service. Kubernetes ensures ReThe adiness probe passes detection and then allows the service to send traffic to Pod. If the Readines probe starts to fail, Kubernetes stops sending traffic to the container until it passes. – Liveness probe makes KuberneteS knows if your application is alive or dead. If your application is still alive, then Kubernetes doesn’t care about it. If your application is dead, Kubernetes will delete the Pod and start a new replacement.
We are accustomed to monitoring at two levels: the application and the host that runs them. Now, because the container is in the middle layer and Kubernetes itself needs to be monitored, there are four different components that need to be monitored and gathered measurement information. 1) cAdvisor + InFluxDB+Grafana: A simple multi-host monitoring system Cadvisor: Write data into InfluxDB InfluxDB: a time-series database, provide data storage, store in a specified directory, Grafana: provides W.EB console, custom query indicators, query data from InfluxDB, and display. 2) Heapster + InfluxDB + Grafana: Heapster is a collector that will collect cAdvisor data on each NodeSummarize and then import to InfluxDB to support detailed resource usage at all levels of Cluster, Node and Pod. Heapster: Get Metrics and event data in the Kubernetes cluster and write it to InfluxDB, Heapster collects more data than cAdvisor, and less data is stored in InfluxDB. InfluxDB: A sequential database that provides storage of data in a specified directory. Grafana: Provides a WEB console to customize queriesIndicators, query data from InfluxDB and display it. 3) Prometheus + Grafana: Prometheus is a monitoring tool integrating DB, Graph, Statistics and Alert. Provide a multidimensional data model (Time series data consists of metric names and a set of key / value, and PromQl, which provides high write and query performance. It does not have high availability and support because it occupies a large amount of memory, does not depend on distributed storage and works on a single master node.Pull/push two methods of time series data acquisition.
Considering the shortcomings of Prometheus in terms of scalability and high availability, in very large-scale applications, we can consider an open source project such as thanos for solving Prometheus’long-term data storage and high availability solutions: https://githu.B.com/improbable-eng/thanos
Four Monitoring Levels of Container Cluster
Mirror registry is the place where mirrors are stored. It is convenient to share container mirrors in teams, companies or around the world. It is also the basic infrastructure for running containers. Docker Registry: Docker’s Open Source Mirror Server, alsoHarbor, the most popular self-built registry solution at present, provides permission control and graphical interface for enterprise-level mirror registry
Almost every corresponding technology has its own basic image, such as — https://hub.docker.com//java/(which still exists but has not been updated for a long time) – https://hub.docker.com/_/python/- https://hub.docker.com//nginx/- https://hub.docker.com//alpine/a commonly used base mirror Alpine Linux(Volume less than 5MB).
China Open Source Cloud Alliance Container Working Group-Container Technology and Its Application
Vernacular Kubernetes Network
Monitoring Scheme for Host and Container
Summary of Cloud Primary Container Ecosystem
Introduction of Storage System and Implementation of Mechanism
Introducing the Working Principle of Internal Components
Docker、Containerd、RunC…：Everything you should know
From docker to runC