Kubernetes is the de facto chief in container orchestration available on the market, and it’s an amazingly configurable and highly effective orchestration device. As with many highly effective instruments, it may be considerably complicated at first. This walk-through will cowl the fundamentals of making a number of pods, configuring them with secret credentials and configuration information, and exposing the providers to the world by creating an InfluxDB and Grafana deployment and Kubernetes cron job to collect statistics about your Twitter account from the Twitter developer API, all deployed on Kubernetes or OKD (previously OpenShift Origin).
- A Twitter account to watch
- A Twitter developer API account for gathering stats
- A Kubernetes or OKD cluster (or MiniKube or MiniShift)
- The kubectl or oc command-line interface (CLI) instruments put in
What you will be taught
This walkthrough will introduce you to quite a lot of Kubernetes ideas. You’ll study Kubernetes cron jobs, ConfigMaps, Secrets, Deployments, Services, and Ingress.
If you select to dive in additional, the included information can function an introduction to Tweepy, an “easy-to-use Python module for accessing the Twitter API,” InfluxDB configuration, and automatic Grafana dashboard providers.
This app consists of a Python script that polls the Twitter developer API on a schedule for stats about your Twitter account and shops them in InfluxDB as time-series knowledge. Grafana shows the information in human-friendly codecs (counts and graphs) on customizable dashboards.
All of those elements run in Kubernetes- or OKD-managed containers.
Get a Twitter developer API account
Follow the directions to sign up for a Twitter developer account, which permits entry to the Twitter API. Record your API_KEY, API_SECRET, ACCESS_TOKEN, and ACCESS_SECRET to make use of later.
Clone the TwitterGraph repo
The TwitterGraph GitHub repo accommodates all of the information wanted for this venture, in addition to a number of to make life simpler if you wish to do it yet again.
Set up InfluxDB
InfluxDB is an open supply knowledge retailer designed particularly for time-series knowledge. Since this venture will ballot Twitter on a schedule utilizing a Kubernetes cron job, InfluxDB is ideal for holding the information.
The Docker-maintained InfluxDB image on DockerHub will work fantastic for this venture. It works out-of-the-box with each Kubernetes and OKD.
Create a deployment
A Kubernetes deployment describes the desired state of a useful resource. For InfluxDB, it is a single container in a pod working an occasion of the InfluxDB picture.
A barebones InfluxDB deployment might be created with the kubectl create deployment command:
kubectl create deployment influxdb --image=docker.io/influxdb:1.6.four
The newly created deployment might be seen with the kubectl get deployment command:
kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
influxdb 1 1 1 1 7m40s
Specific particulars of the deployment might be seen with the kubectl describe deployment command:
kubectl describe deployment influxdb
CreationTimestamp: Mon, 14 Jan 2019 11:31:12 -0500
Replicas: 1 desired | 1 up to date | 1 whole | 1 obtainable | zero unavailable
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Host Port: <none>
Type Status Reason
---- ------ ------
Available True MinimalReplicasAvailable
Progressing True NewReplicaSetAvailable
NewReplicaSet: influxdb-85f7b44c44 (1/1 replicas created)
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 8m deployment-controller Scaled up reproduction set influxdb-85f7b44c44 to 1
Configure InfluxDB credentials utilizing secrets and techniques
Currently, Kubernetes is working an InfluxDB container with the default configuration from the docker.io/influxdb:1.6.four picture, however that isn’t essentially very useful for a database server. The database must be configured to make use of a particular set of credentials and to retailer the database knowledge between restarts.
Kuberenetes secrets are a option to retailer delicate info (reminiscent of passwords) and inject them into working containers as both surroundings variables or mounted volumes. This is ideal for storing database credentials and connection info, each to configure InfluxDB and to inform Grafana and the Python cron job how to connect with it.
You want 4 bits of knowledge to perform each duties:
- INFLUXDB_DATABASE—the title of the database to make use of
- INFLUXDB_HOST—the hostname the place the database server is working
- INFLUXDB_USERNAME—the username to log in with
- INFLUXDB_PASSWORD—the password to log in with
Create a secret utilizing the kubectl create secret command and a few primary credentials:
kubectl create secret generic influxdb-creds
This command creates a “generic-type” secret (versus “tls-” or “docker-registry-type” secrets and techniques) named influxdb-creds populated with some default credentials. Secrets use key/worth pairs to retailer knowledge, and that is excellent to be used as surroundings variables inside a container.
As with the examples above, the key might be seen with the kubectl get secret command:
kubectl get secret influxdb-creds
NAME TYPE DATA AGE
influxdb-creds Opaque four 11s
The keys contained throughout the secret (however not the values) might be seen utilizing the kubectl describe secret command. In this case, the INFLUXDB*_ keys are listed within the influxdb-creds secret:
kubectl describe secret influxdb-creds
INFLUXDB_DATABASE: 12 bytes
INFLUXDB_HOST: eight bytes
INFLUXDB_PASSWORD: four bytes
INFLUXDB_USERNAME: four bytes
Now that the key has been created, it may be shared with the InfluxDB pod working the database as an environment variable.
To share the key with the InfluxDB pod, it must be referenced as an surroundings variable within the deployment created earlier. The current deployment might be edited with the kubectl edit deployment command, which can open the deployment object in your system’s default editor set. When the file is saved, Kubernetes will apply the adjustments to the deployment.
To add surroundings variables for every of the secrets and techniques, the pod spec contained within the deployment must be modified. Specifically, the .spec.template.spec.containers array must be modified to incorporate an envFrom part.
Using the command kubectl edit deployment influxdb, discover that part within the deployment (this instance is truncated):
- picture: docker.io/influxdb:1.6.four
This part describes a really primary InfluxDB container. Secrets might be added to the container with an env array for every key/worth to be mapped in. Alternatively, envFrom can be utilized to map all the important thing/worth pairs into the container, utilizing the important thing names because the variables.
For the values within the influxdb-creds secret, the container spec would appear to be this:
- title: influxdb
After enhancing the deployment, Kubernetes will destroy the working pod and create a brand new one with the mapped surroundings variables. Remember, the deployment describes the desired state, so Kubernetes replaces the outdated pod with a brand new one matching that state.
You can validate that the surroundings variables are included in your deployment with kubectl describe deployment influxdb:
Environment Variables from:
influxdb-creds Secret Optional: false
Configure persistent storage for InfluxDB
A database shouldn’t be very helpful if all of its knowledge is destroyed every time the service is restarted. In the present InfluxDB deployment, the entire knowledge is saved within the container and misplaced when Kubernetes destroys and recreates pods. A PersistentVolume is required to retailer knowledge completely.
To get persistent storage in a Kubernetes cluster, a PersistentVolumeClaim (PVC) is created that describes the sort and particulars of the quantity wanted, and Kubernetes will discover a beforehand created quantity that matches the request (or create one with a dynamic quantity provisioner, if there may be one).
Unfortunately, the kubectl CLI device doesn’t have the flexibility to create PVCs instantly, however a PVC might be specified as a YAML file and created with kubectl create -f <filename>:
Create a file named pvc.yaml with a generic 2G declare:
Then, create the PVC:
kubectl create -f pvc.yaml
You can validate that the PVC was created and certain to a PersistentVolume with kubectl get pvc:
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
influxdb Bound pvc-27c7b0a7-1828-11e9-831a-0800277ca5a7 2Gi RWO normal 173m
From the output above, you may see the PVC influxdb was matched to a PV (or Volume) named pvc-27c7b0a7-1828-11e9-831a-0800277ca5a7 (your title will range) and certain (STATUS: Bound).
If your PVC doesn’t have a quantity, or the standing is one thing aside from Bound, you might want to speak to your cluster administrator. (This course of ought to work fantastic with MiniKube, MiniShift, or any cluster with dynamically provisioned volumes.)
Once a PersistentVolume has been assigned to the PVC, the quantity might be mounted into the container to offer persistent storage. Once once more, this entails enhancing the deployment, first so as to add a quantity object and second to reference that quantity throughout the container spec as a volumeMount.
Edit the deployment with kubectl edit deployment influxdb and add a .spec.template.spec.volumes part beneath the containers part (instance truncated for brevity):
- title: var-lib-influxdb
In this instance, a quantity named var-lib-influxdb is added to the deployment, which references the PVC influxdb created earlier.
Now, add a volumeMount to the container spec. The quantity mount references the quantity added earlier (title: var-lib-influxdb) and mounts the quantity to the InfluxDB knowledge listing, /var/lib/influxdb:
- mountPath: /var/lib/influxdb
The InfluxDB deployment
After the above, you must have a deployment for InfluxDB that appears one thing like this:
- mountPath: /var/lib/influxdb
- title: var-lib-influxdb
Expose InfluxDB (to the cluster solely) with a Service
By default, pods on this venture are unable to speak to 1 one other. A Kubernetes Service is required to “expose” the pod to the cluster or to the general public. In the case of InfluxDB, the pod wants to have the ability to settle for visitors on TCP port 8086 from the Grafana and cron job pods (which will probably be created later). To do that, expose (i.e., create a service for) the pod utilizing a Cluster IP. Cluster IPs can be found solely to different pods within the cluster. Do this with the kubectl expose command:
kubectl expose deployment influxdb --port=8086 --target-port=8086 --protocol=TCP --type=ClusterIP
The newly created service might be verified with the kubectl describe service command:
kubectl describe service influxdb
Port: <unset> 8086/TCP
Session Affinity: None
Some of the small print (particularly the IP addresses) will range from the instance. The “IP” is an IP handle inner to your cluster that is been assigned to the service via which different pods can talk with InfluxDB. The “Endpoints” are the container’s IP and port that is listening for connections. The service will route visitors to the inner cluster IP to the container itself.
Now that InfluxDB is ready up, transfer on to Grafana.
Set up Grafana
Grafana is an open supply venture for visualizing time-series knowledge (suppose: fairly, fairly graphs).
As with InfluxDB, the official Grafana image on DockerHub works out-of-the-box for this venture, each with Kubernetes and OKD.
Create a deployment
Just as earlier than, create a deployment primarily based on the official Grafana picture:
kubectl create deployment grafana --image=docker.io/grafana/grafana:5.three.2
There ought to now be a grafana deployment alongside the influxdb deployment:
kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
grafana 1 1 1 1 7s
influxdb 1 1 1 1 5h12m
Set up Grafana credentials and config information with secrets and techniques and ConfigMaps
Building on what you have already realized, configuring Grafana needs to be each comparable and simpler. Grafana does not require persistent storage, because it’s studying its knowledge out of the InfluxDB database. It does, nevertheless, want two configuration information to arrange a dashboard provider to load dashboards dynamically from information, the dashboard file itself, a file to attach the dashboard file to InfluxDB as an information supply, and at last a secret to retailer default login credentials.
The credentials secret works the identical because the influxdb-creds secret already created. By default, the Grafana picture appears for surroundings variables named GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD to set the admin username and password on startup. These might be no matter you want, however keep in mind them so you should use them to log into Grafana when you’ve it configured.
Create a secret named grafana-creds for the Grafana credentials with the kubectl create secret command:
kubectl create secret generic grafana-creds
Share this secret as an surroundings variable utilizing envFrom, this time within the Grafana deployment. Edit the deployment with kubectl edit deployment grafana and add the surroundings variables to the container spec:
- title: grafana
Validate that the surroundings variables have been added to the deployment with kubectl describe deployment grafana:
Environment Variables from:
grafana-creds Secret Optional: false
That’s all that is required to start out utilizing Grafana. The remainder of the configuration might be performed within the internet interface if desired, however with just some config information, Grafana might be absolutely configured when it begins.
Kubernetes ConfigMaps are just like secrets and techniques and might be consumed the identical approach by a pod, however they do not retailer the knowledge obfuscated inside Kubernetes. Config maps are helpful for including configuration information or variables into the containers in a pod.
The Grafana occasion on this venture has three config information that have to be written into the working container:
- influxdb-datasource.yml—tells Grafana the best way to speak to the InfluxDB database
- grafana-dashboard-provider.yml—tells Grafana the place to search for JSON information describing dashboards
- twittergraph-dashboard.json—describes the dashboard for displaying the Twitter knowledge collected
Kubernetes makes it simple so as to add these information: they’ll all be added to the identical config map directly, and they are often mounted to completely different places on the filesystem regardless of being in the identical config map.
If you haven’t performed so already, clone the TwitterGraph GitHub repo. These information are actually particular to this venture, so the best option to eat them is instantly from the repo (though they might actually be written manually).
From the listing with the contents of the repo, create a config map named grafana-config utilizing the kubectl create configmap command:
kubectl create configmap grafana-config
The kubectl create configmap command creates a config map named grafana-config and shops the contents as the worth for the important thing specified. The –from-file argument follows the shape –from-file=<keyname>=<pathToFile>, so on this case, the filename is getting used as the important thing for future readability.
Like secrets and techniques, particulars of a config map might be seen with kubectl describe configmap. Unlike secrets and techniques, the contents of the config map are seen within the output. Use kubectl describe configmap grafana-config to see the three information saved as keys within the config map (outcomes are truncated as a result of they’re looooooong):
kubectl describe configmap grafana-config
kubectl describe cm grafana-config
- title: 'default'
Each of the filenames needs to be saved as keys and their contents because the values (such because the grafana-dashboard-provider.yml above).
While config maps might be shared as surroundings variables (because the credential secrets and techniques have been above), the contents of this config map have to be mounted into the container as information. To do that, a quantity might be created from config map within the grafana deployment. Similar to the persistent quantity, use kubectl edit deployment grafana so as to add quantity .spec.template.spec.volumes:
Then edit the container spec to mount every of the keys saved within the config map as information of their respective places within the Grafana container. Under .spec.template.spec.containers, add a quantityMounts part for the volumes:
- title: grafana
- mountPath: /and many others/grafana/provisioning/datasources/influxdb-datasource.yml
- mountPath: /and many others/grafana/provisioning/dashboards/grafana-dashboard-provider.yml
- mountPath: /var/lib/grafana/dashboards/twittergraph-dashboard.json
The title part references the title of the config map quantity and including the subPath objects permits Kubernetes to mount every file with out overwriting the remainder of the contents of that listing. Without it, /and many others/grafana/provisioning/datasources/influxdb-datasource.yml for instance, can be the one file in /and many others/grafana/provisioning/datasources.
Each of the information might be verified by them throughout the working container utilizing the kubectl exec command. First, discover the Grafana pod’s present title. The pod may have a randomized title just like grafana-586775fcc4-s7r2z and needs to be seen when working the command kubectl get pods:
kubectl get pods
NAME READY STATUS RESTARTS AGE
grafana-586775fcc4-s7r2z 1/1 Running zero 93s
influxdb-595487b7f9-zgtvx 1/1 Running zero 18h
Substituting the title of your Grafana pod, you may confirm the contents of the influxdb-datasource.yml file, for instance (truncated for brevity):
kubectl exec -it grafana-586775fcc4-s7r2z cat /and many others/grafana/provisioning/datasources/influxdb-datasource.yml
# config file model
# checklist of datasources to insert/replace relying
# what's obtainable within the database
# <string, required> title of the datasource. Required
- title: influxdb
Expose the Grafana service
Now that it is configured, expose the Grafana service so it may be seen in a browser. Because Grafana needs to be seen from outdoors the cluster, the LoadBalancer service kind will probably be used relatively than the internal-only ClusterIP kind.
For manufacturing clusters or cloud environments that assist LoadBalancer providers, an exterior IP is dynamically provisioned when the service is created. For MiniKube or MiniShift, LoadBalancer providers can be found by way of the minikube service command, which opens your default browser to a URL and port the place the service is accessible in your host VM.
The Grafana deployment is listening on port 3000 for HTTP visitors. Expose it utilizing the LoadBalancer-type service utilizing the kubectl expose command:
kubectl expose deployment grafana --type=LoadBalancer --port=80 --target-port=3000 --protocol=TCP
After the service is uncovered, you may validate the configuration with kubectl get service grafana:
kubectl get service grafana
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana LoadBalancer 10.101.113.249 <pending> 80:31235/TCP 9m35s
As talked about above, MiniKube and MiniShift deployments is not going to routinely assign an EXTERNAL-IP and will probably be listed as <pending>. Running minikube service grafana (or minikube service grafana –namespace <namespace> in case you created your deployments in a namespace aside from Default) will open your default browser to the IP and port combo the place Grafana is uncovered in your host VM.
At this level, Grafana is configured to speak to InfluxDB and has routinely provisioned a dashboard to show the Twitter stats. Now it is time to get some precise stats and put them into the database.
Create the cron job
A Kubernetes cron job, like its namesake cron, is a option to run a job on a specific schedule. In the case of Kubernetes, the job is a job working in a container: a Kubernetes job scheduled and tracked by Kubernetes to make sure its completion.
For this venture, the cron job is a single container working a Python script to gather Twitter stats.
Create a secret for the Twitter API credentials
The cron job makes use of your Twitter API credentials to connect with the API and pull the stats from surroundings variables contained in the container. Create a secret to retailer the Twitter API credentials and the title of the account to collect the stats from (substitute your personal credentials and account title beneath):
kubectl create secret generic twitter-creds
--from-literal=TWITTER_ACCESS_SECRET=<your twitter entry secret>
--from-literal=TWITTER_ACCESS_TOKEN=<your twitter entry token>
--from-literal=TWITTER_API_KEY=<your twitter api key >
--from-literal=TWITTER_API_SECRET=<your twitter api secret>
--from-literal=TWITTER_USER=<your twitter username>
Create a cron job
Finally, it is time to create the cron job to collect statistics. Unfortunately, kubectl does not have a option to create a cron job instantly, so as soon as once more the article have to be described in a YAML file and loaded with kubectl create -f <filename>.
Create a file named cronjob.yml describing the job to run:
schedule: '*/three * * * *'
Looking over this file, the important thing items of a Kubernetes cron job are evident. The cron job spec accommodates a jobTemplate describing the Kubernetes job to run. In this case, the job consists of a single container with the Twitter and InfluxDB credentials’ secrets and techniques shared as surroundings variables utilizing the envFrom that was used within the deployments.
This job makes use of a customized picture from Docker Hub, clcollins/twittergraph:1.zero. The picture is simply Python three.6 and accommodates the app.py Python script for TwitterGraph. (If you’d relatively construct the picture your self, you may observe the directions in BUILDING.md within the GitHub repo to construct the picture with Source-To-Image.)
Wrapping the job template spec are the cron job spec choices. The most essential half, outdoors of the job itself, is arguably the schedule, set right here to run each three minutes eternally. The different essential bit is the concurrencyPolicy, which is ready to change, so if the earlier job continues to be working when it is time to begin a brand new one, the pod working the outdated job is destroyed and changed with a brand new pod.
Use the kubectl create -f cronjob.yml command to create the cron job:
kubectl create -f cronjob.yaml
The cron job can then be validated with kubectl describe cronjob twittergraph (instance truncated for brevity):
kubectl describe cronjob twitterGraph
Schedule: */three * * * *
Concurrency Policy: Replace
Starting Deadline Seconds: <unset>
Note: With a schedule set to */three * * * * , Kubernetes will not instantly begin the brand new job. It will wait three minutes for the primary interval to move. If you’d prefer to see quick outcomes, you may edit the cron job with kubectl edit cronjob twittergraph, and (briefly) change the schedule to * * * * * to run each minute. Just do not forget to vary it again once you’re performed.
That needs to be it. If you have adopted all of the steps appropriately, you should have an InfluxDB database, a cron job amassing stats out of your Twitter account, and a Grafana deployment to view the information. For manufacturing clusters or cloud deployments of Kubernetes or OpenShift, go to the LoadBalancer IP to log into Grafana utilizing the credentials you set earlier with the GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD. After logging in, choose the TwitterGraph dashboard from the Home dropdown on the top-left of the display. You ought to see one thing just like the picture beneath, with present counts to your followers, of us you might be following, standing updates, likes, and lists. It’s in all probability a bit boring at first, however in case you depart it working, over time and with extra knowledge assortment, the graphs will begin to look extra fascinating and supply extra helpful knowledge!
Where to go from right here
The knowledge collected by the TwitterGraph script is comparatively simplistic. The stats which might be collected are described within the data_points dictionary in the app.py script, however there is a ton of data available. Adding a brand new cron job that runs each day to gather the day’s exercise (variety of posts, follows, and many others.) can be a pure extension of the information. More fascinating, in all probability, can be correlating the each day knowledge assortment, e.g., what number of followers have been gained or misplaced primarily based on the variety of posts that day, and many others.