Skip to content

susanxhuynh/dcos-metrics

 
 

Repository files navigation

dcos-metrics

Build Status Go Report Card

Note: This project is a work-in-progress. We're currently aiming to ship a completed service, with integration hooks, as part of DC/OS 1.10. Community help is welcome and appreciated!

  1. Overview
  2. How this repo is organized
  3. Getting Started
  4. Documentation
  5. Community
  6. Contributing
  7. License
  8. Acknowledgements

Overview

I want to...

  • emit metrics from a Mesos container: You should check for STATSD_UDP_HOST and STATSD_UDP_PORT in your application environment, then send statsd-formatted metrics to that endpoint when it's available. You may emit your own tags using the dogstatsd tag format, and they'll automatically be translated into avro-formatted tags! (see also: example code)
  • emit metrics from a system process on the agents: You should send avro-formatted metrics to the Collector process at 127.0.0.1:8124. (see also: avro schema, example code)
  • collect and process emitted metrics: See Quick Start above. Take a look at the available Kafka Consumers, and see if your format already exists. If it doesn't, a new Consumer is very easy. (see also: avro schema)
  • develop parts of the metrics stack: You can run the whole stack on your local system, no Mesos Agent required! To get started, take a look at the local stack launcher scripts.

How this repo is organized

  • module: C++ code for the mesos-agent module. This module is installed by default on DC/OS EE 1.7+, with updated output support added as of EE 1.8+.
    • Input: Accepts data produced by Mesos containers on the agent. All Mesos containers are given a unique StatsD endpoint, advertised via STATSD_UDP_HOST/STATSD_UDP_PORT environment variables. The module then tags and forwards upstream any metrics sent to that endpoint. (EE 1.7+)
    • Output formats:
      • Avro metrics sent to a local Collector process on TCP port 8124 (EE 1.8+)
      • StatsD to metrics.marathon.mesos with tags added via key prefixes or datadog tags (EE 1.7 only, disabled in EE 1.8),
  • collector: A Marathon process which runs on every agent node.
    • Inputs:
      • Listens on TCP port 8124 for Avro-formatted metrics from the mesos-agent module, as well as any other processes on the system.
      • Polls the local Mesos agent for additional information:
        • /containers is polled to retrieve per-container resource usage stats (this was briefly done in the Mesos module via the Oversubscription module interface). Similarly /metrics/snapshot is also polled for system-level information.
        • /state is polled to determine the local agent_id and to get a mapping of framework_id to framework_name. These are then used to populate agent_id on all outgoing metrics, and framework_name for metrics that have a framework_id (i.e. all metrics emitted by containers).
    • Output: Data is collated into topics and forwarded to a configured Kafka instance (default kafka).
  • consumer: Kafka Consumer implementations which fetch Avro-formatted metrics and do something with them (print to stdout, write to a database, etc). By default the Consumers will consume from all topics which match the regex pattern metrics-.*. This expression can be customized, or alternately a single specific topic can be specified for consumption.
  • examples: Reference implementations of programs which integrate with the metrics stack:
    • collector-emitter: A reference for DC/OS system processes which emit metrics. Sends some Avro metrics data to a local Collector process.
    • local-stack: Helper scripts for running a full metrics stack on a dev machine. Feeds stats into itself and prints them at the end. Requires a running copy of Zookeeper (reqd by Kafka).
    • statsd-emitter: A reference for mesos tasks which emit metrics. Sends some StatsD metrics to the STATSD_UDP_HOST/STATSD_UDP_PORT endpoint advertised by the mesos-agent module.
  • schema: Avro schemas shared by most everybody that processes metrics (agent module, collector, collector clients, kafka consumers). The exception is containerized processes which only need know how to emit StatsD data.

Getting Started

First, get a 1.8 EE cluster with at least 3 private nodes (minimum for default Kafka), then install the following:

  1. Install Kafka: dcos package install kafka or install via the Universe UI - Note: stock settings are plenty to start with, but for production use consider increasing the default number of partitions (num.partitions) and replication factor (default.replication.factor).
  2. Run a Metrics Collector on every node: use provided marathon jsons.
  3. One or more Metrics Consumers: see example marathon jsons for each consumer type, edit output settings as needed before launching

Documentation

architecture diagram

Community

This project is one component of the larger DC/OS community.

Contributing

We love contributions! There's more than one way to give back, from code to documentation and examples. To ensure we have a chance to keep up with community contributions, please follow the guidelines in CONTRIBUTING.md.

License

DC/OS, along with this project, are both open source software released under the Apache Software License, Version 2.0.

Acknowledgements

About

Make metrics accessible.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 59.6%
  • Go 26.6%
  • Java 8.1%
  • CMake 3.2%
  • Shell 2.2%
  • Python 0.2%
  • Other 0.1%