Skip to content

Latest commit

 

History

History
238 lines (178 loc) · 6.32 KB

README.md

File metadata and controls

238 lines (178 loc) · 6.32 KB

Overview

Port Twitter's Ostrich library from Scala to .NET

License

Apache 2.0

Goals

  • Collect runtime performance statistics on an individual machine
  • View the data in real-time
  • Simple and straightforward
  • Low performance overhead
  • Readable via HTTP

Non-goals

  • Persistence
  • Cluster wide graphing (although Cacti/Munin/Ganglia/Carbon can consume the output - there is a sample CarbonWriter included)

Dependencies

  • log4net
  • JSON.NET
  • nunit

For ease of deployment, two Apache 2.0 licensed libraries have files embedded in the assembly under a different namespace - MiscUtil and Kayak HTTP Server.

Running

To enable the diagnostics library within an application, you need to add the following lines at startup:

var diagnosticsService = new HttpDiagnosticsService();
service.Start();

This will start a socket on the specified port that listens for HTTP requests. The performance library contains a small embedded web server to listen for requests. If something goes wrong in the startup of the service, it will not thrown an exception that will take your application down. Startup exceptions are logged, but not thrown. You should also dispose the diagnostics service at shutdown using Dispose().

If you are wanting to run the service in a web application, then integration can we setup through web.config. You can add the following sections to web.config:

Integrated Pipeline:

<appSettings>
     ... <!-- Some more information at the bottom of this page about which port number to use -->
     <add key="OstrichNet.Sercice.HttpDiagnosticsService.port" value="7006" />
 </appSettings>
 
 <system.webServer> 
    ... 
   <modules>
     ...
     <add name="DiagnosticsModule preCondition="managedHandler" type="OstrichNet.Service.DiagnosticsHttpModule, OstrichNet" />
   </modules>
 </system.webserver>

Classic Pipeline:

<appSettings>
    ... <!-- Some more information at the bottom of this page about which port number to use -->
    <add key="OstrichNet.Service.HttpDiagnosticsService.port" value="7006" />
</appSettings>

<system.web>
    ...
    <httpModules>
        ...
        <add name="DiagnosticsModule" type="OstrichNet.Service.DiagnosticsHttpModule, OstrichNet" />
    </httpModules>
</system.web>

Counters, Metrics, and Gauges

Reference the Ostrich docs, but here are some code examples in C#:

Counters

Stats.Incr("counter_name");

Metrics

Stopwatch stopwatch = new Stopwatch();
stopwatch.start();
...code to time...
Stats.Time("db_call", stopwatch);
Stats.Time("db_call"), () => { ... });  
var dbRows = Stats.Time("db_call"), () => { ... });  

Gauges

Stats.Gauge(current_temperature, () => tempMonitor.CalculateTemp());

Viewing

Server Info

curl http://127.0.0.1:9999/server_info
{"machine":"SERVER_NAME","process":"ProcessName","app_name":"ProcessName, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null","start_time":"2011-08-19T02:32:59Z","uptime":"92878085.125"}

We can make this easier on the eyes by adding the pretty parameter:

curl http://127.0.0.1:9999/server_info?pretty=true
{
  "machine": "SERVER_NAME",
  "process": "ProcessName",
  "app_name": "ProcessName, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null",
  "start_time": "2011-03-24T12:31:01Z",
  "uptime": "319452.0508"
}

Or by adding .txt to the end of the URL:

curl http://127.0.0.1:9999/server_info.txt
server_info:
  machine: SERVER_NAME
  process: ProcessName
  app_name: ProcessName, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
  start_time: 2011-03-24T12:31:01Z
  uptime: 409848.1

Stats

Stats allow you view the output of counters, metrics, and gauges.

curl http://127.0.0.1:9999/stats/
{
  "metrics": {
    "process_data": {
      "average": 0.852127284825243,
      "count": 25549,
      "max": 162.0,
      "min": 0.0,
      "p0": 0,
      "p25": 0,
      "p50": 0,
      "p75": 1,
      "p9": 2,
      "p99": 6,
      "p999": 30,
      "p9999": 30,
      "standard_deviation": 0.0053312127955066554
    },
    "process_data2": {
      "average": 0.84910756223578809,
      "count": 25548,
      "max": 162.0,
      "min": 0.0,
      "p0": 0,
      "p25": 0,
      "p50": 0,
      "p75": 1,
      "p9": 2,
      "p99": 6,
      "p999": 30,
      "p9999": 30,
      "standard_deviation": 0.0053124243024031473
    }
  },
  "counters": {
    "process_data": 25549
  },
  "gauges": {
    "cpu_user_time": 2.2254095077514648,
    "clr_time_in_gc": 0.16423256695270538,
    "clr_heap_bytes": 291951040.0
  }
}

Counters and gauges should be self-explanatory. For metrics, the stats contain the average, min, max, count, standard deviation, and percentiles for an individual metric.

Reports

Reports are a simple HTML view of the raw stats and metrics.

curl http://127.0.0.1:9999/report/

Graphs

http://127.0.0.1:9999/graph/

Graphs allow you to view the last hour of counters, measures, and gauges. Visiting the graph page will return an a list of available items that are being graphed.

FAQ

What's the performance overhead?

Collector output writes asynchronously in a background thread every minute. dotTrace showed that adding metrics to resource intensive application added about .2% overhead which was much faster than the performance counters they replaced.

What shouldn't you do?

Each counter/metric/gauge uses O(1) memory no matter how long the code is running or how many times they are mutated. The default collector process is an O(N) operation, so exploding the total number of metrics/counters by doing something like appending customer key to every counter name would be a bad idea. Sampling and tracing would be a more appropriate use case for trying to monitor performance at that level. See Google's Dapper (http://research.google.com/pubs/pub36356.html).

Is this safe to run inside of IIS?

The service embeds inside of IIS on a daemon thread. It has succesfully been used with IIS 6.0 and IIS 7.0 without any issues.

If you are hosting multiple worker processes on the same machine, you'll find out that the library will have trouble binding to the same port twice from different processes. It will attempt to bind to one port number higher than the configured value in that instance.