9 - Visualizing Aggregates#

This tutorial brings together concepts from previous posts around data visualization, statpoints, and windows queries. The visualizations provided below offer a powerful tool for visualizing arbitrarily long time series of data, while also highlighting specific regions in the data that may be of interest.

We also provide two helper functions which may be useful for other user-developed code. The first (points_to_dataframe) takes a list of StatPoint objects returned from a btrdb windows queries and converts it into a Pandas dataframe, where the columns are StatPoint attributes (i.e., min, max, mean, standard deviation and count). The second helper function (plot_aggregates) uses this Pandas dataframe to generate a plot showing the range and distribution of values over time.

[1]:
import btrdb
import pandas as pd
import numpy as np

import time


from datetime import datetime, timedelta

from matplotlib import pyplot as plt
from btrdb.utils import timez

db = btrdb.connect()

Select data#

[2]:
streams = db.streams_in_collection('sunshine/PMU1', tags={'unit': 'volts'})

pd.DataFrame([[s.name, s.unit, s.collection] for s in streams],
            columns=['name','unit','collection'])
[2]:
name unit collection
0 L3MAG volts sunshine/PMU1
1 L1MAG volts sunshine/PMU1
2 L2MAG volts sunshine/PMU1

Determine time interval#

Below, we use stream.earliest() and stream.latest() to determine the time interval spanned by the data.

[3]:
stream = db.stream_from_uuid(streams[1].uuid)

def get_time(stream, func):
    return timez.ns_to_datetime(getattr(stream, func)()[0].time)

print('start:', get_time(streams[0], 'earliest'))
print('end:', get_time(streams[0], 'latest'))
print(str(get_time(streams[0], 'latest') - get_time(streams[0], 'earliest')))
start: 2015-10-01 16:08:24.008333+00:00
end: 2017-04-15 01:41:35.999999+00:00
561 days, 9:33:11.991666

See it in the plotter: https://plot.ni4ai.org/permalink/VKne4LTTl

Choose time interval#

Here, we select a time interval of one year.

[4]:
start_time = datetime(2016,4,1)
end_time = datetime(2017,4,1)

start_ns = timez.datetime_to_ns(start_time)
end_ns = timez.datetime_to_ns(end_time)

https://plot.ni4ai.org/permalink/2KN5iCXw5

[5]:
window = timez.ns_delta(days=30)
pw = int(np.log2(window))

points, _ = zip(*stream.aligned_windows(start_ns, end_ns, pointwidth=pw))
points
[5]:
(StatPoint(1459166279268040704, 6825.37109375, 7157.580867829119, 7301.88525390625, 269591739, 36.51418677591111),
 StatPoint(1461418079081725952, 6580.9541015625, 7161.23304578515, 7300.8623046875, 269692154, 35.61453444076186),
 StatPoint(1463669878895411200, 6796.833984375, 7160.458736401789, 7286.55126953125, 147383982, 34.1646855496053),
 StatPoint(1465921678709096448, 6964.91455078125, 7159.378036144785, 7325.37158203125, 206497053, 40.14920297145494),
 StatPoint(1468173478522781696, 5558.57421875, 7167.452925690963, 7294.76318359375, 269810992, 39.238088874956624),
 StatPoint(1470425278336466944, 5780.056640625, 7161.174368865373, 7307.212890625, 268549058, 41.321696619794494),
 StatPoint(1472677078150152192, 6392.52099609375, 7161.894405845441, 7318.228515625, 268988146, 39.96141042153979),
 StatPoint(1474928877963837440, 5955.3212890625, 7158.670968887105, 7300.72509765625, 255962541, 39.97700500308349),
 StatPoint(1477180677777522688, 5097.31298828125, 7156.974233908252, 7283.45361328125, 268787149, 36.33054088798969),
 StatPoint(1479432477591207936, 6591.23486328125, 7158.855179375172, 7281.74755859375, 270215978, 33.48565115468457),
 StatPoint(1481684277404893184, 6501.1142578125, 7162.591062269605, 7282.77734375, 270216211, 34.53212742881303),
 StatPoint(1483936077218578432, 6076.43017578125, 7162.1296525714415, 7262.462890625, 270089894, 35.17972671790351),
 StatPoint(1486187877032263680, 6127.64404296875, 7153.473267831473, 7292.42724609375, 234816262, 37.10013475124324),
 StatPoint(1488439676845948928, 5907.42041015625, 7160.925875368738, 7297.984375, 270196174, 34.90643606628646))

Convert to pandas dataframe#

[6]:
def points_to_dataframe(points,
                        aggregates=['time','min','max','mean','stddev','count'],
                        use_datetime_index=True):
    df = pd.DataFrame([[getattr(p, agg) for agg in aggregates] for p in points],
                         columns=aggregates)
    if use_datetime_index:
        df['datetime'] = [timez.ns_to_datetime(t) for t in df.time]
        df = df.set_index('datetime')
    return df

df = points_to_dataframe(points)
df
[6]:
time min max mean stddev count
datetime
2016-03-28 11:57:59.268041+00:00 1459166279268040704 6825.371094 7301.885254 7157.580868 36.514187 269591739
2016-04-23 13:27:59.081726+00:00 1461418079081725952 6580.954102 7300.862305 7161.233046 35.614534 269692154
2016-05-19 14:57:58.895411+00:00 1463669878895411200 6796.833984 7286.551270 7160.458736 34.164686 147383982
2016-06-14 16:27:58.709096+00:00 1465921678709096448 6964.914551 7325.371582 7159.378036 40.149203 206497053
2016-07-10 17:57:58.522782+00:00 1468173478522781696 5558.574219 7294.763184 7167.452926 39.238089 269810992
2016-08-05 19:27:58.336467+00:00 1470425278336466944 5780.056641 7307.212891 7161.174369 41.321697 268549058
2016-08-31 20:57:58.150152+00:00 1472677078150152192 6392.520996 7318.228516 7161.894406 39.961410 268988146
2016-09-26 22:27:57.963837+00:00 1474928877963837440 5955.321289 7300.725098 7158.670969 39.977005 255962541
2016-10-22 23:57:57.777523+00:00 1477180677777522688 5097.312988 7283.453613 7156.974234 36.330541 268787149
2016-11-18 01:27:57.591208+00:00 1479432477591207936 6591.234863 7281.747559 7158.855179 33.485651 270215978
2016-12-14 02:57:57.404893+00:00 1481684277404893184 6501.114258 7282.777344 7162.591062 34.532127 270216211
2017-01-09 04:27:57.218578+00:00 1483936077218578432 6076.430176 7262.462891 7162.129653 35.179727 270089894
2017-02-04 05:57:57.032264+00:00 1486187877032263680 6127.644043 7292.427246 7153.473268 37.100135 234816262
2017-03-02 07:27:56.845949+00:00 1488439676845948928 5907.420410 7297.984375 7160.925875 34.906436 270196174

Define data viz helper function#

[7]:
def plot_aggregates(df, vlines=[], hlines=[]):
    fig, ax = plt.subplots(figsize=(15,3))
    df['min'].plot(ax=ax, ls=' ', marker='_', color='black', markersize=5, label='minimum')
    df['max'].plot(ax=ax, ls=' ', marker='_', color='black', markersize=5, label='maximum')
    df['mean'].plot(ax=ax, label='average', ls=' ', marker='.')
    ax.fill_between(df.index, df['mean']-df['stddev'], df['mean'] + df['stddev'], alpha=0.5, label=r'$+/- 1\times\sigma$')
    plt.legend()

    ax.vlines(vlines, *ax.get_ylim(), color='0.5', alpha=0.5, zorder=10, lw=3, label='events')
    ax.hlines(hlines, *ax.get_xlim(), color='0.5', zorder=10, lw=1, ls='--', label='threshold')
    return fig

plot_aggregates(df)
plt.show()
../_images/tutorials_plotaggregates_14_0.png

Visualize aggregates at different time-resolutions#

Weekly#

[8]:
window = timez.ns_delta(days=7)
pw = int(np.log2(window))

points, _ = zip(*stream.aligned_windows(start_ns, end_ns, pointwidth=pw))
df = points_to_dataframe(points)
fig = plot_aggregates(df)
../_images/tutorials_plotaggregates_17_0.png

Daily#

[9]:
window = timez.ns_delta(days=1)
pw = int(np.log2(window))

points, _ = zip(*stream.aligned_windows(start_ns, end_ns, pointwidth=pw))
df = points_to_dataframe(points)
fig = plot_aggregates(df)
../_images/tutorials_plotaggregates_19_0.png
[ ]: