7 - Working with Metadata#

This tutorial demonstrates various options for querying and interacting with metadata about streams.

What is Metadata?#

Metadata reports high-level information about data that is in a stream, such as where it was collected or how it relates to other streams in the database. This tutorial illustrates various functions for interacting with metadata in the database.

Functions covered:#

[1]:
import btrdb
import pandas as pd
from matplotlib import pyplot as plt
[2]:
db = btrdb.connect()

About the data#

This tutorial uses a data collection POW/signatures which includes time series data from upwards of 200 events captured by digital fault recorders (DFRs) on the grid. The raw data set includes a 2-second snapshot of the raw waveform data recorded during each fault, as well as information collected after the fact to document what had caused the fault.

[3]:
collection = 'POW/signatures'
streams = db.streams_in_collection(collection)

Accessing metadata fields#

Each stream includes two types of metadata: - Tags describe required fields, such as the stream name and units - Annotations: describe optional fields which are specified by the user.

Different types of metadata can be retrieved using `stream.tags() <https://btrdb.readthedocs.io/en/latest/api/streams.html#btrdb.stream.Stream.tags>`__ and `stream.annotations() <https://btrdb.readthedocs.io/en/latest/api/streams.html#btrdb.stream.Stream.annotations>`__.

[4]:
stream = streams[1]

print('COLLECTION:', stream.collection)

print('\nTAGS:\n', stream.tags().keys())
print('\nANNOTATIONS:\n', stream.annotations()[0].keys(),)
COLLECTION: POW/signatures/event2907

TAGS:
 dict_keys(['unit', 'ingress', 'distiller', 'name'])

ANNOTATIONS:
 dict_keys(['Weather', 'SiteName', 'EventTime', 'Phase', 'DataSource', 'FailedEquipmentCode', 'IsolationEquipmentCode', 'EventId', 'Details', 'FeederId', 'Cause'])

You can display tags and annotations for a streamsest using the function streamset.describe(). Passing additional metadata fields as arguments to the function will add them to the table.

[5]:
streams = db.streams_in_collection(collection, tags={'unit': 'volts'})
streamset = btrdb.stream.StreamSet(streams[:10])
print(streamset.describe('Cause','Weather'))
Collection                Name    Unit    UUID           Version  Cause      Weather
------------------------  ------  ------  -----------  ---------  ---------  -------------
POW/signatures/event2907  Vb      volts   9b4dda9f...         10  Lightning  Major Storm
POW/signatures/event0550  Vb      volts   9ff2f527...         10  Equipment  Clear Weather
POW/signatures/event0287  Va      volts   684fb803...         10  Vehicle    Snow/Ice
POW/signatures/event3067  Va      volts   dafb42ba...         10  Weather    Wind
POW/signatures/event3067  Vb      volts   65fd8f4e...         10  Weather    Wind
POW/signatures/event3042  Vb      volts   7f0aed61...         10  Equipment  Unknown
POW/signatures/event3042  Vc      volts   e009c009...         10  Equipment  Unknown
POW/signatures/event2999  Vc      volts   03d8bd43...         10  Tree       Unknown
POW/signatures/event3152  Vb      volts   0d094ffd...         10  Equipment  Unknown
POW/signatures/event3152  Vc      volts   47d35cde...         10  Equipment  Unknown

Getting a birds eye view of metadata contents#

One can get a high-level overview of metadata fields available for streams in a given collection using `db.collection_metadata <https://btrdb.readthedocs.io/en/latest/api/conn.html#btrdb.conn.BTrDB.collection_metadata>`__.

[6]:
metadata = db.collection_metadata(collection)
metadata
[6]:
({'name': 0, 'unit': 0, 'ingress': 0, 'distiller': 0},
 {'Cause': 1766,
  'DataSource': 1766,
  'Details': 1766,
  'EventId': 1766,
  'EventTime': 1766,
  'FailedEquipmentCode': 1766,
  'FeederId': 1766,
  'IsolationEquipmentCode': 1766,
  'Phase': 1766,
  'SiteName': 1766,
  'Weather': 1766})

Filtering on metadata contents#

Metadata can provide a useful mechanism for determining what streams are relevant to a particular analysis. Two examples of filtering streams based on metadata contents are provided below.

[7]:
# Find all voltage streams for a given event
event_name = stream.collection
print(event_name)
streams = db.streams_in_collection(event_name,
                                   tags={'unit':'volts'})
streams
POW/signatures/event2907
[7]:
[<Stream collection=POW/signatures/event2907 name=Vb>,
 <Stream collection=POW/signatures/event2907 name=Va>,
 <Stream collection=POW/signatures/event2907 name=Vc>]
[8]:
# Find all fault events caused by lightning during a major storm
annotations = {'Cause': 'Lightning',
              'Weather': 'Major Storm'}
streams = db.streams_in_collection(collection,
                                   annotations=annotations)

print("%i streams selected from across %i collections"
      %(len(streams), len(list(set([s.collection for s in streams])))))
63 streams selected from across 9 collections

Using SQL to query metadata#

Above, we used dictionaries to specify filtering criteria in terms of tags and annotations we wished to filter upon. You can achieve the same results using the db.query() function which allows you to pass any arbitrary SQL query. That query is applied to the database table that stores metadata, and offers much greater flexibility for specifying complex search criteria.

[9]:
##################
# Specify query
query = """SELECT uuid, name, unit from streams
            WHERE collection LIKE $1 and unit='volts'"""

params = ['%%event%s'%(stream.annotations()[0]['EventId'])]

##################
# Find Streams
streams_dict = db.query(query, params)
streams = db.streams(*[s['uuid'] for s in streams_dict])


##################
# Get data!
start = min([s.time for s in streams.earliest()])
end = max([s.time for s in streams.latest()])
df = streams.filter(start=start, end=end).to_dataframe()
[10]:
df.head()
[10]:
POW/signatures/event2907/Vb POW/signatures/event2907/Va POW/signatures/event2907/Vc
time
1154455676608999936 -10199.030273 183.957260 10191.901367
1154455676609130240 -10495.413086 718.309326 9938.632812
1154455676609260288 -10756.926758 1270.181030 9659.163086
1154455676609390592 -10974.854492 1857.092407 9370.959961
1154455676609520896 -11123.044922 2461.523438 9082.756836