7 - Working with Metadata#
This tutorial demonstrates various options for querying and interacting with metadata about streams.
What is Metadata?#
Metadata reports high-level information about data that is in a stream, such as where it was collected or how it relates to other streams in the database. This tutorial illustrates various functions for interacting with metadata in the database.
Functions covered:#
`db.streams_in_collection()<https://btrdb.readthedocs.io/en/latest/api/conn.html#btrdb.conn.BTrDB.streams_in_collection>`__`db.collection_metadata()<https://btrdb.readthedocs.io/en/latest/api/conn.html#btrdb.conn.BTrDB.collection_metadata>`__`db.query()<https://btrdb.readthedocs.io/en/latest/api/conn.html#btrdb.conn.BTrDB.query>`__
[1]:
import btrdb
import pandas as pd
from matplotlib import pyplot as plt
[2]:
db = btrdb.connect()
About the data#
This tutorial uses a data collection POW/signatures which includes time series data from upwards of 200 events captured by digital fault recorders (DFRs) on the grid. The raw data set includes a 2-second snapshot of the raw waveform data recorded during each fault, as well as information collected after the fact to document what had caused the fault.
[3]:
collection = 'POW/signatures'
streams = db.streams_in_collection(collection)
Accessing metadata fields#
Each stream includes two types of metadata: - Tags describe required fields, such as the stream name and units - Annotations: describe optional fields which are specified by the user.
Different types of metadata can be retrieved using `stream.tags() <https://btrdb.readthedocs.io/en/latest/api/streams.html#btrdb.stream.Stream.tags>`__ and `stream.annotations() <https://btrdb.readthedocs.io/en/latest/api/streams.html#btrdb.stream.Stream.annotations>`__.
[4]:
stream = streams[1]
print('COLLECTION:', stream.collection)
print('\nTAGS:\n', stream.tags().keys())
print('\nANNOTATIONS:\n', stream.annotations()[0].keys(),)
COLLECTION: POW/signatures/event2907
TAGS:
dict_keys(['unit', 'ingress', 'distiller', 'name'])
ANNOTATIONS:
dict_keys(['Weather', 'SiteName', 'EventTime', 'Phase', 'DataSource', 'FailedEquipmentCode', 'IsolationEquipmentCode', 'EventId', 'Details', 'FeederId', 'Cause'])
You can display tags and annotations for a streamsest using the function streamset.describe(). Passing additional metadata fields as arguments to the function will add them to the table.
[5]:
streams = db.streams_in_collection(collection, tags={'unit': 'volts'})
streamset = btrdb.stream.StreamSet(streams[:10])
print(streamset.describe('Cause','Weather'))
Collection Name Unit UUID Version Cause Weather
------------------------ ------ ------ ----------- --------- --------- -------------
POW/signatures/event2907 Vb volts 9b4dda9f... 10 Lightning Major Storm
POW/signatures/event0550 Vb volts 9ff2f527... 10 Equipment Clear Weather
POW/signatures/event0287 Va volts 684fb803... 10 Vehicle Snow/Ice
POW/signatures/event3067 Va volts dafb42ba... 10 Weather Wind
POW/signatures/event3067 Vb volts 65fd8f4e... 10 Weather Wind
POW/signatures/event3042 Vb volts 7f0aed61... 10 Equipment Unknown
POW/signatures/event3042 Vc volts e009c009... 10 Equipment Unknown
POW/signatures/event2999 Vc volts 03d8bd43... 10 Tree Unknown
POW/signatures/event3152 Vb volts 0d094ffd... 10 Equipment Unknown
POW/signatures/event3152 Vc volts 47d35cde... 10 Equipment Unknown
Getting a birds eye view of metadata contents#
One can get a high-level overview of metadata fields available for streams in a given collection using `db.collection_metadata <https://btrdb.readthedocs.io/en/latest/api/conn.html#btrdb.conn.BTrDB.collection_metadata>`__.
[6]:
metadata = db.collection_metadata(collection)
metadata
[6]:
({'name': 0, 'unit': 0, 'ingress': 0, 'distiller': 0},
{'Cause': 1766,
'DataSource': 1766,
'Details': 1766,
'EventId': 1766,
'EventTime': 1766,
'FailedEquipmentCode': 1766,
'FeederId': 1766,
'IsolationEquipmentCode': 1766,
'Phase': 1766,
'SiteName': 1766,
'Weather': 1766})
Filtering on metadata contents#
Metadata can provide a useful mechanism for determining what streams are relevant to a particular analysis. Two examples of filtering streams based on metadata contents are provided below.
[7]:
# Find all voltage streams for a given event
event_name = stream.collection
print(event_name)
streams = db.streams_in_collection(event_name,
tags={'unit':'volts'})
streams
POW/signatures/event2907
[7]:
[<Stream collection=POW/signatures/event2907 name=Vb>,
<Stream collection=POW/signatures/event2907 name=Va>,
<Stream collection=POW/signatures/event2907 name=Vc>]
[8]:
# Find all fault events caused by lightning during a major storm
annotations = {'Cause': 'Lightning',
'Weather': 'Major Storm'}
streams = db.streams_in_collection(collection,
annotations=annotations)
print("%i streams selected from across %i collections"
%(len(streams), len(list(set([s.collection for s in streams])))))
63 streams selected from across 9 collections
Using SQL to query metadata#
Above, we used dictionaries to specify filtering criteria in terms of tags and annotations we wished to filter upon. You can achieve the same results using the db.query() function which allows you to pass any arbitrary SQL query. That query is applied to the database table that stores metadata, and offers much greater flexibility for specifying complex search criteria.
[9]:
##################
# Specify query
query = """SELECT uuid, name, unit from streams
WHERE collection LIKE $1 and unit='volts'"""
params = ['%%event%s'%(stream.annotations()[0]['EventId'])]
##################
# Find Streams
streams_dict = db.query(query, params)
streams = db.streams(*[s['uuid'] for s in streams_dict])
##################
# Get data!
start = min([s.time for s in streams.earliest()])
end = max([s.time for s in streams.latest()])
df = streams.filter(start=start, end=end).to_dataframe()
[10]:
df.head()
[10]:
| POW/signatures/event2907/Vb | POW/signatures/event2907/Va | POW/signatures/event2907/Vc | |
|---|---|---|---|
| time | |||
| 1154455676608999936 | -10199.030273 | 183.957260 | 10191.901367 |
| 1154455676609130240 | -10495.413086 | 718.309326 | 9938.632812 |
| 1154455676609260288 | -10756.926758 | 1270.181030 | 9659.163086 |
| 1154455676609390592 | -10974.854492 | 1857.092407 | 9370.959961 |
| 1154455676609520896 | -11123.044922 | 2461.523438 | 9082.756836 |