DataTables
DataTables are one of the central features of the ursa library, providing a simple interface for constructing live-updating datasets which can be loaded directly from a remote (or local) redis cluster in milliseconds.
What is a Data Table?
A DataTable, in essence, is an unordered collection of TimeSeries. In addition to this, DataTable objects have some specific attributes for determining how frequently and how far back, data from these tables should be cached.
Data Table Caching
DataTables are automatically updated and cached in either the remote redis cluster associated with your instance, or on your local machines as a cluster replication. This allows the DataTable to be loaded into python at near-instantaneous speeds for analytics, visualization, machine-learning inference, and much more.
Dataset
The DataTable, once created, is being constantly updated by the Source-specific data interfaces associated with each TimeSeries object in DataTable.timeseries. To access the data associated with a dataset, you simply call on the DataTable.dataset.
Data can be served as a polars or pandas dataframe:
# Import required objects
from ursa_sync.backend import TimeSeries
from ursa_sync import session
# Open a session with the database
db = session()
# Create a new DataTable object
DT = DataTable.create(
session=db,
name = "Test DataTable", # The name of the object
active = True, # Whether or not to actively update the table in memory
cache_n_rows_local=1000, # How many rows to actively maintain in memory
sync_frequency_local=60, # How often synchronize
timeseries = TimeSeries.load_by(
['frequency'],
['1m'],
db
) # Attaches all TimeSeries objects with a frequency of 1m
)
DF1 = DT.dataset() # Defaults to return a pandas dataframe
DF2 = DT.dataset(format='polars')
print(DF1)
Out:
1 10 11 ... 7 8 9
1733282820000 95680.00 3666.98 3673.36 ... 2506.0 3666.98 3673.77
1733282880000 95764.29 3672.42 3674.57 ... 1989.0 3672.51 3674.57
1733282940000 95772.00 3672.47 3672.48 ... 1680.0 3674.56 3675.80
1733283000000 95759.87 3672.47 3674.99 ... 1407.0 3672.48 3675.35
1733283060000 95794.99 3670.85 3670.86 ... 2440.0 3675.00 3675.71
... ... ... ... ... ... ... ...
1733342580000 97319.00 3834.40 3835.27 ... 6053.0 3840.59 3841.60
1733342640000 97260.00 3830.01 3833.64 ... 8611.0 3836.20 3836.92
1733342700000 97132.01 3831.24 3837.58 ... 5010.0 3833.64 3837.58
1733342760000 97253.61 3836.41 3840.59 ... 3627.0 3837.57 3842.17
1733342820000 97273.44 3840.59 3849.30 ... 14018.0 3840.59 3849.80
[1001 rows x 21 columns]
# We can also load it in as a polars dataframe:
print(DF2)
Out:
shape: (1_001, 22)
┌──────────┬─────────┬─────────┬───────────┬───┬─────────┬─────────┬─────────┬───────────┐
│ 1 ┆ 10 ┆ 11 ┆ 12 ┆ … ┆ 7 ┆ 8 ┆ 9 ┆ timestamp │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ i64 │
╞══════════╪═════════╪═════════╪═══════════╪═══╪═════════╪═════════╪═════════╪═══════════╡
│ 95772.0 ┆ 3672.47 ┆ 3672.48 ┆ 1.5388e6 ┆ … ┆ 1680.0 ┆ 3674.56 ┆ 3675.8 ┆ 173328294 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 0000 │
│ 95759.87 ┆ 3672.47 ┆ 3674.99 ┆ 995444.72 ┆ … ┆ 1407.0 ┆ 3672.48 ┆ 3675.35 ┆ 173328300 │
│ ┆ ┆ ┆ 8079 ┆ ┆ ┆ ┆ ┆ 0000 │
│ 95794.99 ┆ 3670.85 ┆ 3670.86 ┆ 1.1683e6 ┆ … ┆ 2440.0 ┆ 3675.0 ┆ 3675.71 ┆ 173328306 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 0000 │
│ 95780.6 ┆ 3669.79 ┆ 3670.5 ┆ 882341.47 ┆ … ┆ 1696.0 ┆ 3670.56 ┆ 3672.44 ┆ 173328312 │
│ ┆ ┆ ┆ 1883 ┆ ┆ ┆ ┆ ┆ 0000 │
│ 95800.0 ┆ 3670.8 ┆ 3670.8 ┆ 598944.22 ┆ … ┆ 2011.0 ┆ 3672.19 ┆ 3673.49 ┆ 173328318 │
│ ┆ ┆ ┆ 3832 ┆ ┆ ┆ ┆ ┆ 0000 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 97132.01 ┆ 3831.24 ┆ 3837.58 ┆ 2.7786e6 ┆ … ┆ 5010.0 ┆ 3833.64 ┆ 3837.58 ┆ 173334270 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 0000 │
│ 97253.61 ┆ 3836.41 ┆ 3840.59 ┆ 4.4486e6 ┆ … ┆ 3627.0 ┆ 3837.57 ┆ 3842.17 ┆ 173334276 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 0000 │
│ 97273.44 ┆ 3840.59 ┆ 3848.61 ┆ 3.4978e6 ┆ … ┆ 18532.0 ┆ 3840.59 ┆ 3849.99 ┆ 173334282 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 0000 │
│ 97625.96 ┆ 3847.31 ┆ 3848.26 ┆ 3.2079e6 ┆ … ┆ 12344.0 ┆ 3848.99 ┆ 3849.2 ┆ 173334288 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 0000 │
│ 97625.96 ┆ 3847.31 ┆ 3848.22 ┆ 3.2191e6 ┆ … ┆ 12677.0 ┆ 3848.99 ┆ 3849.2 ┆ 173334294 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 0000 │
└──────────┴─────────┴─────────┴───────────┴───┴─────────┴─────────┴─────────┴───────────
Both representations of the dataset are served live, with the latest entry recieved and cached from the stream being the most recent timestamp associated with the dataset.
pandas datasets are returned as DataFrames with a Unix millisecond timestamp (as integer) index.
polars dataframes do not have an index column, and are therefore returned with an additional column called “timestamp” containing the timestamp information as an integer-casted millisecond Unix timestamp.
Column names correspond directly to the TimeSeries IDs corresponding to that column’s data.
For pandas-specific information you can find details in their documentation here. Polars docs can be found here.