DataTables

DataTables are one of the central features of the ursa library, providing a simple interface for constructing live-updating datasets which can be loaded directly from a remote (or local) redis cluster in milliseconds.

What is a Data Table?

A DataTable, in essence, is an unordered collection of TimeSeries. In addition to this, DataTable objects have some specific attributes for determining how frequently and how far back, data from these tables should be cached.

Data Table Caching

DataTables are automatically updated and cached in either the remote redis cluster associated with your instance, or on your local machines as a cluster replication. This allows the DataTable to be loaded into python at near-instantaneous speeds for analytics, visualization, machine-learning inference, and much more.

Dataset

The DataTable, once created, is being constantly updated by the Source-specific data interfaces associated with each TimeSeries object in DataTable.timeseries. To access the data associated with a dataset, you simply call on the DataTable.dataset.

Data can be served as a polars or pandas dataframe:


# Import required objects
from ursa_sync.backend import TimeSeries 
from ursa_sync import session 

# Open a session with the database
db = session() 

# Create a new DataTable object
DT = DataTable.create(
    session=db,
    name = "Test DataTable",    # The name of the object
    active = True,              # Whether or not to actively update the table in memory
    cache_n_rows_local=1000,    # How many rows to actively maintain in memory
    sync_frequency_local=60,    # How often synchronize 
    timeseries = TimeSeries.load_by(
        ['frequency'], 
        ['1m'], 
        db
    )                           # Attaches all TimeSeries objects with a frequency of 1m 
)

DF1 = DT.dataset() # Defaults to return a pandas dataframe
DF2 = DT.dataset(format='polars')

print(DF1)


Out: 
                      1       10       11  ...        7        8        9
1733282820000  95680.00  3666.98  3673.36  ...   2506.0  3666.98  3673.77
1733282880000  95764.29  3672.42  3674.57  ...   1989.0  3672.51  3674.57
1733282940000  95772.00  3672.47  3672.48  ...   1680.0  3674.56  3675.80
1733283000000  95759.87  3672.47  3674.99  ...   1407.0  3672.48  3675.35
1733283060000  95794.99  3670.85  3670.86  ...   2440.0  3675.00  3675.71
...                 ...      ...      ...  ...      ...      ...      ...
1733342580000  97319.00  3834.40  3835.27  ...   6053.0  3840.59  3841.60
1733342640000  97260.00  3830.01  3833.64  ...   8611.0  3836.20  3836.92
1733342700000  97132.01  3831.24  3837.58  ...   5010.0  3833.64  3837.58
1733342760000  97253.61  3836.41  3840.59  ...   3627.0  3837.57  3842.17
1733342820000  97273.44  3840.59  3849.30  ...  14018.0  3840.59  3849.80

[1001 rows x 21 columns]


# We can also load it in as a polars dataframe:

print(DF2)

Out:
shape: (1_001, 22)
┌──────────┬─────────┬─────────┬───────────┬───┬─────────┬─────────┬─────────┬───────────┐
 1         10       11       12           7        8        9        timestamp 
 ---       ---      ---      ---           ---      ---      ---      ---       
 f64       f64      f64      f64           f64      f64      f64      i64       
╞══════════╪═════════╪═════════╪═══════════╪═══╪═════════╪═════════╪═════════╪═══════════╡
 95772.0   3672.47  3672.48  1.5388e6     1680.0   3674.56  3675.8   173328294 
                                                                      0000      
 95759.87  3672.47  3674.99  995444.72    1407.0   3672.48  3675.35  173328300 
                             8079                                     0000      
 95794.99  3670.85  3670.86  1.1683e6     2440.0   3675.0   3675.71  173328306 
                                                                      0000      
 95780.6   3669.79  3670.5   882341.47    1696.0   3670.56  3672.44  173328312 
                             1883                                     0000      
 95800.0   3670.8   3670.8   598944.22    2011.0   3672.19  3673.49  173328318 
                             3832                                     0000      
                                                                       
 97132.01  3831.24  3837.58  2.7786e6     5010.0   3833.64  3837.58  173334270 
                                                                      0000      
 97253.61  3836.41  3840.59  4.4486e6     3627.0   3837.57  3842.17  173334276 
                                                                      0000      
 97273.44  3840.59  3848.61  3.4978e6     18532.0  3840.59  3849.99  173334282 
                                                                      0000      
 97625.96  3847.31  3848.26  3.2079e6     12344.0  3848.99  3849.2   173334288 
                                                                      0000      
 97625.96  3847.31  3848.22  3.2191e6     12677.0  3848.99  3849.2   173334294 
                                                                      0000      
└──────────┴─────────┴─────────┴───────────┴───┴─────────┴─────────┴─────────┴───────────

Both representations of the dataset are served live, with the latest entry recieved and cached from the stream being the most recent timestamp associated with the dataset.

pandas datasets are returned as DataFrames with a Unix millisecond timestamp (as integer) index.

polars dataframes do not have an index column, and are therefore returned with an additional column called “timestamp” containing the timestamp information as an integer-casted millisecond Unix timestamp.

Column names correspond directly to the TimeSeries IDs corresponding to that column’s data.

For pandas-specific information you can find details in their documentation here. Polars docs can be found here.