NOTES.md 2.8 KB

Helpful Notes on Tutela Documentation

Links that we found helpful in developing and deploying Tutela on EC2. Tutela uses a PostgreSQL database to store clusters.

Usage

Add code to your python path:

source init_env.sh

Logs

[10.28.21] Added Redis to cache queries, so we don't need to continually bug firebase. However, this is only a bandaid as this does not support fast querying.

[10.29.21] Switching to an SQL model for fast queries. Removes need for firebase and Redis.

[11.9.21] Important to keep uploading to db vs creating clusters files as separate. We also assume that the graph will be small enough to fit into memory. Some doubt here.

Computing clusters

Here is the procedure of steps to compute "deposit address reuse" clusters. This involves processing CSVs of > 1 Tb, so care must be taken to not exceed constraints on storage and RAM.

  1. Download raw block and transaction data using scripts/dl_bucket.py.
  2. Sort via external merge sort (see scripts/sort_big_csv.py).
  3. Run scripts/run_deposit.py to generate data.csv and metadata.csv.
  4. Prune data.csv -> data-pruned.csv using scripts/prune_data.csv.
  5. Prune metadata.csv -> metadata-pruned.csv using scripts/metadata-pruned.csv.
  6. Run scripts/run_nx.py to generate user_clusters.json and exchange_clusters.json.
  7. Run combine_metadata.py to generate metadata-final.csv.