Kaili Wang 5a33bfbf48 fix anon score (#67) | 2 years ago | |
---|---|---|
data | 2 years ago | |
live | 2 years ago | |
scripts | 2 years ago | |
src | 2 years ago | |
webapp | 2 years ago | |
whitepaper | 2 years ago | |
.gitignore | 2 years ago | |
NOTES.md | 3 years ago | |
README.md | 2 years ago | |
appspec.yml | 3 years ago | |
init_env.sh | 3 years ago | |
requirements.txt | 3 years ago |
The repo contains open-source code for Tutela, an anonymity tool for Ethereum and Tornado Cash users. For a more technical description, please refer to the public whitepaper.
In response to the Tornado Cash (TC) Anonymity Research Tools Grant, we have built Tutela, an Ethereum wallet anonymity detection tool, to tell you if your blockchain transactions have revealed anything about your identity. What does this mean? Well, for example, if you have used multiple Ethereum wallets to send tokens to a single centralized exchange deposit address, you may have revealed that your wallets are owned by the same entity.
We'd love to get user feedback! Tell us what you like, what you don’t and what you think is missing! Please leave your feedback in the Tutela-Product-Feedback channel of the Tornado Cash Discord.
Tornado cash users have multiple addresses and use Tornado Cash to hide this fact. We believe the most important need for this user base is to know whether their addresses can already be connected by third parties. Conversely, for Tornado Cash, compromised transactions could reduce the size of the anonymity set for each token pool.
In response, Tutela has focused on informing users which of their Ethereum addresses are "affiliated" (a non-blockchain analogy would be haveibeenpwned.com) by parsing the Ethereum graph of transaction. This involves two Ethereum-wide heuristics and five Tornado Cash -specific heuristics (i.e. reveals) that investigate transactions in and out of Tornado Cash pools.
Across all of Ethereum, we would like to cluster together addresses that likely belong to the same entity.
When you send tokens from an Ethereum wallet to your account at a centralized exchange, the exchange creates a unique deposit address for each customer. If you reuse the same deposit address by sending tokens from multiple Ethereum wallets to it, your two wallets can be linked. Even if you send tokens from multiple wallets to multiple deposits, all of these addresses can be linked. In this way, it is possible to build a complex graph of address relationships.
Every Ethereum address is mapped to a point in a high dimensional vector space using a machine Learning algorithm. For every Ethereum entity, the goal of the vector embedding is to summarize which addresses this entity interacts with the most. This is done by creating a large Ethereum graph where nodes represent addresses and edges represent transactions -- for every node, a local subgraph is created through random walks, which is then featurized and put through Word2Vec.
If we focus on users interacting with Tornado Cash pools, we can apply different heuristics to link together deposit and withdraw transactions.
If a deposit transaction and a withdrawal transaction to a specific Tornado Cash pool share the same wallet address, then we assume the address is compromised (e.g. they may be a yield miner who does not care about anonymity), and should not add to the anonymity of future Tornado Cash transactions for that pool.
Pre EIP-1559 Ethereum transactions contained a gas price. Users can set their wallet gas fee and pay a very specific gas fee (e.g. 147.4535436 Gwei) when they deposit in a Tornado Cash pool. If they also withdraw from that same Tornado cash pool, using the same wallet application (e.g. Metamask), but a different wallet address and haven’t changed the gas fee, it could reveal that two addresses are connected.
If we observe a single address depositing multiple times to several pools and then a second address withdrawing the exact same amount from the same pools, it is likely that the two addresses belong to the same entity. For example, if Alice deposits 5 times to the 1 ETH pool, 3 times to the 100 DAI pool, and 4 times to the 0.1 ETH pool, and Bob withdraws identically, then Alice and Bob potentially are the same individual.
If address A deposits to a Tornado Cash pool and address B withdraws from the same pool but we observe frequent interactions between Address A and B outside of Tornado Cash (more general Ethereum transactions), it potentially indicates that address A and B are owned by the same entity.
Anonymity mining is a clever way to incentivize users to participate in mixing. However, if users carelessly claim their Anonymity Points (AP) or Tornado tokens, then they can reduce their anonymity set. For instance, if a user withdraws their earned AP tokens to a deposit address, then we can approximate the maximum time a user has left their funds in the mixing pool. This is because users can only claim AP and TORN tokens after deposit transactions that were already withdrawn.
Tutela is still in its very early stages and we are looking for feedback at all levels. Let us know your thoughts, critiques, and suggestions in the Tutela-Product-Feedback channel of the Tornado Cash Discord.. How can we make Tutela something useful for you? What features or heuristics are we missing?
Ethereum and Tornado Cash transactions are downloaded using BigQuery. The deposit address reuse algorithm was adapted from the existing implementation in etherclust. Our Python implementation can be found in src/
; it is written to scalably operate over the >1 Tb of Ethereum data. The Tornado-specific heuristics can be found in scripts/tornadocash
, again written in Python. The Tutela web application lives in webapp/
and is written in Flask with a PostgreSQL database for storing clusters. The frontend is written in Javascript, HTML, and CSS.
We aim to provide consistent updates over time as we improve Tutela.
Development of the web application and clustering was done by mhw32, kkailiwang, Tiggy560, and nickbax, with support from Convex Labs. Development of TCash heuristics was done by seresistvanandras, unbalancedparentheses, tomasdema, entropidelic, HermanObst, and pefontana.