possible-distributed-anti-abuse.html 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316
  1. title: Possible routes for distributed anti-abuse systems
  2. date: 2017-04-04 18:00
  3. author: Christine Lemmer-Webber
  4. tags: federation, anti-abuse
  5. slug: possible-distributed-anti-abuse
  6. ---
  7. <p>
  8. I work on federated standards and systems, particularly
  9. <a href="https://www.w3.org/TR/activitypub/">ActivityPub</a>.
  10. Of course, if you work on this stuff, every now and then the question
  11. of "how do you deal with abuse?" very rightly comes up.
  12. Most recently <a href="https://mastodon.social/">Mastodon</a> has gotten
  13. some attention, which is great!
  14. But of course, people are raising the question,
  15. <a href="https://brian.mastenbrook.net/blog/2017/federating-failure.html">can federation systems really protect people from abuse</a>?
  16. (It's not the first time to come up either; at LibrePlanet in 2015 a
  17. number of us held a "social justice for federated free software systems"
  18. dinner and were discussing things then.)
  19. It's an important question to ask, and I'm afraid the answer is,
  20. "not reliably yet".
  21. But in this blogpost I hope to show that there may be some hope for
  22. the future.
  23. </p>
  24. <p>A few things I think you want out of such a system:</p>
  25. <ul>
  26. <li>
  27. It should actually be decentralized.
  28. It's possible to run a mega-node that everyone screens their content
  29. against, but then what's the point?
  30. </li>
  31. <li>
  32. The most important thing is for the system to prevent attackers from
  33. being able to deliver hateful content.
  34. An attack in a social system means getting your message across, so
  35. that's what we don't want to happen.
  36. </li>
  37. <li>
  38. But who are we protecting, and against what?
  39. It's difficult to know, because even very progressive groups often don't
  40. anticipate who they need to protect; "social justice" groups of the past
  41. are often exclusionary against other groups until they find out they need
  42. to be otherwise (eg in each of these important social movements, some
  43. prominent members have had problems including other social justice groups:
  44. racist suffragists, civil rights activists exclusionary against gay and
  45. lesbian groups, gay and lesbian groups exclusionary against transgender
  46. individuals...).
  47. The point is: if we haven't gotten it all right in the past, we might not
  48. get it all right in the present, so the most important thing is to allow
  49. communities to protect themselves from hate.
  50. </li>
  51. </ul>
  52. <p>
  53. Of course, keep in mind that <em>no</em> technology system is going
  54. to be perfect; these are all imperfect tools for mitigation.
  55. But what technical decisions you make do also affect who is
  56. empowered in a system, so it's also still important to work on
  57. these, though none of them are panaceas.
  58. </p>
  59. <p>
  60. With those core bits down, what strategies are available?
  61. There are a few I've been paying close attention to
  62. (keep in mind that I am an expert in zero of these routes at present):
  63. </p>
  64. <ul>
  65. <li>
  66. <strong>Federated Blocklists:</strong> The easiest "starter" route.
  67. And good news!
  68. If you're using the
  69. <a href="https://www.w3.org/TR/activitypub/">ActivityPub standard</a>,
  70. there's
  71. <a href="https://www.w3.org/TR/activitypub/">already a Block activity</a>,
  72. and you could build up group-moderated collections of people to block.
  73. A decent first step, but I don't think it gets you very far; for one thing,
  74. being the maintainer of a public blocklist is a risky activity;
  75. trolls might use that information to attack you.
  76. That and merging/squashing blocklists might be awkward in this system.
  77. </li>
  78. <li>
  79. <strong>Federated reputation systems:</strong>
  80. You could also take it a step further by using something like the
  81. <a href="https://www.stellar.org/how-it-works/stellar-basics/explainers/">Stellar consensus protocol</a>
  82. (more info in <a href="https://www.stellar.org/papers/stellar-consensus-protocol.pdf">paper form</a>
  83. or even
  84. <a href="https://www.stellar.org/stories/adventures-in-galactic-consensus-chapter-1/">a graphic novel</a>).
  85. Stellar is a cryptographically signed ledger. Okay, yes, that makes it a
  86. kind of blockchain (which will make some peoples' eyes glaze over, but
  87. technically a signed git repository is also a blockchain), but it's not
  88. necessarily restricted to use of cryptocurrencies... you can track any
  89. kinds of transactions with it.
  90. Which means we could also track blocklists, or even less binary
  91. reputation systems! But what's most interesting about Stellar is that
  92. it's also federated... and in this case, federation means you can
  93. <em>choose</em> what groups you trust... but due to math'y concepts that
  94. I occasionally totally get upon being explained to me and then forget the
  95. moment someone asks me to explain to someone else, consensus is still
  96. enforced within the "slices" of groups you are following.
  97. You can imagine maybe the needs of an LGBT community and a Furry
  98. community might overlap, but they might not be the same, and maybe you'd
  99. be subscribed to just one or both, or neither.
  100. Or pick your other social groups, go wild.
  101. That said, I'm not sure how to make these "transactions" not public in
  102. this system, so it's very out there in the open, but since there's a
  103. voting system built-in maybe particular individuals won't be as liable
  104. for being attacked as individuals maintaining a blocklist are.
  105. Introducing a sliding-scale "social reputation system" may also introduce
  106. other dangerous problems, though I think Stellar's design is probably the
  107. least dangerous of all of these since it probably will still keep abusers
  108. out of a particular targeted group, but will allow
  109. marginalized-but-not-recognized-by-larger groups still avenues to set up
  110. their own slices as well.
  111. </li>
  112. <li>
  113. <strong>"Charging" for distributing messages:</strong>
  114. Hoo boy, this one's going to be controversial!
  115. This was suggested to me by someone smart in the whole distributed
  116. technology space.
  117. It's not necessarily what we would normally consider real money that
  118. would be charged to distribute things... it could be a kind of "whuffie"
  119. cryptocurrency that you have to pay.
  120. Well the upside to this is it would keep low-funded abusers out of a
  121. system... the downside is that you've now basically powered your
  122. decentralized social network through pay-to-play capitalism.
  123. Unfortunately, even if the cryptocurrency is just some "social media fun
  124. money", imaginary currencies have a way of turning into real currencies;
  125. see paying for in-game currency in any massively multiplayer game ever.
  126. I don't think this gives us the power dynamics we want in our system, but
  127. it's worth noting that "it's one way to do it"... with serious side
  128. effects.
  129. </li>
  130. <li>
  131. <strong>Web of trust / Friend of a Friend networks:</strong>
  132. Well researched in crypto systems, though nobody's built really good
  133. UIs for them.
  134. Still, a lot of potential if the system was somehow made friendly and
  135. didn't require showing up to a nerd-heavy "key-signing party"...
  136. if the system could have marking who you trust and who you don't (and not
  137. just as in terms of verifying keys) built as an elegant part of the UI,
  138. then yes I think this could be a good component for recognizing who you
  139. might allow to send you messages.
  140. There are also risks in having these associations be completely public,
  141. though I think web of trust systems don't necessarily have to be
  142. public... you can recurse outward from the individuals you do already
  143. know.
  144. (<i>Edit:</i> My friend <a href="http://www.draketo.de/">ArneBab</a>
  145. suggests that looking at how Freenet handles its web of trust
  146. would be a good starting point for someone wishing to research
  147. this.
  148. I have 0 experience with Freenet, but
  149. <a href="https://github.com/freenet/wiki/wiki/Web-Of-Trust">here</a> are
  150. <a href="https://emu.freenetproject.org/pipermail/devl/2016-April/038916.html">some</a>
  151. <a href="http://freesocial.draketo.de/fms_en.html">resources</a>.)
  152. </li>
  153. <li>
  154. <strong>Distributed recommendation systems:</strong>
  155. Think of recommender systems in
  156. (sorry for the centralized system references)
  157. Amazon, Netflix, or any of the major social networks
  158. (Twitter, Facebook, etc).
  159. Is there a way to tell if someone or some message may be relevant to you,
  160. depending on who else you follow? Almost nobody seems to be doing research
  161. here, but not quite nobody; here's one paper:
  162. <a href="https://people.eecs.berkeley.edu/%7Ejfc/'mender/IEEESP02.pdf">Collaborative Filtering with Privacy</a>.
  163. Would it work?
  164. I have no idea, but the paper's title sure sounds compelling.
  165. (<i>Edit:</i>
  166. ArneBab also points out that
  167. <a href="http://credence-p2p.org/">credence-p2p</a> might also be useful
  168. to look at.
  169. <a href="http://credence-p2p.org/paper.html">Relevant papers here.</a>)
  170. </li>
  171. <li>
  172. <strong>Good ol' Bayesian filtering:</strong>
  173. Unfortunately, I think that there's too many alternate routes of attacks
  174. for just processing a message's statistical contents to be good enough,
  175. though I think it's probably a good component of an anti-abuse system.
  176. In fact, maybe we should be talking about solutions that can use multiple
  177. components, and be very adaptive...
  178. </li>
  179. <li>
  180. <strong>Distributed machine learning sets:</strong>
  181. Probably way too computationally expensive to run in a decentralized
  182. network, but maybe I'm wrong.
  183. Maybe this can be done in a the right way, but I get the impression
  184. that without the training dataset it's probably not useful?
  185. Prove me wrong!
  186. But I also just don't know enough about machine learning.
  187. Has the right property of being adaptive, though.
  188. </li>
  189. <li>
  190. <strong>Genetic programs:</strong>
  191. Okay, I hear you saying,
  192. "what?? genetic programming?? as in programs that evolve?"
  193. It's a field of study that has quite a bit of research behind it,
  194. but very little application in the real world... but it might be a good
  195. basis for filtering systems in a federated network
  196. (I'm beginning to explore this but I have no idea if it will bear fruit).
  197. Programs might evolve on your machine and mine which adapt to the changing
  198. nature of social attacks.
  199. And best of all, in a distributed network, we might be able to send our
  200. genetic anti-abuse programs to each other... and they could breed and make
  201. new anti-abuse baby programs!
  202. However, for this to work the programs would have to carry part of the
  203. information of their "experiences" from parent to child.
  204. After all, a program isn't going to very likely randomly bump into finding
  205. out that a hateful group has started using "cuck" as a slur.
  206. But programs keep information around while they run, and it's possible that
  207. parent programs could teach wordlists and other information to their
  208. children, or to other programs.
  209. And if you already have a trust network, your programs could propagate their
  210. techniques and information with each other.
  211. (There's a risk of a side channel attack though: you might be able to find
  212. some of the content of information sent/received by checking the wordlists
  213. or etc being passed around by these programs.)
  214. (You'd definitely want your programs sandboxed if you took this route,
  215. and I think it would be good for filtering only... if you expose output
  216. methods, your programs might start talking on the network, and who knows
  217. what would happen!)
  218. One big upside to this is that if it worked, it <em>should</em> work in a
  219. distributed system... you're effectively occasionally bringing the
  220. anti-abuse hamster cages together now and then.
  221. However, you do get into an ontology problem... if these programs are
  222. making up wordlists and binding them to generated symbols, you're
  223. effectively generating a new language.
  224. That's not too far from human-generated language, and so at that point
  225. you're talking about a computer-generated natural language... but I think
  226. there may be evolutionary incentive to agree upon terms.
  227. Setting up the "fitness" of the program (same with the machine learning
  228. route) would also have to involve determining what filtering is useful /
  229. isn't useful to the user of the program, and that's a whole challenging
  230. problem domain of its own (though you could start with just manually
  231. marking correct/incorrect the way people train their spam filters with
  232. spam/ham).
  233. But... okay by now this sounds pretty far-fetched, I know, but I think it
  234. has some promise... I'm beginning to explore it with a derivative of some
  235. of the ideas from
  236. <a href="http://faculty.hampshire.edu/lspector/push.html">PushGP</a>.
  237. I'm not sure if any of these ideas will work but I think this is both the
  238. most entertainingly exciting and crazy at the same time.
  239. (On another side, I also think there's an untapped potential for
  240. roguelike AI that's driven by genetic algorithms...)
  241. There's definitely one huge downside to this though, even if it
  242. <em>was</em> effective (the same problem machine learning groups have)...
  243. the programs would be nearly unreadable to humans!
  244. Would this really be the only source of information you'd want to trust?
  245. </li>
  246. <li>
  247. <strong>Expert / constraint based systems:</strong>
  248. Everyone's super into "machine learning" based systems right now, but
  249. it's hard to tell what on earth those systems are <em>doing</em>, even
  250. when their results are impressive (not far off from genetic algorithms,
  251. as above! but genetic algorithms may not require the same crazy large
  252. centralized datasets that machine learning systems tend to).
  253. Luckily there's a whole other branch of AI involving "expert systems" and
  254. "symbolic reasoning" and etc.
  255. The most promising of these I think is the
  256. <a href="https://groups.csail.mit.edu/mac/users/gjs/propagators/">
  257. propagator model</a> by Sussman / Radul / and many others
  258. (if you've seen the constraint system in SICP, this is a grandchild of
  259. that design).
  260. One interesting thing about the propagator model is that it can come to
  261. conclusions from exploring many different sources, and it can <em>tell
  262. you how</em> it came to those conclusions.
  263. These systems are incredible and under-explored, though there's a catch:
  264. usually they're hand-wired, or the rules are added manually (which is
  265. partly how you can tell where the conclusions came from, since the
  266. symbols for those sources may be labeled by a human... but who knows,
  267. maybe there's a way to map a machines concept of some term to a human's
  268. anyway).
  269. I think this won't probably be adaptive enough for the fast-changing
  270. world of different attack structures... but! but! we've explored a lot of
  271. other ideas above, and maybe you have some <em>combination</em> of a
  272. reputation system, and a genetic programming system, and etc, and this
  273. branch of study could be a great route to glue those very differing
  274. systems together and get a sense of what may be safe / unsafe from
  275. different sources... and at least understand how each source, on its
  276. macro level, contributed to a conclusion about whether or not to trust a
  277. message or individual.
  278. </li>
  279. </ul>
  280. <p>
  281. Okay, well that's it I think!
  282. Those are all the routes I've been thinking about.
  283. None of these routes are proven, but I hope that gives some evidence that
  284. there are avenues worth exploring... and that there is likely hope for
  285. the federated web to protect people... and maybe we could even do it
  286. better for the silos.
  287. After all, if we <em>could</em> do filtering as well as the big orgs,
  288. even if it were just at or nearly at the same level (which isn't as good
  289. as I'd like), that's already a win: it would mean we could protect
  290. people, and also preserve the autonomy of marginalized groups... who
  291. aren't very likely to be well protected by centralized regimes if push
  292. really does come to shove.
  293. </p>
  294. <p>
  295. I hope that inspires some people!
  296. If you have other routes that should be added to this list or you're
  297. exploring or would like to explore one of these directions, please
  298. <a href="https://dustycloud.org/contact/">contact me</a>.
  299. Once the <a href="https://www.w3.org/wiki/Socialwg/">W3C Social Working
  300. Group</a> wraps up, I'm to be co-chair of the following Social Community
  301. Group, and this is something we want to explore there.
  302. </p>
  303. <p>
  304. <i>Update:</i> I'm happy to see that
  305. <a href="https://news.ycombinator.com/item?id=14038700">the Matrix folks
  306. also see this</a> as "the single biggest existential threat" and
  307. "a problem that the whole decentralised web community has in common"...
  308. apparently they already have been looking at the Stellar approach.
  309. More from their
  310. <a href="https://matrix.org/blog/wp-content/uploads/2017/02/2017-02-04-FOSDEM-Future.pdf">
  311. FOSDEM talk slides</a>.
  312. I agree that this is a problem facing the whole decentralized web, and
  313. I'm glad / hopeful that there's interest in working together.
  314. Now's a good time to be implementing and experimenting!
  315. </p>