outages.tmpl 6.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
  1. {{template "base/head" .}}
  2. <div id="outages-div">
  3. <div class="ui middle very relaxed page grid">
  4. <div id="outages-links-div">
  5. <h2> Outages </h2>
  6. <ul>
  7. <li><a href="{{AppSubUrl}}/outages#about">About this page</a></li>
  8. <li><a href="{{AppSubUrl}}/outages#2016-08-31">2016-08-31</a></li>
  9. <li><a href="{{AppSubUrl}}/outages#2016-01-31">2016-01-31</a></li>
  10. <li><a href="{{AppSubUrl}}/outages#2015-07-18">2015-07-18</a></li>
  11. <li><a href="{{AppSubUrl}}/outages#2015-04-30">2015-04-30</a></li>
  12. <li><a href="{{AppSubUrl}}/outages#2015-04-20">2015-04-20</a></li>
  13. <li><a href="{{AppSubUrl}}/outages#2015-04-17">2015-04-17</a></li>
  14. <li><a href="{{AppSubUrl}}/outages#2015-03-12">2015-03-12</a></li>
  15. </ul>
  16. </div>
  17. </div>
  18. <div class="ui middle very relaxed page grid">
  19. <div id="outages-list-div">
  20. <a name="about"></a>
  21. <h1>About this page</h1>
  22. <p>Occasionally you'll see <a href="{{AppSubUrl}}/error/error.html">this</a> fellow on your visit to our site. Whenever this happens we've made a mistake. This page is here to help you understand what went wrong and (hopefully) give you a sense that we're constantly improving. If you find any problems using the site please report a bug <a href="{{AppSubUrl}}/hp/gogs/issues">here</a></p>
  23. <p>We hope you won't have to refer to this page too often. :)</p>
  24. <br>
  25. <a name="2016-08-31"></a>
  26. <h1>2016-08-31</h1>
  27. <h2> What happened </h2>
  28. <p> NotABug.org was completey unreachable for most of the 31st of August and the 1st of September. The frontend was not reachable, git pulls and pushes were not possible.</p>
  29. <br>
  30. <h2> What caused it </h2>
  31. <p> Due to a bad DIMM and non-ECC RAM in the server the filesystem of the main gogs repository got corrupted. This had to be restored from backup. No data was lost, however </p>
  32. <br>
  33. <h2> How has this been addressed </h2>
  34. <p>
  35. <ul>
  36. <li> A new server with ECC ram was purchased. </li>
  37. <li> All data was succesfully restored from backups. </li>
  38. </ul>
  39. </p>
  40. <br>
  41. <a name="2016-01-31"></a>
  42. <h1>2016-01-31</h1>
  43. <h2> What happened </h2>
  44. <p> NotABug.org was completey unreachable for most of the 31st of January and the 1st of February. The frontend was not reachable, git pulls and pushes were not possible.</p>
  45. <br>
  46. <h2> What caused it </h2>
  47. <p> Due to a drive failure in the array the filesystem housing the Git repositories went in read-only mode. During recovery of the situation a admin-error accidentally deleted the root filesystems of both the hypervisor and the NotABug.org VM. Due to this error the SSH key of the server was changed. The new fingerprints are:
  48. <ul>
  49. <li> 1024 SHA256:3Gfd/hCr2FAnV34U/LKO93m14yD7JnJOfZPub2sK7No (DSA) </li>
  50. <li> 256 SHA256:tv7X1j2jOYPgztNC+3KYh2E751I01Y7vHWS83uax8jQ (ECDSA) </li>
  51. <li> 256 SHA256:BtR8pJSuIPDCUT5DJ0kk/Sxspp9dgWf5p3EE0OUQe3g (ED25519)</li>
  52. <li> 2048 SHA256:C3P4FLq03YMBgA0ipZrw8AvYLQsq54I1lYUPhPKI5iE (RSA)</li>
  53. <li> 1024 MD5:c3:27:ea:d8:08:52:f7:c5:5a:68:4d:37:c3:50:32:35 (DSA) </li>
  54. <li> 256 MD5:44:1a:ae:46:fe:c9:a7:f9:99:77:21:56:bc:02:47:de (ECDSA)</li>
  55. <li> 256 MD5:fe:ec:8b:51:90:05:bc:3f:dd:21:94:31:46:12:72:d6 (ED25519) </li>
  56. <li> 2048 MD5:b2:de:dc:04:31:c4:18:85:10:ee:c9:d7:e4:60:57:7d (RSA) </li>
  57. </ul>Additionally, all the avatars were lost. No repository data was lost, however.</p>
  58. <br>
  59. <h2> How has this been addressed </h2>
  60. <p>
  61. <ul>
  62. <li> The systems were reinstalled and restored from backup where possible. </li>
  63. </ul>
  64. </p>
  65. <br>
  66. <a name="2015-07-18"></a>
  67. <h1>2015-07-18</h1>
  68. <h2> What happened </h2>
  69. <p> NotABug.org web front-end was throwing 500 errors for most of the day. It seems git pushes also did not work.</p>
  70. <br>
  71. <h2> What caused it </h2>
  72. <p> Someone has flooded the Gogs install with thousands of new user registrations. We were effectively suffering a denial of service attack.</p>
  73. <br>
  74. <h2> How has this been addressed </h2>
  75. <p>
  76. <ul>
  77. <li> The Gogs VM was restarted. </li>
  78. <li> The new users were deleted. </li>
  79. </ul> More work still needs to be done to make this type of attack less likely to succeed in the future. No concrete steps have been taken in this direction, however.
  80. </p>
  81. <br>
  82. <br>
  83. <a name="2015-04-20"></a>
  84. <h1>2015-04-20</h1>
  85. <h2> What happened </h2>
  86. <p> NotABug.org web front-end was down between 19:00 and 19:30 CEST. It was impossible to view any of the NotABug.org web content. Git pulls and pushes were unaffected.</p>
  87. <br>
  88. <h2> What caused it </h2>
  89. <p> The Gogs software crashed and the monitoring software responsible for restarting it automatically was still faulty. We now know that the problem was that the gogs daemon expects to be able to read the current userid from the $USER environment variable. This only works if the init script is started with the shell runing in bash mode. The init script was called with /bin/sh when running non-interactively (during startup and during the monitor cron job). </p>
  90. <br>
  91. <h2> How has this been addressed </h2>
  92. <p>
  93. <ul>
  94. <li> The init script has been adjusted such that it explicitly sets the $USER variable. </li>
  95. </ul>
  96. </p>
  97. <br>
  98. <a name="2015-04-17"></a>
  99. <h1>2015-04-17</h1>
  100. <h2> What happened </h2>
  101. <p> NotABug.org web front-end was down between 17:00 and 19:00 CEST. It was impossible to view any of the NotABug.org web content. Git pulls and pushes were unaffected.</p>
  102. <br>
  103. <h2> What caused it </h2>
  104. <p> The Gogs software crashed and the monitoring software responsible for restarting it automatically was faulty causing a restart loop.</p>
  105. <br>
  106. <h2> How has this been addressed </h2>
  107. <p>
  108. <ul>
  109. <li> The monitoring software has been adjusted such that a restart loop should be less likely.</li>
  110. <li> Investigations are ongoing to find the root cause of the crash. </li>
  111. </ul>
  112. </p>
  113. <br>
  114. <a name="2015-03-12"></a>
  115. <h1>2015-03-12</h1>
  116. <h2> What happened </h2>
  117. <p> NotABug.org web front-end was down between 17:00 and 18:00 CEST. It was impossible to view any of the NotABug.org web content. Git pulls and pushes were unaffected.</p>
  118. <br>
  119. <h2> What caused it </h2>
  120. <p> We currently believe that a bad merge of one of Gogs' dependencies caused a problem with the application. At the time the only person who could fix the problem was off at a party. There was no monitoring. Additionally the error page offered no recourse making it seem like we had just gone away.</p>
  121. <br>
  122. <h2> How has this been addressed </h2>
  123. <p>
  124. <ul>
  125. <li> The dependencies of Gogs were reverted to known-good versions. </li>
  126. <li> More people were given access to troubleshoot the Gogs instance for NotABug.org. </li>
  127. <li> Basic monitoring was put in place which will be improved in the future. </li>
  128. <li> A <a href="{{AppSubUrl}}/error/error.html">better error page</a> was created with contact information. </li>
  129. </ul>
  130. </p>
  131. <br>
  132. </div>
  133. </div>
  134. </div>
  135. {{template "base/footer" .}}