1
yangqi 2017-02-01 10:04:14 +08:00 1
太好笑了, 5 种备份方式没有一个备份正确的。这平时没人检查,一帮人居然没人意识到备份的重要性
|
2
dzxx36gyy 2017-02-01 10:04:32 +08:00 via Android 1
噗噗噗
|
3
ZE3kr OP 看到 “ 5 backup/replication techniques deployed none are working reliably or set up in the first place ” 时我也很惊讶,至少我都会定期检查备份是否更新,是否有大小(经常因为配置问题备份是空的)
|
4
ZE3kr OP 之前就是因为自己维护 GitLab 太麻烦,用了 GitLab.com 的公共服务,没想到它们的公共服务现在出了问题……
|
5
EPr2hh6LADQWqRVH 2017-02-01 10:27:40 +08:00 via Android
堪忧啊, Ruby 党
|
6
AstroProfundis 2017-02-01 10:30:31 +08:00
最后那句结论真是看着血淋淋的...
|
7
deleted 2017-02-01 10:36:54 +08:00 via Android
gitlab 粗事了…
|
8
wzxjohn 2017-02-01 10:46:35 +08:00
吃惊。。。还好我用的是自建的。。。
|
9
Havee 2017-02-01 11:02:50 +08:00
汗...
|
10
Unknwon 2017-02-01 11:07:24 +08:00 2
|
11
DoraJDJ 2017-02-01 11:13:32 +08:00 via Android
rm -rf / 又发威了
幸好提前转到了 Coding Pages |
13
neilp 2017-02-01 11:29:57 +08:00
5 个方案竟然都失败了.
也是没谁了 |
14
irainsoft 2017-02-01 11:34:06 +08:00
5 个都不工作也真是厉害 这么久还没发现....
|
15
Sharuru 2017-02-01 11:55:29 +08:00
五个备份除了最后的 S3 存储不正确以外,其他的都是因为备份周期过长( 24 小时)导致的。
备份是为了能够在系统失效后尽快的回复可用状态的最后手段, 如果是为了失效后快速恢复,那是 HA 做的事; |
16
ZE3kr OP |
17
yangqi 2017-02-01 12:02:05 +08:00
@Sharuru 24 小时可用的备份只有 LVM 快照,其他都无效
"2. Regular backups seem to also only be taken once per 24 hours, though YP has not yet been able to figure out where they are stored. According to JN these don ’ t appear to be working, producing files only a few bytes in size." "3. Disk snapshots in Azure are enabled for the NFS server, but not for the DB servers." "4. The synchronisation process removes webhooks once it has synchronised data to staging. " "5. The replication procedure is super fragile, prone to error, relies on a handful of random shell scripts, and is badly documented" |
18
ZE3kr OP 话说我最近也弄丢个非常重要的代码文件,正在我特别着急,是不是要重新写一遍代码时,发现我有一个一天前的 snapshot ,瞬间如释重负
|
19
DaCong 2017-02-01 12:07:00 +08:00 via Android
怪不得我今天更新 aur 上的一个包的时候从 gitlab 下载源代码总是出错。
|
20
matrix67 2017-02-01 12:09:41 +08:00
好消息是 This incident affected the database (including issues and merge requests) but not the git repo's (repositories and wikis).
|
21
matrix67 2017-02-01 12:12:22 +08:00
主要原因是手快
YP thinks that perhaps pg_basebackup is being super pedantic about there being an empty data directory, decides to remove the directory. After a second or two he notices he ran it on db1.cluster.gitlab.com, instead of db2.cluster.gitlab.com 2017/01/31 23:27 YP - terminates the removal, but it ’ s too late. Of around 310 GB only about 4.5 GB is left - Slack 然后他们还想用这个方法 恢复。难道他们是现场 google 的?应该不会吧。 JEJ: Probably too late, but isn't it sometimes possible if you make the disk read-only quickly enough? Also might still have file descriptor if the file was in use by a running process according to http://unix.stackexchange.com/a/101247/213510 Azure 这个删的快。想到 adobe 那个梗。 mac 要放个 adobe 压压惊。 Also, Azure is apparently also really good in removing data quickly, but not at sending it over to replicas. |
22
wanghanlin 2017-02-01 13:07:11 +08:00
没人关注这个点吗
Removed a user for using a repository as some form of CDN, resulting in 47 000 IPs signing in using the same account |
23
nomaka 2017-02-01 14:49:06 +08:00
The incident affected the database (including issues and merge requests) but not the git repo's (repositories and wikis). 文件存储的东西还在
|
24
ZE3kr OP @nomaka 感觉没什么用, git 一般本地都有,而就本地没有的那些它反而丢了,所以💊的节奏,数据库里还有用户账号和各种认证信息等等。
|
26
syhsyh9696 2017-02-01 20:43:21 +08:00
@avastms 莫名奇妙,是数据库的问题,怎么又说 ruby 堪忧?犯不着这样踩吧。
|
27
EPr2hh6LADQWqRVH 2017-02-01 22:20:08 +08:00
@syhsyh9696 浮躁,踏实的公司出不了这个问题
|
28
okampfer 2017-02-02 09:08:05 +08:00
|
29
ZE3kr OP @okampfer 当然 gitlab.com
|
30
stevenkang 2017-02-03 10:28:27 +08:00
@okampfer code.aliyun.com 主要是快,稳定,可靠(至少比 CSDN )。
|