{"id":1715,"date":"2014-12-06T16:07:42","date_gmt":"2014-12-06T15:07:42","guid":{"rendered":"http:\/\/denbeke.be\/blog\/?p=1715"},"modified":"2015-07-31T14:13:46","modified_gmt":"2015-07-31T13:13:46","slug":"backup-your-databases-in-git","status":"publish","type":"post","link":"https:\/\/denbeke.be\/blog\/software\/backup-your-databases-in-git\/","title":{"rendered":"Backup your databases in Git"},"content":{"rendered":"<p>Storing backups of the database is import for any possible service on the internet. Git can be the right\u00a0tool to backup databases.<\/p>\n<p>Like other version control systems, Git tracks the changes, and will only push the changes in files to the remote. So if one line in a 1\u00a0million database dump is changed, we don&#8217;t need to transfer the whole dump to our backup server. This economization is done by Gits delta compression mechanism.<sup>1<\/sup><\/p>\n<h3>Configuring Git<\/h3>\n<p>Generating SSH keys:<\/p>\n<pre><code>$ ssh-keygen -t rsa -C \"your_email@example.com\"\r\nGenerating public\/private rsa key pair.\r\nEnter file in which to save the key (\/Users\/you\/.ssh\/id_rsa): [Press enter]\r\nEnter passphrase (empty for no passphrase): [Type a passphrase]\r\nEnter same passphrase again: [Type passphrase again]\r\nYour identification has been saved in \/Users\/you\/.ssh\/id_rsa.\r\nYour public key has been saved in \/Users\/you\/.ssh\/id_rsa.pub.\r\nThe key fingerprint is:\r\n01:0f:f4:3b:ca:85:d6:17:a1:7d:f0:68:9d:f0:a2:db your_email@example.com\r\n<\/code><\/pre>\n<p>If you want to execute the backup script automatically, leave the <code>passphrase<\/code> blank. This way SSH won&#8217;t ask for it. Note that this might be insecure!<\/p>\n<p>Now create a remote Git repository, and add the public key to this\u00a0Git service, e.g. Github, Gogs, &#8230;.<\/p>\n<p>Init a new local repo with the SSH remote address, and commit\/push an initial commit.<\/p>\n<h3>Backup script<\/h3>\n<pre><code>#! \/bin\/sh\r\n\r\nTIMESTAMP=$(date +\"%F\")\r\nBACKUP_DIR=\"\/home\/mathias\/backup\/mysql_git\"\r\nMYSQL_USER=\"Mathias\"\r\nMYSQL=\/usr\/bin\/mysql\r\nMYSQL_PASSWORD=\"yourpassword\"\r\nMYSQLDUMP=\/usr\/bin\/mysqldump\r\n\r\ncd $BACKUP_DIR\r\n\r\necho \"Backupping databases\"\r\n\r\ndatabases=`$MYSQL --user=$MYSQL_USER -p$MYSQL_PASSWORD -e \"SHOW DATABASES;\" | grep -Ev \"(Database|information_schema|performance_schema)\"`\r\n\r\nfor db in $databases; do\r\n\r\n        echo \"  &gt; Dumping $db to disk\"\r\n        $MYSQLDUMP --force --opt --user=$MYSQL_USER -p$MYSQL_PASSWORD --skip-extended-insert --databases $db &gt; \"$BACKUP_DIR\/$db.sql\"\r\n\r\n        echo \"  &gt; Saving $db changes on Git server\"\r\n        git add $BACKUP_DIR\/$db.sql\r\n        git commit -m \"$db `date +\"%m-%d-%Y\"`\"\r\n        git push\r\n\r\ndone\r\n\r\necho \"Done\"<\/code><\/pre>\n<p>The script will loop through all MySQL databases, and dump them to a <code>.sql<\/code> file (with their name). \u00a0After it is dumped, the file will be added to the local git repo and will be committed.<\/p>\n<p>After each commit, the changes are pushed to the remote repo. This avoids having very big\u00a0pushes to do, when working with large databases. If you want to push only once, just place the push at the end of the script.<\/p>\n<h3>Running the backup script<\/h3>\n<p>Running this script manually isn&#8217;t the best solution. Making an automated backup service of this is straightforward, just make a cronjob that executes the script every day (or any timespan you want).<\/p>\n<p>Type <code>crontab -e<\/code> in the console, this will open your personal cron configuration in your favorite editor. Now add the cronjob to the crontab:<\/p>\n<pre><code>30 2 * * * \/home\/mathias\/backup\/mysql_git\/backup.sh &gt;&gt; \/home\/mathias\/backup_git_cron.log<\/code><\/pre>\n<p>This particular example will run the backup script every day at 2h30, and append the output of the script to a <code>backup_git_cron.log<\/code> file in my home directory. (Of course you are absolutely free to create any exotic cronjob that runs the backup script at your desired moment)<\/p>\n<h3>Big data and low-end hardware<\/h3>\n<p>Git works very well for small programming source files and small text files. Those database dumps, however, \u00a0aren&#8217;t always that small. On my VPS I have 200MB of database dumps, which each have to be compressed and packed for every commit.\u00a0This takes a lot of time on a machine with 512MB ram, and even crashes sometimes on the largest files. While pushing I&#8217;ve seen this error way too much:\u00a0<code>error: pack-objects died of signal 9<\/code>.<\/p>\n<p>Some other Git users with larger files have reduced the limits regarding packing, which resulted in fewer problems packing those files:<\/p>\n<pre><code>git config --global pack.windowMemory \"100m\"\r\ngit config --global pack.packSizeLimit \"100m\"\r\ngit config --global pack.threads \"1\"\r\n<\/code><\/pre>\n<p>On my server that didn&#8217;t really seem to work (apparently I still haven&#8217;t got enough free ram): I noticed fewer problems while compressing the objects, but it still took quite some time (and crash) for the large dumps.<br \/>\nThe solution for me seemed to be turning of the delta compression.<sup>2<\/sup><\/p>\n<pre><code>echo '*.sql -delta' &gt; .gitattributes<\/code><\/pre>\n<p>The above solution writes the setting to the <code>.gitattributes<\/code> file. If you commit this file, it will be turned of on any clone of the repo.<\/p>\n<p>Another solution would be to migrate from Git to Mercurial. From what I&#8217;ve <a title=\"What is the Difference Between Mercurial and Git?\" href=\"http:\/\/stackoverflow.com\/questions\/35837\/what-is-the-difference-between-mercurial-and-git\" target=\"_blank\">read<\/a>, Mercurial stores diffs in stead of object packs.<\/p>\n<blockquote><p><em>There&#8217;s one huge difference between git and mercurial; the way the represent each commit. git represents commits as snapshots, while mercurial represents them as diffs.<\/em><\/p><\/blockquote>\n<blockquote><p><em>Changesets (diffs) advantage is in taking up less space. Git recovers the space used for commits by using compression, but this requires an occasional explicit recompress step (&#8220;git pack&#8221;).<\/em><\/p><\/blockquote>\n<p>When\u00a0the history of the repository become too large, it is useful to do a shallow clone of the remote:\u00a0<code>git clone --depth &lt;depth&gt; &lt;remote-url&gt;<\/code>. This way you don&#8217;t keep large local history, but let the remote keep it.<\/p>\n<h3>Conclusion<\/h3>\n<p>Git might not be the perfect system for backups, but using it with care (and good hardware) it can provide a decent backup system.<\/p>\n<p>And it&#8217;s always better than having no backup at all!<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p><sup>1<\/sup> Later in this blogpost I have pointed\u00a0out that disabling delta compression is better on low memory machines to use less memory. Note that without delta compression Git needs to send a lot more data to the remote.<sup>2<\/sup><\/p>\n<p><sup>2<\/sup>\u00a0Note that disabling delta compression makes that Git needs to push the full packs of large files, and can therefore not rely on delta compression. So If you change one line of \u00a0500MB file, that is packed in 100MB pack, you will always need to send that 100MB pack to the remote (instead of just 15KB) when delta compression is turned off. Without delta compression Git also needs to store all the objects of the files. After 65 commits (of some large databases), I had a repo of almost 1GB. Running <code>git gc<\/code> shrank\u00a0that repo to less than 100MB. (Unfortunately, running <code>git gc<\/code> on my 512MB ram server results in those well-known issues: \u00a0<code>error: pack-objects died of signal 9<\/code>, <code>warning: suboptimal pack - out of memory<\/code> and <code>fatal: inflateInit: out of memory<\/code>)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Storing backups of the database is import for any possible service on the internet. Git can be the right\u00a0tool to backup databases. Like other version control systems, Git tracks the changes, and will only push the changes in files to the remote. So if one line in a 1\u00a0million database dump is changed, we don&#8217;t [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[227,161],"tags":[239,210,241,240,212],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v15.6.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Backup your databases in Git &ndash; DenBeke<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/denbeke.be\/blog\/software\/backup-your-databases-in-git\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Backup your databases in Git &ndash; DenBeke\" \/>\n<meta property=\"og:description\" content=\"Storing backups of the database is import for any possible service on the internet. Git can be the right\u00a0tool to backup databases. Like other version control systems, Git tracks the changes, and will only push the changes in files to the remote. So if one line in a 1\u00a0million database dump is changed, we don&#8217;t [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/denbeke.be\/blog\/software\/backup-your-databases-in-git\/\" \/>\n<meta property=\"og:site_name\" content=\"DenBeke\" \/>\n<meta property=\"article:published_time\" content=\"2014-12-06T15:07:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2015-07-31T13:13:46+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary\" \/>\n<meta name=\"twitter:creator\" content=\"@MthsBk\" \/>\n<meta name=\"twitter:site\" content=\"@MthsBk\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data1\" content=\"5 minutes\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/denbeke.be\/blog\/#website\",\"url\":\"https:\/\/denbeke.be\/blog\/\",\"name\":\"DenBeke\",\"description\":\"Mathias Beke\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/denbeke.be\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/denbeke.be\/blog\/software\/backup-your-databases-in-git\/#webpage\",\"url\":\"https:\/\/denbeke.be\/blog\/software\/backup-your-databases-in-git\/\",\"name\":\"Backup your databases in Git &ndash; DenBeke\",\"isPartOf\":{\"@id\":\"https:\/\/denbeke.be\/blog\/#website\"},\"datePublished\":\"2014-12-06T15:07:42+00:00\",\"dateModified\":\"2015-07-31T13:13:46+00:00\",\"author\":{\"@id\":\"https:\/\/denbeke.be\/blog\/#\/schema\/person\/386878f712fe3fe22227216f087772dc\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/denbeke.be\/blog\/software\/backup-your-databases-in-git\/\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/denbeke.be\/blog\/#\/schema\/person\/386878f712fe3fe22227216f087772dc\",\"name\":\"Mathias Beke\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/denbeke.be\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/015ba35e6ce4f5859e3888ca99807575?s=96&d=mm&r=g\",\"caption\":\"Mathias Beke\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/denbeke.be\/blog\/wp-json\/wp\/v2\/posts\/1715"}],"collection":[{"href":"https:\/\/denbeke.be\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/denbeke.be\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/denbeke.be\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/denbeke.be\/blog\/wp-json\/wp\/v2\/comments?post=1715"}],"version-history":[{"count":19,"href":"https:\/\/denbeke.be\/blog\/wp-json\/wp\/v2\/posts\/1715\/revisions"}],"predecessor-version":[{"id":1884,"href":"https:\/\/denbeke.be\/blog\/wp-json\/wp\/v2\/posts\/1715\/revisions\/1884"}],"wp:attachment":[{"href":"https:\/\/denbeke.be\/blog\/wp-json\/wp\/v2\/media?parent=1715"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/denbeke.be\/blog\/wp-json\/wp\/v2\/categories?post=1715"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/denbeke.be\/blog\/wp-json\/wp\/v2\/tags?post=1715"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}