Avoiding Personal Information Leaks in Blogs

This article introduces practical tips and best practices for protecting personal privacy and avoiding sensitive information leaks in blog writing.

The popular free open-source platform GitHub Pages is widely used, with many blogs being published through GitHub Pages.
However, the free version requires public repositories for public access. Once a repository is public, even articles marked as drafts can be accessed from the Git repository. Although published articles rarely contain sensitive information, the source repository of open-source blogs may leak personal information. Below are some common information leak keywords, and comments are welcome for additions.

Sensitive Keywords

Chinese KeywordsEnglish Keywords
Passwordpassword
Accountaccount
ID Cardid
Bank Cardcard
Alipayalipay
WeChatwechat
Phone Numberphone
Home Addressaddress
Workplacecompany
Social Security Cardcard
Driver’s Licensedriver
Passportpassport
Credit Cardcredit
Secret Keykey
Configuration Fileini
Credentialscredential
Usernameusername

Regular expression search:

(密码|账号|身份证|银行卡|支付宝|微信|手机号|家庭住址|工作单位|社保卡|驾驶证|护照|信用卡|username|password|passwd|account|key\s*:|\.ini|credential|card|bank|alipay|wechat|passport|id\s*:|phone|address|company)

If you use VSCode as your blog editor, you can use regular expression search to quickly perform a site-wide search and check for potential information leaks.

Git History

Git history may contain information leaks, which can be scanned through simple scripts in open-source blogs.

If it’s your own repository, you can clear the history using the following method. If you need to preserve historical information, do not clear it.

Please ensure you understand the meaning of the command; it will clean up history, so proceed with caution and back up important data before operating.

git reset --soft ${first-commit}
git push --force

Other Repository Scanning Methods

https://github.com/trufflesecurity/trufflehog

  • Find, verify, and analyze leaked credentials
  • 17.2k stars
  • 1.7k forks

img

Other Blog Publishing Methods

  • Github Pro supports publishing private repositories to Pages, Pro costs four dollars per month
  • Set as a private repository and publish to Cloudflare Pages
  • Separate repositories: one private repository for articles being edited, one public repository for articles ready for publication

If your blog uses a comment system like giscus that relies on GitHub, you will still need a public repository.

Good Habits vs. Good Mechanisms

When discussing personal information leaks in open-source blogs, many people believe that as long as you avoid uploading sensitive information to the repository, there won’t be any problems.

This is a useless platitude, similar to asking programmers not to write bugs—it’s correct but useless. Relying on habits to protect personal information is unreliable. Don’t easily trust someone’s habits; they might forget at any time.

Writing sometimes involves temporary statements, especially in technical blogs by programmers. Short scripts might be written casually, and one may not always remember to use environment variables, thus leaving the possibility of sensitive information being exposed.

Most people understand what good habits are, so we won’t discuss them here. Instead, we’ll focus on how to avoid personal information leaks through mechanisms.

First, separate repositories: keep draft repositories and publishing repositories separate. All articles published on GitHub Pages should be reviewed and won’t have draft status articles leaked.

You can also use Github Action to scan for sensitive information with each submission. If sensitive information is found, the submission will be blocked. Refer to trufflehog

The regular expression search shared in this article is just a simple example and isn’t integrated into any workflow. You can customize it further based on your needs and integrate it into your processes.

References