Tech World With Milan Newsletter

Share this post

Why Google Stores Billions of Lines of Code In a Single Repository

newsletter.techworld-with-milan.com

Discover more from Tech World With Milan Newsletter

Insights into a beautiful world of Software Engineering, simplified. Join 15,000+ readers from Microsoft, Google, Meta, Amazon, and more.
Over 15,000 subscribers
Continue reading
Sign in

Why Google Stores Billions of Lines of Code In a Single Repository

Dr. Milan Milanović
Aug 3, 2023
11
Share this post

Why Google Stores Billions of Lines of Code In a Single Repository

newsletter.techworld-with-milan.com
Share

The text by Rachel Potvin and Josh Levenberg in the Communications of the ACM described how Google uses a monorepo approach.

Google’s monolithic repository provides a common source of truth for tens of thousands of developers worldwide. They use it for 95% of their source code, leaving Google Chrome and Android on their specific ones.

They used CVS and, after some time, migrated to Perforce and later replaced it with Piper. The 2016 source says it has over 2 billion lines of code and 40,000 daily commits by more than 10,000 engineers. We can only assume that this is much bigger today.

Engineers mainly use trunk-based development models at scale with much success. Regarding branching, changes are pushed to the main branch and submitted to code reviews. All of this removes nightmares of merge hell.

Tricorder carries out the first automatic checks when they try to submit new code and provides initial automated feedback to the developer. And then, code review can be done using the Critique tool for code reviews.

Engineers use Rosie, a solution for massive refactorings, optimizations, code cleaning, and other tools. It enables changes to be divided into minor changes and reviews by each owner before being built.

Some advantages of this approach are:

  • Unified versioning

  • Extensive code sharing

  • Simplified dependency management

  • Atomic changes

  • Large-scale refactoring

  • Collaboration across teams

  • Flexible code ownership

  • Code visibility

And disadvantages include having to create and scale tools for development and execution and maintain code health, as well as the potential for codebase complexity (such as unnecessary dependencies).

The paper

Check the entire paper and the presentation by Rachel.


Developer Tools in Meta

The article from Meta's engineering blog discusses several open-source developer tools used at Meta.

Here are the tools they use:

  • Sapling: This is a version control system designed for scalability and usability. It consists of a server, a client, and a virtual file system. The server stores all the data and is implemented primarily on Rust. The client communicates with the server and provides familiar operations like check out, rebase, commit, amend, etc. The virtual file system, EdenFS, checks out everything in a few seconds but only downloads the files from the server when they are accessed.

  • Buck2: This is a build system used by many developers at Meta to compile and test their changes. Buck2 is designed to work at a large scale, supporting remote caching and execution. It also supports multiple programming languages simultaneously.

  • Infer, RacerD, and Jest are tools used for testing and static analysis. Infer is used for general static analysis and supports multiple languages, including Java and C++. RacerD is used to detect Java concurrency bugs. Jest is a JavaScript testing framework transferred to the OpenJS Foundation in 2022.

  • Sapienz: This tool automatically tests mobile apps by simulating the user experience to discover crashes and other potential issues.

In addition to these open-source tools, Meta developers also use proprietary tools like Phabricator for CI and code review.

Meta's developer workflow
Meta developer tools (credits: Meta)

You can read the full article and learn more about these tools here. Also, you can find more about these tools (along with the ones covered above) in the article on Meta developer’s workflow.


Bonus: Free Programming Books Curated By Stack Overflow Docs

Here is the list noted as "Notes for Professionals," curated by the people at Stack Overflow.

It contains nearly 50 books covering different technologies, such as:

🔹 Python
🔹 Node.js
🔹 Java
🔹 Kotlin
🔹 Git
🔹 C++
🔹 C#
🔹 JavaScript
🔹 SQL
🔹 Swift
🔹 Algorithms
🔹 and more

Some contain more than 700 pages.

Check them out here.

No alternative text description for this image
Free books by Stack Overflow Docs

🎁 If you are interested in sponsoring one of the following issues and supporting my work while enabling this newsletter to be accessible to readers, check out the Sponsorship Tech World With Milan Newsletter opportunity Tech World With Milan Newsletter.


Thanks for reading Tech World With Milan Newsletter! Subscribe for free to receive new posts and support my work.

11
Share this post

Why Google Stores Billions of Lines of Code In a Single Repository

newsletter.techworld-with-milan.com
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Dr. Milan Milanović
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing