Fergus Henderson - Software Engineering at Google - Summary

Click to see the full document. This is my summary, quoting and paraphrasing the content.

Introduction

The aim is to catalogue and briefly describe Google’s key software engineering practices.

Software Development

The Source Repository

Most of the code is stored in a single unified source-code repository and is accessible to all software engineers. Change rate of 40 thousand commits per work day.

Write access to the repository is controlled: only the listed owners of each sub-tree of the repository can approve changes to that sub-tree.

Culturally, engineers are encouraged to fix anything that they see is broken and know how to fix, regardless of project boundaries.

Almost all development occurs at the “head” of the repository.

Automated systems run tests frequently, often after every change to any file in the transitive dependencies of the test.

Most larger teams also have a “build cop” who is responsible for ensuring that the tests continue to pass at head, by working with the authors of the offending changes to quickly fix any problems or to roll back the offending change.

The Build System

There is a distributed build system that provides standard commands for building and testing software that work across the whole repository.

Build specifications are comprised of declarations called “build rules” that each specify high-level concepts like “here is a C++ library with these source files which depends on these other libraries”. Individual build steps should depend only on their declared inputs and should be deterministic.

The work of each build is typically distributed across hundreds or even thousands of machines. Build results are cached “in the cloud”. There is never any need to run the equivalent of “make clean”.

Presubmit checks automatically run when initiating a code review and / or preparing to commit a change to the repository.

Code Review

There are web-based code review tools, including automatically suggesting reviewer(s) for a given change.

All changes to the main source code repository must be reviewed by at least one other engineer.

Code review discussions for each project are automatically copied to a mailing list designated by the project maintainers.

There is an “experimental” section of the repository where the normal code review requirements are not enforced.

Engineers are encouraged to keep each individual change small.

Testing

Unit Testing is strongly encouraged and widely practiced. Integration testing and regression testing are also widely practiced. So is load testing prior to deployment.

Bug Tracking

There is a system for tracking issues (bugs, feature requests, customer issues and processes such as releases or clean-up efforts). Issues are categorized into hierarchical components and each component can have a default assignee and default email list to CC.

Programming Languages

There are five officially-approved programming languages: C++, Java, Python, Go and JavaScript.

There are also style guides for each language, to ensure that code all across the company is written with similar style, layout, naming conventions, etc.

Interoperation between these different programming languages is done mainly using Protocol Buffers.

Debugging and Profiling Tools

All servers are linked with libraries that provide a number of tools for debugging running servers.

Release Engineering

For most teams the release engineering work is done by regular software engineers.

Releases are done frequently for most software; weekly or fortnightly releases are a common goal and some teams even release daily. This is made possible by automating most of the normal release engineering tasks.

A release typically starts by syncing to the change number of the latest “green” build (i.e. the last change for which all the automatic tests passed) and building a candidate.

Then it is typically loaded onto a “staging” server for further integration testing by small set of users (sometimes just the development team).

The next step is usually to roll out to one or more “canary” servers that are processing a subset of the live production traffic.

Finally the release can be gradually rolled out to all servers in all data centers.

Launch Approval

The launch of any user-visible change or significant design change requires approvals from a number of people outside of the core engineering team that implements the change.

There is an internal launch approval tool that is used to track the required reviews and approvals and ensure compliance with the defined launch processes for each product.

Post-Mortems

Whenever there is a significant outage of any of our production systems the people involved are required to write a post-mortem document. This document describes the incident, including title, summary, impact, timeline, root cause(s), what worked or not and action items. The focus is on the problems and how to avoid them in future, not on the people or apportioning blame.

Frequent Rewrites

Most software gets rewritten every few years. Software that is a few years old was designed around an older set of requirements and is typically not designed in a way that is optimal for current requirements.

Project Management

20% Time

Engineers are permitted to spend up to 20% of their time working on any project of their choice, without needing approval from their manager or anyone else.

Objectives and Key Results (OKRs)

Individuals and teams are required to explicitly document their goals and to assess their progress towards these goals. Teams set quarterly and annual objectives, with measurable key results that show progress towards these objectives.

Project Approval

There is no well-defined process for project approval or cancellation. Managers at every level are responsible and accountable for what projects their teams work on.

Corporate Reorganizations

Occasionally an executive decision is made to cancel a large project and then the engineers who had been working on that project may have to find new projects on new teams.

People Management

Roles

Engineering Manager: responsible for selecting Tech Leads and for the performance of their teams.

Software Engineer (SWE): Individual Contributors (ICs) and / or Tech Leads (TLs). ICs design, implement, test and release features. TLs are responsible for technical decisions about the project.

Research Scientist: exceptional research ability evidenced by a great publication record and ability to write code.

Site Reliability Engineer (SRE).

Product Manager: work with Software Engineers to ensure that the right features get implemented.

Program Manager / Technical Program Manager: manage projects, processes or operations.

Facilities

All buildings are fun and have features like slides, ball pits, games rooms, free cafes, micro-kitchens, gyms.

Training

There is a mandatory initial training course and there are a variety of online or in-person training courses.

In addition, each new employee is usually appointed an official “Mentor” and a separate “Buddy” to help get them up to speed. Unofficial mentoring also occurs via regular meetings with their manager, team meetings, code reviews, design reviews and informal processes.

Transfers

Transfers between different parts of the company are encouraged, starting from 12 months in a position.

Performance

Employees get annual performance bonuses and equity awards based on their performance.

There is a very careful and detailed promotion process that involves nomination by self or manager, self-review, peer reviews, manager appraisals, promotion committees and potential appeal committee.

Poor performance is handled with manager feedback and if necessary with performance improvement plans that involve setting very explicit concrete performance targets and assessing progress towards those targets.

Manager performance is assessed with feedback surveys filled by every report twice a year.