How to review massive refactoring pull requests

Occasionally your team may need to make a large scale refactor which touches thousands of files. The changes are mostly boilerplate changes made by refactoring tools, but there is always the risk of manual error. If you are asked to review a pull request like this, how can you approach it?

There is no silver bullet to solve this problem. It's not practical to manually review such a large set of changes, so there will always be some risk of unreviewed code causing problems, but I do have some advice for minimising the risk.

  • Consider the business value of your review. It is quick and easy to make changes to thousands of files using automated tools, which means they are cheap to make. This means that there is a relatively low threshold for how valuable the change needs to be in order to make it worthwhile for the business. However, reviewing the changes can't be automated, so consider how much value the review will actually add.
  • Having said that, while code review can't be automated, some basic verification can be. Make sure you lint the code and run all your automated tests. (Hopefully this happens automatically for every PR anyway, but if there is some reason this doesn't happen then make sure you manually run them.)
  • Before you start reviewing the code, get as much context as possible about the changes. For example:
    • Speak to the author of the changes and ask for a high level explanation of what they have done - and why.
    • Ask the author if there are any changes they're worried about. Was there anything which went wrong, or areas where the automated tooling let them down and they had to manually intervene?
    • Read all the commit messages. It is not possible to thoroughly read thousands of files, but it is reasonable to read dozens of commit messages.
  • Look for especially significant files where you can focus your review. For example:
    • Filter to file extensions which are uncommon within the codebase - .csproj, .json, etc. These are often very significant files which affect the entire system, so are high risk to change.
    • Use your knowledge of the system to identify which areas of the system are of particular importance. In many systems, authentication & authorisation is a good example of this. Files which relate to these areas should be given more thorough review.
    • Are there areas of the system which are known to be brittle?
    • If your view of the Pull Request shows the number of changes in a file, focus on files which contain many changed lines. For example, imagine someone has rearranged the namespaces in a project. Most files will contain a single changed line. If you see a file containing 20 changed lines, you know something different has happened in that file, so it's a good file to review properly.
    • Changes that affect interaction with other components of a system - e.g. the database, 3rd party integrations, imports & exports, etc.
  • For the remaining files, do spot checks on a wide variety of files. You can spot check at multiple levels. For example, if there are too many projects to review, pick some to check. Within each project being checked, pick some folders to check. Within each of them, pick some files to check.
  • Review the list of changed files - what's been added, renamed, moved, deleted, edited? Is there anything unexpected that you can see?
  • Keep an eye open for things which are different from what's next to them. For example, if a class has been renamed you will see lots of changes on individual lines. If you see a block of lines which have all changed, pay attention to that.
  • Work from the outside in. Start with the highest level changes. What projects/folders have changed? What sort of changes are you expecting in each? Move in towards the detail or specific files. Within each project, what are the groups of changes? What types of files have been modified? Keep moving in until you get to individual files.

There is no easy way to review a large boilerplate Pull Request, but these tips provide a framework for approaching such a task. Ultimately, you will be reliant on your automated tooling to avoid causing problems with a large boilerplate change.