Methodology

Collab.dev uses an event-driven approach to analyze pull request (PR) activity and generate actionable insights. This methodology outlines how data is collected, processed, and visualized to track code review efficiency, approval trends, and overall project health.

We analyze the recent 100 merged PRs per repository, capturing key events such as commits, reviews, and merge activities. These events are then used to compute various metrics, including review turnaround time, approval time, and review coverage.

Why Focus on Merged PRs?

By examining only merged pull requests, we focus on the complete lifecycle of successful code contributions i.e. the changes that ultimately become part of the codebase. This approach allows us to identify patterns, bottlenecks, and efficiencies in the review process for code that matters most.

Understanding how successfully merged code moves through your review workflow provides actionable insights for optimization. For example, discovering that a significant percentage of merged PRs experience delays at specific stages (like waiting for reviewer response) can help teams refine their collaboration processes and reduce time-to-merge without compromising code quality.

The following sections detail the types of events recorded, how data is structured, and the calculations behind each metric.

Data Collection Approach

PR Events Captured

Each PR in the dataset consists of multiple events, including:

Data Captured for Analysis

Each event provides specific data points that enable further metric calculations:

Using this structured dataset, time-based metrics (e.g., approval time, review turnaround time) are derived by calculating the difference between key events. Aggregations, such as medians, trends, and distributions, provide insights into PR workflows. PRs are also grouped based on attributes like size or contributor activity to help teams understand the factors influencing review and merge efficiency.

Review Actions

In collab.dev, a review is any of the following actions by a reviewer:

This ensures all forms of review activity are captured in collaboration metrics.

Metrics

Contributor Distribution

Overview

This chart categorizes all Pull Requests (PRs) by their origin—Core Team, Community, or Bots—and calculates each group's percentage of total contributions. This helps measure how much development work comes from internal vs. external sources and the role of automation in PR creation.

Significance

Segmenting PRs by contributor type provides valuable insights into community engagement, core team workload, and automation's role in the development process. A higher proportion of community contributions suggests strong external participation and supports open-source sustainability. Understanding the balance between core team and external contributions helps distribute engineering efforts effectively. Additionally, bot-generated PRs can indicate efficiency gains through automation but may require monitoring to maintain code quality.

Calculation

Each contributor type is calculated as a percentage of total PRs:

\[ \text{Core Team PRs } \% = \frac{\text{PRs from Core Team}}{\text{Total PRs}} \times 100 \]
\[ \text{Community PRs } \% = \frac{\text{PRs from External Contributors}}{\text{Total PRs}} \times 100 \]

(Excludes PRs from Core Team and Bots)

\[ \text{Bot PRs } \% = \frac{\text{PRs from Bots}}{\text{Total PRs}} \times 100 \]

Bot Pull Requests

Overview

This chart breaks down the proportion of PRs authored by humans vs. bots, providing insights into automation levels within the repository. The Bot Pull Requests Breakdown further details which bots are contributing PRs, allowing maintainers to assess their role in the development process.

Significance

Understanding the balance between bot and human contributions helps teams assess automation efficiency, human engagement, and bot performance. A high proportion of bot-generated PRs may indicate effective CI/CD automation but could also introduce unnecessary overhead if not well-managed. If bot Pull Requests dominate, it may signal a lack of community or core team contributions. Tracking PRs by bot allows teams to monitor automation trends and ensure bots are contributing meaningfully rather than generating excess noise.

Calculation

Proportion of PRs by type:

\[ \text{Human PRs } \% = \frac{\text{PRs from Humans}}{\text{Total PRs}} \times 100 \]
\[ \text{Bot PRs } \% = \frac{\text{PRs from Bots}}{\text{Total PRs}} \times 100 \]

Bot Pull Requests breakdown:

\[ \text{PRs per Bot} = \text{Count of PRs created by each unique bot} \]

Bot Activity

Overview

This chart analyzes automated activity within a repository, showing the breakdown of different actions performed by bots. It displays the percentage of repository events performed by bots, the number of unique bots active in the repository, and a detailed breakdown of the types of activities these bots engage in (such as creating PRs, adding comments, approving reviews, etc.).

Significance

Understanding bot activity helps teams assess the impact of automation on their development workflow. By tracking the percentage of events performed by bots versus humans, teams can evaluate whether they have the right balance of automated and manual processes. The breakdown of bot activity types reveals which aspects of the development lifecycle are most automated, helping teams identify potential opportunities for additional automation or areas where existing automation may be excessive. Tracking unique bot counts also provides visibility into the diversity of automation tools integrated into the workflow.

Calculation

Bot activity metrics are calculated through several key measurements:

\[ \text{Bot Activity } \% = \left( \frac{\text{Total Bot Events}}{\text{Total Repository Events}} \right) \times 100 \]

Where:

For the event type breakdown, each category's percentage is calculated as:

\[ \text{Event Type } \% = \left( \frac{\text{Count of Specific Bot Event Type}}{\text{Total Bot Events}} \right) \times 100 \]

The analysis also tracks:

This comprehensive view of bot activity enables teams to better understand how automation is integrated into their workflow and identify potential areas for optimization.

Review Funnel

Overview

This chart provides a breakdown of the review process, showing the proportion of PRs that receive a review and how many are ultimately approved. The Review Rate highlights the percentage of PRs that received at least one review, while the Approval Rate indicates how many reviewed PRs were approved. The Review Funnel visually represents the flow of PRs from creation to review and approval, helping teams assess review coverage and approval trends.

Significance

Tracking the review funnel helps teams assess their review coverage and approval trends. A high review rate indicates strong adherence to code review practices, while a lower rate may highlight gaps in collaboration or review policies. The approval rate provides insight into how many reviewed PRs meet the team's quality standards and get approved. The review funnel further contextualizes this by showing how PRs move from creation to review and approval, helping teams identify inefficiencies or bottlenecks in the review workflow.

Calculation

The Review Rate is calculated as:

\[ \text{Review Rate} = \left( \frac{\text{Reviewed PRs}}{\text{Total PRs}} \right) \times 100 \]

Where:

The Approval Rate is calculated as:

\[ \text{Approval Rate} = \left( \frac{\text{Approved PRs}}{\text{Reviewed PRs}} \right) \times 100 \]

Where:

The Review Funnel visualizes these numbers, illustrating how PRs progress from creation to review and ultimately to approval.

Review-Merge Coverage

Overview

This chart provides a breakdown of the Review-Merge Rate, showing the proportion of PRs that were merged with a review. The Review-Merge Rate highlights the percentage of merged PRs that received at least one review, while the Review Coverage Breakdown visually compares PRs merged with vs. without a review.

Significance

Tracking review-merge coverage helps assess adherence to code review processes and maintain code quality. A higher percentage of PRs merged with reviews indicates that the team follows structured review practices, reducing the risk of merging unverified changes. Conversely, a significant number of PRs merged without review might highlight workflow gaps, potential risks, or situations where reviews are bypassed. Understanding this ratio allows teams to reinforce review policies, improve collaboration, and ensure that code changes meet quality standards before merging.

Calculation

The Review-Merge Rate is calculated as the proportion of merged PRs that had at least one recorded review action (approval or changes requested):

\[ \text{Review-Merge Rate} = \left( \frac{\text{Merged PRs with Review}}{\text{Total Merged PRs}} \right) \times 100 \]

Where:

The percentage of PRs merged without review is derived as:

\[ \text{PRs Merged Without Review} = 100 - \text{Review-Merge Rate} \]

This metric provides insights into how often code reviews are incorporated into the development workflow before merging changes into the main branch.

Review Turnaround

Overview

This chart provides a breakdown of Review Turnaround, showing how long it takes for PRs to receive their first review. Review Turnaround Time represents the median time until a PR gets its first review. PRs Reviewed Within 1 Hour highlights the percentage of PRs that received a review within the first hour. The Review Turnaround Distribution visualizes how review times are spread across different intervals.

Significance

Measuring review turnaround time provides insight into how quickly PRs receive feedback, helping teams assess the efficiency of their review process. Faster review times can indicate smooth collaboration and quick feedback loops, while longer times may highlight delays or bottlenecks. Segmenting PRs by review time distribution allows teams to analyze how many PRs are reviewed promptly versus those that experience prolonged review delays. This helps teams optimize their review practices, balance workload distribution, and ensure timely code reviews.

Calculation

The review turnaround time for each PR is calculated as the difference between the time of the first review action (approval or changes requested) and the time the review was requested. If no review request exists, PR creation time is used as the starting point:

\[ T_{\text{review}} = T_{\text{first_review}} - T_{\text{review_requested}} \]

Where:

If no review request event exists for a PR, $T_{\textrm{review_requested}}$ is replaced by $T_{\textrm{created}}$ (PR creation timestamp).

To analyze review turnaround trends, PRs are grouped into the following time intervals:

For the overall median review turnaround time, the calculation is:

\[ \tilde{T}_{\text{review}} = \text{median}(\{T_{\text{review},i} \mid i \in \text{Dataset} \}) \]

Where:

This approach ensures that review efficiency is measured consistently while accounting for cases where PRs do not have an explicit review request.

Request Approval Time

Overview

This chart measures the median approval time for pull requests, showing how long it takes for PRs to receive approval after a review is requested. It includes the Overall Median Approval Time, which represents the median time for PRs to be approved, regardless of size. It also breaks down PR Approval Time by Size, showing the median approval time for PRs grouped by the number of lines changed.

Significance

Measuring PR approval time provides insight into how quickly pull requests move from review request to approval. By segmenting approval times based on PR size, this analysis highlights patterns in review timelines, offering data on how different PR sizes progress through the approval process. These insights can help teams evaluate whether review time aligns with expectations, assess the impact of PR size on approval speed, and make informed decisions about optimizing review workflows.

Calculation

The approval time for each pull request (PR) is calculated as the difference between the time a review was requested and the time it received its first approval from the requested reviewer:

\[ T_{\text{approval}} = T_{\text{approved}} - T_{\text{review_requested}} \]

Where:

To analyze the impact of PR size on approval time, PRs are grouped based on the number of lines changed:

For each group, the median approval time is computed:

\[ \tilde{T}_{\text{approval}} = \text{median}(\{T_{\text{approval,i}} \mid i \in \text{Group} \}) \]

Where:

This calculation ensures that the approval time metric reflects central trends rather than being skewed by outliers, providing a clearer picture of how quickly PRs of different sizes get approved after reviews are requested.

Merge Time

Overview

This chart provides a cumulative breakdown of how quickly PRs get merged, showing the proportion of PRs merged within a given timeframe. Key percentiles (25%, 50%, 75%) highlight how long it takes for different portions of PRs to be merged, giving teams insight into both typical and slower merge times.

Significance

Tracking merge time helps teams understand typical merge speeds, set realistic expectations, and identify process bottlenecks. Analyzing slow merges can uncover inefficiencies in the review process, while percentile-based benchmarks highlight trends—if the 75th percentile is significantly higher than the median, it may indicate a subset of PRs experiencing major delays that need attention.

Calculation

Merge time for each PR:

\[ T_{\text{merge}} = T_{\text{merged}} - T_{\text{created}} \]

Cumulative proportion of merged PRs:

\[ \text{Cumulative Proportion} = \left( \frac{\text{PRs merged within X hours}}{\text{Total PRs}} \right) \times 100 \]

Key percentiles provide insights into PR merge speed:

Wait Time Analysis

This analysis tracks how long PRs spend waiting in different stages of the review process, helping teams identify bottlenecks and optimize workflow efficiency. The four key wait time phases are:

The following charts provide both a repository-wide view (weighted median distribution) and a detailed breakdown per PR to uncover patterns and outliers.

Wait Time Distribution

Overview

This chart shows how PR wait times are distributed across the four tracked phases. Each percentage represents a phase's weighted contribution to the total median wait time across the repository.

Significance

Understanding this distribution helps teams pinpoint the biggest sources of delay. If Merge Decision Wait dominates, it may indicate slow merging habits. A high Reviewer Response Wait suggests that PRs are waiting too long for feedback.

Calculation

Each phase's percentage is calculated as:

\[ \text{Wait Time } \% = \left( \frac{\text{Weighted Median Wait Time in Phase}}{\text{Total Weighted Median Wait Time}} \right) \times 100 \]

Where:

\[ \text{Weighted Median Wait Time} = (\text{Median Wait Time in Phase}) \times (\text{Number of PRs in Phase}) \]

And:

\[ \text{Median Wait Time in Phase} = \text{median}(\{ \text{Wait Time for each PR in the phase} \}) \]

This ensures that frequently occurring delays carry greater weight in the final distribution.

Wait Time Breakdown by PR

Overview

This chart provides a detailed breakdown of how long individual PRs spend in each wait time phase, helping teams analyze where delays occur and how PRs move through the review process. Users can filter by longest or shortest wait times to identify patterns in review efficiency, potential blockers, or inconsistencies in workflow.

Significance

Understanding PR wait times helps teams determine whether delays are caused by reviewers, authors, or process inefficiencies. While longer wait times may indicate bottlenecks, shorter wait times are not always a sign of best practices—they could highlight efficient workflows, but also cases where key review steps are being skipped. By analyzing both ends of the spectrum, teams can identify trends, ensure proper review processes are followed, and refine their collaboration practices.

Calculation

For each PR:

\[ T_{\text{total_wait}} = T_{\text{initial}} + T_{\text{reviewer}} + T_{\text{author}} + T_{\text{merge}} \]

Each phase's percentage within a single PR is calculated as:

\[ \text{Phase } \% \text{ of PR Wait Time} = \left( \frac{\text{Phase Wait Time}}{T_{\text{total_wait}}} \right) \times 100 \]

This breakdown helps teams spot inefficiencies in specific PRs and take corrective action.

Code Review Workflow

Overview

This metric tracks the movement of Pull Requests (PRs) through different stages in a repository's workflow. It visualizes how PRs progress from creation to review, approval, and merging, highlighting the proportion of PRs that follow each possible path.

Significance

Understanding PR flow helps teams analyze their code review process, identify bottlenecks, and optimize collaboration efficiency. It reveals the proportion of PRs that go directly into review vs. those requiring a request, the balance between approved PRs and those needing changes, and where delays or inefficiencies occur. Additionally, it highlights whether the review workflow is streamlined or overly complex, helping teams refine their processes.

Calculation

To calculate the proportion of PRs progressing through each stage, we define transition phases and compute percentages based on PRs moving from one stage to the next.

For each transition:

\[ \text{Transition Percentage} = \left( \frac{\text{PRs in End State}}{\text{PRs in Start State}} \right) \times 100 \]

Where:

This calculation applies across all PR flow transitions, providing insight into how PRs move through the review pipeline and where inefficiencies may occur.

Conclusion

These metrics provide visibility into a repository's collaboration patterns and review processes. The combination of time-based metrics (review turnaround, wait times) and process-based metrics (review coverage, PR flow) offers both detailed and high-level views of PR activity. By surfacing both individual PR data and repository-wide patterns, these metrics create transparency around how code changes flow through the review process.