Welcome to issue #22 of Indiscrete Musings
I write about the world of Cloud Computing and Venture Capital and will most likely fall off the path from time to time. You can expect a bi-weekly to monthly update on specific sectors with Cloud Computing or uncuffed thoughts on the somewhat opaque world that is Venture Capital. I’ll be mostly wrong and sometimes right. Views my own.
Please feel free to subscribe, forward, and share. For more random musings, follow @MrRazzi17
Over the last year, in speaking with engineering leaders, one problem came out consistently: frustration for many engineering leaders and companies alike when it comes to using GitHub Actions (GHA). The cost tends to be astronomical, the persistent wait times for execution can often (on average) be 20 minutes or so, and they (actions) are running daily, if not every second.
There were a few times last year, after speaking with some of the buyers and also new start-ups in the space, I asked myself, how is this still happening in 2025? It felt like dial-up internet in the world of Starlink. I decided to spend some time in the space, and I’m happy to share some condensed learnings and select excerpts from experts over a year. For one, as an infra VC, I am incredibly excited about the momentum of start-ups tackling this problem.
Simplicity That Doesn’t Scale
GitHub Actions (GHA) has become a major player in continuous integration and continuous deployment (CI/CD) workflows. However, despite its widespread use and integration with GitHub, it has significant limitations that frustrate developers, enterprises, and companies alike.
GitHub Actions stands out because of its integration within the GitHub ecosystem. In speaking with one of the founders in the space, he mentioned, “We use GitHub Actions because it’s integrated with GitHub, and there’s a price elasticity to pay more to not think about it.” For small teams, this simplicity is a significant draw. At the same time, as companies grow (and they naturally do), the lack of centralized tools for managing complex workflows can create inefficiencies. One engineering leader I spoke with while doing research highlighted this by saying, “The development cycle with Actions is slow—opening the YAML, writing the script—it kills momentum.”
This issue becomes more pronounced when workflows span multiple repositories or require advanced configurations. Without built-in support for workflow templates or shared configurations, teams often waste time duplicating and maintaining scripts. Additionally, GHA’s limited support for secrets management and dependency tracking adds to the operational burden, especially in security-conscious environments. Companies working on sensitive data pipelines often find themselves building custom solutions to plug the gaps in GHA, diverting valuable engineering time away from core objectives.
When companies start to use 8vCPU runners and above on GitHub, it’s known that for some customers, it can take 5-10 minutes to provision a runner – so much time lost for the developer, and once this compounds, the company! The most illustrative example is when companies want to deploy a hotfix and deploy 10 or so services concurrently for larger runners; not only is this a huge pain, but it prevents companies from deploying again when these problems persist; it’s thousands if not millions of dollars at waste.
The High Cost of Adoption
Many companies start with GitHub’s cloud runners, only to find them too expensive as usage grows. Why? The reason is centered around the fact that as a company grows, so does the code base, and the number of tests continues to increase; each new engineering inventory runs tests written by other engineers – so CI spending increases with the number of engineers (N^x, x = engineers), and it can be challenging only to run relevant tests unless they build a system like Bazel. Speaking with an engineer, “We moved to self-hosted runners because GitHub’s cloud runners were becoming very expensive,” explained one senior engineer. Even self-hosted solutions come with their challenges. Forget maintenance; you have to have a team of engineers running the platform, and as we know, the best engineering teams typically want to work on providing value vs. maintaining infrastructure. As an eng director explained, “We expected 20-30% cost savings after switching to self-hosted Kubernetes runners but achieved only 10%.” For many organizations, the costs outweigh the benefits, particularly at scale.
GHA’s pricing structure does not account for varying usage patterns, leaving teams with unexpected costs during peak development periods. Startups and smaller organizations, which often operate under tight budgets, find it particularly difficult to justify these expenses. As one engineering manager pointed out, “If you’re a small company, GHA’s costs can quickly spiral out of control, even with minimal workloads.”
Compounding this issue is the lack of predictability in GHA’s billing model. As teams grow or integrate more workflows, costs often increase in unforeseen ways. “The billing surprises were the last straw,” one company leader shared. “We had to reconsider whether GHA was sustainable for us long-term.” These concerns are driving many organizations to explore alternatives that offer more transparent and predictable pricing.
The Observability Problem
One of the most common criticisms of GitHub Actions is its lack of observability features. As a senior VP remarked, “There’s no way of understanding trends, where things are failing, or identifying flaky tests.” Without built-in analytics, developers are forced to rely on third-party tools like Datadog to monitor their workflows, which adds extra cost and complexity. This lack of visibility becomes a significant bottleneck for teams managing intricate CI/CD pipelines.
The absence of centralized dashboards and historical data also hampers long-term optimization efforts. Teams are unable to identify patterns in build failures or performance degradation, leading to reactive rather than proactive problem-solving. “Observability is critical,” noted a VP of engineering. “Without it, you’re essentially flying blind, especially when managing large-scale CI/CD workflows.”
Debugging errors within GitHub Actions can be time-consuming and labor-intensive. Many teams find themselves sifting through logs manually, with little support for tracing errors back to their root causes. As one developer explained, “It’s frustrating to spend hours trying to figure out why a workflow failed, especially when we’re working on tight deadlines.” It’s simple: build better tooling to address this yourself (e.g., Bazel) or be at the whims of the unknown in GHA.
The Engineering Problem
While self-hosting GitHub Actions runners can be an option for many organizations, it is fraught with hidden costs and complexities that often diminish the anticipated savings. One major issue is dealing with highly variable workloads. During peak working hours, CI/CD pipelines can experience a surge in activity, only to see little to no usage at night. This inconsistency makes inelastic compute resources highly inefficient, while on-demand compute—though more flexible—tends to be significantly more expensive.
Another challenge is accurately attributing costs. CI workloads are unique in their resource usage patterns: the beginning and end of jobs often involve heavy I/O operations, while the intermediate steps are predominantly CPU-bound. This uneven distribution results in secondary costs, such as network data transfer fees, becoming a substantial part of the overall expenditure. “We underestimated how much our network costs would add up,” admitted one engineering lead, reflecting on their transition to self-hosted runners.
Optimizing performance in CI/CD environments adds another layer of complexity. Achieving fast execution times requires careful balancing of CPU, memory, disk, network, and cost. However, improving one aspect often comes at the expense of another. For instance, increasing CPU availability might reduce delays but significantly drive up costs. “There’s always a tradeoff,” noted an engineering manager. “You can lower costs or reduce delays, but rarely both.”
For larger teams, these problems are magnified. Organizations running tens or even hundreds of thousands of jobs daily must grapple with the challenge of scaling their infrastructure effectively. Each code commit can trigger hundreds of parallel jobs, creating highly spiky workloads. Without proper planning, these fluctuations can lead to inefficiencies and escalating costs, making self-hosting a daunting proposition for growing companies. Tied to this is the euphoria around AI Code-Gen (e.g., Cursor, Co-pilot, Codeium); all of these tools make running and writing tests easier. As a result, people are writing 5x more tests, amplifying both the engineering overload and the costs touched on earlier.
Queue: Start-ups
To address these issues, startups like Blacksmith, WarpBuild, Namespace, and Dime.run, BuildJet, and Depot have emerged to fill this dire need. These platforms provide better observability, improved caching, and flexible runner management. One founder highlighted the value of these tools: “The caching features provided by these services are useful, but they feel constrained by GitHub Actions’ customization limitations.”
Furthermore, startups are introducing advanced debugging capabilities that GHA currently lacks. For instance, some tools enable developers to freeze builds mid-execution for closer inspection, a feature invaluable for diagnosing complex issues. “Debugging CI/CD pipelines shouldn’t feel like guesswork,” said an engineering leader. “These new tools bring much-needed transparency to the process.”
Startups are also innovating in areas like CI/CD pipeline health monitoring and performance optimization. By providing actionable insights into bottlenecks, these tools help teams reduce build times and improve reliability. “With x vendor, we identified and fixed several flaky tests within days,” shared a technical lead. “It saved us weeks of trial and error.”
The Shift Toward Abstraction and Low-Code Solutions
Another trend reshaping CI/CD workflows is the rise of abstraction and low-code solutions. “Companies like Proatron and Human Tech are abstracting away infrastructure complexities from developers,” said a platform engineering director. These tools reduce the need for developers to manually write and debug YAML scripts, making workflows faster and more efficient. Unfortunately, GitHub Actions has been slow to adopt these trends, leaving room for competitors to fill the void.
Low-code CI/CD platforms also offer significant time savings by automating repetitive tasks and providing pre-built templates for common workflows. For organizations with limited DevOps expertise, these solutions lower the barrier to entry, enabling teams to focus on building features rather than maintaining infrastructure. “The less time my team spends on YAML, the more time they can spend delivering value to our customers,” explained a product manager.
Additionally, low-code platforms are addressing gaps in scalability by offering pre-configured integrations with cloud providers and other developer tools. These integrations make it easier for teams to deploy complex workflows without needing deep expertise in infrastructure management. “We deployed our entire pipeline to the cloud in under an hour using a low-code platform,” said one satisfied user.
GitHub Actions: Falling Short of Its Potential
As I reflect on the past year of conversations and observations in the CI/CD space, one thing is clear: the frustration with GitHub Actions is both widespread and justified. From ballooning costs to inefficiencies in execution and the glaring lack of observability, these shortcomings hinder its potential as a truly transformative tool. Yet, in this challenge lies opportunity. Startups are rising to meet the demand for better, faster, and more efficient solutions. They are not only bridging the gaps but also redefining what developers should expect from CI/CD platforms. I am excited to be on the hunt for one company tackling these challenges head-on.
Big thanks to Aditya and Surya for the edits, pushback, and overall thoughts.