What percentage of code is copied and pasted?
Here's a fun fact: in an analysis of 58,996 programmers who coded between August 1, 2022 and August 31, 2022, our team found that 13.2% of code was copied and pasted. Relative to previous months, this percentage had little variation (July: 13.3%, June: 13.7%, May: 13.6%).
We collected this data using our Code Time plugin for code editors such as Visual Studio Code and IntelliJ. In our analysis, a copy/paste action was defined as any event in which more than 8 characters were added to a file at once; this allowed us to distinguish multi-character additions from single-character additions and indentations. Auto-completions and replacements were also tracked separately in our analysis, improving the accuracy of our copy/paste algorithm. To get the percentage of code additions that were copy/paste, we calculated a ratio of multi-character additions to total characters additions.
The team at Stack Overflow reported that "depending on who you ask, as little as 5-10% or as much as much as 7-23% of code is cloned from somewhere else." Our analysis says 13.2%.
If you're wondering why any of this matters, it may very well not. But it makes you wonder: how much code is being re-used as boilerplate across projects? Moved between files? Copied from Stack Overflow? We may never know.
All this goes without saying that copying code is neither a good thing nor a bad thing. If anything, it's an indicator of how we build apps today, integrating code from third-party libraries, online documentation, and other sources.
It will be interesting to see how this metric changes over time.
About Software.com
Our mission is to transform software development by empowering engineering teams with the world’s most powerful data platform. We collect data across the stack to help engineering teams ship software faster with better visibility and automation.
For more information, visit www.software.com or follow @software_hq on Twitter.