The Pitfalls of Copy-Paste Coding

There is a saying:

The best code is the code that is never written; the second best, is the code that someone else has written.

The idea is that if you have written the code, you now have to maintain it going forward. However, if it is someone else’s code, you may get lucky, and it may be maintained by them. The biggest advantage of using someone else’s code is that you don’t need to write it, saving significant time in getting your software out the door sooner.

However, when you copy and paste someone else’s code, and it is not imported via a dependency manager, you are now taking on the responsibility of maintaining it and ensuring it meets the standards expected for the rest of the codebase.

Impact on Project Timelines and Costs

For non-technical stakeholders, understanding the broader impact of copy-paste coding is essential:

  1. Project Delays: Unvetted code can introduce bugs and security issues, leading to unexpected delays as the development team works to resolve them.
  2. Increased Costs: Fixing issues caused by poorly integrated code can be costly, both in terms of time and resources. Additionally, legal liabilities can arise if the code’s licensing terms are not properly adhered to.
  3. Reputation and Trust: Security breaches or failures due to copied code can damage the organisation’s reputation and erode trust with clients and customers.

Copy-paste coding may seem like a great productivity boost, but it comes with significant risks. By understanding these risks and implementing best practices, organisations can ensure their codebase remains robust, secure, and compliant.

Risk Identification and Management

Security of the code

The The code that is copied may have any number of security issues, either intentionally placed by a malicious developer or left behind because the developer who copied it did not review it before using. Examples I have seen include:

  • During the initialisation of the database (and identity management), the copied code seeded a Super Admin user account. The intention was that whoever was copying the code would have an account right away, but it was not identified, and years later, the Super Admin account was active with the original well-known credentials.
  • When emails were sent out, they BCC’d the original developer’s email account. The intention was that the email should have been updated and only used for debugging purposes. Thankfully, that path of the code was not possible to be executed at the time of discovery, but it was unknown if that code path had ever been executed.
  • Events and logs were being sent to the original developer’s IP address. Appearing to be a hard-coded “fallback value” in the class, there was a hard-coded public IP address that all events and logs handled by that class were sent to. At the time, it was unknown how long the logs had been emitting.

To mitigate these issues, a thorough code review of the incoming code is mandatory. Ideally, this should be done by someone other than the person responsible for bringing the code into the codebase.

Legality of the Code

Something that is not often covered in developer training and is often overlooked is the license that comes with the code. In general, if the code has no specific license, you cannot copy it. Then, there are sites that have code licensed under specific terms that may be incompatible with your commercial intentions, or they may be quite friendly (e.g., Apache, MIT). Prominent examples I have seen include:

  • Including GPL-licensed1 code in and throughout code that was never intended to be shared, with the GPL code having been modified.
  • Including code copied verbatim from a blog post, with no express license for the code, just the generic “Copyright <Author> All Rights Reserved” in the footer of all web pages.

Mitigation can be as simple as checking your organisation’s policy to see what licenses may be approved, and when in doubt, seeking specific legal advice.

Style of the Code

The overall style may not seem like much of an issue, but it is a contributing factor to the maintainability of the codebase. If you were to grab three different files and get the sense that they were written by three different developers, you will have issues in the future. While IntelliText and other tools are great aids, these examples may make it more difficult to integrate with your code:

  • Inconsistent naming conventions or unclear naming conventions for variables, methods, and classes.
  • Magic numbers and strings, such as using hard-coded values (magic numbers/strings) instead of constants or configuration settings.
  • Poor code comments and documentation, with insufficient comments explaining complex logic or decisions.

Style can be argued about until the cows come home. However, having automatic formatters and linters that highlight when something is not to specific styles is of great assistance. Further to the review for security, reviewing the code to understand it and inserting additional comments or documentation to aid in re-understanding it in the future is beneficial.

Quality of the Code

Quality of the code can be difficult to identify when you are in a rush to get something working. However, poor-quality code can come back to bite you. Simple examples may include constructors of classes that do not use dependency injection or public methods of the class having nullable strings but not checking for whitespace. Some obvious issues I have seen in many codebases that copy and paste code include:

  • No use of dependency injection in modern dotnet.
  • Methods that have nullable string parameters but check only for null, not whitespace.
  • Limited to no error checking.
  • Magic strings with no comments to explain where they came from, what they mean, or how they actually work.
  • Comments that just state what the code does, not why or its expected usage.
  • Testing of the code is incredibly difficult.

One mitigation is to set up the inbound code behind an interface. You can recreate all of the public methods that you are going to use in the interface. This will also allow for easier migration to self-authored/owned code in the future. This gives you the opportunity to reshape and refactor the incoming code into the quality that you expect in your codebase.

Maintainability of the Code

The maintainability of the code may be one of the hardest aspects to quickly identify. Poor error handling, code duplication, and improper use of asynchronous programming are common issues. Examples include:

  • Poor error handling or using generic exception handling.
  • Code duplication rather than abstracting common logic into reusable methods or classes. This may also include methods or classes with names like “Original_DO_NOT_USE.”
  • Poor use of asynchronous programming, such as not using async and await correctly. This goes in both directions, such as purely synchronous code in a method called “Async.”

Building an interface for the imported code is a great way to manage this while giving you a chance to reshape and refactor the code into the quality that you expect in your codebase.

References

  1. Frequently Asked Questions about the GNU Licenses – GNU Project – Free Software Foundation ↩︎