System Architecture (part 1) - Strategies / by Andrew Wyllie

Overview

Technology startups face a number of challenges as they grow, many of which should be expected and in some cases even welcomed as they are indicators that the business is succeeding.  Sometimes the challenges are easy to solve e.g., hire more people, buy more capacity and faster servers. Unfortunately, often there are other challenges that are not as easy to resolve like inadequate processes, flaws in architecture or not having the required expertise in house to resolve issues.  These types of challenges can be a huge burden if they are not addressed early as work arounds (and work arounds of the work arounds) usually cause confusion among the technical staff as well as expose performance issues, reduce scalability and potentially expose security issues. The longer these problems are swept under the rug, the more costly it will be to fix them in the future.  That said, the main goal of a systems architecture team is to analyze, design and build systems that are scalable, flexible, secure, compliant and forward looking to handle future needs.

One of the keys to good architecture is understanding the problem before rushing to solutions. Don’t let the latest technology or concepts guide your decisions until you have defined the problem you are trying to solve and fully understand, at least at a high level, what the benefits and costs of various technologies will be. It may sound like more work but having a clear understanding of the various technology solutions not only means that you will be able to make good decisions with your current requirements, it also leads to the ability to quickly assess new solutions in the future. I know, this seems like a very obvious approach but I have seen evidence of rushing to the latest and greatest “cool tech” far too many times. How many times have you interviewed with or started at a new company and found that the responses you get to questions like “So how did you decide to use database XXX for this project?” are something along the lines of “Well, it was the cool thing at the time”, “Everyone else was doing it” or my personal favorite, the person that made that decision has left. I’m not saying that you should not invest time in learning new tech but spend time learning new tech to understand how it will fix a problem you have as opposed to learning new tech to create a problem you don’t have.

This document will go through the system design process and illustrate some of the ways fast, redundant, highly scalable systems can be built that handle current and future needs of the company.

Economics of Software Development

Obviously, the number one goal for building software is to make money.  To that end, we need to understand where the costs lie in designing and building technology solutions while staying within the prescribed budget and resources provided by the company.  Every decision we make has to be in line with the overall organization's goals. Having clearly stated goals for current products and ideas about future directions help create testable scenarios for design and new feature decisions.  For example, if a future goal is having fifty very large enterprise customers, you will make very different decisions early on compared to a goal of having 50 million small customers. If a future goal is to expand into another country, designers and architects can ensure that the solutions being developed can scale across different regions, handle internationalization issues (compliance issues, languages, currencies) and business logic processing that can handle different rules based on the country of origin.  None of these things need to be built today but simple design decisions can be made that will make future development much easier. The hard part is knowing what questions to ask.

The study Economics is all about balancing and optimizing. When applied to system architecture problems, this means finding ways to optimize solutions that meet or exceed requirements using available resources, whether that means money, capabilities of technical staff, existing products/solutions that can be leveraged and/or access to physical resources. None of these things are unlimited, finding the correct balance between them is one of the keys to good design.

Requirements and Goals

Similar to a traditional architect that designs buildings, the process of software architecture starts with the understanding the requirements and goals of the business.  Without this understanding, we have no way to make design decisions nor measure whether the implemented design meets the needs of the company.  In essence, every design decision needs to be justified by current and expected business drivers. In many cases the business drivers are as of yet unknown, especially in young companies that are just starting up. The founders/c-suite are more interested in talking about selling their idea and how much money they are going to make rather than getting in to nitty-gritty details about what is actually required, which is understandable. Unfortunately, the more unknowns there are on this list, the less control you have over what direction to take (and as the systems architect, you are in the unenviable position of getting to tell the company brass that while AWS is really awesome, it’s also not free). This is not the end of the world necessarily from a technical design perspective but a strategy should be adopted to build a foundation that allows easy pivoting and to push off expensive decisions until a better understanding of requirements and goals is established. The idea is to avoid rewriting the whole thing from scratch every six months and then claiming to have learned from your mistakes unless you are smart enough to realize that the mistake you made was not spending enough time gathering and prioritizing your requirements and goals.

Below is a list of common requirements and considerations, there may be others. The idea is to spend a little bit of time getting a high level view of what will be required of your architecture and hopefully avoid future surprises. Try to establish “must haves” (requirements) vs “nice to haves” (goals) so that you a good framework for making decisions and trade offs. These apply mostly to cloud architectures although many of then can be adapted to building standalone software products as well.

Business Driver Description Questions
Cost Infrastructure, human resources
  • how much infrastructure is required
  • what resources will be required to build/test/maintain it
Risk Changes to code and infrastructure
  • deployments create risk - what is the deployment process and how can we mitigate risk?
  • database changes/data migrations
  • security issues
SLA Service Level agreements be offered to a customer
  • what level of service do we guarantee
  • do our service providers offer the same or better
Security Keeping the system safe from external and internal threats
  • What Personally Identifiable Information(PII) do we need for a user? How do we minimize and secure this?
  • what type of encryption do we need
  • how do we identify and track attack vectors
  • network and firewall configurations
  • structural testing/whitebox/fuzzing
  • how do we track bugs and security issues in software packages we are using?
Data a data driven company uses internal AND external data to make business decisions
  • how do we identify data to collect
  • data integrity and versioning
  • durabilty requirements
  • security requirements
  • data advocacy
Scalability the ability to provide enough capacity to our customers
  • 'Just In Time' delivery - provide just enough resources without maintaining unused capacity
Optimization refactoring code and design for better performance
  • how easy is it to target and optimize specific parts of the system
Speed (of system) how quickly we respond to customer interactions how does the architecture handle:
  • auto scaling
  • caching
  • infrastructure latency (Network, REST, hardware, software)
  • database connections, query speed
  • data transfer
Speed (of business) the speed of development, how quickly can new features be released, how long do bug fixes take
  • continuous integration cadence
  • how long does QA take - unit/automated testing?
Robustness What are out requirements
  • Recovery Time Objective (RTO) - amount of time to recover
  • Recovery Point Objective (RPO) - maximum amount of data loss that can be tolerated
Flexibility The ability to pivot the business and reuse existing pieces
  • does the architecture support new products without major changes
  • does the architecture support all of the needs of the organization (web app, erp, data teams, etc)
Service Offerings The products/features we support
  • does the architecture adequately support the products that are being produced
Quality Unit test coverage The use and reusabilty of the code base
  • test driven development
  • how easy is it for new developers to add new code (speed of development)
  • how easy it is for new developers/QA to get a new dev/test environment on-line
Resources human and tech resources that are available
  • do we have the people we need to support the architecture
  • do we have the technology available
  • are there new tech solutions on the horizon that we need to be aware of

Other Considerations

Once you have an overall idea of what the requirements are, there are a few other considerations worth thinking about based on how you plan to run internal engineering processes and how the engineering team will interface with the rest of the company. The idea here is to make sure that the engineering processes you choose to implement will be supported by your architecture choices. A company wide Agile sprint schedule may not match up well with micro services architectures, a CI/CD system may not really be required if your are building software packages that will be downloaded by end users and installed on their computers.

Data Driven Development

While data was mentioned in the table above, it deserves a special call out as the collection, analysis and distribution of data needs special architectural attention.  Properly designing for data flow cannot just be bolted on as an afterthought.   Having a full understanding of what data will needed to be captured and how that data will flow to the correct organizational units is a critical part of the design process.

The Storage Requirements column below helps define some aspects of data integrity. Dated means that we need to make sure that every data entry has a timestamp. Versioned means that we need to track all of the changes to this data i.e., if someone changes an order, we want to still have the data from the original order on hand, we may also want to track who changed the order and we obviously would want to have a timestamp on that change. Changes to other items, like changing an address may also be versioned if you are concerned about people breaking into an account and needing to revert the account to an earlier state without having to go through the process of getting this information from a backup which could be complicated if we are only trying to restore information from one part of the database.

Data Type Users Description Accessibility Storage Requirements
sales and order related
  • sales
  • marketing
  • business intelligence
  • management
sales data drive organizational decisions
  • highly available (ERP)
  • very secure
  • keep forever
  • dated
  • versioned
customer's own data
  • customers
  • data science
Customer info and data that they keep on our systems
  • Highly available for current data
  • older data may be able to be archived after some time
  • dated
  • potentially versioned
customer/user data
  • sales/marketing
Data about a user or customer:
  • demographic data
  • usage data
  • billing account
  • Highly avaialable
  • dated
  • versioned
Logging
  • devops
  • engineering
used to monitor and diagnose system issues
  • highly available for current data
  • older data can be archived - but available for audits
  • some data can be summarized and the raw data discarded
  • dated
  • versioned

The table above is not all inclusive, you’ll need to think carefully about your data and how it will be used to make sure you are capturing everything you need as well as insuring that the data entries that are being captured are accurate.

Security Driven Development

This is something I’ve been thinking about for a while after working for a healthcare startup for a year. It’s well known that security should never be a secondary concern, especially for application attached to the internet. In some cases though, the security regulations will have a large impact on your architecture and what you are able to do. Making sure that all of the relevant security and compliance standards for your industry are being met/exceeded can be a very time consuming process. It’s very important to have a grasp on what will be required before selecting technology products as the products you select must also adhere to the same standards and you could be liable for breaches on their systems/products if you cannot show that you researched them thoroughly and that you used their systems according to the terms in your license agreements. It would also probably make a lot of sense to make you architecture simple to audit, test and monitor. Just as we can use data to help identify future development projects, we have to be aware that security and compliance will drive a lot of the development effort. While security driven development can be time consuming, building security and compliance checkpoints into the development process will help mitigate potential problems and will also keep these issues fresh in the minds of the engineering team as they design and build new features.

A few things to consider when designing for security and compliance standards:

  • encryption strategies - lots of trade-offs here depending on how secure you need to make the system

  • building a system that is easy to audit can save a lot of time

  • cost of audits to demonstrate compliance can be prohibitive - the cost of the fines that may be incurred for not being compliant are even worse

  • privacy issues - how much Personally Identifiable Information (PII) do you really need? How are you protecting it?

There are lots of other security, privacy and compliance issues that may come up but there is a very good chance that if you are building an application that will be used by other enterprises, that there will be an expectation to have compliance information ready as well as proof of an audit.

Methodology

The architecture of system is tightly coupled with the methodology used to design and build it.  While this may not necessarily be intuitive at first, the process of building a technology solution is just as important as the solution itself.  The methodology defines the values, principles and practices that will dictate the design and construction of the system.  A well defined methodology is not a set of rules but rather a set of guidelines that the entire team has input on that keep the development process focused. This is a massive topic with lots of differing opinions (as are architecture choices) but it is important to plan out our methodology and make sure it jives with your team and your architecture.

I’ve seen and used a number of different methodologies over the years but what I have found is that a combination of a number of different ideas works the best for me. The first step is to understand the desires and requirements of the team. Let’s face it, software engineers are a weird bunch (in good ways). Some engineers prefer a very rigid environment, some prefer to work on their own - the darker the corner the better, some prefer to work on teams and enjoy teaching other members of the team. Having all of these different personalty types abide by the same set of rules can be a daunting task. My approach has been to layout the absolute minimum process that meets all of the goals of the organization and that mesh well with our architecture choices. Then, as a team, we evaluate our process and make small tweaks to try to improve it. I prefer small teams that work on their own schedules as opposed to a schedule that is rigid and is the same for the entire engineering department. We rely on tons of unit tests which all have to pass before something can move to production. We also rely on consistent APIs and SDKs so that the entire environment is consistent and stable (for the most part). I also trust my engineers and allow them to move work into production at a very rapid clip after going through the proper review process. This can mean multiple releases a day, releases on Fridays, weekends - whatever they want. The trust I give them gives them more confidence in what they are doing, having very short small release cycles allows us to back something out quickly if we need to before too much damage is done. Given all that, I need an architecture that can move quickly, is very modular and adaptable, is easy to release to and hopefully easy to manage. Your choice of methodology and comfort with your trust levels with developers is no doubt different from mine. This will affect your architecture choices.

System Architecture (part 2) - Models

Thanks for reading! Part 2 of this article will talk about different architectures, how they can be implemented and pros and cons of each one.