6.9 KiB
System Design Fundamentals
Intro to Interviews
- Asking Questions
- What features are involved, the stack, whats bloat, where do the troubles if any lie?
- What sort of scaling is to be accounted for ?
Features
- The feature set
-
Make note of the specs carefully. Feel free to annotate a bit
- Define APIs and Endpoints.
-
Knowing what routes will be hit by the public and what sort of auth is being used is essential
- Availability
-
What to do if a host goes down, what to do if the entire data centre goes down. If already exists then enquire about the current plans and also ascertain the amount of availability cared about
- Latency Performance
-
Public facing services require snappy responses. This may be kept track of with monitoring tools.
- Scalability
- Durability
-
At times data can be stored in a db securely without loss and compromises what sort of dbs am I working with.
- Class Diagrams
-
OOP diagrams basically, they may ask to design some parking lot or elevator systems
- Security and Privacy
-
TLDR: When users and auth are required these practices will become sacrosanct
- Cost Effective
-
Lean systems are not only cost effective but easier to maintain. KISS. Check Pros and Cons for current and alt flows
Concepts
Vert vs Horizontal Scaling
- Vertical Scaling is adding more juice to the host to handle the
extra load
- You can't go beyond a certain point
- Gets expensive
- All eggs dilemma
- Horizontal Scaling is adding distributed hosts to share the load
- This is a technically more challenging problem since it all needs to sync and have good routing
- Typical distributed systems others
CAP Theorem (Brewer)
CAP stands for:
- Consistency
- Availability
- Partition Tolerance ( need it cuz cant have packets being lost )
- Traditional DBs choose Consistency over Availability
- NoSQL preferes if choosen the opposite
ACID vs BASE
ref:
- ACID - (RDBMS) Atomic, Consistent, Isolated, and Durable
- BASE - (NoSQL) Basic Availability, Soft-State, Eventual Consistency
Parting or Sharding
refs:
- https://neo4j.com/blog/acid-vs-base-consistency-models-explained/
- https://medium.com/geekculture/acid-vs-base-in-databases-1bcad774da26
- https://phoenixnap.com/kb/acid-vs-base
- When there are trillions or more records it is impossible to store all of them in one node. This is the procedure which handles solving that
- Sharding - Every node is resp is for some of the records. HASHING is used a lot (READ)
Locking (DBs); Optimistic vs Pessimistic
refs:
- https://foo.bar
- Optimistic Locking - When you are about to commit a transaction you check if no other transaction updated the specific "record" you are working on.
- Pessimistic Locking - Lock it all and then commit the transaction
- NOTE: Both have Pros and Cons. Learn when to use which
Strong Consistency vs Eventual Consistency
- SConst Reads will see the latest writes (RDBMS)
- EConst Reads some writes but eventually sees the latest write (NoSQL)
RDBMS vs NoSQL
- NoSQL getting really rad nowadays but dont meh.
Types of NoSQL
- key-value
- wide column
- document based
- graph based
--- Note current conf
---Caching -------
- Every node does its own caching; not shared
- Suited cache?? this shares cache betn nods
Points of Concern:
- Cache is mem so keep small
- cannot be accepted as source of truth
Data Centers/Racks/Hosts
Key points of interest may be:
- Latency between hosts or racks
- What are the contigency plans for when racks or even DCs go down!
RAM/CPU/HDD/Internet Bandwidth
- Everything must be design to comply well withing these constraints.
- throughput latency improvement
Random and/or Sequential Read/Write
refs:
- https://needaref.now
- Use sequential. period.
HTTP vs HTTP2 vs WebSockets
- presumably websockets trump all since they are bidirectional etc.
- HTTP2 tries to cover deficiencies of HTTP like allowing for more than one request per connection (limit?)
TCP/IP Stack
Have understood basics
the various (iirc 7?) layers etc might be good to give it a gander once more.
IPV4 vs IPV6
- runnin out of ipv4 addys
- ipv4 = 32 bits vs ipv6 = 128bits (remember go-discord-irc conundrum)
- Some power systems
TCP vs UDP
- UDP is super fast, dont care some packet loss. Good for audio/video streams
- TCP is useful to ensure the data integrity was maintained during transit and is inherently a bit slow.
DNS Lookup
- KNOWN we run DNS servers
- Understand DNS cache poisoning (not a threat now)
- Using PowerDNS with mysql for db replication
- I still feel like the
tcpdump
output scares me. So i need to understand for eg.ARP Poisoning
DynDNS
explore
HTTPS and TLS
Note: Can be elaborated upon
- People who use
http
should be sentenced to staying away from computers. - I have understood the fundamentals of the TLS handshake but Cryptography is a complex subject and while i understand the Diffie-Hellman key exchange and stream and block ciphers in general TLS in its most complicated forms is a bit of a mystery. (REFINE BEFORE TALK).
PKI and CAs
- We know that CAs verify that the certs or pubkeys are actually good to go ie they are recognized and authorized.
- Prevents MITM
- See
Georg
for more if- eg avao;
Symmetric vs Asymmetric Encryption
- sym - AES
- asym - PKI (computationally expensive)
Load Balancers
- operate at L4 or rather mostly L7
- Nginx? How does it fit
CDN and Edge
- Lets say you want to stream a movie and i have it in my datacentre half way around the globe. CDNs allow for placing content/resource closer to you for better performance and latency along with costing the org not as many long distance clogged lines
- Edge builds on this has a dedicated network to further speed the process up (READ MORE)
Bloom Filters and Count-min sketch
- Space efficient probablisitic based data structures.
- BF - Used to decide to an element is a part of a set or not. May have false positives but never false negatives. Very Space Efficient (READ MORE)
- CMS - Frequency event counter. fraction of space used to probablistically arrive at close to the accurate answer.
Paxos
VMS and Containers
- vm is system on system, containers self contained