System Design
Last updated
Last updated
https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/
How we scaled Dropbox https://www.youtube.com/watch?v=PE4gwstWhmc https://blogs.dropbox.com/tech/2018/10/dropbox-traffic-infrastructure-edge-network/
A Brief History of High Availability https://www.cockroachlabs.com/blog/brief-history-high-availability/
https://www.hiredintech.com/classrooms/system-design/lesson/52 https://github.com/donnemartin/system-design-primer https://gist.github.com/vasanthk/485d1c25737e8e72759f https://github.com/binhnguyennus/awesome-scalability load balancer in depth introduction https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236
Xiaohan Zeng http://blog.gainlo.co
https://www.hiredintech.com/classrooms/system-design/lesson/52
http://www.lecloud.net/tagged/scalability
http://tutorials.jenkov.com/software-architecture/index.html
System Design resources: AA Architecture Lectures https://ruggeri.github.io/architecture_lecture/
System design questions https://www.hiredintech.com/classrooms/system-design/lesson/60
https://github.com/rlee0525/TechnicalConceptsForInterviews/blob/master/SystemSummary.md
https://github.com/donnemartin/system-design-primer
Whatsapp https://www.youtube.com/watch?v=5m0L0k8ZtEs Ask for features, then derive architecture text/audio/video one to one? group chat? sent/delivered confirmation, read confirmation push notification
load balancing
servers between parties horizontal scaling message queues are ephemeral
sent confirmation: an ACK from the server (TCP) OR they could be special msgs generated by the servers
system design: https://www.hiredintech.com/courses/system-design
https://github.com/rlee0525/TechnicalConceptsForInterviews/blob/master/SystemSummary.md
https://github.com/donnemartin/system-design-primer
https://hpbn.co/ High Performance Browser Networking Cracking the code interview book questions
system design: https://www.hiredintech.com/courses/system-design
The system design process: 1. Use cases Constraints (how much traffic? how much data?) per month for human, per second for design example: shorten URL service new URL per month: 1. 80/20 rule: 20 percent of the item generates 80% of the traffic
abstract design application layer database layer generate a hash md5 convert to base 62
bottlenecks Perhaps your system needs a load balancer and many machines behind it to handle the user requests. Or maybe the data is so huge that you need to distribute your database on multiple machines. What are some of the downsides that occur from doing that? Is the database too slow and does it need some in-memory caching?
Traffic and Data
Scale your abstract design
Scalability https://www.hiredintech.com/classrooms/system-design/lesson/60
Malan class https://www.youtube.com/watch?time_continue=4&v=-W9F__D3oY4 46:42
share a physical machine with several virtual machines or just directly share a physical machines more privacy: make sure only you have the access to the physical machine SFTP vs. FTP: secure. username and password.
Vertical Scaling: more CPU, cores, L2 caches, RAM, disk space, horizontal scaling: multiple servers. How to distribute? Load balancer round robin: DNS server BIND: return the IP address of different servers upon each request easy to implement. problem: one server may get slower and it’s getting more new users. not cache friendly. TTL time-to-live. after that the cache expires 2. load balancer has a public IP. all the servers have private IP. load balancer distribute by randomness, or server load. problem: cookies, sessions. (Sticky sessions)
solution: store session data in the load balancer. problem: a weakness in the system: what if that session data server is down? solution: raid Redundant array of independent disks RAID0 two identical hard drives. stripe data across them. write data twice as fast (performance) RAID1 mirror data across disks RAID10 four drives, combines 0 and 1 RAID5: I have 5 drives, but only one of them is used for redundancy (RAID1) RAID6: any two drives can die.
Load balancer can be software(free options available) or hardware (expensive! hundreds of thousands of dollars)
Storing the private IP in the cookie (what about privacy? IP changes?), or a random big number, and let the load balancer know which server that number points to.
PHP accelerator: store the intermediate language (like Java)
Caching: Craiglist: use HTML to store an add: very fast reading. Con: more disk space (no template), hard to change.
MySQL query cache memcache (store in RAM)
database storage engines: memory: store data in RAM archive: auto compress data, slower to query, save space NDB: network storage engine for clustering
database replication: master-slave: backup purposes, or load balance Facebook: read heavy: direct read to slaves and write to master master-master: two masters, and their slaves
load balancer: active-active mode (similar to the idea of master-master) active-passive. heartbeat.
Partitioning: Harvard fb/ MIT fb/ …
High availability (HA): two masters (for database) two active load balancers Internet - load balancer Traffic: TCP 443: default port for SSL for HTTP based URLs TCP 80 port 22: SSH load balancer - server Traffic: offload SSL, just TCP 80
webserver - database Traffic: SQL querys TCP 3306
Scalability for Dummies: every server contains exactly the same codebase and does not store any user-related data, like sessions or profile pictures, on local disc or memory.
DB: MySQL to NoSQL, no need for joins With “cache” I always mean in-memory caches like Memcached or Redis. Please never do file-based caching
two patterns of caching database queries. problem: when to expire? cache objects
Some ideas of objects to cache:
user sessions (never use the database!) fully rendered blog articles activity streams user<->friend relationships
Caching: Redis, Memcached
Asynchronism turn dynamic content to static html beforehand Just tell the user to come back later
DB Sharding In this scheme the data for User A is stored on one server and the data for User B is stored on another server. It's a federated model. Groups of 500K users are stored together in what are called shards.
Advantage: high availability. faster queries, more write bandwidth, Cons: rebalancing data; joining data from multiple shards.
Uber everything is a tradeoff JSON: does not have types Microservices are a way of replacing human communication with API coordination.
shortening URL md5 convert to base62
https://blog.pragmaticengineer.com/distributed-architecture-concepts-i-have-learned-while-building-payments-systems/ https://github.com/aphyr/distsys-class https://gist.github.com/macintux/6227368
DDIA Book Review
DDIA 中文笔记