Notes
  • Notes
  • JavaScript
    • URL processing
    • Numbers
  • Python
    • python random notes
    • Python Resources
  • Setup
    • Mac Setup
  • Command Line
    • Basics
    • Linux basics
    • Bash script
    • Create temp files
    • awk
    • sed
    • make
    • ssh
    • gzip
    • Command line tools
    • ffmpeg
    • at and crontab - scheduling commands
  • Web Developement
    • Chrome Dev Tools
    • HTML
    • Markdown
    • CSS
    • Rails
    • Hugo
    • REST APIs
  • Soft Skills
    • Listening Skills
    • Public Speaking
  • Containers
  • Career
    • Resume
    • Interview
    • Promotion
    • Keeping Track of Your Work
    • Decide What to Work On
  • Ergonomics
    • Work Env Setup
    • Pain Relieve
  • Digest / Writing Ideas
    • Books
      • Antifragile
      • Anti-Intellectualism in American Life 美国的反智传统
    • Economy / Society
    • How to spend your time
    • Life
    • Higher education
  • Misc
    • Regex
    • Don't Make Me Think
    • Microsoft Excel
    • AdTech 101
  • Resources
    • web
    • Vim
    • Tools
    • System Design
    • Design Pattern
    • Load Balancer
    • References
    • Hardware
    • Algorithm Resources
    • Command Line Resources
  • Git
    • Pro Git
  • Non-Tech
    • 化学科普 - 拿破仑的纽扣
    • 人生经验 - If I Knew Then
    • 哲学
      • Harvard - Justice
    • 宗教
      • Introduction to the New Testament History and Literature
      • 蔡志忠 - 漫画东方圣经
    • 人文
      • Open Yale Course - 心理学导论
  • Spark
  • VS Code
Powered by GitBook
On this page

Was this helpful?

  1. Resources

Hardware

PreviousReferencesNextAlgorithm Resources

Last updated 5 years ago

Was this helpful?

Hardware

  • Linear array traversal are extremely cache friendly. For big n, big-O rules. But for smaller n, give array a second thought.

Hardware will prefetch the data and instructions if you have a nice predictable pattern. So sequence code is the fastest. Branching, function calls will all slow things down. But don't sacrifice readability. The bottleneck might as well be the network latency.

Be careful with heterogenous arrays: the process instruction for each element might be different, invalidate the instruction cache line each time. Sort sequences by type. Make "fast paths" branch-free sequences. PGO: profile guilded optimization. Compiler first build the program with logging. Then test against typical use cases. Then rebuild with source code and the result of the first test. It will choose to inline some common paths and functions. May gain 10-20% time. WPO: while program optimization.

Cache coherency - Multithread: if you have at least one writer, use mutex or atomic instructions. But those takes time.

False sharing: independent values/variables fall on one cache line; different cores concurrently access that line; frequently; at least one is writer.

data-oriented programming: lay data in memory in a way that will make CPU cache happy. Video game programming, put the isLive info together, outside of the objects.

Cache Associatibility: also affect performance.

Scott Meyers: CPU Caches and Why You Care
与程序员相关的CPU缓存知识 | 酷 壳 - CoolShell酷 壳 - CoolShell
Logo