openbook

fastcach

软件介绍

fatcache 可以让你在 SSD 上运行 memcached,你可以把它当作是大数据中的缓存,fatcache 是 Twitter 的开源项目。

一些性能数据:

单节点可每秒可处理 10 万 set 操作,

每个数据包是 100 字节

单节点可每秒处理 4.5k get 操作,每个数据 100 字节

8 个 fatcache 实例可处理 32k get 操作/每秒到一个单 600GB 的 SSD 存储

你可以通过将多个 SSD 连接到单个机器来提升 IO 性能

There are two ways to think of SSDs in system design. One is to think of SSD as an extension of disk, where it plays the role of making disks fast and the other is to think of them as an extension of memory, where it plays the role of making memory fat. The latter makes sense when persistence (non-volatility) is unnecessary and data is accessed over the network. Even though memory is thousand times faster than SSD, network connected SSD-backed memory makes sense, if we design the system in a way that network latencies dominate over the SSD latencies by a large factor.

To understand why network connected SSD makes sense, it is important to understand the role distributed memory plays in large-scale web architecture. In recent years, terabyte-scale, distributed, in-memory caches have become a fundamental building block of any web architecture. In-memory indexes, hash tables, key-value stores and caches are increasingly incorporated for scaling throughput and reducing latency of persistent storage systems. However, power consumption, operational complexity and single node DRAM cost make horizontally scaling this architecture challenging. The current cost of DRAM per server increases dramatically beyond approximately 150 GB, and power cost scales similarly as DRAM density increases.

Fatcache extends a volatile, in-memory cache by incorporating SSD-backed storage.

SSD-backed memory presents a viable alternative for applications with large workloads that need to maintain high hit rate for high performance. SSDs have higher capacity per dollar and lower power consumption per byte, without degrading random read latency beyond network latency.

Fatcache achieves performance comparable to an in-memory cache by focusing on two design criteria:

  • Minimize disk reads on cache hit
  • Eliminate small, random disk writes

The latter is important due to SSDs' unique write characteristics. Writes and in-place updates to SSDs degrade performance due to an erase-and-rewrite penalty and garbage collection of dead blocks. Fatcache batches small writes to obtain consistent performance and increased disk lifetime.

SSD reads happen at a page-size granularity, usually 4 KB. Single page read access times are approximately 50 to 70 usec and a single commodity SSD can sustain nearly 40K read IOPS at a 4 KB page size. 70 usec read latency dictates that disk latency will overtake typical network latency after a small number of reads. Fatcache reduces disk reads by maintaining an in-memory index for all on-disk data.

Batched Writes

There have been attempts to use an SSD as a swap layer to implement SSD-backed memory. This method degrades write performance and SSD lifetime with many small, random writes. Similar issues occur when an SSD is simply mmaped.

To minimize the number of small, random writes, fatcache treats the SSD as a log-structured object store. All writes are aggregated in memory and written to the end of the circular log in batches - usually multiples of 1 MB.

By managing an SSD as a log-structured store, no disk updates are in-place and objects won't have a fixed address on disk. To locate an object, fatcache maintains an in-memory index. An on-disk object without an index entry is a candidate for garbage collection, which occurs during capacity-triggered eviction.

In-memory index

Fatcache maintains an in-memory index for all data stored on disk. An in-memory index serves two purposes:

  • Cheap object existence checks
  • On-disk object address storage

An in-memory index is preferable to an on-disk index to minimize disk lookups to locate and read an object. Furthermore, in-place index updates become complicated by an SSD's unique write characteristics. An in-memory index avoids these shortcomings and ensures there are no disk accesses on cache miss and only a single disk access on cache hit.

Maintaining an in-memory index of all on-disk data requires a compact representation of the index. The fatcache index has the following format:

struct itemx {
  STAILQ_ENTRY(itemx) tqe;    /* link in index / free q */
  uint8_t             md[20]; /* sha1 message digest */
  uint32_t            sid;    /* owner slab id */
  uint32_t            offset; /* item offset from owner slab base */
  uint64_t            cas;    /* cas */
} __attribute__ ((__packed__));

Each index entry contains both object-specific information (key name, &c.) and disk-related information (disk address, &c.). The entries are stored in a chained hash table. To avoid long hash bin traversals, the number of hash bins is fixed to the expected number of index entries.

To further reduce the memory consumed by the index, we store the SHA-1 hash of the key in each index entry, instead of the key itself. The SHA-1 hash acts as the unique identifier for each object. The on-disk object format contains the complete object key and value. False positives from SHA-1 hash collisions are detected after object retrieval from the disk by comparison with the requested key. If there are collisions on the write path, new objects with the same hash key simply overwrite previous objects.

The index entry (struct itemx) on a 64-bit system is 44 bytes in size. It is possible to further reduce index entry size to 28 bytes, if CAS is unsupported, MD5 hashing is used, and the next pointer is reduced to 4 bytes.

At this point, it is instructive to consider the relative size of fatcache's index and the on-disk data. With a 44 byte index entry, an index consuming 44 MB of memory can address 1M objects. If the average object size is 1 KB, then a 44 MB index can address 1 GB of on-disk storage - a 23x memory overcommit. If the average object size is 500 bytes, then a 44 MB index can address 500 MB of SSD - a 11x memory overcommit. Index size and object size relate in this way to determine the addressable capacity of the SSD.

授权协议

协议解读

相关文档

性能测试

代码下载

https://github.com/twitter/fatcache

demo演示

更新跟踪

漏洞跟踪