Redis

Very fast (110000 SETs/sec, 81000 GETs/sec). in RAM, high availability, key-value data store with persistence and consistency (relaxed).
Widely used: Twitter GitHub Weibo Pinterest Snapchat Craigslist Digg StackOverflow Flickr, and more
Key: string, Value: can be a string, list, set, sorted set or hash
Documentation

Using Redis

In lab
Run through the redis tutorial.
Atomic operations like incr, and Transactions but limited in clustered environments.
Also has PUB/SUB

Reminders... # Run redis server docker run --name some-redis -d redis redis-server --appendonly yes # single server # Run another redis container executing redis-cli in it, connecting our first container docker run -it --link some-redis:redis --rm redis redis-cli -h redis -p 6379 docker stop CONTAINER_ID docker start CONTAINER_ID docker exec -it CONTAINER_ID /bin/bash # shell inside container redis-benchmark

vs

They are for very different purposes!!

Postgresql: Store on disk => slow, cheap, large storage
Redis: Store in RAM => fast, expensive, small storage
Postgresql: strong properties: ACID,replicate/scale carefully (replication=>reduced performance in many cases)
Redis: weaker properties, very scalable (more hosts=>better performance), high availability, some atomic operations
Postgresql: Very general purpose tool, lots of usecases
Redis: More limited usecases, typically a high speed cache, message queue.

Redis vs other DBS

Big difference: redis=fast+small data+high availabiliy. Way to use lots of RAM.
Main competitors are things like memcached, MongoDB.

Redis Architecture: Single Server

Client/Server model
Client = redis-cli or redis-api (java, Python

Redis Persistence

Stop the server => loose everything, so redis adds persistence
RDB: Occassionally, or on demand (SAVE command), store current state of memory in a compact binary format on disk.
AOF: Continually log all commands executed by this server.

docker container ls # looking for some-redis docker exec -it CONTAINER_ID /bin/bash # shell inside container bash: redis-benchmark # executable (not redis command) redis-server: SAVE # redis command, then take a look at db file (or BGSAVE) bash: ls -al

RDB Pros and Cons

+ RDB compact single-file, good for versioning state of system.
+ Disaster recovery: Quickly store small .rdb offsite.
+ Good performance: Occassionally redis forks child process to store state of memory. Parent does no disk I/O.
+ Disaster recovery: Fast restarts than with AOF.
- loose all data since last snapshot
- fork may be slow on large datasets impacting performance of server

AOF Pros and Cons

+ better durability, tunable (via fsync). Note cost of fsync.
+ AOF log is append only, so potentially lose only last command
- AOF larger than RDB, but has a rewrite/compress option.
- AOF slower than RDB (depending on fsync policy)
- AOF slower on restarts than RDB

Advice: Use both.

RDB snapshotting algorithm

fork # so takes advantage of Copy On Write parent: continues to server requests child: write RAM to tmp.rdb child: mv tmp.rdb dump.rdb child: terminates # on redis restart, load dump.rdb

AOF algorithm

write to log whenever a command modifies a key # on redis restart, run the log # How durable? Depends on how you configure fsync # fsync on each command/every second/never fsync tradeoffs

AOF rewriting

To keep AOF small, create a shorter version of the log with the same end RAM state ex: increment key 1000 times vs increment key by 100 Algorithm redis-client: BGREWRITEAOF # redis command fork rewriter job parent: continues serving requests, accumulates any new commands in RAM, not current.log child:rewrites current.log to tmp.aof child: mv tmp.aof current.aof child: tell parent "DONE" child: terminates parent: logs accumulated backlog to current.aof parent: continues as in AOF algorithm above

Redis Architecture: Replication (Data)

Master(write/read) - slave(read) replication: slaves are eventually consistent.
Async replication, fast since master does not wait for slave to update.
Master is single writer, many slaves readers. Clients can read from any slave.
Possible non-durability: master fails after "done" to client.
not strongly consistent as in CAP. Client writes to master and immediately reads from slave.
There is a wait command to tune consistency.
Slaves can have other slaves, cascading replication.
Master can have persistence turned off, slaves can have it turned on.

Healing

Link between master and slave fails

Slave connects to master
Slave asks for partial re-sync
If partial fails, slave asks for complete re-sync.

Redis Architecture: Hash Partitioning (=Sharding)

16384 hashslots in a redis cluster (16 bits)
Each node is responsible for a range/ collection of hashslots.
Can also specify parts of keys instead of full keys to use in sharding.
Redis also supports other partitioning schemes.

Redis Architecture: Cluster

Full Mesh: Each node is connected to all others.
Supports re-sharding on the fly.

Redis Architecture: Sentinel

What if Master becomes unavailable, who watches the watcher?