Project Info

Multi-accelerator Execution of Autonomous Workloads

Mehmet Belviranli

Project Goals and Description:

Modern mobile and embedded system on chips (SoCs) couple CPUs with specialized processing units, such as graphical processing units (GPU), deep learning accelerators (DLA), field programmable gate arrays (FPGA),  digital signal processors (DSP) and programmable vision accelerators (PVA), on the same die to serve the demanding needs of autonomous, mobile, and edge computing.  In such SoCs, operations in an application can often be accelerated via different specialized processing units with varying performance, energy, and latency characteristics. For example, a convolution operation can be set to run on the CPU, GPU, PVA, or DLA. The processing unit (i.e., accelerator) that would provide the optimal execution time and/or energy efficiency for the operation depends both on the processor capabilities, and on properties of the operation, such as matrix size and filter dimensions. Depending on the dynamic requirements of the system (e.g., high throughput, low energy), runtime parameters of the operation (e.g., number of objects, image size), and availability of processors, the programmer (or system scheduler) may choose to map different operations to different processors throughout execution of an application. The project aims to build a multi-accelerator execution framework so that popular autonomous workloads such as object detection, pose extraction, motion planning, object tracking and other neural-network based algorithms can utilize a hybrid execution to benefit varying energy, latency and throughput requirements.  During the project the students will have access to various state-of-the-art SoCs such as Snapdragon 865, NVIDIA Xavier and AGX Orin. Students will have the chance to work on different application scenarios such as autonomous discovery and tracking drones, cloud-rendered video streaming frameworks and FPGA-accelerated robot workloads. Students will build static and dynamic runtime schedulers, analytical performance models and multi-accelerator execution paradigms to enable resource efficient execution of autonomous workloads.   

More Information:

Grand Challenge: Engineer the tools of scientific discovery.

Primary Contacts:

Mehmet Belviranli <> Ismet Dagli <>

Student Preparation


There are no qualifications. Students who liked computer organization and operating systems classes and/or who are interested in working on GPUs, FPGAs and other processors, are strongly encouraged to apply.  


5 to 15 hours per week


Depending on the architecture students would want to work on, they will learn how to program and use CUDA, TensorRT, Hexagon DSP and Xilinx Accel platform. They will get familiar with common computer vision, deep learning and motion planning workloads. 


I plan to meet students once or twice a week. I will ask weekly plans on Mondays and status reports on Fridays. I will be asking students to create tasks and subtasks to plan their work. I will also ask students to document their work and version their source code as they proceed in the project. 


Share This