During my secondment visit at TU Delft, conducted in the context of the ENSURE-6G project, I focused on the problem of optimizing inference for hybrid vision models (i.e., models that combine both convolutional and transformer blocks) deployed in highly resource-constrained environments. The work targeted system scenarios involving multiple IoT devices that are unable to host complete large models locally, a setting that is increasingly relevant for future 6G and edge intelligence applications.
The host research group investigates model inference on small devices, with particular emphasis on energy efficiency and system-level optimization. During the visit, I had the opportunity to interact closely with the group’s PhD students, exchange technical insights, and study existing system models and optimization formulations developed within the project. This collaboration helped me better understand the challenges associated with model partitioning and device assignment, especially as problem complexity increases with larger models and higher numbers of participating devices.

Specifically, one of the students has already defined an optimization problem to define blocks of models that can be parallelised, and to find the optimal block assignment to the devices. On the other hand, due to the problem’s complexity, solving such problems requires a significant computational overhead, i.e., the problem contains many parameters and becomes even more complex as the number of devices increases or as the ML model gets larger. Hence, we want to find a heuristic solution that can reduce solution complexity and runtime. We started our exploration via building a framework that can support split inference and evaluation, and can be used as a benchmark to conduct an empirical analysis that can provide us insights about the system.
Notably, a key technical achievement during my visit was the implementation of a model-agnostic split inference framework, which introduces a new software mechanism to support the execution of partitioned models across distributed devices. The framework is not tied to a specific neural architecture and can accommodate hybrid vision models, making it suitable as a general experimentation platform for split inference research. The framework was successfully validated in a prototype setup involving one laptop and one NVIDIA Jetson device, demonstrating the feasibility of distributed inference under realistic hardware constraints. Plans are in place to extend the evaluation to more scalable multi-device configurations.
Despite the limited duration of the secondment, the visit enabled the establishment of a solid technical foundation for future work on inference optimization in distributed edge systems. This helped me to expand my research knowledge, as my PhD topic was mainly in split learning with deep convolutional networks, while in this visit, I had the chance to study hybrid models and implement split computing from a different perspective.