VUW-Massey Joint-Workshop on Advances in Learning-based Visual Computing

Time and Location:

IC2.73 | Innovation Complex Seminar Room, Location: Auckland Campus, Massey University, Auckland, 27 Feb 2026

Google Map (Innovation Complex, Massey University)：https://maps.app.goo.gl/9Nd23wVvT7SF88YYA

Online Live Session: https://vuw.zoom.us/j/95844426329

Hosts：

Dr. Fang-Lue Zhang, Victoria University of Wellington, New Zealand,

Prof. Ruili Wang, Massey University, New Zealand,

Prof. Reinhard Koch, Kiel University, Germany.

New Zealand Time	Session	Video Recording
13:30-13:40	Welcome Speech by Prof. Chris Scogings
13:40-14:10	Prof. Reinhard Koch: Bridging the gap between 2D and 3D - 30 years of research in camera-based 3D scene reconstruction
14:10-14:40	Prof. Ruili Wang: Multimodal Data Processing with NLP and LLMs for Video Captioning and Image Fusion
14:40-15:10	Dr. Fang-Lue Zhang: Gaze-Driven Panoramic Image Understanding and Enhancement
15:10-15:30	Tea Break
15:30-16:00	A/Prof. Burkhard Wuensche: Applications of Visual Computing in AR/VR
16:00-16:30	A/Prof. Xing Yan: Learning High-Fidelity 3D Reconstruction from Images
16:30-17:00	Prof. Thomas Pfeiffer: Large Language Models in collective decision-making and deliberation

*For watching the recordings online, the recommended browsers are Safari, Chrome, and Microsoft Edge.

Speakers:

1. Prof. Reinhard Koch, Kiel University

Title:Bridging the gap between 2D and 3D - 30 years of research in camera-based 3D scene reconstruction

Bio: Prof. Dr. Reinhard Koch (Leader) is a professor of computer science at Kiel University, since 1999 and an internationally recognized expert in 3D computer vision, visual computing and 3D scene reconstruction from different image modalities, using stereo and multiview rigs, structure from motion, time-of-flight cameras and plenoptic imaging systems. He has published over 220 peer reviewed journal and conference papers in all relevant venues, including the top venues like ICCV, ECCV, CVPR, NIPS, MICCAI, T-PAMI, IJCV and others, with over 13.000 citations and h-index 51. His work was honored with numerous awards, like the David Marr Award (the highest award in computer vision), Olympus-Award, and numerous best paper awards. He served the community as chair and co-chair of the German Pattern Recognition Society and as German delegate to the International Association for Pattern Recognition for 9 years and was elected IAPR fellow. From 2023 to 2026 Prof. Koch is the recipient of the Julius von Haast Catalyst:Leaders Fellowship of the Royal Society of New Zealand Project UVW2301: Holistic Volumetric Representation for Reconstructing Immersive Videos.

Abstract: Computing a truthful reconstruction of 3D scenes from camera images has been an eminent topic in computer vision for at least 30 years. Ever since the groundbreaking work of Longuet-Higgins in the 1980th and Faugeras, Hartley and Zisserman and others in the 1990th, researchers have investigated solutions for this ill-posed problem. Today, deep learning approaches take over, but are still in need of basic reconstruction tasks. In my talk I will revise the approaches of this research area, from calibrated and uncalibrated Structure from Motion to novel learning methods like 3D Gaussian Splatting.

2. Prof. Ruili Wang, Massey University

Title: Multimodal Data Processing with NLP and LLMs for Video Captioning and Image Fusion

Bio: Professor Ruili Wang received a Ph.D. degree in Computer Science from Dublin City University, Dublin, Ireland. He is currently the Professor of Artificial Intelligence and Chair of Research in the School of Mathematical and Computational Sciences at Massey University, Auckland, New Zealand, where he serves as the Director of the Centre for Language and Speech Processing. His current research interests include speech processing, natural language processing, video and image processing, and intelligent systems. Professor Wang serves as a member and Associate Editor of the editorial boards for international journals, including IEEE Transactions on Multimedia (TMM), IEEE Transactions On Circuits and Systems for Video Technology (TCSVT)，IEEE Computational Intelligence Magazine, IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Knowledge and Information Systems (Springer), Applied Soft Computing (Elsevier), Neurocomputing (Elsevier).

Abstract: In his presentation, Professor Wang will cover three topics: (i) Topic 1: Briefly report research progress in multimodal data processing conducted in the Speech and Language Processing Centre at Massey Auckland; (ii) Topic 2: Briefly introduce their recent progress in video captioning, in particular the recently published paper in the journal of Knowledge-Based Systems, "Knowledge Enhancement and Disentanglement Learning for Video Captioning"; (iii) Topic 3: Briefly describe their recent progress in image fusion, in particular the recently accepted paper in IEEE Transactions on Multimedia, "Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion." Note that Topics 2 and 3 both relate to multimodal data processing, specifically the application of Natural Language Processing (NLP) and Large Language Models (LLMs) to video and image processing.

3. Dr. Fang-Lue Zhang, Victoria University of Wellington

Title: Gaze-Driven Panoramic Image Understanding and Enhancement

Bio: Fang-Lue Zhang is a Senior Lecturer at Victoria University of Wellington, New Zealand. He received his Ph.D. in Computer Science from Tsinghua University in 2015. Since 2009, Dr. Zhang has focused on research in computer graphics and intelligent image/video editing methods. He has proposed numerous innovative approaches in the structured representation, analysis, and synthesis of images and videos, as well as in perception-based visual media analysis and editing. He has published over 100 papers in international conferences and journals in the fields of computer graphics and artificial intelligence, including more than 40 papers in top-tier publications such as IEEE TPAMI, ACM SIGGRAPH/SIGGRAPH Asia, ACM TOG, IEEE TVCG, IEEE TIP, and AAAI. Dr. Zhang has received the Victoria University of Wellington Early-Career Research Excellence Award (2019) and the Royal Society of New Zealand Fast-Start Marsden Grant (2020). He has served as the Program Chair of Pacific Graphics 2020 and 2021, and as the Program Chair of CVM 2024. He is currently an IEEE Senior Member and Industrial Coordinator of the IEEE Central New Zealand Section. He serves on the editorial boards of several international journals in computer graphics.

Abstract: Panoramic images and videos can present 360-degree real-world scenes, providing users with a highly immersive experience. Compared to traditional virtual reality (VR) scenes generated through complex modeling and rendering, panoramic images and videos are directly captured from the real world, offering a more intuitive and comprehensive representation of the scene. Visual perception in panoramic environments plays a crucial role in the quality of user experience, and understanding and analyzing users' visual perception is one of the core challenges in this field. This report focuses on a key perceptual feature in 360-degree images and videos—the user scanpath—and explores how deep learning techniques can be applied to predict scanpaths and enhance image quality based on such user gaze trajectories.

4. A/Prof. Burkhard Wuensche, University of Auckland

Title: Applications of Visual Computing in AR/VR

Bio: Assoc.-Prof. Burkhard Wünsche is a leading researcher in visual computing and serious games who focuses on solving real-world problems using visual representations. He received his Vordiplom (similar to a BSc) at the University of Kaiserslautern in Germany, and his MSc and PhD at the University of Auckland in New Zealand. He is the leader of the Graphics Group of the University of Auckland, and a member of the Centre for Brain Research (CBR) and the Centre for Automation, Robotics, and Engineering Sciences (CARES). His research interests include visual computing (Computer Graphics, Computer Vision, HCI, AR/VR), serious games, and innovative education applications and health interventions. He has published more than 300 papers in prestigious journals and conferences such as IEEE TVCG, IEEE Viz, SIGGRAPH ASIA, Eurographics, CHI, SIGCSE and IEEE VR. He has received more than $9 million on research grants including a Horizon Europe and MBIE grant. He has chaired and/or was on the ICP of more than 100 conferences incl. SIGGRAPH, CHI, SIGCSE, IEEE VR, MICCAI, Eurographics, and Pacific Viz. He worked on numerous industry projects and led a team winning the Velocity entrepreneurship competition.

Abstract: Visual computing, the integration of computer graphics, computer vision, and HCI, has become a catalyst for solving complex real-world challenges. This talk explores how advanced multi-modal AR/VR representations are transforming healthcare and education by shifting users from passive observers to active, multisensory participants. We examine the technical foundations of these experiences, from cost-effective 3D environment synthesis and 3D object detection and tracking, to multi-modal sensory integration and multimodal biosensing for user state detection. We give examples of real-world applications by presenting recent research on VR tools for teaching molecular structure and function, AR digital placemaking, VR tinnitus rehabilitation, cybersickness mitigation, exergaming, and personalised adaptive learning. We will discuss promising results and current limitations and challenges. The talk concludes with a roadmap for emerging trends and interdisciplinary collaboration.

5. A/Prof. Xing Yan, Hefei University of Technology

Title: Learning High-Fidelity 3D Reconstruction from Images - via Teacher-Model-Guided Views, Dynamic Positional Encoding, and Decomposed Neural Signed Distance Fields

Bio: Yan Xing received the B.S. degree in Computer Science from Northeast Normal University, China, in 2000, and the Ph.D. degree in Computer Aided Geometric Design and Computer Graphics from Hefei University of Technology, China, in 2009. She was a postdoctoral researcher in the Department of Computer Science at Rice University from 2011 to 2012. She is currently an Associate Professor at Hefei University of Technology and a visiting scholar at Victoria University of Wellington (2025–2026). Her research interests include computer vision, computer graphics, image-based 3D reconstruction, neural implicit representations, spline and subdivision methods, and deep learning.

Abstract: Image-based 3D reconstruction remains a fundamental yet challenging problem in computer vision, particularly under sparse-view and complex-geometry settings. In this talk, I will present three of our works that aim to improve reconstruction accuracy and robustness by combining model guidance, adaptive representations, and neural implicit modeling. First, we address sparse-view 3D reconstruction by introducing a teacher-model-guided strategy that generates pseudo views, significantly improving reconstruction quality when input images are limited. Second, we propose a dynamic positional encoding scheme whose frequency varies across both spatial locations and training stages, enabling neural networks to better capture multi-scale geometric details while avoiding high-frequency artifacts. Third, we present a decomposed neural signed distance field (SDF) representation, where the scene geometry is modeled as a combination of a base function and a displacement function. Two neural networks are jointly trained to learn these components, while a separate MLP models the scene’s radiance field. A coarse-to-fine training strategy is adopted to progressively introduce high-frequency details, leading to more accurate and stable surface reconstruction. Experiments demonstrate that the proposed methods achieve high-fidelity surface reconstruction, particularly for scenes with complex geometry and sparse observations.

6. Prof. Thomas Pfeiffer, Massey University

Title: Large Language Models in collective decision-making and deliberation

Bio: Thomas Pfeiffer graduated as biophysicist from the Humboldt University Berlin and obtained a PhD from the ETH Zurich. After an appointment as postdoctoral researcher at Harvard University, he accepted a professorship at the New Zealand Institute for Advanced Study at Massey University in Auckland. Thomas’ research interests are metascience, AI behavioral science and game theory. He is particularly interested in mechanisms of information elicitation and aggregation, how such mechanisms can be used for the benefit of science, and how they can help integrating AI capabilities into research workflows. Thomas was part of several large-scale replication projects, and academic lead of the team running large-scale replication forecasting within DARPA’s ‘Systematizing Confidence in Open Research and Evidence (SCORE)’ project. He has authored more than 50 publications, including in prestigious journals such as Science, the Proceedings of the National Academy of Sciences, and Nature Human Behaviour.

Abstract: The capabilities of cutting-edge LLMs have opened up novel research areas at the interaction between AI research and the social and behavioral sciences. Tools from the social and behavioral sciences can be used to assess the behavior of single AI agents and investigate the dynamics of systems with multiple AI or human and AI agents. Moreover, LLMs can be used to generate ‘synthetic participants’ for social and behavioral science studies which allows overcoming constraints from limited sample sizes. I will present a study investigating collective decision-making and the efficiency of deliberation with LLM agents. My results show that deliberation can lead to an efficient exchange of information but can be prone to features of the communication protocol.

Print version | Backlinks |

Te Kura Mātai Pūkaha, Pūrorohiko

School of Engineering and Computer Science