Kang-wook Kim

I recently earned a Bachelor's degree in Electrical and Computer Engineering from Seoul National University and am currently a research intern at the Vision & Learning Lab. My work focuses on AI-driven speech synthesis, integrating behavioral traits like turn-taking and backchanneling for more natural dialogues. I aim to apply these innovations in mental healthcare to create empathetic AI agents that enhance communication and support.

Email / CV / Google Scholar / Semantic Scholar / Twitter / Github / Linkedin

Research

My research focus is on speech synthesis and deep learning, with a particular emphasis on analyzing and manipulating speech. * indicates equally contributed. Some of my notable papers are highlighted.

Behaviorally Aware Spoken Dialogue Generation with Large Language Models
Sehun Lee*, Kang-wook Kim*, Gunhee Kim
NAACL 2025 (Oral Presentation)
project page

We address the challenge of modeling nuanced conversational behaviors—such as backchannels, turn-taking, and filler words—using Behavior-SD, a 100K-dialogue dataset (2,044 hours) annotated for these behaviors. Our BeDLM model generates natural conversations conditioned on behavior cues and narrative context, advancing realistic dialogue systems for applications like mental health support and personalized assistants.

Enhanced X-sepnet with Physics-Informed Unrolling: Towards Accurate MRI Susceptibility Mapping
Kang-wook Kim
Undergraduate Thesis
pdf / poster

I enhanced χ-sepnet with physics-informed unrolling to improve MRI susceptibility mapping accuracy but withheld ISMRM 2024 submission due to the model's underestimation of susceptibility in patient data, requiring further refinement.

FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow
Ki-Ung Song*, Dongseok Shim*, Kang-wook Kim*, Jae-young Lee, Younggeun Kim
CVPR 2022 NTIRE Workshop
arXiv

2nd place on the NTIRE Learning Super-Resolution Space Challenge 4X track and 1st place on the 8X track.

Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song*, Sang Hoon Woo*, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim
CVPR Demo Track, 2022
arXiv / Demo

Our team developed a multilingual system that generates lip-synced talking face videos from text in four languages while preserving speaker identity.

Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Kang-wook Kim, Seung-won Park, Junhyeok Lee, Myun-chul Joe
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
project page / arXiv / github

We propose Assem-VC, a voice conversion system that combines modern techniques for realistic any-to-many conversion while preserving rhythm and intonation.

Controllable and Interpretable Singing Voice Decomposition via Assem-VC
Kang-wook Kim, Junhyeok Lee
NeurIPS Workshop on ML for Creativity and Design, 2021 (Oral Presentation [top 6.2%])
project page / arXiv / github / bibtex

We propose a controllable singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC.