Kang-wook Kim
I am a recent Electrical and Computer Engineering graduate from Seoul National University, currently a research intern at the Vision & Learning Lab. My work focuses on AI-driven speech synthesis, integrating behavioral traits like turn-taking and backchanneling for more natural dialogues. I aim to apply these innovations in mental health care to create empathetic AI agents that enhance communication and support. I am eager to pursue a PhD abroad to further advance AI in healthcare, collaborating with experts to push the boundaries of human-like conversational systems.
Email  / 
CV  / 
Google Scholar  / 
Semantic Scholar  / 
Twitter  / 
Github  / 
Linkedin
|
|
Research
My research focus is on speech synthesis and deep learning, with a particular emphasis on analyzing and manipulating speech.
* indicates equally contributed. Some of my notable papers are highlighted.
|
|
Behaviorally Aware Spoken Dialogue Generation with Large Language Models
Sehun Lee*,
Kang-wook Kim*,
Gunhee Kim
Submitted to NAACL 2025
project page
We address the challenge of modeling nuanced conversational behaviors—such as backchannels, turn-taking, and filler words—using Behavior-SD, a 100K-dialogue dataset (2,044 hours) annotated for these behaviors. Our BeDLM model generates natural conversations conditioned on behavior cues and narrative context, advancing realistic dialogue systems for applications like mental health support and personalized assistants.
|
|
Enhanced X-sepnet with Physics-Informed Unrolling: Towards Accurate MRI Susceptibility Mapping
Kang-wook Kim
Undergraduate Thesis
pdf
/
poster
I enhanced χ-sepnet with physics-informed unrolling to improve MRI susceptibility mapping accuracy but withheld ISMRM 2024 submission due to the model's underestimation of susceptibility in patient data, requiring further refinement.
|
|
FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow
Ki-Ung Song*, Dongseok Shim*, Kang-wook Kim*, Jae-young Lee, Younggeun Kim
CVPR 2022 NTIRE Workshop
arXiv
2nd place on the NTIRE Learning Super-Resolution Space Challenge 4X track and 1st place on the 8X track.
|
|
Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song*, Sang Hoon Woo*, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim
CVPR Demo Track (Round 1), 2022
arXiv
/
Demo
Our team developed a multilingual system that generates lip-synced talking face videos from text in four languages while preserving speaker identity.
|
|
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Kang-wook Kim,
Seung-won Park,
Junhyeok Lee,
Myun-chul Joe
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
project page
/
arXiv
/
github
We propose Assem-VC, a voice conversion system that combines modern techniques for realistic any-to-many conversion while preserving rhythm and intonation.
|
|
Controllable and Interpretable Singing Voice Decomposition via Assem-VC
Kang-wook Kim,
Junhyeok Lee
NeurIPS Workshop on ML for Creativity and Design, 2021   (Oral Presentation [top 6.2%])
project page
/
arXiv
/
github
/
bibtex
We propose a controllable singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC.
|
|