Research
* indicates equally contributed. Some of my notable papers are highlighted.
|
|
Think, Verbalize, then Speak:
Bridging Complex Thoughts and Comprehensible Speech
Sang Hoon Woo*,
Sehun Lee*,
Kang-wook Kim,
Gunhee Kim
EMNLP 2025
pdf
We introduce an explicit "verbalization" step that translates model thoughts into speech-friendly utterances for spoken dialogue systems, along with ReVerT, an efficient verbalization model.
|
|
Behavior-SD: Behaviorally Aware Spoken Dialogue Generation with Large Language Models
Sehun Lee*,
Kang-wook Kim*,
Gunhee Kim
NAACL 2025   (Oral)
🏆 Senior Area Chair Award
– Top Paper in Speech Processing and Spoken Language Understanding
pdf
/
project page
We present Behavior-SD and BeDLM, enabling large language models to generate natural, full-duplex spoken dialogues enriched with human conversational behaviors.
|
|
Enhanced X-sepnet with Physics-Informed Unrolling: Towards Accurate MRI Susceptibility Mapping
Kang-wook Kim
Undergraduate Thesis
pdf
/
poster
I enhanced χ-sepnet with physics-informed unrolling to improve MRI susceptibility mapping accuracy but withheld ISMRM 2024 submission due to the model's underestimation of susceptibility in patient data, requiring further refinement.
|
|
FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow
Ki-Ung Song*, Dongseok Shim*, Kang-wook Kim*, Jae-young Lee, Younggeun Kim
CVPR 2022 NTIRE Workshop
pdf
2nd place on the NTIRE Learning Super-Resolution Space Challenge 4X track and 1st place on the 8X track.
|
|
Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song*, Sang Hoon Woo*, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim
CVPR Demo Track, 2022
pdf
/
Demo
Our team developed a multilingual system that generates lip-synced talking face videos from text in four languages while preserving speaker identity.
|
|
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Kang-wook Kim,
Seung-won Park,
Junhyeok Lee,
Myun-chul Joe
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
pdf
/
project page
/
github
We propose Assem-VC, a voice conversion system that combines modern techniques for realistic any-to-many conversion while preserving rhythm and intonation.
|
|
Controllable and Interpretable Singing Voice Decomposition via Assem-VC
Kang-wook Kim,
Junhyeok Lee
NeurIPS Workshop on ML for Creativity and Design, 2021   (Oral [top 6.2%])
pdf
/
project page
/
github
/
bibtex
We propose a controllable singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC.
|
|