Kang-wook Kim

I am a research intern at the Vision & Learning Lab, having earned my Bachelor's degree from Seoul National University. My research focuses on spoken language models and spoken dialogue systems.

Email  /  CV  /  Google Scholar  /  Semantic Scholar  /  Twitter  /  Github  /  Linkedin

profile photo
Research

* indicates equally contributed. Some of my notable papers are highlighted.

clean-usnob SubAlign: Explicit Speech–Text Alignment Driven Tokenization for Spoken Language Modeling
Kang-wook Kim, Sehun Lee, Sang Hoon Woo, Gunhee Kim
Submitted to January ARR 2026
pdf

We present SubAlign, the first speech tokenization framework to explicitly segment speech at the subword level corresponding to LLM vocabularies.

clean-usnob Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech
Sang Hoon Woo*, Sehun Lee*, Kang-wook Kim, Gunhee Kim
EMNLP 2025
pdf / project page

We introduce an explicit "verbalization" step that translates model thoughts into speech-friendly utterances for spoken dialogue systems, along with ReVerT, an efficient verbalization model.

clean-usnob Behavior-SD: Behaviorally Aware Spoken Dialogue Generation with Large Language Models
Sehun Lee*, Kang-wook Kim*, Gunhee Kim
NAACL 2025   (Oral)
🏆 Senior Area Chair Award (Top 0.3%) 🏆
– Top Paper in Speech Processing and Spoken Language Understanding

pdf / project page

We present Behavior-SD and BeDLM, enabling large language models to generate natural, full-duplex spoken dialogues enriched with human conversational behaviors.

clean-usnob Enhanced X-sepnet with Physics-Informed Unrolling: Towards Accurate MRI Susceptibility Mapping
Kang-wook Kim
Undergraduate Thesis
pdf / poster

I enhanced χ-sepnet with physics-informed unrolling to improve MRI susceptibility mapping accuracy but withheld ISMRM 2024 submission due to the model's underestimation of susceptibility in patient data, requiring further refinement.

clean-usnob FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow
Ki-Ung Song*, Dongseok Shim*, Kang-wook Kim*, Jae-young Lee, Younggeun Kim
CVPR 2022 NTIRE Workshop
pdf

2nd place on the NTIRE Learning Super-Resolution Space Challenge 4X track and 1st place on the 8X track.

clean-usnob Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song*, Sang Hoon Woo*, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim
CVPR Demo Track, 2022
pdf / Demo

Our team developed a multilingual system that generates lip-synced talking face videos from text in four languages while preserving speaker identity.

clean-usnob Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Kang-wook Kim, Seung-won Park, Junhyeok Lee, Myun-chul Joe
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
pdf / project page / github

We propose Assem-VC, a voice conversion system that combines modern techniques for realistic any-to-many conversion while preserving rhythm and intonation.

clean-usnob Controllable and Interpretable Singing Voice Decomposition via Assem-VC
Kang-wook Kim, Junhyeok Lee
NeurIPS Workshop on ML for Creativity and Design, 2021   (Oral, top 6.2%)
pdf / project page / github / bibtex

We propose a controllable singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC.