|
Research
* indicates equally contributed. Some of my notable papers are highlighted.
|
|
SubAlign: Explicit Speech–Text Alignment Driven Tokenization for Spoken Language Modeling
Kang-wook Kim,
Sehun Lee,
Sang Hoon Woo,
Gunhee Kim
Submitted to January ARR 2026
pdf
We present SubAlign, the first speech tokenization framework to explicitly segment speech at the subword level corresponding to LLM vocabularies.
|
|
Think, Verbalize, then Speak:
Bridging Complex Thoughts and Comprehensible Speech
Sang Hoon Woo*,
Sehun Lee*,
Kang-wook Kim,
Gunhee Kim
EMNLP 2025
pdf
/
project page
We introduce an explicit "verbalization" step that translates model thoughts into speech-friendly utterances for spoken dialogue systems, along with ReVerT, an efficient verbalization model.
|
|
Behavior-SD: Behaviorally Aware Spoken Dialogue Generation with Large Language Models
Sehun Lee*,
Kang-wook Kim*,
Gunhee Kim
NAACL 2025   (Oral)
🏆 Senior Area Chair Award (Top 0.3%) 🏆
– Top Paper in Speech Processing and Spoken Language Understanding
pdf
/
project page
We present Behavior-SD and BeDLM, enabling large language models to generate natural, full-duplex spoken dialogues enriched with human conversational behaviors.
|
|
Enhanced X-sepnet with Physics-Informed Unrolling: Towards Accurate MRI Susceptibility Mapping
Kang-wook Kim
Undergraduate Thesis
pdf
/
poster
I enhanced χ-sepnet with physics-informed unrolling to improve MRI susceptibility mapping accuracy but withheld ISMRM 2024 submission due to the model's underestimation of susceptibility in patient data, requiring further refinement.
|
|
FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow
Ki-Ung Song*, Dongseok Shim*, Kang-wook Kim*, Jae-young Lee, Younggeun Kim
CVPR 2022 NTIRE Workshop
pdf
2nd place on the NTIRE Learning Super-Resolution Space Challenge 4X track and 1st place on the 8X track.
|
|
Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song*, Sang Hoon Woo*, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim
CVPR Demo Track, 2022
pdf
/
Demo
Our team developed a multilingual system that generates lip-synced talking face videos from text in four languages while preserving speaker identity.
|
|
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Kang-wook Kim,
Seung-won Park,
Junhyeok Lee,
Myun-chul Joe
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
pdf
/
project page
/
github
We propose Assem-VC, a voice conversion system that combines modern techniques for realistic any-to-many conversion while preserving rhythm and intonation.
|
|
Controllable and Interpretable Singing Voice Decomposition via Assem-VC
Kang-wook Kim,
Junhyeok Lee
NeurIPS Workshop on ML for Creativity and Design, 2021   (Oral, top 6.2%)
pdf
/
project page
/
github
/
bibtex
We propose a controllable singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC.
|
|