
Hobbies and interests
Clinical Psychology
Reading
Criticism
Novels
I read books daily
Yoojin Shin
1,635
Bold Points1x
Finalist
Yoojin Shin
1,635
Bold Points1x
FinalistBio
Hello, I'm Yoojin Shin.
I start my second Master's at Lehigh and would finish the degree within 2 years so that I can apply to Phd right away.
Education
Lehigh University
Master's degree programMajors:
- Computer Science
University of Utah
Master's degree programMajors:
- Computer Science
University of Utah
Bachelor's degree programMajors:
- Psychology, General
Miscellaneous
Desired degree level:
Doctoral degree program (PhD, MD, JD, etc.)
Graduate schools of interest:
Transfer schools of interest:
Majors of interest:
Career
Dream career field:
Information Technology and Services
Dream career goals:
AI Engineer
Deeply2024 – 2024
Sports
Surfing
Club2017 – 2017
Research
Computer Science
University of Utah — Researcher2020 – 2022Computational Science
Hanyang University — Researcher2020 – 2022
Arts
University of Utah
Videography2018 – 2018
Public services
Volunteering
AIESEC — Translating language and Assisting2016 – 2016
Elevate Women in Technology Scholarship
A few months ago, I watched my prototype app read a Turkish sign aloud to a visually impaired user. The sign had hand-drawn, curved lettering—difficult even for humans to parse—but our model read it clearly and confidently, converting the text to speech in seconds. In that moment, I felt something shift. Technology wasn’t just doing something clever—it was offering someone freedom.
That moment solidified my belief in the power of multimodal learning: training AI to understand the world through multiple modes like images, text, and sound. As someone who studied psychology and communication in undergrad and later transitioned into computer science, I’ve always been fascinated by how humans integrate sensory information. Multimodal AI doesn’t just simulate this—it brings us closer to machines that communicate and perceive more like us.
In my recent project, I built an OCR + TTS pipeline using a multimodal learning framework. The system identifies text in images (OCR), processes it linguistically, and reads it aloud using TTS (text-to-speech). I designed it to assist visually impaired users navigating unfamiliar environments, where printed information like signs, labels, or menus might otherwise be inaccessible. Seeing it work—watching someone “hear” what they couldn’t “see”—was incredibly moving.
What makes this even more powerful is its potential for linguistic inclusivity. As a Korean native who works across English and has trained models in Turkish, I’m currently expanding the project to support all three languages. I believe that accessibility shouldn’t stop at ability—it should cross language barriers too.
Multimodal learning also inspires me because it opens doors for ethical, human-centered AI. From education to healthcare to disaster response, systems that can combine sight, language, and sound will be able to assist in more nuanced, empathetic ways. I envision future applications where AI not only detects a hazard in a video but narrates it clearly, or where a child can ask a question by voice and get both visual and spoken feedback tailored to their needs.
To me, multimodal learning is more than a research area—it’s a blueprint for equitable technology. It teaches us that the future isn’t just smart—it must also be accessible, multilingual, and kind.