ai-tools

    How to Add AI-Generated Captions to Course Videos

    Use Descript or CapCut to auto-transcribe course videos and add styled captions for accessibility and engagement.

    Abe Crystal, PhD8 min readUpdated April 2026

    Both Descript and CapCut can auto-transcribe your course video and generate styled captions in minutes. You upload your video, the AI produces a transcript, you review and correct it, then export the video with captions burned in or as a separate SRT file. The whole process takes less time than manually typing a transcript for a single lesson.

    30–45 min per lessonDescript or CapCut (free tiers available)Beginner
    1Import Video
    2Auto-Transcribe
    3Review & Correct
    4Style Captions
    5Export

    What you’ll walk away with:

    • Accurately captioned course videos ready to upload
    • SRT files for toggleable subtitles on your course platform
    • A custom vocabulary list for your field’s terminology
    • Accessible content that works for every student

    Why add captions to your course videos

    Captions are not optional decoration. They are an accessibility requirement for a meaningful share of your students — anyone who is deaf or hard of hearing, anyone learning in a second language, anyone watching in a noisy environment or a quiet one where they cannot use speakers. Industry surveys consistently find that over 80% of people who use captions are not deaf or hard of hearing — they use them by choice for comprehension and convenience.

    The learning impact is measurable. Research on multimedia learning consistently shows that adding text alongside audio improves retention and comprehension by 25-40%, particularly when the content includes unfamiliar vocabulary or complex concepts — which describes most course material. If you teach anything where terminology matters, captions give your students a second channel to process what you are saying.

    There is also a practical reality: many of your students watch video without sound. On a commute. During a lunch break at work. While their kids are sleeping. If your course videos have no captions, those students get nothing. Captions turn a silent video into a usable lesson.

    Adding captions with Descript

    Descript is a video and audio editor built around transcription. When you import a video, it automatically generates a transcript and lets you edit the video by editing the text. That same transcript becomes the foundation for your captions.

    1

    Import your video and auto-transcribe

    Open Descript, create a new project, and drag your video file in. Descript will automatically transcribe the audio. For a 10-minute video, this usually takes under a minute. The transcript appears in the editor as editable text synchronized to the video timeline.

    2

    Review and correct the transcript

    Play through the video and read along with the transcript. Fix any words the AI got wrong — technical terms, proper nouns, and acronyms are the most common errors. Descript highlights low-confidence words, which helps you focus your review on the parts most likely to need correction. This is the most important step. Do not skip it.

    3

    Style your captions

    In Descript's captions panel, choose a visual style for how the text appears on screen. You can adjust font, size, color, background opacity, and position. For course videos, readability matters more than aesthetics — a clean sans-serif font at a generous size with a semi-transparent dark background is usually the right call.

    4

    Choose burned-in or SRT export

    Descript gives you two options. You can export the video with captions permanently embedded in the image (burned-in), or you can export a separate SRT or VTT subtitle file that platforms can display as a toggleable overlay. For course platforms, the SRT route is usually better — it lets students turn captions on or off.

    5

    Export and upload

    Export your video (with or without burned-in captions) and your subtitle file. Upload the video to your course platform and attach the SRT file if your platform supports it. If you are uploading to YouTube or Vimeo and embedding in your course, both platforms accept SRT uploads in their subtitle settings.

    Adding captions with CapCut

    CapCut is a free video editor from ByteDance (the company behind TikTok) that includes built-in auto-captioning. It is simpler than Descript — fewer editing features, but a fast path from video to captioned export.

    1

    Import your video and generate auto-captions

    Open CapCut's desktop app, create a new project, and import your video. In the text panel, select "Auto captions" and choose your language. CapCut transcribes the audio and places caption text blocks on the timeline, synced to your speech.

    2

    Review and correct

    Click through the caption blocks and fix errors. CapCut's caption editor shows each text segment alongside its timecode, so you can read sequentially and catch mistakes. As with Descript, pay particular attention to specialized vocabulary — the AI was not trained on your field's jargon.

    3

    Choose a style and adjust

    CapCut offers preset caption styles ranging from minimal to animated. For course content, pick something clean and readable. You can customize font, size, color, outline, and shadow. Avoid animated word-by-word highlights unless your course specifically benefits from that style — for most educational content, static block captions are easier to read.

    4

    Export

    CapCut exports video with captions burned in. If you need a separate SRT file, CapCut does not natively export one — you would need to use a third-party tool or switch to Descript for that. For videos where burned-in captions are fine (social clips, standalone lessons), CapCut handles the job cleanly and for free.

    The human layer

    Auto-transcription gets you 90% of the way, but the last 10% requires your eyes and ears. AI transcription models are trained on general speech — they have no knowledge of your field's terminology, your students' names, the specific frameworks you teach, or the branded language you use in your course.

    A yoga instructor teaching about "ujjayi breath" will see it transcribed as "OG eye breath" or "you jai breath." A therapist discussing "EMDR" might get "EM dear." A business coach who mentions "Ruzuku" will almost certainly see it mangled. These errors are not edge cases — they happen in every course video that uses specialized language.

    The fix is simple: watch your video with the transcript open and correct what the AI got wrong. Budget about 1.5 times the video length for this review. A 10-minute lesson takes roughly 15 minutes to caption, review, and export. It is not glamorous work, but it is the difference between captions that help your students and captions that confuse them.

    Course creator tips

    Build a custom vocabulary list

    Before you start captioning, write down every specialized term, proper noun, and acronym that appears in your course. Keep this list open while reviewing transcripts. Descript lets you add custom vocabulary to improve future transcriptions — if you teach a multi-lesson course, the AI gets better as you go. CapCut does not have this feature, so you will need to correct the same terms manually each time.

    Use a readable font at a generous size

    Your students may be watching on phones, tablets, or small laptop screens. Captions that look fine on your 27-inch monitor can be unreadable on a phone. Test your captioned video on the smallest screen your students are likely to use. A minimum of 24-point equivalent with a contrasting background is a good baseline.

    Know the burned-in vs. SRT tradeoff

    Burned-in captions are permanent — if you find a typo after exporting, you have to re-export the entire video. SRT files can be edited in any text editor without touching the video. If your course content is likely to be updated, or if you want to offer captions in multiple languages later, SRT is the more flexible choice. If you are creating short social clips where universal visibility matters more than editability, burned-in is simpler.

    What it gets wrong

    Both Descript and CapCut struggle with the same categories of speech. Knowing what to watch for makes your review faster.

    • Technical terms and jargon — any word outside common English vocabulary will likely be wrong on first pass. "Polyvagal" becomes "poly vagal" or "polyva goal." "Asana" (the yoga pose) becomes "asana" (the project management tool) or just "a sauna."
    • Proper nouns and names — student names, book titles, researcher names, and brand names are frequently garbled. "Vygotsky" becomes "vuh GOT ski." Your own name may not survive intact.
    • Acronyms — "ADHD" usually works. "EMDR," "CBT," "IFS," and niche acronyms often do not. The AI may try to spell them as words instead of letters.
    • Fast speech and overlapping audio — if you tend to speak quickly or if there is background noise, transcription accuracy drops noticeably. Recording in a quiet space at a moderate pace improves both the AI's accuracy and your students' comprehension.
    • Accented English — transcription models still perform better on American and British English than on other accents. If you or your guest speakers have accents that the model handles poorly, plan for extra review time.

    Frequently asked questions

    Should I burn captions into the video or use a separate SRT file?

    It depends on where your video lives. Burned-in captions appear on every platform and device without extra setup, which makes them reliable for social media and downloaded files. SRT files give students the option to toggle captions on or off and let you update text without re-exporting. For course lessons, SRT is usually the better choice. For promotional clips, burned-in is more practical.

    How accurate is AI auto-transcription for course videos?

    Roughly 90-95% accurate on clear, single-speaker English audio. Accuracy drops with technical jargon, proper nouns, fast speech, and background noise. Always review the transcript before publishing — a 10-minute video typically needs about 3-5 minutes of correction.

    Do I need to pay for Descript or CapCut to add captions?

    CapCut's desktop app includes auto-captions for free. Descript's free plan gives you one hour of transcription per month. For a full course, Descript's Hobbyist plan at $24/month adds unlimited transcription. If cost is the deciding factor, start with CapCut.

    Your captioned videos need a home

    Once your videos have clean, accurate captions, the next question is where students will actually watch them. Ruzuku includes built-in video hosting, so you upload your captioned files directly into lessons without managing a separate Vimeo or Wistia account. Your videos live alongside discussion prompts, exercises, and supporting materials — everything in one place.

    If you are just getting started, our step-by-step guide walks you through building your first course from scratch, including how to structure video lessons for maximum impact.

    Related guides

    Topics:
    captions
    subtitles
    accessibility
    video
    AI tools
    Descript
    CapCut
    course video
    SRT

    Related Articles

    ai-tools

    How to Edit Course Videos Using Descript's AI Features

    Edit course videos by editing text in Descript. Remove filler words, fix audio, and correct eye contact with AI.

    Read more
    ai-tools

    How to Fix Eye Contact in Course Videos Using Descript

    Use Descript's AI Eye Contact feature to correct gaze in course videos when you glance at notes or scripts.

    Read more
    ai-tools

    How to Analyze Competitor Courses Using ChatGPT

    Use ChatGPT to systematically review competitor sales pages, curricula, and student reviews — then find the gaps your course can fill.

    Read more

    Ready to Build Your Course?

    AI handles the first draft. You bring the expertise. Start free on Ruzuku — unlimited courses, zero transaction fees.

    No credit card required · 0% transaction fees