Powered by Meta's SAM Audio

Remove Any Sound.
Just Describe It.

The ultimate AI audio separation tool utilizing SAM Audio technology. Describe, click, or select to isolate sounds with surgical precision.

Used by professionals on

FL StudioPremiere ProAbleton LiveDaVinci ResolvePro ToolsFinal Cut ProFL StudioPremiere ProAbleton LiveDaVinci ResolvePro ToolsFinal Cut Pro

Powered by SAM Audio

SAM Audio separates target and residual sounds from any audio or audiovisual source—across general sound, music, and speech.

Text Prompts

Describe any sound in plain English. Type "remove drums" or "isolate vocals" and let SAM Audio handle the rest.

Visual Prompts

Click any object or person in your video. SAM Audio extracts their audio automatically—no description needed.

Span Prompts

Select a moment where your target sound plays. SAM Audio learns and tracks that sound throughout the entire file.

Multi-Modal Prompts

Combine text, visual, and time-based prompts for surgical precision on the most complex audio mixtures.

Workflow Redefined

From raw recording to pristine audio in three steps.

1

Upload Media

Drag and drop any video or audio file. We support all major formats.

2

Describe & Isolate

"Remove the air conditioner hum." SAM Audio processes your prompt instantly.

3

Export Clean

Download your separated stems or the cleaned master track in WAV quality.

Built for Creators

From bedroom producers to Hollywood studios, AudioSam adapts to your workflow. Professional-grade audio separation for every creative field.

Music Producers
Vocal removalStem separationSample extraction

Music Producers

Stem separation & remix

Extract vocals, drums, bass, and instruments from any track. Create remixes, sample packs, or isolate elements for your productions.

Podcasters
Noise removalVoice isolationAudio cleanup

Podcasters

Crystal-clear dialogue

Remove background noise, isolate guest audio, and clean up recordings. Deliver professional-quality episodes every time.

Video Editors
Dialogue extractionSound designADR prep

Video Editors

Perfect audio post

Separate dialogue from ambient sound, remove unwanted audio, and create clean audio beds for your video projects.

Filmmakers
Location audioFoley separationMix prep

Filmmakers

Production & post audio

Isolate on-set dialogue, remove location noise, and prepare stems for scoring. Professional audio separation for cinema.

AI Audio Separation in Action

Watch how AudioSam isolates vocals, removes background noise, and extracts stems from any audio file. Select a demo below.

Complete audio separation—isolate vocals, instruments, and effects from any mixed track in seconds.

Natural Language Processing

Describe any sound in plain English. Request "drums", "vocals", or "background noise" and isolate instantly.

Video-Aware Isolation

Click any person or object in video to extract their audio. Visual and audio separation in one click.

Temporal Tracking

Mark a sound once, track it everywhere. SAM Audio follows sounds as they move and change.

Multi-Modal Control

Combine text, visual, and time-based prompts for surgical precision on complex audio mixtures.

Real-Time Processing

Faster than real-time at RTF 0.7. Process hours of audio in minutes with scalable cloud infrastructure.

Studio-Grade Quality

Transformer-based AI trained on millions of hours. Best-in-class separation for music, speech, and sound.

Simple Pricing

Flexible credit system. Only pay for what you clean.

Hobbyist

$0/mo
  • 10 Free Credits / mo
  • MP3 Export
  • Text Prompts only

Creator

$19/mo
  • 500 Credits / mo
  • WAV & FLAC Export
  • Text, Visual & Span Prompts

Studio

$49/mo
  • Unlimited Credits
  • API Access
  • Batch Processing

Frequently Asked Questions

Everything you need to know about AudioSam and SAM Audio technology.

How accurate is SAM Audio for sound separation?
SAM Audio utilizes state-of-the-art transformer models trained on millions of audio hours. It achieves industry-leading separation quality for vocals, instruments, speech, and environmental sounds. While no AI tool is 100% perfect for every scenario, AudioSam consistently delivers professional-grade results that rival dedicated hardware solutions.
What audio and video formats does AudioSam support?
AudioSam supports all major audio formats including MP3, WAV, FLAC, AAC, OGG, and M4A. For video files, we support MP4, MOV, AVI, MKV, and WebM. The maximum file size depends on your subscription tier, with Studio users enjoying unlimited file sizes and batch processing capabilities.
What is the difference between text, visual, and span prompts?
Text prompts let you describe sounds in natural language (e.g., "remove background traffic noise"). Visual prompts allow you to click on objects in video frames to isolate their associated sounds. Span prompts let you select specific time ranges where the target sound occurs, helping the AI identify and separate it throughout the entire file.
Is my uploaded audio and video data secure?
Absolutely. All files are encrypted using AES-256 encryption during upload and processing. We automatically delete your original and processed files from our servers within 24 hours. Enterprise and Studio users can request immediate deletion or configure custom retention policies. We never use your content to train our models without explicit consent.
Can AudioSam remove vocals from music for karaoke or remixes?
Yes! AudioSam excels at vocal isolation and removal. You can extract clean vocals for remixes, create instrumental versions for karaoke, or separate individual instruments from mixed tracks. Simply use a text prompt like "isolate vocals" or "remove singing voice" to get started. The quality rivals dedicated stem separation tools.
How does the credit system work?
Credits are consumed based on the duration of your audio or video file. One credit equals approximately one minute of processed audio. Complex separations (like isolating multiple sounds simultaneously) may use slightly more credits. Free users receive 10 credits monthly, Creator plans include 500 credits, and Studio plans offer unlimited processing.
What is SAM Audio and how is it different from other AI audio tools?
SAM Audio (Segment Anything Model for Audio) is Meta's groundbreaking AI technology that brings the versatility of image segmentation to audio. Unlike traditional tools that only separate predefined categories (vocals, drums, bass), SAM Audio can isolate any sound you describe. It's the first model to support multi-modal prompting—combining text descriptions, visual cues from video, and temporal selection.
Can I use AudioSam for podcast editing and noise removal?
AudioSam is perfect for podcast production. Remove background noise like air conditioning hum, traffic sounds, or keyboard clicks while preserving crystal-clear speech. You can also isolate individual speakers from group recordings, remove "ums" and "ahs", or extract specific segments. Many professional podcasters use AudioSam as part of their standard workflow.
Is there an API available for developers?
Yes! Studio plan subscribers get full API access with comprehensive documentation, SDKs for Python, JavaScript, and other popular languages, and webhook support for async processing. The API supports all three prompting modes and includes batch processing endpoints for high-volume workflows. Rate limits and concurrent processing capabilities scale with your needs.
How long does audio processing take?
Processing time depends on file length and complexity. Most audio files under 5 minutes are processed in under 30 seconds. Longer files or complex multi-sound separations may take 1-2 minutes. Studio users benefit from priority processing queues, ensuring faster turnaround even during peak usage times. You'll receive a notification when your file is ready.