Data Scientist for AI VOICE ID, 2+ years of Multimedia AI/ML
Employee • New York, United States • Remote
We are looking for an experienced full-time Data Scientist Voice/Search STT, TTS, STS with over 2 years of AI/ML experience. The role involves working on an AI Voice-based B2C Media Wallet app with VoiceID, providing secure and convenient access to games, entertainment, and enabling interactions with brands, influencers, content, and payments. Our goal at Babylon Voice AI is to empower individuals and corporations to shape the future of voice-driven experiences while ensuring digital identity protection in a dynamic digital world. We work with data from a wide variety of sources including video, audio, news, twits, etc The team has several US-professors as advisors with expertise in ML/DS, stochastic control who provide mentorship, conversational AI, voice/dialog systems. Your contributions will drive content discovery and personalization through voice/video interactions across apps, devices (e.g. Alexa, Google Home, etc.), and automotive products.
Babylon Voice, a cutting-edge technology company, is at the forefront of revolutionizing the digital identity landscape through AI Voice ID. Our top-10 NYC startup team has earned awards, grants, and recognition in hackathons fromAmazon, Spotify, Google, Open AI (Microsoft), Deloitte, and Polygon, Unstoppable Domains. Our team comprises exceptional AI scientists, cryptography devs, CS and Ph.D. from Stanford, MIT, Google, Discord, World of Tanks and Telegram. Our technology has made its mark on Minecraft, Fortnite, Bloomberg, Disney, Republic, JP Morgan, and Roblox. Guiding our journey is a former partner from Andreessen Horowitz (a16z), who has joined as a founder, having previously sold their gaming startup to Disney. Leading our charge is our CEO, a Female Founder armed with a Ph.D. in Mathematics, a background as an MTV host, and selling her startup to Sony Pictures. Babylon Voice is on a mission to redefine the concept of digital identity using advanced AI technologies. We employ cutting-edge ML/AI technologies like STT, TTS, STS, NLP, CVPR, Style transfer using Cycle-GAN and Recycle-GAN, summary search, Speech recognition, Language translation, and Synthetic Media Generation. Our target markets include B2B, B2C, Entertainment, Customer Service, Advertising, Compliance, Security, and Privacy for Non-Pornographic DeepFake. Our vision centers on the convergence of AI and deep tech, welcoming the next billion digital users to an era where AI superpowers augment human intelligence. At the core of our innovation is "VoicePrint," our AI Digital identity standard, infusing biometric security and synthetic voice capabilities. Babylon Voice envisions earning royalties for each authenticated human voice, a testament to our transformative impact.
Why Join Us:
- We are a fully distributed team with a New York HQ, offering flexible work and schedules
- You will have the opportunity to work on turning bleeding-edge research into commercial products, focusing on Voice AI, Digital ID, Digital IP based on multimedia
- We support a growth mindset and provide paper publications, mentorship, and internships from top researchers
- We welcome candidates internationally and foster a no-micromanagement environment for highly self-sufficient individuals
- Our tech stack includes PyTorch wrapped in Flask and running in a Kubernetes cluster, AWS, and a range of great libraries and frameworks such as React, NLTK, PyBrain, NumPy, SciPy, Pandas, Keras, Airflow, Docker, Fastapi, Flutter, Node.js, and TypeScript. We leverage 48+ AI/ML networks, including DALL-E and Stable Diffusion AI technology, for 3D avatars in Unity and Unreal Engine 5
Responsibilities and what we are looking for:
- Developing cutting-edge ML for Speech to Text (STT), Text to Speech (TTS), Speech to Speech (STS), real-time speech synthesis (Clone, Deep Fake), AI discrimination in voice. Convert written text to natural-sounding speech using the latest neural speech synthesizing techniques. Design criteria for voice performance evaluation
- Research, design, experiment with, and build ML systems, particularly related to voice products
- Prototype New Features. This means rapidly building prototypes end-to-end, including storage, business logic, and user experience.
- R&D in Voice recognition, Synthesize voice across languages and variety of voices in all supported dialects. Adapt and customize voices for the vocabulary and the tone, including Medical Muscle Tension Dysphonia
- Assemble prototypes and MVP. Compress models and optimize inference. Define measures of success for podcast-related initiatives. Build dashboards and self-service tools to enable ongoing monitoring of trends. Develop a deep appreciation of the podcast content landscape and how users engage with podcasts, rooms, videos, etc
Initial work could be done remotely with daily Zoom standups with full team and in person meetings. Preferably you would be located and work in our New York, NY office
- Advanced STEM degree: M.S. or PhD with extensive relevant AI experience (Computer Science, Math, Economics, Engineering)
- Extensive experience utilizing ML/AI methodologies, building data pipelines, exploratory data analysis. Comfortable navigating large datasets (advanced SQL). Work with products on product experimentation. Find ideal testing opportunities, measure AB test results.
- Experience with libraries ML-frameworks (e.g., PyTorch, Keras, Vowpal Wabbit, scikit-learn)
- Familiarity with tools such as Python, R, Julia or MATLAB - Familiarity with AWS or another cloud infrastructure provider (GCP, Azure, etc), Technologies: Kafka, Airflow, Composer. Production experience implementing machine learning pipelines and models at scale in Python, Java, Scala, or similar languages. Proficiency with distributed processing and warehousing frameworks (e.g., Spark, Hadoop, Hive, Tez, etc.). Experience with the research and development workflow/life-cycle for large-scale batch and streaming ML
- Excellent written and verbal communication skills, ability to collaborate effectively with non-tech team members and stakeholders Self-motivated, growth-oriented, and driven to pursue solutions to challenging problems. Excellent problem solving skills
- A big "Plus" Deeply curious; interested in how people interact with content, and podcasts specifically. Though not required, previous experience in media, entertainment, or technology is a plus. You are located anywhere.
Our Tech Stack:
Includes PyTorch wrapped in Flask and running in a Kubernetes cluster, AWS, and a range of great libraries and frameworks such as React, NLTK, PyBrain, NumPy, SciPy, Pandas, Keras, Airflow, Docker, Fastapi, Flutter, Node.js, and TypeScript. We leverage 48+ AI/ML networks, including DALL-E and Stable Diffusion AI technology, for 3D avatars in Unity and Unreal Engine 5.Register to Apply
Please let Babylon Voice (Manan AI Inc)| New York know that you found this job role on CryptoJobs.gg