Understand AI in Game Audio Roundtable
一、工作组介绍与Wiki资源 · Working Group Introduction & Wiki Resource
Alex(主持人 / Moderator):
The work product of the AI Special Interest Group is a wiki — a crowdsourced wiki that we've been contributing to. If you find one of those plastic cards on your desk, you can scan the QR code to look at the wiki. What we've been doing with the wiki is running various different tools and writing down our experiences with them. For instance, I have used a number of different voiceover tools, and I have experience with generative music tools, and I'm sharing those experiences. What we want is for you to join the wiki and contribute your experiences, so that audio developers can go there and learn about what other people have experienced using these AI tools.
One last comment: when we started this in the fall, a lot of the tools have changed. The landscape is changing very, very quickly. Some of the voiceover tools that I used in the fall are already out of business. A lot of things will be changing very quickly, but we still want people to contribute and share their experiences so that other audio developers can learn from them.
AI特别兴趣小组的工作成果是一个Wiki——一个我们一直在共同贡献的众包Wiki。如果你在桌子上发现了那些卡片,可以扫描二维码查看Wiki。我们在Wiki里做的事情是运行各种不同的工具,并记录我们的使用体验。比如,我用过很多不同的配音工具,也有使用生成式音乐工具的经验,我都在上面分享了。我们希望你们加入Wiki并贡献自己的经验,这样音频开发者就可以去那里了解其他人在使用这些AI工具时的体验。
最后补充一点:当我们在秋天启动这件事时,很多工具已经发生了变化。这个领域变化得非常非常快。我在秋天使用的一些配音工具现在已经倒闭了。很多事情会持续快速变化,但我们仍然希望大家继续贡献和分享经验,让其他音频开发者能从中学习。
We did a quick survey about interests. Hands for voiceover: about twenty people. Hands for generative music: a good number. Hands for sound effect design: quite a few. Hands for vibe coding or debugging: also substantial. It's split pretty evenly between the various areas. So maybe what we could do is spend a little time on each of these areas, starting with voiceover.
我们做了一个快速的兴趣调查。对配音感兴趣的举手:大约二十人。对生成式音乐感兴趣的:相当多。对音效设计感兴趣的:也不少。对vibe coding或调试感兴趣的:同样可观。分布相当均匀。所以我们不如在每个领域都花一点时间,先从配音开始。
二、配音与AI:法律框架与行业实践 · Voiceover & AI: Legal Framework and Industry Practice
Bo Cruncher(Blindlight 公司):
My name is Bo Cruncher. I run a company called Blindlight. For those not familiar with what Blindlight is, it's a SAG-AFTRA interactive signatory — we've been doing casting and production for video games. Some games we've done go way back, like Halo and Skyrim; more recently Destiny 2 and Fallout 76. I was on — and still am on — the Video Game Producers Committee, which was the industry side of the negotiations with SAG-AFTRA in the most recent, unfortunately difficult, strike and negotiation situation. I was part of the crew that helped draft the rules for digital replicas.
I am obviously very interested in voiceover tools. I also, in another life, was an actor — if you've ever seen a show called The Suite Life of Zack and Cody, I played Skippy on that show. I also wrote for a show called Trip Tank. So I came from the film and TV world.
What I want to make sure, when working with these tools, is that I have a vested interest — not just because my business is human-based — but a real interest in quality, because the performance really, really matters. So I want to make sure that people are following the digital replica rules as set forth by the union. Or if they're doing non-union work, that they're aware of the state and federal laws that may apply, and depending on the country in Europe, the laws there — AFTRA has laws similar to SAG-AFTRA.
My main interest in this subject is just making sure people stay informed. I gave a talk yesterday called "Actionable Intelligence: What Game Devs Need to Know About the SAG-AFTRA Interactive Contract and the Tools for AI" — you can check that out if you want.
I think the biggest risk in general with AI is that it encourages people to try and be overzealous about what some would like to call "efficiencies" — to cut corners and not really understand the impact of what they're doing. They may also be motivated to be overly charitable about the quality of certain AI output. Someone might think "oh, this is so cheap, is it really that bad, do players really care?" — but there's that quote about audio: it's half of everything you experience. So it does matter. Unfortunately, audio is often only 10% of the budget.
我的名字是Bo Cruncher,我经营一家叫Blindlight的公司。对于不熟悉Blindlight的人,它是一家SAG-AFTRA互动娱乐签约公司——我们一直在为电子游戏做选角和制作。我们参与过的游戏历史悠久,从《光环》《上古卷轴:天际》,到最近的《命运2》和《辐射76》。我曾经是——现在仍然是——电子游戏制作人委员会的成员,这是最近这场不幸的罢工和谈判中代表行业方与SAG-AFTRA谈判的委员会。我参与了数字复制品相关规则的起草。
我对配音工具显然非常感兴趣。我在另一段人生中也曾是演员——如果你看过《扎克和科迪的旅馆生活》,我在里面饰演Skippy。我也曾为《Trip Tank》写稿。所以我来自影视圈。
我在使用这些工具时想确保的是:我有一个切身的利益所在——不仅仅是因为我的业务是以人为本的——而是对质量有真实的关切,因为表演真的非常、非常重要。所以我想确保大家遵循工会制定的数字复制品规则。或者如果他们做的是非工会项目,也需要了解可能适用的州法和联邦法律,以及根据所在欧洲国家的不同法律。AFTRA有类似SAG-AFTRA的法律。
我在这个话题上的主要兴趣就是确保大家保持知情。我昨天做了一个演讲,叫"可行的情报:游戏开发者需要了解的SAG-AFTRA互动合同与AI工具"——如果你想了解我们的小组讨论,可以去查看。
我认为AI总体上最大的风险是,它鼓励人们试图对某些人所称的"效率"过于激进——走捷径,不真正理解自己行为的影响。他们也可能被动机驱使,对某些AI输出的质量过于宽容。有人可能会想"哦,这太便宜了,真的有那么差吗,玩家真的在乎吗"——但有那句关于音频的名言:你所体验的一半都是音频。所以这确实很重要。不幸的是,音频往往只占预算的10%。
工作组结构补充说明 / Working Group Structure:
The working group operates according to however anyone is able to contribute, at whatever time and however much they feel comfortable. We have a weekly meeting — proposed by Pat, who said "we're not going to get anything done unless we meet at least once a week." We have a Trello with all our tasks — in progress, to be in progress, and completed — so we treat it like a production. We also have a Discord, so if you're unable to attend the weekly meeting, you can get on Discord and write your thoughts, and you can mark things as off the record. We have a mission statement with a code of conduct, and we want to make sure the environment allows people to express their concerns in a professional way.
Also: the IGDA does not own anything. We are only a place where information is aggregated and put together. What you do with it is totally up to you. We do not own your creative work, we do not own your ideas — you own them. But you're contributing them to the community at large.
Being informed matters because these rules and laws are going to stack over time. It's not a good idea to just play catch-up or only rely on in-house or out-of-house legal counsel. The SAG-AFTRA Interactive contract is extremely niche, and lawyers may not be informed of all the developments. If they're not working at the practical production level — like actually contracting actors or integrating implementations into your engine — they may not know all you need. So you want two sources: always talk to a lawyer, but also talk to someone with practical production experience who understands these tools and the union and state and federal laws at the production level.
工作组的运作方式是根据每个人能够贡献的时间和精力来灵活安排。我们每周开一次会——是Pat提议的,他说"除非我们每周至少见面一次,否则什么都做不成"。我们有一个Trello,记录所有任务——进行中的、待推进的、已完成的——所以我们把它当成一个制作项目来对待。我们还有一个Discord,如果你无法参加每周会议,可以在Discord上写下你的想法,也可以标注"仅供参考,不公开"。我们有使命声明和行为准则,希望确保环境让大家能以专业的方式表达关切。
另外:IGDA不拥有任何东西。我们只是一个汇聚和整合信息的地方。你用这些信息做什么完全取决于你自己。我们不拥有你的创意作品,不拥有你的想法——那些都是你的。你只是在向更大的社区贡献它们。
保持知情很重要,因为这些规则和法律会随时间不断叠加。一味地事后补课,或者只依赖内部或外部法律顾问,都不是好主意。SAG-AFTRA互动合同是极其小众的领域,律师可能不了解所有的最新进展。如果他们不在实际制作层面工作——比如实际签约演员或将实现方案整合进引擎——他们可能不知道你需要了解的一切。所以你需要两个来源:一定要咨询律师,同时也要找有实际制作经验的人,他们了解这些工具以及制作层面的工会规则和州/联邦法律。
Dave Hummel(作曲家、音效设计师、配音总监 / Composer, Sound Designer, Voiceover Director):
I have a fairly large client and we use AI voiceover right now for demos, because changes happen all the time. We use it to demo things out, but we're always going to a real person in the end. Sometimes we create an AI version of a voice we like and use it a lot — we'll use it to demo, then say "hey, we can get that real voice" and give the actor a live session. The speed of changes is something I see affecting the future — people love options and they love having many versions quickly, and that kicks things back to the talent for pickups. But sometimes, once in a while, you can't get them — they're on vacation, or you caught them right before they left town — and you have to use AI to fudge a line or two.
As an artist, I don't want to be replaced. As a composer, I don't use generative music — I'll write a line and maybe the articulation will be AI-generated, but these people are friends. I have a relationship with them, and they need to put food on the table. My concern is that right now AI isn't as easy to coach as a real session, but it's getting smarter every day. I could now theoretically do everything myself — but I choose not to, and I think that should be the theme of what we're concerned about.
我有一个相当大的客户,我们现在在使用AI配音来做Demo,因为变动一直在发生。我们用它来演示想法,但最终还是会找真人来录制。有时候我们会创建一个我们喜欢的配音演员的AI版本,用它来演示,然后说"嘿,我们可以请到那个真人",再给演员安排一次现场录音。变化速度之快让我预见到未来的影响——人们喜欢选项,喜欢快速得到很多版本,这就把补录任务推回给了演员。但有时候,偶尔地,你联系不到他们——他们在度假,或者你在他们离开前一刻才抓到他们——你不得不用AI来补几行台词。
作为一个艺术家,我不想被取代。作为一个作曲家,我不使用生成式音乐——我会自己写旋律,也许音符发音会是AI生成的,但那些配音演员是我的朋友。我与他们有关系,他们也需要养家糊口。我担心的是,现在AI还不像真实录音棚那样容易指导,但它每天都在变得更聪明。我现在理论上可以独自完成所有事情——但我选择不这样做,我认为这应该是我们关注问题的核心主题。
Matthew Abbove(Side公司音乐与声音高级总监 / Senior Director of Music and Sound, Side):
My name is Matthew Abbove, from Montreal. I'm Senior Director of Music and Sound at Side. Some of you might have known me when I was at Behaviour Interactive, which was acquired by Side a little over two years ago. Side is very well known for voice work, so that's how I got even more interested in voiceover these past few years. I'm a composer, and I've seen a lot of things, especially from the perspective of Canada and Montreal where we do both English and French language voice recordings — so localization is a big part of what I deal with.
Over the past year I've seen a lot of unequal agreements across the world. Some countries are accepting AI terms while others aren't — some countries are still resisting. In France right now, for example, there is still a fight going on. The SAG-AFTRA deal solved some things and clarified some things, but this is still not resolved on a worldwide scale.
Despite being very scared of AI — which is why I raised my hand for all of those questions — if you're attending talks on AI today that aren't specific to game audio, you've got to notice that games are incorporating more and more AI-generated content. If there's AI-generated content in a game, you cannot have only audio, voice, and music remaining entirely human — that's literally the only way to maintain differentiation. So what's the solution?
One approach, if I look back in history, is the star system. In Hollywood, actors' names are famous — you like them, you follow them. In video games, that's less common. I think the game industry should push actor visibility more. When we put a human voice actor in a game — not just in the original language, but in localized versions as well — games should try to promote that. I think that's a way to make consumers want to see humans in games. The connection with characters is achieved in a much better way when the actor involves themselves with the game's marketing, social media, and so on.
Voice cloning is still inevitable in certain cases — if an actor becomes sick, or if there's more content than an actor can deliver in the available time. That's why those terms are accepted. But you need to push actor visibility, especially for localized languages, because there's so much secrecy around game development that you will rarely know who did the localized version. These are still real recordings that people play and listen to, and we should find ways to recognize those performers more.
我叫Matthew Abbove,来自蒙特利尔,担任Side公司音乐与声音高级总监。你们中有些人可能认识我,那时我还在Behaviour Interactive,两年多前被Side收购了。Side在配音领域非常知名,这也是我这几年对配音越来越感兴趣的原因。我是一个作曲家,见过很多事情,尤其是从加拿大和蒙特利尔的角度来看——我们同时进行英语和法语的配音录制,因此本地化是我工作的重要组成部分。
在过去一年里,我看到了全球范围内许多不平等的协议。一些国家接受了AI条款,而另一些国家还没有,还有些国家仍在抵制。例如,法国现在仍在进行抗争。SAG-AFTRA的协议解决了一些问题,也澄清了一些事情,但这在全球范围内仍未解决。
尽管我非常害怕AI——这就是为什么我在所有问题上都举了手——但如果你今天参加了非游戏音频领域的AI演讲,你必然注意到游戏中越来越多地融入了AI生成的内容。如果游戏中有AI生成的内容,你就不可能只让音频、配音和音乐保持纯人工——那确实是维持差异化的唯一方式。那么解决方案是什么?
如果回顾历史,一个方法是明星制度。在好莱坞,演员的名字是出名的——你喜欢他们,你关注他们。在电子游戏中,这种情况比较少见。我认为游戏行业应该更多地推动演员的可见度。当我们在游戏中使用一个真人配音演员时——不仅仅是在原版语言中,在本地化版本中也是如此——游戏应该尝试宣传这一点。我认为这是一种让消费者希望在游戏中看到真人的方式。当演员将自己与游戏的营销、社交媒体等联系起来时,与角色的连接会以更好的方式实现。
在某些情况下,声音克隆仍然是不可避免的——如果演员生病了,或者内容量超过了演员在可用时间内能完成的量,这就是为什么那些条款被接受的原因。但你需要推动演员的可见度,尤其是对于本地化语言,因为游戏开发有太多保密性,你很少知道谁做了本地化版本。这些仍然是真实的录音,人们在游戏和倾听,我们应该找到更多认可这些表演者的方式。
Thomas Neumann(CPR 音频程序员 / Audio Programmer, CPR):
I just want to offer a slightly counterbalancing perspective. There is a discrepancy between composers or voice actors who get royalties versus other team members who never get royalties. To break open the kind of teamwork that is required to really ship a game, and then to make someone stand out even more, might also be quite toxic towards everyone else on the team. So just offering a counter lens to that point.
我只是想提供一个稍微平衡的视角。作曲家或配音演员获得版税,而其他团队成员从未获得版税,这之间存在差距。要打破真正推出一款游戏所需的团队协作,然后让某些人更加突出,对团队中的其他所有人来说也可能非常有毒。所以只是提供一个反向视角供参考。
三、AI音效生成 · AI Sound Effect Generation
Michael Turner(技术音效设计师 / Technical Sound Designer):
Hi, my name is Michael Turner. I'm a technical sound designer and I'm concerned about AI sound effects, particularly around training data. All the models out there right now are purposefully cloud-based, and the business model is building the data center, harvesting the data, and then using that data to further improve the model. This has led to the regurgitation of public domain text and works, word for word.
My concern is: when we use AI audio plugins and we feed our own audio into them — I'm not against AI audio plugins, there are great open-source ones that do great things locally — but the moment you start going into a cloud ecosystem and sending your data to the cloud, what's to say they're not just keeping your sound effect and using it to train? There was some controversy with a plugin that had terms saying "we can use your audio input to train a model." What would stop them from then just prompting the model to remake a sound effect you had already made?
Another concern is proprietary information. Not just audio — if there's some proprietary engine code base that you're feeding into an AI coding tool, what's to stop someone from getting into the model and regurgitating it? The scenario where you gathered your own field recordings over years, and now some other project could benefit from them — that's one of the biggest discussion points in our regular working group meetings. There are things known as "fairly trained" datasets, but you're hitting the nail on the head: what happens to stuff that I submit, and what happens to stuff that I use?
大家好,我叫Michael Turner,是一名技术音效设计师。我对AI音效生成有顾虑,特别是关于训练数据的问题。目前所有的模型都是故意基于云端的,其商业模式是建立数据中心,收集数据,然后用这些数据进一步改进模型。这导致了公共领域文本和作品被逐字逐句地重复输出。
我的担忧是:当我们使用AI音频插件并将自己的音频输入其中时——我并不反对AI音频插件,有很棒的开源本地插件可以做很多好事——但一旦你开始进入云端生态系统,将数据发送到云端,谁能保证他们没有直接保存你的音效并用来训练模型?曾经有一个插件因为条款中写着"我们可以使用你的音频输入来训练模型"而引发了争议。那什么能阻止他们接着用提示词让模型重制一个你已经做好的音效?
另一个担忧是关于专有信息。不仅仅是音频——如果你有一个专有的引擎代码库正在输入到AI编程工具中,什么能阻止别人进入模型并将其重新生成出来?你花了多年时间采集的实地录音,现在其他项目却可能受益于它们——这是我们工作组定期会议中最大的讨论议题之一。有一种叫"公平训练"数据集的说法,但你说到了核心问题:我提交的东西会发生什么,我使用的东西又会发生什么?
Scott Williams(游戏音频教育者 / Game Audio Educator, IGDA Working Group):
I'm Scott Williams, involved in game audio education for a long time and also part of the IGDA AI Working Group. I've done experiments with ElevenLabs and also with open-source models running locally. That's something that hasn't been brought up much: almost everybody has been talking about AI services with subscription fees, but it is possible — given that you have a reasonably powerful computer — to run a lot of those models completely locally. They don't take enormous resources comparatively speaking.
From a data provenance standpoint, open-source local models may create issues because open source is not going to check whether everything has been fairly trained for its dataset. But in terms of what I've been mainly working with: ElevenLabs. According to most information I've seen, their stuff is fairly trained and considered royalty-free, so there are no copyright issues associated with it. I'm not sure if it has a cloning capability — it didn't when I was testing it, though that's certainly a possibility and something that will need to be very carefully looked at.
As far as capability at this point: it's reasonably decent at single sound effects, like a laser blast or footsteps. It does depend heavily on how good your prompting is. It is pretty horrible at ambiences — it does not understand complex scene descriptions. I tried to give it a prompt for an 1880s Victorian carnival with a barker and a calliope in the background, and it had no idea what I was talking about. I got dog barks instead.
On variations: it automatically gives you four variations on everything. That said, you probably have to generate about four times, so you're talking about sixteen generations to get maybe three reasonable variants. They're not really graded variants — they're all basically different seeds running. You can however lock the seed, which gives you a similar generational output with some variance.
我是Scott Williams,长期从事游戏音频教育工作,也是IGDA AI工作组的成员。我做过ElevenLabs的实验,也用过本地运行的开源模型。有一件事还没有被太多提及:几乎所有人都在谈论需要订阅费的AI服务,但实际上——如果你有一台相对强大的电脑——可以在本地完全运行很多这样的模型。相对来说,它们不需要消耗大量资源。
从数据来源的角度来看,本地开源模型可能会产生问题,因为开源不会去核查训练数据集是否全部经过公平训练。但就我主要使用的工具来说:ElevenLabs。根据我看到的大多数信息,他们的内容是经过公平训练的,被认为是免版税的,所以没有版权问题。我不确定它是否有克隆功能——我在测试时没有发现,但这当然是一个可能性,需要非常非常仔细地审视。
就目前的功能而言:它对单个音效做得相当不错,比如激光爆炸声或脚步声,这在很大程度上取决于你的提示词质量。它在氛围声方面非常糟糕——它无法理解复杂的场景描述。我曾试图给它一个提示,描述一个有招徕顾客的叫卖者和背景里的管风琴的1880年代维多利亚时代嘉年华,它完全不理解我在说什么,结果给了我狗叫声。
关于变体:无论你输入什么,它会自动给你四个变体。但实际上,你可能需要生成大约四次,也就是说你要生成十六个才能得到也许三个合理的变体。这些变体并不是真正分级的——它们基本上都是在不同种子值下运行的结果。不过你可以锁定种子,这样可以得到相似的生成输出,同时保留一定的变化量。
Danny(资深音频设计师 / Senior Audio Designer):
Something I'm really interested in is: is there a way we can safely use AI and sound generation to make our enormous and stressful workload better? On big games especially, there are times where we use assets that aren't precious — like if I have a bunch of edited tree sounds from one game, I'm going to take them and put them in the next game if they serve the player. I'm curious whether AI can help make those kinds of functional assets faster and better — not just generating variations of a single sound, but the actual implementation work. Can we start using it for things we kind of don't care about that much, or that we only really want to do once? I don't know the answer to that question, but is there a world where it helps us get more sleep at night rather than just destroying what we make?
我真正感兴趣的是:有没有办法安全地使用AI和声音生成来让我们繁重的工作负担变得更好一些?特别是在大型游戏上,有些时候我们使用的资产并不那么珍贵——比如我有一堆从上一款游戏里编辑好的树木声音,如果它们适合,我就会把它们放进下一款游戏,因为玩家得到了他们需要的东西。我很好奇AI是否能帮助我们更快、更好地制作那些功能性资产——不仅仅是生成单个声音的变体,而是实际的实现工作。我们能不能开始把AI用在那些我们不太在意的事情上,或者只想做一次就完成的事情上?我不知道这些问题的答案,但是否存在这样一个世界——它能让我们夜里多睡几个小时,而不只是破坏我们所创造的东西?
Scott Williams(回应 / Response):
So this is about asset generation, as well as handling asset triggering and implementation. And yes, there's a lot of churn and grunt work — making sure 9,000 things are connected, going and getting 10,000 different ambient sounds connected to 10 different things. We make progress on this with procedural systems and scripts, and that's definitely a place where AI could help. But how do we do it safely so our stuff isn't just churning out everywhere, and now all games sound the same the way that all food from the same supplier tastes the same?
This is slightly off-topic, but the idea of using agents to code stuff has become a huge problem in Godot, because Godot is an open-source game engine and people can propose pull requests to add features. People are coding all of these extra features using AI agents and it's creating a real problem for maintainers trying to evaluate whether the code is valid.
I have used AI for vibe coding in classes at the San Francisco Conservatory, where students created sound behaviors in Unity using AI. One student was really smart — they actually took existing scripts, dumped them into the AI, and said "make me a system that extends this," and it worked pretty well. With great power comes great responsibility.
这涉及资产生成,以及资产触发和实现的处理。确实有大量重复性的繁重工作——确保9000件东西连接好,去获取10000种不同的环境音连接到10个不同的触发点。我们通过程序化系统和脚本在这方面取得进展,这绝对是AI可以帮忙的地方。但我们如何安全地做到这一点,不让我们的东西到处乱涌,导致所有游戏听起来都一样,就像来自同一供应商的食物尝起来都一样?
稍微离题一下,但用AI代理来写代码的想法在Godot中已经成了一个大问题,因为Godot是一个开源游戏引擎,任何人都可以提交pull request来添加功能。人们用AI代理编写了所有这些额外功能,这给维护者带来了真正的麻烦,他们要费力地判断代码是否有效。
我曾经在旧金山音乐学院的课堂上用AI做vibe coding,让学生在Unity中用AI创建声音行为。有一个学生非常聪明——他实际上把现有脚本导入AI,说"给我做一个扩展这个的系统",效果还不错。能力越大,责任越大。
四、生成式音乐 · Generative Music
Nathan Greg(西雅图作曲家 / Seattle-based Composer):
I'm Nathan Greg, a Seattle-based composer. Where are my tools? Suno is not my tool. I want a limited dataset with my material only. I want to try to figure out how to scale to the adaptive needs of games. I want to write my own music for different modes of play, and I don't want to short-change my transition logic just because I don't have time to write 5,000 transitions. Where are the tools that do this? Where are the tools for professionals?
我是Nathan Greg,西雅图的作曲家。我的工具在哪里?Suno不是我的工具。我想要一个只有我自己素材的有限数据集。我想弄清楚如何扩展到游戏自适应音频的需求。我想为不同的游戏模式写自己的音乐,我不想因为没时间写5000条过渡而削减我的过渡逻辑。能做到这个的工具在哪里?面向专业人士的工具在哪里?
Alex(回应 / Response):
There are a couple of tools we're aware of that we have write-ups for in the wiki. There's a tool called VToken, where you train on a limited set including your own music. And there's another tool I just started playing with called Musa AI, which generates MIDI — I asked it to make a 32-bar Baroque composition and it generated it in MIDI, using Gemini under the hood.
You make an excellent point about Suno. I saw an ad for it this morning in my Facebook feed, and it was people who are supposedly quitting Spotify and Apple Music because Suno is all they need now — which I don't believe, by the way. I think that's corporate propaganda. But Suno is not immune to scrutiny. You need to find tools that are created by people who are enthusiasts about composition. Roland, for example, is a participant in the AI working group — their customers are composers, not people who want easy listening experiences. We would love it if you would help us find more of those professional-oriented tools so we can improve the wiki for people like you.
我们知道的有几个工具,在Wiki里有记录。有一个叫VToken的工具,你可以用包含你自己音乐在内的有限数据集来训练它。还有一个我刚开始试用的工具叫Musa AI,它生成MIDI——我让它创作一段32小节的巴洛克风格作品,它用Gemini在底层生成了MIDI。
你关于Suno的观点说得非常好。今天早上我在Facebook上看到了它的广告,说有人正在退出Spotify和Apple Music,因为Suno满足了他们所有的需求——顺便说一句,我不相信这个,我认为那只是企业宣传。但Suno也不是无懈可击的。你需要找到那些由真正热爱作曲的人创建的工具。比如Roland就是AI工作组的参与者——他们的客户是作曲家,而不是想要轻松聆听体验的普通用户。如果你能帮助我们找到更多这样面向专业人士的工具,让我们能改进Wiki,那对像你这样的作曲家来说会很有价值。
Seth Sweet(前Riot Games作曲家 / Former Riot Games Composer):
Hey, I'm building them. My name is Seth Sweet. I was a composer and music composition supervisor at Riot Games for the last three years. I left last month. Before Riot, I worked with Spitfire Audio to make some of the industry-standard sample libraries. I realized that smaller companies like Spitfire Audio really cater to professionals, while bigger companies like Native Instruments really don't. So I'm making a sample library now that really caters to professionals.
It uses the bare minimum of AI to do what I believe is machine work: programming key switches, programming MIDI CCs. But it is not writing music for you. It is not extending your music. It is not writing a single note or changing a single note that you make. I'm here at GDC to talk to composers like yourself and anybody else who actually makes a living doing this: what do you want in these tools? Let me build it for you.
我正在做这样的工具。我叫Seth Sweet,在Riot Games担任了三年的作曲家和音乐作曲主管,上个月刚离职。在Riot之前,我与Spitfire Audio合作制作了一些行业标准采样库。我意识到像Spitfire Audio这样的小公司才真正服务于专业人士,而像Native Instruments这样的大公司其实并不那么关注专业人士的需求。所以我现在正在制作一个真正面向专业人士的采样库。
它使用最少量的AI来完成我认为是机器应该做的工作:编程键位切换、编程MIDI CC。但它不会为你写音乐,不会延伸你的音乐,不会写你创作中的任何一个音符或改变任何一个音符。我来参加GDC是为了与像你这样的作曲家以及任何真正靠此谋生的人交谈:你们想要这些工具里有什么功能?让我来为你们构建它。
Max Y(作曲家与混音工程师 / Composer and Mix Engineer):
My name is Max Y. I'm a composer and I've been a mix engineer for over a decade in the music industry. I'm still relatively new to game audio. I'm consistently on the fence between AI and being a traditionalist as a musician. I've been studying piano for over 20 years, and I very much believe that human music is innately human — as is with any of the arts. We all know the battle between generative AI and something that is innate to human expression. But I also have a deep appreciation for technology and how forward-thinking it is, and how much it does for all of us as a tool.
I used Suno with artists for when we hit a writing block — just to get an idea of something, and then go back and rewrite based on it with our own ideas. But now I'm questioning what the ethical boundaries are with incorporating generative AI into something I believe should be natively human. And hearing that people may lose opportunities because a studio decides to replace something completely with something machine-made — that irks me. But I also understand the pressures. I'm constantly teetering: how much do we allow AI, where's the stopping point? It's still going to grow exponentially. I constantly struggle between wanting to learn more and wanting to be proud of the work that I've done. I guess I just have that constant concern.
我叫Max Y,是一个作曲家,在音乐行业做混音工程师已经超过十年。我对游戏音频还比较新。作为一个音乐人,我一直在AI和传统主义之间摇摆。我学钢琴超过二十年,我非常相信人类音乐是天然属于人类的——就像所有的艺术一样。我们都知道生成式AI与人类内在表达之间的对立。但我也对技术有深深的欣赏,欣赏它的前瞻性,以及它作为工具为我们所有人带来的价值。
我在与艺术家遇到创作瓶颈时用Suno——只是为了获得一些灵感,然后再回来用自己的想法重写。但现在我开始质疑,将生成式AI融入到我认为本应是人类的东西中,其伦理边界在哪里。听到人们可能因为工作室决定用机器生成的东西完全取代人类创作而失去机会,这让我感到不安。但我也理解其中的压力。我一直在摇摆:我们允许AI走多远,停止点在哪里?它仍然会以指数级增长。我一直在挣扎:想了解更多,又想对自己所做的工作感到自豪。我想这就是我一直以来的顾虑。
[演讲者 / Speaker](关于情感价值 / On Emotional Value):
There's a project — and we all know audio or music is often only 10% of the budget but 40 or 50 percent of the emotional value. So if a producer said "we're just going to do generated music," I would argue you're going to wind up with a product that has very little emotional engagement and won't make as much money. And in the end, the producer is going to be interested in the money.
I have an experience in the wiki — I won't name the product, but I took a product that asserts it knows how to articulate human performances. I gave it Samuel Barber's Adagio for Strings — a very well-known, extremely emotionally engaging piece that many people weep when they hear. In the wiki you can hear an orchestral performance of it alongside the AI-articulated performance. I informally surveyed about six or seven people, and every single person described to me what I would call the Uncanny Valley — they did not engage emotionally with the AI-articulated performance. That's something really important: the value is the human component, because that translates to emotional engagement, and that translates to the customer, and that translates to the money the producer wants to make.
有一个项目——我们都知道,在一个作品中,音频或音乐往往只占预算的10%,但占据了40%到50%的情感价值。所以如果一个制作人说"我们就用生成式音乐吧",我会说,你最终会得到一个几乎没有情感投入的产品,它不会赚到那么多钱。而最终,制作人是在乎钱的。
我在Wiki里记录了一个经历——我不说是什么产品,但我测试了一个声称知道如何生成人类演奏表情的产品。我给它输入了巴伯的《弦乐柔板》——这是一首非常著名、情感极其丰富的曲子,很多人听到时会哭泣。在Wiki中,你可以听到它的管弦乐演奏版和AI生成演奏版并排。我非正式地调查了六七个人,每一个人向我描述的都是我所称的"恐怖谷效应"——他们无法与AI生成的演奏产生情感共鸣。这非常重要:价值在于人类的成分,因为这转化为情感投入,情感投入转化为消费者,消费者转化为制作人想要赚的钱。
Ryan(回应 / Response):
The people that I know who have gotten the best results from AI have spent enormous amounts of time prompting and refining. And for me, as someone who started composing on a piano roll on AdLib in 1987, I lean more towards live players now than I used to. I think it's going to be a layering. Some people will opt for AI tools, but at the same time there's going to be people who say "wait, this takes me way too long — I want somebody sitting in my room that I can talk to, and we have a rapport." That may be better. For the composers here: how many of you, even if you use virtual instruments for sketching, want real musicians performing your music in the final product — and why?
我认识的那些从AI中获得最好结果的人,花费了大量的时间来提示和调整。对我来说,作为一个从1987年就开始在AdLib上用钢琴卷窗作曲的人,我现在比以前更倾向于使用真实的演奏者。我认为这会是一种层叠关系。有些人会选择AI工具,但同时也会有人说"等等,这花费我太多时间了——我想要一个坐在我房间里、我可以与他交谈的人,我们之间有默契"。那可能更好。对于这里的作曲家们:即使你们在草稿阶段使用虚拟乐器,有多少人希望最终产品中有真实的音乐家来演奏你们的音乐——为什么?
五、AI水印与内容溯源 · AI Watermarking & Content Provenance
Rob Hamilton(RPI 音频技术学家与教育者 / Audio Technologist and Educator, RPI):
My name is Rob Hamilton. I teach at RPI in upstate New York. I'm an audio technologist and composer, formerly at CCRMA Stanford and Smule, a mobile music company here in the Bay Area.
A lot of the fear around using AI in composition and artistic workflows has to do with being replaced — and that's a very real fear. A lot of that fear comes from the fact that there's no recognizable, reliable, distinguishing factor between AI-generated music and human-generated music unless it's terrible. And human music can be terrible too. So from an education standpoint: I'm tasked to teach students early-level musical and technology skills, and I'm getting confronted with "well I can just generate this." As opposed to the majority of us in this room who have training in sound and music and now look at AI as an additional toolset or a replacement fear — here's a generation asking "do we really need that? Do we need to know the circle of fifths? Do we need to know Monteverdi? Do we need to know Stravinsky?"
So I'm looking at the concept of watermarking — the ability to understand what is AI and what is not, so we can make a choice: do we like this or do we not? Do I want to support AI music or do I want to support human-made music? There are two use cases: in education, how do I know if my student is cheating or not? And more broadly in life, how do I know whether to support a given project based on what AI is involved?
When I say watermarking, I mean more in the sense of: was this human-produced? A badge that says "100% human product." So from the working group standpoint, I could see a rationale where toolmakers would opt into a strategy like this — an opt-in system at first that might evolve. Is that a conversation this group could engage, perhaps with the toolmakers, about flagging content?
我叫Rob Hamilton,在纽约上州的RPI教书,是一名音频技术学家和作曲家,曾在CCRMA斯坦福和湾区移动音乐公司Smule工作。
对于在作曲和艺术工作流程中使用AI的大量恐惧,都与被取代有关——这是非常真实的恐惧。很多恐惧来自于这样一个事实:除非AI生成的结果很糟糕,否则AI生成的音乐和人类创作的音乐之间没有可识别的、可靠的区别特征。而人类音乐也可能很糟糕。所以从教育角度来说:我被要求教学生早期音乐和技术技能,但我面临的挑战是"我可以直接生成这个"。与我们房间里大多数受过声音和音乐训练的人不同——我们将AI视为额外的工具集或替代的威胁——有一代人在问"我们真的需要那些吗?我们需要知道五度圈吗?我们需要了解蒙特威尔第吗?我们需要了解斯特拉文斯基吗?"
所以我正在思考水印的概念——能够区分什么是AI、什么不是AI的能力,这样我们可以做出选择:我喜欢这个吗?我想支持AI音乐还是人类创作的音乐?有两种使用场景:在教育中,我怎么知道我的学生是否在作弊?更广泛地说,我怎么知道是否要根据其中涉及的AI程度来支持某个项目?
当我说水印时,我的意思更多是:这是人类制作的吗?一个标签说"100%人类产品"。所以从工作组的角度来看,我可以看到有理由让工具制作者选择这样的策略——先是自愿参与的系统,然后可能演变发展。这是否是一个这个小组可以与工具制作者一起探讨的话题,关于内容标记的问题?
Alex(回应 / Response):
We have heard about various tools that can identify AI content, including the fact that stem subversion — separating components — makes detection difficult. And within the Game Audio Association, there's a discussion between Adam Billia and Brian Hardgrove about creating something I call "Circle H" — like Circle K, a mark to indicate that a piece of music is human-created. The snag right now is that it's percentage-based, not a binary 100% human vs. 100% AI. Somehow we need to come up with a measure of how much of a particular track is human versus AI — how "spicy" it is in either direction. That's an open question, but it's being actively discussed.
我们听说过各种可以识别AI内容的工具,包括茎分离——将音乐各组成部分分开——会使检测变得困难的问题。在游戏音频协会内部,Adam Billia和Brian Hardgrove之间有一个讨论,关于创建一个我称之为"Circle H"的标识——类似Circle K的商标,用来表示一段音乐是人类创作的。目前的障碍是这需要基于百分比,而不是简单的二元对立——100%人类对100%AI。我们需要找到一种方法来衡量一首特定曲目中人类与AI各占多少——它在任意一个方向上有多"浓"。这是一个悬而未决的问题,但正在被积极讨论。
Aleana Zebra(对话编辑 / Dialogue Editor):
My name is Aleana Zebra. I mainly do dialogue editing in games — I'm working on about a dozen triple-A projects, making the voice actors sound good. My first introduction to music technology was via composition — I double-majored in music composition and audio technology. When I was focusing more on composition and learning to use VSTs, one of the first things I learned when writing music with a piano roll is that if every note happens exactly on time, it sounds like a computer did it. So what I would do is go in and nudge every note just a tiny bit early or a smidgen late — just to add that human element, to give something more human to the virtual performance.
This whole conversation has given me a lot of thoughts on how we're now trying to get AI to do that humanization in a different way — and whether that's really bringing us closer to authenticity or just simulating it.
我叫Aleana Zebra,主要在游戏中做对话编辑——我正在参与大约十几个3A项目,让配音演员听起来更好。我对音乐技术的最初接触是通过作曲——我主修音乐作曲和音频技术双学位。当我更专注于作曲、学习使用VST时,我在钢琴卷窗中写音乐时学到的第一件事是:如果每个音符都精确地在正确时间上,听起来就像是电脑做的。所以我会去手动把每个音符调得稍早一点或稍晚一点——只是为了增加那种人类的感觉,给虚拟演奏带来更多人性化的元素。
整个对话让我对我们现在如何试图让AI以不同的方式来做这种人性化处理有了很多思考——这是否真的让我们更接近真实感,还是只是在模拟它。
[演讲者 / Speaker](回应 / Response):
There actually have been systems to do that timing shift — it's called humanization, and it's been around for a long time. Most AI tools that generate MIDI will do humanization, and it's a spectrum — how much humanization to apply is a continuum.
实际上,已经有做这种时间偏移的系统存在——它叫做"人性化"处理,已经存在很长时间了。大多数生成MIDI的AI工具都会进行人性化处理,这是一个连续的谱系——施加多少人性化是一个可调节的连续量。
Rossa Pavash(DICE技术音效设计师 / Technical Sound Designer, DICE):
My name is Rossa Pavash, I'm a technical sound designer at DICE. This is about music, but also about generative content in general. On the topic of the inherent value between human performances and generated performances — I can imagine that value gap will eventually close over time. But I don't think we need to guide ourselves based on whether AI is better or worse, because as I see it, art is not just about the final product — it's about the journey along the way. When you show a song to someone, it's not just "listen to this." It's "listen to this — it was written by this person, for these reasons, in this particular moment in their life." It needs to have a story that goes along with the art piece, not just the art piece itself. If we can keep that in mind — even if we do use AI — if we keep a story, a motivation, a path that guides you to the finish line, then we can achieve something that is still art regardless of the tools used.
我叫Rossa Pavash,是DICE的技术音效设计师。这个话题关于音乐,但也关于一般意义上的生成式内容。关于人类演奏和生成演奏之间固有价值的话题——我可以想象这种价值差距最终会随着时间推移而缩小。但我认为我们不需要以AI是否更好或更差来引导自己,因为在我看来,艺术不仅仅是关于最终产品——它也是关于沿途的旅程。当你向某人展示一首歌时,不仅仅是"听这个",而是"听这个——它是由这个人写的,出于这些原因,在他们生命中的这个特定时刻"。它需要有一个伴随艺术作品的故事,而不仅仅是艺术作品本身。如果我们能记住这一点——即使我们确实使用了AI——如果我们保持一个故事、一个动机、一条引导你到终点的路径,那么无论使用什么工具,我们都能实现仍然是艺术的东西。
六、Vibe Coding 与调试 · Vibe Coding & Debugging
[演讲者 / Speaker]:
I run a small two-person company making music instruments. We have to use these AI tools because between the two of us, we can't get enough work done. We have switched completely over to cloud-based vibe coding. I haven't written a lot of code in six months. We found bugs at the NAMM show in our product. We gave the whole codebase to Claude — Claude instrumented the code, we told it what was wrong in the instrumentation log file, and then we said "go, crank on it for two hours and fix the bug." And that's the world that we're living in right now as far as audio coding is concerned. If you want new features in the audio engine, new interactive features in the system — that's probably how you're going to get there.
There is also a section in the wiki with information about tool assist and augmentation, along the lines of Aura and Rudder for Unreal Engine — worth checking out.
我经营一家制作音乐乐器的两人小公司。我们必须使用这些AI工具,因为凭我们两个人,无法完成足够的工作量。我们已经完全切换到基于云的vibe coding。在过去六个月里,我没有写太多代码。我们在NAMM展会上在我们的产品中发现了Bug。我们把整个代码库交给Claude——Claude对代码进行了插桩,我们告诉它日志文件中哪里出了问题,然后说"去吧,花两个小时解决这个Bug"。这就是我们目前在音频编程领域所处的世界。如果你想要音频引擎中的新功能、系统中的新交互功能——这可能就是你实现它的方式。
Wiki里还有一个关于工具辅助和增强的章节,涉及面向Unreal引擎的Aura和Rudder——值得去查看。
Dan Hickory(作曲家与Bird Collection版权公司 / Composer and Bird Collection Music Rights):
My name is Dan Hickory. I'm a composer, music educator, and I work at a music rights company called Bird Collection. In response to my fellow educator's question about why students need to learn music theory: royalties. As of right now, AI-generated music is not qualified for royalties and copyright protection. So I'm wondering — on the voice acting and sound effects side, is there a similar legislative issue that could really damage that situation, or is it just music?
我叫Dan Hickory,是一名作曲家、音乐教育者,在一家叫Bird Collection的音乐版权公司工作。回应我同行教育者关于学生为什么需要学习音乐理论的问题:版税。就目前而言,AI生成的音乐没有版税和版权保护资格。所以我想知道——在配音和音效方面,是否也存在类似的立法问题,可能会真正损害该领域的情况,还是这主要只是音乐面临的问题?
Alex(回应 / Response):
You've gotten to a root question we discuss in the group. For voiceover, there are guidelines from SAG-AFTRA. For script writing, there are guidelines from the Writers Guild. For composers, though, you don't have a guild — and ASCAP, SESAC, and BMI are not going to be dealing with AI replacement legislation for you. I actually think there needs to be a discussion within the composing community about creating a Composers Guild, to at a minimum deal with legislation for composers. While that's being worked out, I think the people designing tools to help with the mechanical things we do — not the creative things — are doing something really important right now as all of this gets sorted out.
你问到了一个我们在小组中讨论的核心问题。对于配音,SAG-AFTRA有相关指导方针。对于剧本写作,有编剧工会的指导方针。但对于作曲家来说,你们没有工会——而ASCAP、SESAC和BMI不会为你们处理AI替代相关的立法问题。我实际上认为在作曲社区内需要有一个关于建立作曲家工会的讨论,至少要处理与作曲家相关的立法问题。在这一切都在进行的同时,我认为那些设计工具来帮助我们完成机械性工作——而不是创意工作——的人,在这个一切都在被厘清的时刻,正在做一件非常重要的事情。
Crystal Cooper(罗切斯特理工学院AI研究者与教育者 / AI Researcher and Educator, Rochester Institute of Technology):
Hi, I'm Crystal Cooper. I teach at Rochester Institute of Technology — audio technology and also AI research. This is just for anybody interested in AI-assisted work: make sure that you give your AI tools — Claude, whoever you use — context. I think we assume that it has access to all the world's knowledge and knows how to use it. If you don't interpret your requests and provide context — music notation, compositional style, all those things — without any context, you're going to end up with something completely different from what you intended. So unless you plan to upload your sheets of music plus how you compose, it will fill in the blanks on its own. Just something to think about.
大家好,我是Crystal Cooper。我在罗切斯特理工学院教书,专注于音频技术和AI研究。这是给所有对AI辅助工作感兴趣的人的一个提示:请确保给你的AI工具——Claude,或者你使用的任何工具——提供背景信息。我认为我们往往假设它可以访问世界上所有的知识并知道如何使用它们。如果你不解释你的请求并提供背景信息——乐谱记谱法、作曲风格、所有这些东西——没有任何背景,你最终会得到与你预期完全不同的东西。所以除非你计划上传你的乐谱加上你的作曲方式,否则它会自行填补空白。只是一个值得思考的点。