Civilization VII: Mixing a Sound Strateg

Civilization VII: Mixing a Sound Strategy

Speakers: Cadet、Dmitri、Joe（Firaxis Games） Special Thanks: Dylan Escola-Sandoval Senior, Sound Designer

Introduction

Welcome to our Civilization VII "Mixing a Sound Strategy" talk. We're here from Firaxis Games. Our senior sound designer Dylan Escola-Sandoval contributed greatly to this talk and we want to give him a shout out.

欢迎来到我们的《文明VII》"混音的声音策略"分享。我们来自Firaxis Games。我们的高级音效设计师Dylan Escola-Sandoval对这次分享贡献巨大，我们要向他致谢。

一、核心混音目标 · Main Mix Goals

Cadet:

To start, I'm going to discuss what our main mix goals were.

首先，我来讲讲我们的核心混音目标是什么。

The first goal was mixing a high level of complexity at scale. In Civilization VII, we brought sound design to the forefront of the experience like never before. With so many details added, the greatest challenge as an audio dev team was to maintain clarity within a really dense and adaptive mix — for a game that grows exponentially with every turn as it progresses. It could have easily become chaos. Our focus was to craft a mix that makes each session feel both authentic and alive without becoming sonically overwhelming.

第一个目标是在宏大规模下混音高度复杂的内容。在《文明VII》中，我们将声音设计推向了前所未有的体验前沿。随着加入了大量细节，作为音频开发团队，最大的挑战是在一个极为密集且自适应的混音中保持清晰度——而这款游戏每轮都在指数级增长。它很容易就会变成一团混乱。我们的重点是打造一个让每次游戏感觉既真实又充满生命力、同时又不会让听觉感到不堪重负的混音。

Our second goal was to match the immersive audio to the rich visual fidelity. Compared to past Civilization games, we increased the sound design details with a focus on making sure each civilization has its own unique identity. Through extensive research, historically accurate sound effects were applied to everything possible — unit weapons and vehicles, buildings like specific temple or church bells, wallas, the natural environments of these buildings, as well as natural wonders like local birds and animal sounds. This additional detail gives players a deeper knowledge of different civilizations and adds an immersive level to the gameplay.

第二个目标是将沉浸式音频与丰富的视觉保真度相匹配。与过去的《文明》系列相比，我们增加了声音设计的细节，重点确保每个文明都有其独特的身份认同。通过大量研究，历史上准确的音效被应用到了一切可能的地方——单位的武器和载具、特定的神庙或教堂钟声等建筑、群众声，以及这些建筑的自然环境，还有自然奇迹中的本地鸟类和动物声音。这些额外的细节让玩家对不同文明有更深入的了解，并为游戏增添了沉浸感。

Our last main mix goal was to create a dynamic, responsive, and seamless experience. The driving force throughout our development process was to make Civ VII sound dynamically responsive to players' choices — leveraging sound to enhance gameplay to a level not executed before in past Civ games. Ensuring a seamless mix while doing this was a huge challenge, one that none of us on the audio team had encountered in our careers. Unlike traditional games, Civ VII has everything on screen at once. We needed a new sound strategy with flexible systems to control the mix of these vast amounts of simultaneous audio sources.

我们最后一个核心混音目标是创造动态的、响应式的、无缝的体验。贯穿整个开发过程的驱动力，是让《文明VII》的声音对玩家的选择做出动态响应——利用声音将游戏体验提升到过去的《文明》游戏从未达到过的高度。在此过程中确保无缝的混音是一个巨大的挑战，我们音频团队中没有任何人在职业生涯中遇到过这种情况。与传统游戏不同，《文明VII》在同一时间将所有内容都显示在屏幕上。我们需要一套全新的声音策略，配合灵活的系统来控制这大量同时播放的音频源的混音。

二、核心混音哲学 · Core Mixing Philosophies

Aligning our sound strategy with core mixing philosophies proved to be a really helpful point of reference along the way. As many of you know, there are countless decisions to make when mixing a game, and I personally found myself coming back to these over and over to help guide our strategies and choices.

将我们的声音策略与核心混音哲学对齐，在整个过程中被证明是非常有用的参考。正如你们很多人所知道的，在混音一款游戏时需要做出无数的决定，我个人发现自己一次又一次地回到这些哲学来帮助指导我们的策略和选择。

The first is gameplay focus — I would say this was the most prominent one that guided the challenging question of what key gameplay information we want to hear at all times.

第一是游戏焦点——我会说这是最突出的一个，它引导着"我们始终想听到哪些关键游戏信息"这个具有挑战性的问题。

The second is tolerance over time. A player in Civilization repeats several core movements each turn that define a 4X game: scouting and exploration, moving military units, expansion with settlers, city production management, and so on. These actions scale as the game progresses, but the underlying repetitive motions remain constant. The large amount of UI involved is also an integral part of the experience, so repetitive tolerance needed to be taken into great consideration. Full frequency representation was something we built into the designs from the start — it creates a more subconscious listening experience. Randomized parameters along with other mixing strategies all help alleviate ear fatigue, and we'll share some specific examples of that.

第二是随时间的容忍度。《文明》的玩家每回合会重复几个定义4X游戏的核心操作：侦察探索、移动军事单位、用定居者扩张、城市生产管理等等。这些动作随着游戏进展而扩大，但基础的重复动作保持不变。涉及的大量UI也是体验中不可或缺的一部分，因此重复性的容忍度需要得到极大的重视。全频率表现是我们从一开始就内置到设计中的东西——它创造了一种更潜意识的聆听体验。随机化参数以及其他混音策略都有助于减轻听觉疲劳，我们将分享一些具体的例子。

The third is engagement is dynamic and seamless. Ensuring a seamless sound experience that also has dynamic attenuations is highly important for a Civ game — to help create an organic listening experience with all these subtle changes. It reflects our reality, but it's also fun and rewarding, and most importantly it feels like a game. Assets attenuate dynamically and gradually over various durations. Many of our listener placement and zoom level mixing decisions were continuously referencing this philosophy.

第三是互动性是动态而无缝的。确保一个同时具备动态衰减的无缝声音体验，对于《文明》游戏来说非常重要——帮助通过所有这些细微变化创造出有机的聆听体验。它反映了我们的现实，但它也是有趣和有价值的，最重要的是感觉像一款游戏。资产在不同持续时间内动态地逐渐衰减。我们的很多听者定位和缩放级别混音决策都在持续地参考这一哲学。

The fourth is immersion is key: spatial mixing, dynamic EQ, consistency across attenuations, as well as representing the historical and geographical context as clearly as possible. A big part of mixing Civ is emphasizing shifting contexts that are constantly changing with different UX flows while keeping the player immersed in the game world.

第四是沉浸感是关键：空间混音、动态均衡、跨衰减的一致性，以及尽可能清晰地呈现历史和地理背景。混音《文明》的一大重点是强调在不同UX流程中不断变化的情境转换，同时让玩家保持沉浸在游戏世界中。

Last is prioritization — top critical sounds take priority, and then the challenge is to determine the priorities of all the other sounds around that focal point. At some moments we asked ourselves if we actually really need to hear a sound at all. Working with priorities was a big challenge that also tied into performance and optimization. With thousands of potential audio events triggered in real time, we built scalable systems for dynamic prioritization and real-time mixing, ensuring that the soundscape remains coherent, expressive, and deeply tied to player choice.

最后是优先级——最关键的声音优先，然后挑战是确定围绕该焦点的所有其他声音的优先级。在某些时刻我们会问自己，我们是否真的需要听到某个声音。使用优先级是一个巨大的挑战，也与性能和优化密切相关。随着实时触发数千个潜在音频事件，我们建立了可扩展的动态优先级和实时混音系统，确保声景始终保持连贯、富有表现力，并与玩家的选择深度关联。

Throughout our session we will showcase many of our creative and technical approaches to developing those systems, starting with scale, then perspective, and finally the final mix.

在整个分享中，我们将展示开发这些系统的许多创意和技术方法，从规模开始，然后是视角，最后是最终混音。

三、规模：环境与静态发声体 · Scale: Ambience & Static Emitters

Dmitri:

I'm going to talk about the ambience of Civilization VII. The way it's made up is we have a 2D environmental bed and 3D sounds that we call static emitters. For the scale of our game, when we zoom out we have thousands of emitters playing, so controlling them was quite a challenge — and it's more complex than it may seem.

我来讲讲《文明VII》的环境音。它的构成方式是：我们有一个2D环境底层音轨，以及我们称之为静态发声体的3D声音。对于我们游戏的规模来说，当我们缩放到全局视角时，会有数千个发声体在同时播放，因此控制它们是一个相当大的挑战——而且比看起来要复杂得多。

2D环境底层 · 2D Environmental Bed

One of the solutions we're first going to talk about is how we control the idea of this environment made up of hundreds of different biomes. We have a 2D environmental bed that represents the desert, tundra, grassland, tropical, and so on. Most of the time we do have all of those terrain types on screen, and when you zoom out, that's when we get to the hundreds of biomes. So it's not an option to just stick an emitter on every single hex and have those sounds playing all at once.

我们首先要讲的解决方案之一是：我们如何控制这个由数百种不同生物群落组成的环境的概念。我们有一个2D环境底层音轨，代表沙漠、苔原、草地、热带等等。大多数时候，所有这些地形类型都在屏幕上，当你缩放到全局视角时，就会出现数百个生物群落。所以在每一个六边形格上都放置一个发声体让这些声音同时播放，这根本不是一个可行的方案。

Instead, what we did is use a system that counts up all the hexes on the map and decides which ones are more prominent. So if there are 40 desert tiles, 20 grassland, and 10 jungle, it's only going to play the desert and the grassland — not the jungle. The ocean biome is a special exception that's always allowed to play, since when you see an ocean you want to hear those water sounds. That's when we use techniques like side chaining and some RTPCs to actually make it dynamic — getting louder and quieter as you zoom around the map.

我们所做的是使用一个系统来统计地图上所有的六边形格，并决定哪些更为突出。所以如果有40个沙漠格、20个草地格和10个丛林格，它只会播放沙漠和草地——而不是丛林。海洋生物群落是一个特殊的例外，它总是被允许播放，因为当你看到海洋时，你想听到那些水声。这时我们使用侧链和一些RTPC技术，使其真正动态化——随着你在地图上缩放而变得更响或更安静。

3D静态发声体 · 3D Static Emitters

That leads into our other big challenge: 3D static emitters. Static emitters are 3D sounds attached to stationary things like our buildings, wonders, and improvements — and there's a lot of that in Civilization. In Wwise those are usually set up in a blend container with random containers that are playing one-shots, and sometimes a looping sound on top of that. For the scale of our game and the quantity of these 3D sounds, we really try to rely not on loops as it is not as performant — but that's ultimately the decision of our sound designers, and sometimes we do have loops.

这就引出了我们另一个大挑战：3D静态发声体。静态发声体是附着在静止物体上的3D声音，比如我们的建筑、奇迹和改进项目——在《文明》中有大量这样的内容。在Wwise中，这些通常设置在一个混合容器中，配合随机容器播放单次音效，有时上面还有一个循环音效。对于我们游戏的规模和这些3D声音的数量，我们真的尽量不依赖循环，因为循环的性能不够理想——但这最终是我们音效设计师的决定，有时我们还是有循环音效的。

Some of our 3D sounds might not sound as awesome when isolated on their own, but in combination with the 2D environmental bed and all these static emitters playing together, it becomes very active, very busy, very quickly — and mixing those was a really big challenge.

我们的一些3D声音单独隔离出来时可能听起来不那么出色，但与2D环境底层音轨以及所有这些静态发声体共同播放的组合下，它会变得非常活跃、非常忙碌、非常迅速——而混音这些内容是一个真正巨大的挑战。

分类与实例限制 · Categories and Instance Limiting

We started by separating 3D sounds into categories: buildings, improvements, wonders, and so on. They all have custom Actor-Mixer hierarchies and custom buses in Wwise. This gave us the power to create custom behavior for each type of building — custom attenuations, instance limiting, priorities, different routing, tons of RTPCs, states, and side chains.

我们首先将3D声音按类别分开：建筑、改进、奇迹等等。它们在Wwise中都有自定义的Actor-Mixer层级和自定义的总线。这让我们有能力为每种类型的建筑创建自定义行为——自定义衰减、实例限制、优先级、不同的路由、大量的RTPC、状态和侧链。

Looking specifically at how we managed to keep those 3D sounds under control — we focus mostly on instance limiting and priorities. Taking a building as an example: you will notice that we have limitation at almost every step. We limit first at the sound level — each individual sound. Then we limit at the blend level and the random container level. Then we limit at the building level — for instance, the "Ambiance Buildings Arena" allows only three instances to play globally. On top of that, we send them to different buses — the loop and the one-shot FX — and each of those buses has its own limits. And then finally, on top of that, we also have limitations set on the ambient bus itself.

具体来看我们如何设法控制那些3D声音——我们主要关注实例限制和优先级。以一个建筑为例：你会注意到我们在几乎每一步都有限制。我们首先在声音层面进行限制——每个单独的声音。然后我们在混合层面和随机容器层面进行限制。然后我们在建筑层面进行限制——例如，"Ambiance Buildings Arena"全局只允许三个实例播放。在此之上，我们将它们发送到不同的总线——循环和单次音效FX——这些总线各自都有自己的限制。最后，在此之上，我们还对环境总线本身设置了限制。

These limitations kept under control the number of building sounds playing at any given time, while also avoiding having too many instances of the same building playing. This let us hear different buildings instead of using voices for multiple instances of the exact same building. Some of those limitations are platform-dependent to make sure we stay performant on each platform — and Wwise lets us do that quite easily.

这些限制控制了在任何给定时间播放的建筑声音数量，同时也避免了太多相同建筑的实例在播放。这让我们能听到不同的建筑声音，而不是用多个完全相同建筑的实例占用语音槽。其中一些限制取决于平台，以确保我们在每个平台上保持良好的性能——Wwise让我们可以很容易地做到这一点。

按时代设置优先级 · Age-Based Priorities

On top of the limitations, we also have priorities set up by age. In Civilization VII, one of our unique features is that you transition from Antiquity to Exploration to the Modern Era. What we decided to do is set up priorities on each of these ages — when you build a newer building, that building actually has a higher priority than the previous one. All those priorities are offset by their total value at max distance as well, to always aim to hear the most relevant sound for what the player is seeing, and always giving priority to what is centered on screen. That's an oversimplification of our priorities system, but it's the main idea behind it — and the same logic is applied to almost every element of our game based on what we think is more important or relevant to the player.

在限制之上，我们还设置了按时代区分的优先级。在《文明VII》中，我们的独特功能之一是你从古代时代过渡到探索时代再到现代时代。我们决定为每个时代设置优先级——当你建造一个更新的建筑时，该建筑实际上比之前的建筑有更高的优先级。所有这些优先级也会根据最大距离的总值进行偏移，目的是始终让玩家听到与他们所看到的最相关的声音，并始终优先考虑屏幕中心的内容。这对我们的优先级系统来说是一种过度简化，但它是背后的主要思想——同样的逻辑几乎应用于我们游戏的每个元素，基于我们认为对玩家更重要或更相关的内容。

四、规模：引擎端的解决方案 · Scale: The Engine-Side Solution

Joe:

As Dmitri mentioned, it's possible to have thousands of these static emitters trying to play at any given time. And because it's possible to whip the camera all the way across the map in a matter of a second, you really need to have everything ready to play at all times. If you're thinking like an engineer, you might think: how are we going to keep CPU performance manageable with so many static emitters?

正如Dmitri提到的，在任何给定时间可能有数千个静态发声体试图播放。因为你可以在一秒钟内将摄像机横扫整个地图，所以你真的需要随时让一切都准备好播放。如果你从工程师的角度思考，你可能会想：我们怎么在有这么多静态发声体的情况下保持CPU性能？

We tried a couple of things. First, we tried Wwise Virtual Voices — we implemented it and found that even when using virtual voices, Wwise still needs to run spatialization and attenuation calculations to determine what to virtualize and what not to, what to start and stop. It's a great feature, but on our older consoles like Switch and PS4 this was still using too much CPU.

我们尝试了几种方法。首先，我们尝试了Wwise虚拟语音——我们实现了它，但发现即使使用虚拟语音，Wwise仍然需要运行空间化和衰减计算，来确定什么要虚拟化、什么不要，什么要开始、什么要停止。这是一个很棒的功能，但在我们的旧主机比如Switch和PS4上，这仍然占用了太多CPU。

Next, we thought about Wwise's Kill Voice behavior — instead of virtualizing, just kill it. But this presented a software design challenge: once the voice is killed, something has to start it again, because this is a thing that's looping over time. If it's gone and then the player comes back, the system needs to know it was killed and then restart it. Now all of a sudden our engine would need to retrigger the sounds — but the engine doesn't know what Wwise is doing, and that's on purpose, because it's bad software design to have too many systems talking to each other.

接下来，我们考虑了Wwise的Kill Voice行为——不是虚拟化，而是直接终止它。但这带来了一个软件设计挑战：一旦语音被终止，某些东西需要再次启动它，因为这是一个随时间循环的东西。如果它消失了，然后玩家回来了，系统需要知道它被终止了然后重新启动它。现在我们的引擎突然需要重新触发这些声音——但引擎不知道Wwise在做什么，而这是有意为之的，因为让太多系统相互通信是不好的软件设计。

So we came up with a third solution: a priority and culling system entirely on the engine side. If we do it all on the engine, we can start and stop the sounds in the same general system — no need to know what other systems are doing. We can also limit the parameters that affect priority and whether or not something should play. In our case, we used just distance from the listener — actually distance squared, but that's an implementation detail. We're also able to throttle the rate at which priorities are recalculated: we do it once every 250 milliseconds, and in that way we're able to free up some CPU for other processes.

所以我们想出了第三个解决方案：一个完全在引擎端的优先级和剔除系统。如果我们全部在引擎上处理，我们可以在同一个系统内启动和停止声音——不需要知道其他系统在做什么。我们还可以限制影响优先级以及是否应该播放某个声音的参数。在我们的情况下，我们只使用听者距离——实际上是距离的平方，但这是一个实现细节。我们还能限制优先级重新计算的频率：我们每250毫秒计算一次，这样我们能够为其他进程腾出一些CPU。

Here's how it works in detail: we get a list of priorities from our sound design team — really similar if not identical to the ones in Wwise. We use those as our initial priority score, then we subtract from that score an amount relative to the distance from the listener, which is positioned somewhere in the middle of the screen. We also do a little extra calculation based on whether it's a repeating sound or a second instance of the same sound. We then wind up with a final list of scores, sort them from highest to lowest, and cut off all but 50 — those are what's allowed to play, and everything else is stopped. On Nintendo Switch we cut all but 10, other consoles maybe 20, but for PC our target is 50.

这是它的具体工作原理：我们从声音设计团队获得一个优先级列表——与Wwise中的优先级非常相似甚至相同。我们将这些作为初始优先级分数，然后减去一个相对于听者距离的数值，听者位于屏幕中间某处。我们还根据它是否是一个重复音效或同一声音的第二个实例进行一些额外计算。然后我们得到一个最终的分数列表，从高到低排序，砍掉只剩50个——这些是允许播放的，其他一切都被停止。在Nintendo Switch上我们只保留10个，其他主机可能是20个，但对于PC我们的目标是50个。

五、规模：连续战斗系统 · Scale: Continuous Combat System

Dmitri:

Compared to older titles, our combat system is more complex and busier than ever. Most units are now composed of at least eight to fourteen characters, dealing with at least double the quantity compared to older titles like Civ VI where units were made up of about three characters. On top of that, previous iterations relied on what we call internally the "slap and run" technique — attack and then go back to their position, that was it. Simple and efficient, but not super immersive, and it didn't necessarily convey the information of where to find those units.

与旧版相比，我们的战斗系统比以往更复杂、更繁忙。现在大多数单位至少由八到十四个角色组成，与旧版如《文明VI》相比，角色数量至少翻倍——《文明VI》中单位大约由三个角色组成。除此之外，之前的版本依赖于我们内部称之为"打了就跑"的技术——攻击然后回到自己的位置，就这样。简单高效，但不太沉浸，也不一定传达了单位位置的信息。

We now have a system called continuous combat — when a melee unit attacks another melee unit, it stays in combat until the very next turn. This is a really cool system, but we can have a massive number of units on screen at once, all doing combat, all staying in combat until the next turn — which causes a lot of issues with our mix.

我们现在有一个叫做"持续战斗"的系统——当一个近战单位攻击另一个近战单位时，它会保持战斗状态直到下一回合。这是一个非常酷的系统，但我们可能在屏幕上同时有大量单位，全都在战斗，全都保持战斗直到下一回合——这对我们的混音造成了很多问题。

The two main challenges: first, overlapping sounds. We have a lot of similar characters on screen doing the same actions, and we can try making as many variations as possible, but that's not necessarily going to solve this situation. We were getting volume spikes, phase problems, and boring sound design — if all these units fire the exact same sounds, even 64 variations isn't going to sound incredible. Second, with continuous combat, we had the issue of ear fatigue, lack of dynamics, and conflicting feedback — all these units fighting makes it hard for the player to pay attention to what they should focus on.

两个主要挑战：首先是叠加声音。屏幕上有很多相似的角色在做相同的动作，我们可以尽量制作更多的变体，但这不一定能解决这种情况。我们遇到了音量峰值、相位问题，以及无聊的声音设计——如果所有这些单位触发完全相同的声音，即使有64个变体也不会听起来很棒。其次，持续战斗带来了听觉疲劳、缺乏动态以及反馈冲突的问题——所有这些单位都在战斗，玩家很难注意到他们应该关注的内容。

六、战斗声音的工程解决方案 · Engineering Solutions for Combat Audio

Joe:

For overlapping sounds, we used Wwise RTPCs and End-of-Event callbacks. Once again we got a list from our sound design friends — this is a list of events that are going to require overlap handling. Not everything is going to be a problem if you hear multiple instances of it. During gameplay, when the engine detects that it is triggering a sound that's in the overlap list, it tallies up an RTPC. It starts at zero; when we get one instance, we tick it to one; another instance, two; another, three; and so on. Then we get End-of-Event callbacks — Wwise telling us that an event has completed playing. When we get one for a handled event, we subtract from that RTPC. So if we're at three and one ends, we go to two, another ends, we go to one, another ends, back to zero. On the Wwise side, this RTPC controls start delay, pitch, EQ, and volume.

对于叠加声音，我们使用了Wwise RTPC和事件结束回调。我们再次从音效设计朋友那里获得了一个列表——这是一个需要叠加处理的事件列表。不是每件事在听到多个实例时都会成为问题。在游戏过程中，当引擎检测到它正在触发列表中的一个声音时，它会累加一个RTPC。从零开始；当我们得到一个实例时，我们将其增加到一；另一个实例，增加到二；另一个，三；以此类推。然后我们得到事件结束回调——Wwise告诉我们一个事件已经播放完毕。当我们为一个处理过的事件收到回调时，我们从该RTPC中减去。所以如果我们在三，一个结束，我们变成二，另一个结束，变成一，再一个结束，回到零。在Wwise端，这个RTPC控制起始延迟、音调、EQ和音量。

I want you to focus on just the start delay for a moment. If we have a unit of eight archers and they all fire their arrows at once, and all those arrows land at once, you're going to hear just one transient — they all land simultaneously. But if we use that RTPC to delay each instance of that overlapping impact sound, you go from a single transient to a nice glissando of impacts — it sounds more like a brrrrr — a rolling wave of arrival rather than a single thud.

我想让你暂时只关注起始延迟。如果我们有一个由八个弓箭手组成的单位，他们都同时射出箭，所有箭也同时落地，你会听到只有一个瞬态——它们同时落地。但是如果我们使用该RTPC来延迟每个叠加冲击声的实例，你就从单个瞬态变成了一个很好的滑音冲击序列——听起来更像一个颤音——一波滚动的到达声，而不是单一的撞击。

Next, to solve the ear-fatiguing nature of continuous combat, we used a binary RTPC and Wwise's interpolation feature. A binary RTPC is simply one or zero — on or off. In this case, it represents whether or not a unit is in combat: zero means not in combat, one means in combat. Here's where interpolation comes in: interpolation allows you to not just snap an RTPC from one value to another — it fades it over an amount of time you specify. When we go from not-in-combat to in-combat, we have a 24-second fade time on that RTPC. We use it to inversely control the volume of the game objects, so when units start hitting each other, it's nice and loud — and over time, as you start to get sick of it, it turns down gradually. You barely notice it happening, but your ears thank you.

接下来，为了解决持续战斗的听觉疲劳性质，我们使用了二进制RTPC和Wwise的插值功能。二进制RTPC就是一或零——开或关。在这种情况下，它代表单位是否在战斗中：零意味着不在战斗中，一意味着在战斗中。这就是插值发挥作用的地方：插值允许你不仅仅是将RTPC从一个值跳变到另一个值——它在你指定的时间内淡化过渡。当我们从不在战斗到进入战斗时，我们有24秒的淡化时间。我们用它来反向控制游戏对象的音量，所以当单位开始互相攻击时，声音很好听、很响亮——随着时间的推移，当你开始厌倦它时，它会逐渐降低。你几乎不会注意到它在发生，但你的耳朵会感谢你。

七、视角：听者定位的迭代历程 · Perspective: The Listener Placement Journey

Dmitri:

I expect most of us know what a listener is, but just in case: the listener is the point from which Wwise listens to our game. Every sound is played through that point of perspective, which is why finding the best placement possible is crucial for any video game. Civilization VII being a top-down game with the ability to zoom in and out at will makes that really challenging.

我想大多数人可能知道什么是听者，但以防万一：听者是Wwise聆听我们游戏的点。每个声音都通过那个透视点来播放，这就是为什么找到最佳位置对任何视频游戏都至关重要。《文明VII》作为一个可以随意缩放的俯视角游戏，使得这一点尤其具有挑战性。

We tried multiple approaches to find the right listener placement. Every time we thought we were done, we found a new issue.

我们尝试了多种方法来找到正确的听者位置。每次我们认为我们完成了，都会发现新的问题。

第一次迭代 — 默认：摄像机上 / Iteration 1 — Default: On the Camera

The first listener placement is the default: the listener is placed on the camera. This causes a lot of problems. First, the object in the center of the screen was really far away when zoomed out, so we had to rely on really wide attenuations to make sure we actually heard that object. That worked fine as long as you were only zoomed out, but as soon as you zoomed in, suddenly you're hearing everything around you since the attenuations are so massive. A lot of off-screen sounds were playing because the attenuation was enormous. Also, sounds directly under the camera were louder than sounds from the center of the screen, where the player is focused — because technically they're closer to the camera. Far from ideal.

第一个听者位置是默认的：听者放在摄像机上。这造成了很多问题。首先，当缩放到全局视角时，屏幕中心的物体实际上非常远，所以我们不得不依赖非常宽的衰减来确保我们实际上能听到那个物体。这在缩放到全局视角时效果还好，但一旦缩放进去，你突然能听到周围的一切，因为衰减如此巨大。大量屏幕外的声音在播放，因为衰减范围太大。此外，摄像机正下方的声音比屏幕中心（玩家关注的地方）的声音更响，因为从技术上讲它们更接近摄像机。远非理想。

第二次迭代 — 偏移到屏幕中心正上方 / Iteration 2 — Offset to Screen Center

For the second iteration, the listener was offset from the camera to always keep it on top of the center of the screen. This solves the massive problem of sounds directly under the camera being louder than the ones the player is actually focused on. It also allowed us to have slightly smaller and more manageable attenuations. At this point we were using a hybrid system of attenuation and cone listener filtering to deal with what was on screen versus off screen. But it made the mixing process really complicated and hard to maintain — too many moving parts working against each other.

对于第二次迭代，听者从摄像机偏移，始终保持在屏幕中心正上方。这解决了摄像机正下方的声音比玩家实际关注的声音更响的大问题。它还允许我们有稍微更小、更易管理的衰减。此时，我们使用衰减和锥形听者滤波的混合系统来处理屏幕内和屏幕外的问题。但这使混音过程非常复杂且难以维护——太多相互对抗的活动部件。

第三次迭代 — 地图层面的屏幕中心 / Iteration 3 — Map Level Center of Screen

We moved the listener to the center of the screen at the map level. With this change, attenuations got really small and easily manageable since the listener is always on the ground at the map level. This also made on-screen and off-screen behavior feel natural, so we didn't have to use the cone listener anymore, which was really convenient. However, the listener being in a static location and not moving with the camera meant that zooming in and out was not going to work anymore. But we had already set up an RTPC called "camera zoom" for other systems, and this let us use some global filtering values on the buses, making it really easy to mix and maintain over time.

我们将听者移动到地图层面的屏幕中心。有了这个改变，衰减变得非常小且易于管理，因为听者始终在地图层面的地面上。这也使屏幕内和屏幕外的行为感觉自然，所以我们不必再使用锥形听者，这非常方便。然而，听者处于静态位置而不随摄像机移动，意味着缩放功能将不再正常工作。但我们已经为其他系统设置了一个叫做"摄像机缩放"的RTPC，这让我们可以在总线上使用一些全局滤波值，使其随时间的推移非常容易混音和维护。

第四次迭代 — 距离探针 / Iteration 4 — Distance Probe

At this point we were pretty confident we were done. Then we finally built our internal Atmos setup and noticed things didn't sound quite right. The orientation and placement of the listener caused issues with our Atmos spatial image. To fix it, the listener needed to go back on the camera — which also introduced all the previous problems. Luckily, Wwise has a Distance Probe: an object you can place where you want attenuation to be calculated from. We placed it where our listener was — on the center of the screen at the map level. This combination fixed our Atmos spatial imaging while keeping the listener on the camera.

此时我们相当有信心已经完成了。然后我们终于建立了内部的Atmos设置，注意到一些东西听起来不太对。听者的方向和位置导致了我们Atmos空间图像的问题。要修复它，听者需要回到摄像机上——这也重新引入了之前所有的问题。幸运的是，Wwise有一个距离探针：一个你可以放置在你想要衰减从哪里计算的对象。我们将它放在我们听者所在的位置——地图层面的屏幕中心。这个组合在保持听者在摄像机上的同时修复了我们的Atmos空间图像。

As a key takeaway from all of this listener iteration: it's always better to dial in and find your listener placement as soon as possible to avoid spending way too much time redialing attenuations again and again. In the end, we are using a listener-distance-probe hybrid attenuation system. The actual positioning with 2D X/Y movement over the map, plus the camera zoom RTPC on the Z axis to filter different buses, is easy to use and very scalable.

作为所有这些听者迭代的关键经验：尽早确定并找到你的听者位置始终是更好的，以避免一遍又一遍地花费太多时间重新调整衰减。最终，我们使用的是一个听者距离探针混合衰减系统。实际的2D X/Y轴上地图移动定位，加上Z轴上的摄像机缩放RTPC来过滤不同的总线，使用简单且非常可扩展。

八、视角：缩放级别混音 · Perspective: Zoom Level Mixing

Cadet:

At that point, one of the main challenges left was: what do we want the player to hear at every zoom level? We mostly created two main settings. Fully zoomed in, which we consider the most immersive point of view — with all the full character expressions, building ambiences, wallas, and so on. And fully zoomed out, which we consider the strategic point of view. Every other zoom setting is an interpolation between those two.

在那时，剩下的主要挑战之一是：我们想让玩家在每个缩放级别听到什么？我们主要创建了两个主要设置。完全缩放进去，我们认为这是最沉浸的视角——包含所有完整的角色表情、建筑环境音、群众声等等。完全缩放出去，我们认为这是战略视角。其他所有缩放设置都是这两者之间的插值。

The biomes consist of three levels — at the furthest zoomed-out point, the high-altitude wind biome takes dominance, but with a bit of the map biomes still present so you feel them as you move around.

生物群落由三个层级组成——在最远的缩放视角下，高空风生物群落占主导，但仍有一些地图生物群落存在，这样当你在地图上移动时能感觉到它们。

The challenging part was making the zoom transitions feel natural and seamless — between the immersive view where you hear all those details, and the zoomed-out strategic view where we try to give you only the most important key gameplay information. We found fairly early on that we needed a lot of different zoom profiles for different types of sounds. A global setting for all was the initial idea and would have been closer to how we experience sound in the real world, but it didn't work well — we needed to give information and feedback to the player no matter how far from the action they are. We also realized that a normal high-pass or low-pass was too intense and masked too much content, so we decided to rely on shelf EQ filtering.

具有挑战性的部分是让缩放过渡感觉自然而无缝——在沉浸式视角（你能听到所有那些细节）和缩放到全局的战略视角（我们试图只给你最重要的关键游戏信息）之间。我们很早就发现我们需要为不同类型的声音设置很多不同的缩放配置文件。对所有声音使用全局设置是最初的想法，这会更接近我们在现实世界中体验声音的方式，但效果不好——我们需要无论玩家离行动有多远都能给他们提供信息和反馈。我们还意识到普通的高通或低通过于强烈，掩盖了太多内容，所以我们决定依靠搁架式EQ滤波。

We then had to look into every audio asset and group them by category to be able to mix them as a function of the camera zoom RTPC. Those categories were assigned through buses, which allowed us to create a really complex system that was ultimately more systemic, easier to control, and scalable. We ended up with over 200 buses, most of which were for that camera zoom system.

然后我们不得不研究每个音频资产，将它们按类别分组，以便能够作为摄像机缩放RTPC的函数来混音它们。这些类别通过总线分配，这让我们能够创建一个真正复杂的系统，最终更加系统化、更容易控制、更可扩展。我们最终拥有了超过200个总线，其中大多数都是用于那个摄像机缩放系统。

As an example for unit combat: we decided on the following categories based on order of importance for the player. High priority feedback — start-of-combat notifications and bonus unlocks like commander promotions — gets no changes in filtering when zoomed out, and only a couple of dB of volume attenuation. Medium priority feedback — unit depth and units moving around — gets lowered and filtered a bit with camera zoom. And low priority feedback — continuous combat unit idles and unit vocal expressions — are mostly meant for audio fidelity and the cool factor, so they are the ones most affected when zooming completely out. This also helps ensure more repetitive tolerance.

以单位战斗为例：我们根据玩家的重要性顺序决定了以下类别。高优先级反馈——战斗开始通知和奖励解锁，如指挥官晋升——当缩放到全局时不改变滤波，只有几分贝的音量衰减。中等优先级反馈——单位深度和单位移动——随摄像机缩放而稍微降低和过滤。低优先级反馈——持续战斗单位的待机和单位语音表达——主要是为了音频保真度和酷炫因素，所以它们是完全缩放到全局时受影响最大的。这也有助于确保更多的重复耐受性。

九、视角：地图环绕 · Perspective: Map Wrapping

Joe:

Map wrapping — what the heck is that? Civ has this unique thing where the camera can be moved left or right forever, but it's not a forever-long map. What happens when you reach the edge of the map is it teleports back to the other side and keeps going. The interesting thing about Civ is that this border is invisible — the other side is seen from the current side. It's almost like you're going around a cylinder. This causes a problem because we're getting coordinates and locations of sounds from the game, and this kind of breaks perspective and distance calculations when we're at the borders.

地图环绕——这到底是什么？《文明》有这样一个独特的特性：摄像机可以永远向左或向右移动，但地图并不是无限长的。当你到达地图边缘时，它会传送回另一边并继续移动。《文明》有趣的地方在于这个边界是不可见的——你从当前这一边能看到另一边。这几乎就像你在绕着一个圆柱体走。这造成了一个问题，因为我们从游戏中获取声音的坐标和位置，而这会在我们处于边界时破坏透视和距离计算。

Here's what it looks like in the game: the big red L is the listener and we are all the way on the eastern edge of the map. There is a sound playing just to the west of that border — so what the player would expect is to hear the sound playing just off to the right. But if we look at what the game is actually telling our audio system: the listener is all the way on the eastern side, and the emitter is all the way to the west, very far away and to the left. Which is wrong.

下面是它在游戏中的样子：大红色L是听者，我们在地图的最东端边缘。边界西边正好有一个声音在播放——所以玩家期望听到声音从右边稍远处传来。但如果我们看游戏实际上告诉我们音频系统的内容：听者在最东边，而发声体在最西边，非常远、在左侧。这是错误的。

There are a couple of different ways to solve this — a lot of them involve a lot of math. We are musicians and software engineers, we are lazy, we don't like doing math. So we ignored all of those and tried something else, which was introducing a secondary listener just moved right one map length away, across the other side of the map. If we introduce that secondary listener, the emitter is going to be playing just off to the right, very close to us — just like it looks on the map. Some of you might be wondering what happens if the situation is reversed: when we cross the midpoint of the map, we simply fly the emitter to the other side, still one map length away, and everything still works.

有几种不同的方法可以解决这个问题——其中很多涉及很多数学。我们是音乐人和软件工程师，我们很懒，我们不喜欢做数学。所以我们忽略了所有这些，尝试了别的方法，就是引入一个次级听者，向右移动一个地图长度，到地图的另一边。如果我们引入那个次级听者，发声体将会从右边稍远处播放，非常近——就像地图上看起来的样子。你们中有些人可能会想知道反转情况下会发生什么：当我们越过地图的中点时，我们只需将发声体移动到另一侧，仍然相距一个地图长度，一切仍然正常工作。

十、视角：环绕声与Atmos混音 · Perspective: Surround & Atmos Mixing

Cadet:

With Civ VII there were unique challenges to both surround mixing and Atmos mixing, given the zoom levels and aerial perspective of the map. Based on our listener placement, the sounds coming from our 3D static emitters were initially always playing in surround. This worked great when zoomed in, but didn't feel natural at all when zooming out. So we mixed the units in 2D, because hearing them in the back was really distracting for gameplay — the perspective is down and in front. We use the camera zoom RTPC to pan the full surround ambient mix to the front when zooming out, applying this to other sound sources as well to fake the spatialization to become a closer representation of how sound is expressed in the real world and what a player would expect.

对于《文明VII》，考虑到地图的缩放级别和俯视角透视，环绕声混音和Atmos混音都有独特的挑战。基于我们的听者定位，来自我们3D静态发声体的声音最初总是在环绕声中播放。这在缩放进去时效果很好，但缩放到全局时感觉完全不自然。所以我们将单位混音为2D，因为从后方听到它们对游戏来说非常分散注意力——透视方向是向下和向前的。我们使用摄像机缩放RTPC，在缩放出去时将完整的环绕环境混音平移到前方，并将此应用于其他声音源，以伪造空间化，使其更接近现实世界中声音的表达方式和玩家的期望。

For Atmos mixing, we leveraged Civ VII's move to Dolby Atmos in every area possible — such as the high-altitude wind biome, airplanes, natural disasters like hurricanes and thunderstorms, and also moving just a slight amount of music into the height channels to really open up the expansiveness of space.

对于Atmos混音，我们在每个可能的领域都利用了《文明VII》向杜比全景声的转变——比如高空风生物群落、飞机、飓风和雷暴等自然灾害，以及将少量音乐移入高度声道，以真正开阔空间感。

As we talked previously, we had a major issue with the Atmos spatial imaging. The problem was that because of the orientation of our listener, sounds that should be heard in the rear speakers were emitting from the height channels, and sounds that should be in the rear high speakers were coming from the front highs. We didn't know this until we had Atmos objects active — and most of our game isn't comprised of those. Bringing the listener back on the camera while modifying its orientation to always be parallel to the map fixed our problem, and it made sure moving Atmos-emitting objects like airplanes felt really natural.

正如我们之前所讲，我们在Atmos空间图像方面遇到了一个重大问题。问题在于，由于我们听者的方向，应该从后方扬声器听到的声音从高度声道发出，而应该在后方高度扬声器中的声音则从前方高度扬声器发出。我们在激活Atmos对象之前并不知道这一点——而且我们的大多数游戏内容并不包含这些。将听者放回摄像机上，同时修改其方向使其始终平行于地图，修复了我们的问题，并确保了像飞机这样的移动Atmos发声对象感觉非常自然。

十一、最终混音 · The Final Mix

Cadet:

We started in Wwise with a "mix as you go" approach over a couple of years of development. Then came the point where we needed to do a solid pass on each category to dial in our LUFS standardization, to ensure more consistency to prepare for the final mix.

我们在Wwise中从一个"边做边混"的方法开始，历经两年的开发时间。然后到了需要对每个类别进行全面梳理的时候，以调整我们的LUFS标准化，确保更多的一致性，为最终混音做准备。

Given that each age in our game becomes louder and louder due to technological developments and rapid expansion, we decided to take a snapshot of the LUFS across categories in each age as a starting point. It wasn't surprising to see that ambiences in Antiquity were quieter, but it was helpful to see just how much quieter. We also identified inconsistencies we may not want — such as some unit feedback sitting hotter in Antiquity than in the Modern era.

考虑到我们游戏中每个时代由于技术发展和快速扩张而变得越来越响，我们决定在每个时代拍摄各类别的LUFS快照作为起点。看到古代时期的环境音更安静并不令人惊讶，但看到它安静了多少是有帮助的。我们还识别出了一些我们可能不想要的不一致性——比如一些单位反馈在古代比在现代时期更响。

In the end, our integrated LUFS targets landed on: exporting ambiences at -35 LUFS, music at -17 LUFS, and VO, Foley, UI, and sound effects at -23 LUFS. Fully at around -40 LUFS average range, with bigger actions up to around -28 LUFS, and then using a bus system for their final target volumes.

最终，我们的集成LUFS目标是：环境音导出为-35 LUFS，音乐为-17 LUFS，配音、Foley、UI和音效为-23 LUFS。Foley平均约-40 LUFS，较大的动作最高约-28 LUFS，然后使用总线系统来设定它们的最终目标音量。

十二、独特的动态系统 · Unique Dynamic Systems

Dmitri:

For the last topic on our final mix, I'm going to talk about some of our more unusual dynamic systems. During our mixing we used all the tools in our toolbox to keep a clear mix across the ages and all the chaos of war. We created a lot of systems, all reacting to each other, so we could always give the best feedback possible for the player and always trying to best match that visual fidelity of our game.

对于最终混音的最后一个话题，我来讲讲我们一些不寻常的动态系统。在混音过程中，我们使用了工具箱中的所有工具，以在各个时代和所有战争混乱中保持清晰的混音。我们创建了很多系统，所有系统都相互响应，这样我们就能始终给玩家提供最好的反馈，并始终努力最好地匹配我们游戏的视觉保真度。

In the end we ended up using quite a lot of interconnected systems — more than 200 buses, around 500 states, more than 70 attenuation profiles, 60 RTPCs, and so on. Some of the smaller systems are quite interesting, so we're going to touch on them.

最终我们使用了相当多的相互连接的系统——超过200个总线，大约500个状态，超过70个衰减配置文件，60个RTPC等等。一些较小的系统非常有趣，所以我们要简单介绍一下。

First: music side chaining. Music is a really important part of the identity of Civ games, so for the most part we don't lower the music as we normally would in any other game. Instead, we use side chaining to boost the other sounds as a function of the music's volume.

首先：音乐侧链。音乐是《文明》游戏身份认同的一个非常重要的部分，所以大多数情况下我们不会像在其他游戏中那样降低音乐音量。相反，我们使用侧链来根据音乐的音量来提升其他声音。

Second: wonder reveals. Previously we would only know when we entered and exited a cinematic. So we use the music event to create a state indicating which wonder is being constructed, and then use that state to lower or mute most of the other sounds — including other wonders — during the cinematic.

第二：奇迹揭示。以前我们只知道何时进入和退出过场动画。所以我们使用音乐事件来创建一个指示正在建造哪个奇迹的状态，然后使用该状态在过场动画期间降低或静音大多数其他声音——包括其他奇迹的声音。

Third: wildlife routing. Some of our buildings, resources, and wonders include animal sounds — they add a lot of immersion and cultural specificity. We love them, but we didn't want them to become too fatiguing. So we routed them to a different bus, so that as the player zooms out the wildlife gets lowered much faster than other sounds.

第三：野生动物路由。我们的一些建筑、资源和奇迹包含动物声音——它们增加了很多沉浸感和文化特异性。我们喜欢它们，但我们不想让它们变得过于令人疲倦。所以我们将它们路由到一个不同的总线，这样当玩家缩放到全局时，野生动物声音比其他声音降低得更快。

And fourth: Foley combat state control. Our units share a lot of sounds by design — to keep everything performant and creation time under control. But that meant that some of our combat Foley was also reused for idles, so we didn't have mix control over them independently. We used a similar approach to continuous combat — an RTPC that lowers the volume of the Foley when the unit is in idle state versus when it's in combat.

第四：Foley战斗状态控制。我们的单位通过设计共享很多声音——为了让一切保持高性能和控制创建时间。但这意味着我们的一些战斗Foley也被重用于待机状态，所以我们没有独立的混音控制。我们使用了类似于持续战斗的方法——一个RTPC，当单位处于待机状态时降低Foley的音量，而当它处于战斗状态时保持正常音量。

Those are a few examples of the systems that we had a lot of fun designing and putting in place to achieve our mix.

这些是我们在设计和部署以实现我们的混音时非常享受的几个系统示例。

结语 · Closing

Thank you to Audio Canada for having us, and also want to give a shout out to all the support we've gotten from Dolby. Thank you to everyone who contributed to the creation of Civ VII's audio, and thank you all for being here today. We really appreciate it.

感谢Audio Canada的邀请，也要向Dolby给予我们的所有支持致谢。感谢所有为《文明VII》音频创作做出贡献的人，也感谢大家今天的到来。我们真的非常感激。