StudioTech TV

Helping create great internet video

TTFN Video System: Apple Mac Pro configuration – The Ultimate Telestream Wirecast System?

The decision on what Video switching/mixing system to go with at TTFN has been made and we are going to build what we hope will be the Ultimate Telestream Wirecast system!

While we don’t have the Blackmagic Decklink Duo (twin SDI inputs on a single PCI card) yet we have started commissioning the new Mac Pro which will not only significantly improve the time it takes to edit and compress TTFN videos but will also act as out live video mixing system using Telestream Wirecast 4.

For input we will initially use two Blackmagic Design Intensity Pro PCI adapters each providing a single input that can be HDMI, Component, S-video or video.

To improve the overall performance of the system we will install two 4GB Kingston memory modules.

For storage we are adding a Western Digital 1TB Caviar balck drive. This is the same drive supplied by Apple in the Mac Pro as the base drive (unless you opted for SSD).

Opening the Mac Pro box revealed a single box of accessories which contained the wired keyboard, a USB extension cable for the keyboard, a power cable, two DVDs OS X install and Applications and a couple of small booklets including ‘Everything Mac’ a basic introduction to the Mac Pro. Having opened and installed many different systems over the years Apple have certainly created an outstanding packaging system that is good for the user.

Apart from packaging the only other item in the box was the Mac Pro itself. First impressions, even before removing the wrapping, were of quality (weight and feel of the handles) and that it was much bigger than expected!


The side panel is removed by lifting a lever on the back panel. revealing the power supply at the top, 4 removable 3.5″ drive holders, the PCI expansion card area (3 spare, one already used by the ATI Radeon 5870 graphics adapter), and at the bottom the processor tray.

The processor tray is help in by two latches at the bottom of the case, once opened the tray sides easily out. Depending on the Mac Pro configuration the try will have one or two processors (underneath the very large heatsinks). Each processor has a bank of four memory slots.  Our system came with three 1GB memory modules per processor, a total of 6GB. We removed the first 1GB module on each processor and replaced it with a Kingston 4GB module, giving 6GB per processor, 12GB in all. We did not put the 1GB module into the spare slot as the last two slots are on a shared memory channel and this can reduce performance.

Next we unpacked the Intensity Pro cards. The large breakout cable allows you to connect analogue video sources such as component, s-video, and composite video to the card, and to output the analogue signal as well.

The two cards were easy to install after loosening the thumbwheel screws and  removing the locking plate. The trickiest bit of the whole configuration was putting the locking plate back!  Even so this only took a couple of minutes.

The final part of the configuration was to add the second 1TB disk drive. Having a fast drive to write video files to, especilly when recording a live show, is very important. This drive should be fast enough, but if not we will add a faster drive in one of the two spare bays, such as a Western Digital Velociraptor or even a Solid State Drive.

The hardware configuration of this system was very straightforward and the easiest system I have every configured (and I have done quote a few over the years!). The Apple Mac Pro is a quality, beautifully engineered, elegant system. The real proof will of course be in its performance!

Now to start installing software…….

4 comments
Hilton Travis

G’day Mark,

I need to bring you up to speed on how RAM performance works with Intel chipsets and Nehalem/Westmere CPUs as it seems you’re a little confused here.

Without knowing exactly what chipset is in your Mac Pro (or which CPUs) as there’s no mention of this information in your article, I’ll need to be a little more general than I could if I had more of this information available about your particular system.

The exact type of CPU (and chipset) will determine whether the system can utilize single/dual or single/dual/triple channel RAM, with performance increasing as you go up and for this to reach its maximum performance the banks need to be filled with identical RAM modules (at least as far as timing and access patterns go – it is best/easiest to use identical modules). Now, as each CPU contains the RAM controller to address the RAM directly attached, that effectively means that each CPU can deliver single, dual or triple channel RAM performance, depending on how each CPU’s RAM is configured.

As the Mac Pro has 4 modules per CPU and 4 doesn’t fit nicely into 3, what ends up happening is that anyone who wants RAM performance over RAM size ignores this 4th slot. How does this work exactly?

We’ll work on a single CPU for the explanation below because each CPU manages its own attached RAM, therefore if you have a dual CPU system, you need to do the same to each CPU…

First, and most importantly, all of the information below involves *IDENTICAL* RAM modules. You cannot mix and match size, speed, refresh rates or anything else unless you want to drop back to single channel speeds. This is important.

So, assuming you install 1*1GB module in the first bank, you’ll get 1X RAM speeds – ie, the CPU can address the single RAM module at its maximum speed – Single Channel. This may sound ideal, however the CPU can issue data requests much faster than the RAM can handle them, resulting in the CPU being bottlenecked by the RAM module. To get faster than this, we need Dual Channel.

To run RAM in Dual Channel mode, you need identical RAM modules in the first and second slots. Then the CPU will access *each* RAM module at its maximum speed and interleave access requests – so it will access the first module, then the second module, then the first, second, first… This will result in the RAM access speeds being around double the speed of using a single module. (If you install 2 * non-identical modules, you’ll have 2 different banks of Single Channel RAM, which will be accessed at 1X RAM speeds.) To get faster again…

Triple Channel RAM is the next step after Dual Channel – there’s 3 * identical RAM modules installed in the first 3 slots, resulting in the CPU talking 1, 2, 3, 1, 2, 3, 1, 2, 3, 1… to the RAM modules, allowing the CPU to access the whole RAM subsystem around 3X the speed of a single module. This is nice! 🙂 (If the modules aren’t identical, then you have 3 banks of 1X RAM speed, not 1 bank of 3X RAM speed, resulting in an overall reduction in the performance of the RAM subsystem.)

Now, if you add a 4th identical module, the CPU will drop back to 2 * Dual Channel banks of RAM, resulting in maximizing the amount of installed RAM with a 33% sacrifice in overall speed, but still running at 2X RAM speed of a single module. This can obviously have its benefits (i.e. more RAM).

So, if in Triple Channel mode, you’re getting around 9600 MB/sec in RAM subsystem performance, you’ll get around 6200 MB/sec in Dual Channel mode (using either one pair or two pairs of identical modules) and around 3100 MB/sec in Single Channel mode.

Clearly, running 3 * identical RAM modules, resulting in Triple Channel RAM mode, will result in the fastest possible RAM subsystem performance speed whilst sacrificing only a little (25%) of the maximum RAM able to be installed in the system.

Now, you *CAN* run 3 * 4GB modules on one CPU and 3 * 1GB modules on the second CPU and still have this access all performed in Triple Channel mode on each CPU, however this leaves the whole RAM subsystem a little unbalanced, resulting in the 2nd CPU needing to ask the 1st CPU for access to data it has more often than it would were the RAM balanced (say 3 * 4GB or 3 * 2GB modules on each CPU), making the whole system a little slower, in general, depending what’s running on each CPU.

Of course, the RAM subsystem performance is only one component of the overall system performance, however optimizing each component to work well and in balance with the other subsystems will result in the best performance at the best price point – for example, it is no use running a crazy fast HDD subsystem if you’re running only Single Channel RAM as the RAM will bottleneck the fast HDD subsystem.

So, right now, as you’re running unmatched (non-identical) RAM modules in the first 2 banks, your Mac Pro will suffer in its RAM performance as it will be running only at Single Channel speeds – and this has been done to each CPU, so both CPUs are accessing their RAM at Single Channel speeds (around 3100 MB/sec in the example listed above, whereas they could be performing at around 9600 MB/sec if running inTriple Channel mode).

I hope this helps clear up how Intel-based Nehalem/Westmere systems access their RAM and enables you to get the maximum performance from your Mac Pro setup.

    Mark

    Hilton,

    many thanks for taking the time to post, and what a great and useful post!

    The Apple documentation talks about mixing DIMMs and the importance of putting them in the correct slots. It doesn’t anywhere talk about maximising memory performance, however I did find this useful guide on the Apple website:

    Mac Pro (Early 2009 and Mid 2010) – Memory DIMMs – Replacement Instructions

    This states “Each processor’s memory controller has three memory channels. DIMM slots 1, 2, 5, and 6 have their own channels; slots 3 and 4 share a channel and slots 7 and 8 share a channel.” and “Note: Populating slot 4 or 8 slightly drops maximum memory bandwidth, but depending on the applications used, overall system performance may benefit from the larger amount of memory.” Hence my comment on not using the fourth slot. The document talks about mixing DIMM sizes and the need to populate slots in a certain order but does not mention any performance issues.

    From your useful post I can see that running 3 identical DIMMs per CPU should give the best memory performance. But whether overall performance of my 12GB (1×4, 2×1 per cpu) single channel mixed system is better than the 6GB (3 x 1 per cpu) of the original system is the question? Of course a 24GB (3×4 per cpu) would be even better!

    From research on the apple forums this morning the use of the fourth slot is reported to reduce overall memory performance vs 3 DIMMS intriple channel mode.

    Have installed and run Geekbench on the system to see what this tells me about system and memory performance.

    For the 12GB (2x 1×4,2×1) config – http://browse.geekbench.ca/geekbench2/view/344196

    For the 6GB (2x 3×1)config – http://browse.geekbench.ca/geekbench2/view/344214

    While overall system performance was down a little with the 8GB configuration the memory performance increased from 4880 to 4985, about a 2% improvement. This seems very small for the move from single channel to triple channel operation.

    Any suggestions on a better way to measure memory performance?

    Thanks again for the post this has been educational!

    Mark

Hilton Travis

G’day Mark,

You’ve not actually used a Triple Channel configuration to test here at all. You say you used your 12GB setup (4+1+1 per CPU) which is Single Channel and an 8GB setup (I’m assuming 4 per CPU) which is also Single Channel. That’s why the RAM speeds were within 2% of being identical.

What you need to do to test this properly is run 1+1+1 per CPU (6GB total – as originally supplied) and test that (Triple Channel), then remove the 3rd stick and run 1+1 per CPU (4GB total) and test that (Dual Channel). Yes, RAM size will be smaller, therefore overall system performance will be down, but we’re comparing *ONLY* the RAM speed here. There’s no other configurations that you can test (other than a single 1GB per CPU, which should give similar speeds to your 8GB test above) that will show you the difference between Single, Dual and Triple Channel RAM performance using the RAM that you have there.

And yes, the 3rd and 4th slots shouldn’t be used simultaneously *UNLESS* you can afford to sacrifice RAM speed for RAM size, as I mentioned in my original reply.

I don’t know of a better way to test RAM performance on a Mac as I’m not a Mac user, sorry. It is just that the Mac hardware is now so similar to standard PC hardware that the exact same rules apply here to RAM configuration and performance (as well as a great many other subsystems).

Now, if you can borrow an additional 4GB module for testing porpoises, that would be handy – then you could run 4+4+4 on CPU1 and 1+1+1 on CPU2 and see what that gives. With the Nehalem/Westmere CPUs (the new Core i7 2011 CPUs only run Dual Channel RAM, stupidly), when you buy RAM, you need to look at triples, not pairs per CPU. With some of the lower specced Nehalem/Westmere and all the new Core i7 2011 CPUs, buying in pairs is the way to go.

    Mark

    Hi Hilton,

    apologies there was a typo in my previous reply, did test with 6GB, not 8GB.

    I was surprised that the memory test did not improve significantly. The 6 x 1GB modules are all the same as supplied by Apple.

    Will have to dig into this a bit more. Thanks again.

    Mark