Voice Over IP for Linux Games? 147
fathom asks: "A few friends and I are attempting to move all of our gaming from the Windows platform to whatever distro of Linux we like. For the most part we've all had great success: just about everything we play is fine under Linux. However, there is one major drawback: we don't know of any software programs for Linux to do Voice Over IP like BattleCom, RogerWilco, and the GameVoice. Are there any programs out there like this for Linux?" Why limit to Linux? What Voice Over IP software can be used for any Unix that's flexible enough to work for other applications as well as games? We did a similar question about a year ago; has anything changed since then?
Speak Freely (Score:1)
Re:VoIP in Java (Score:1)
Re:You forget cost/profit analysis (Score:1)
1. Why would I care about portability? Aside from my own personal desire to run a game on Linux, there's no reason. It isn't going to increase my profits in any significant way, and it's going to take time away that I could be using to do other things (like fix outstanding bugs).
2. SDL doesn't really do anything 3D-wise as you imply. What support it does have is OpenGL-centric which means I can't use the latest DirectX 8 features. DirectX is pretty good these days. If you haven't messed with it recently you should take another look.
3. The argument about MFC / QT is fairly moot. My company, for the most part, uses neither. Almost everything we'd use MFC for is available in the STL. And the rest (GUI stuff) such as our menus and interface code are done in-house.
Overall the codebase I work on IS fairly portable. But there's more to putting out a port than just recompiling.
The whole argument is time vs. money. If the extra time spent will generate enough profits to offset it, then porting a game gets considered. In most cases, however, the extra time for porting it, testing the port, keeping the port version synced with the Win32 version, and supporting the port are not worth it (with the notable exception of porting a game to a console such as the PS/2 or X-Box).
Re:Responsibility of Game Publishers (Score:1)
I see you are another GNU/Linux zealot. I'm not sure if you realise a few things. First of all, if you speak/write like that to a game publisher and/or developer, you will piss them off and they'll just ignore you. I would just ignore you too. The publishers are not necessarily the ones that will provide a GNU/Linux version to a game. IF it is profitable to have GNU/Linux ports to games on a regular basis, it would be done. Most game developers and publishers have nothing against GNU/Linux. The publisher wouldn't care if they publiushed C64 games, as long as it made a profit. The problem is that not enough people will purchase them. Secondly, it can take a lot of work to port a game, depending on how heavily it depends on MS Windows APIs. Sure they could use cross-platform libraries such as Qt, wxWindows, and WineLib, however they will generally slow down the development process and make the game run slower. The point of making a game is to make money, not support some OS. I like GNU/Linux a lot, however from a game/profit point of view, GNU/Linux might as well be a TRS-80.
Re:VoIP in Java (Score:1)
I'm not so sure you know what a JavaVM is then. In windows its a collection of dll's that bytecode runs on top of. This alone doesn't make it a memory hog. go to the Tomcat [apache.org] website. Tomcat is a very nice web app server running on a JavaVM with a memory footprint of around 10 megs when processing jsp's, servlets and serving content. And much less when sitting idle waiting for requests.
Of course UnrealScript doesn't run in a "Java"VM, but it does run in a virtual machine.
And your python example is nice, but way too stripped down to realistically use it for a VoIP app. Like the original post said, most of the necessary stuff was already in the existing Java API's.
With your embedded vm, you would be starting from scratch
Try these out (Score:3)
sipc [columbia.edu]
Video Conferencing for Linux [hananet.net]
Voice over IP technologies are the same as those used for video conferencing, but with audio codecs only. The two VoIP/VideoConf standards for call setup and control are H.323 and SIP.
Re:Speak Freely for Unix (Score:1)
H.323 (Score:2)
I was also able to use openmcu to provide rudimentary group conference services. This is also packaged for Debian.
BTW, with ohphone you can talk with windows Netmeeting users.
Re:Responsibility of Game Publishers (Score:1)
Also, multiplatform programing subjects the development process to constraints that it might not otherwise have. These constraints actually ultimately benefit the development process by making it more structured.
Besides, a popular PC game is bound to be ported to "one of those other non-DirectX platforms" sooner or later.
Infact, EA's most profitable game platforms don't even run DirectX... (just read their annual reports)
The managers are the bottleneck, not the developers.
Re:Let's avoid being ANAL (Score:1)
Re:Have you considered... (Score:1)
^
|
(Brain cells overloading)
Game publishers need $ figures (Score:1)
True, almost all of these games had some weird circumstance that may have contributed to the poor sales figures (like release timing vs. win32 version, etc), but regardless of the circumstances, Linux games just don't sell.
And let's not forget the difficulty of deploying and supporting PC games on Linux. DirectX may not be perfect, but it's a huge, huge positive for Win32, especially in terms of installation and support -- and it's getting better all the time. OpenGL is nice as well, and cross platform to boot, but try providing a concise, clear set of instructions (or a single installer) that gets accelerated OpenGL running on Linux across a wide variety of consumer graphics cards. It's just not there. Also, I'm sure the various small differences between Linux distros can prove to be a headache as well.
A game platform needs some sort of central authority controlling the feature sets, the quality of drivers, and a consistent, usability tested installer. This central authority provides a lot of platform stability while sacrificing a small amount of absolute developer freedom. Microsoft provides this for Windows, and it works remarkably well considering the wide variety of hardware and software configurations out there. No one entity is centrally controlling and planning the gaming experience on Linux, and this is bad.
Re:Compression too! (Score:1)
I actually have used something like this, but with netcat execing an mp3 decoder with stdout going to a socket. The other end of the socket is on a computer too slow to decode MP3s in real time, running rawplay with stdin coming from a socket.
The slow machine (a P75) has bad memory bandwidth too, so the 1.5Mb/s data stream still slows it down noticeably. Fortunately, I'm shuffling around my computers, and I'm retiring the P75.
#define X(x,y) x##y
Re:VoIP in Java (Score:2)
Another reason for using vorbis is that it will keep Fraunhoffer from suing you for using an MP3 encoder.
#define X(x,y) x##y
Team up w/ Apple (OS X) - MORE BUYERS! (Score:1)
VoIP in Java (Score:4)
I wrote a trivial test app using the java sound API to make a VoIP program. It didn't implement any kind of standard, and it was completely insecure, but it worked after a relatively small amount of effort and it performed really well.
Java Sound passes just about everything through to the card so Java vs C didn't really come in to play much. All I did was decide that one machine was going to play server, and then everyone who connected to that machine got their byte streams mixed using the java Mixer and then sent back the mixed stream.
I'm up to my neck it projects right now, but if someone wanted to lead it up, I'd submit code and experience. Then we wouldn't have to worry about platform at all.
Jason
Re:You forget cost/profit analysis (Score:2)
Also, porting to the Mac tends to be an afterthought. As in, let's worry about making the "real game" (on the primary platform) as freaknasty as possible, and think about possibly offloading the code on a Mac developer later.
And finally, regarding developers that wish they could be working in Linux, I presume you weren't referring to artists, level designers, testers or producers. ;^) Sorry to be brutally honest, but hey, remember the PS2 does run Linux kick-assedly (although it's only been available in Japan [playstation.com]... so far [fakeroot.net])!
Summary of current market conditions: (Score:2)
JOE LUNIX
You should make your upcoming game for Linux. It is technically
superior in many obscure ways.
PC GAME DEVELOPER
Sorry, we could only pick one operating system, and it turned out
to be Windows. Better luck next time.
Re:Let's avoid being ANAL (Score:1)
/Janne
Re:Let's avoid being ANAL (Score:1)
If you tell me to use 'file tranfer protocol', I'll assume that you mean FTP (and is - for some reason - on a let's-expand-all-acronyms-trip). If you say to use 'a file transfer protocol', I'll probably start by trying ssh, as it is safer, then FTP if ssh doesn't work (and rsh never, ever).
In the same vein, if you are talking about VoIP, you are talking about that specific protocol. If you are talking about 'voice over IP', that could mean anything (well, as long as it's about voice over IP, that is). This is even more so for 'voice over IP' than for 'file tranfer protocol', as FTP is a far more well known, widely used term than is VoIP.
So, VoIP is a specific protocol, 'voice over IP' is a technology application with many potential specific solutions.
Then we could get into the whole area of commoditisation of terminology (think thermos), but that would lead this post far beyond what is reasonable...
/Janne
Re:Wow! (Score:1)
Seriously, this is of course a rather pointless, unimportant argument, and (as a previous poster impolicated) this will all be sorted out through the normal language mechanisms of common usage. However, pointless, unimportant arguments can be rather fun (as nobody gets seriously offended), so here's my 2 cents:
The point made above that 'Voice over IP' was capitalized is an entirely valid one (and I totally missed that). When the poster talks of 'common usage', however, I feel it breaks down somewhat as (unlike FTP) the term simply isn't very common.
/Janne
Re:[OT] FFT. (BTW) (Score:1)
Likewise.
I should probably email you with my email address, as these articles will be archived soon, preventing further comments (especially the antimatter propulsion one). Is your listed email address accurate (modulo spam-removal)?
[OT] FFT. (Score:2)
Out of curiosity, why?
Dabbling in signal processing and audio/video compression is one of my hobbies, though I'm only an amateur.
Re:[OT] FFT. (Score:2)
I probably should have mentioned that I already know how the FFT works; I was just wondering why it would be unsuitable.
I'd get around it producing twice as much data by discarding half of it. The real and complex components of the spectrum of a purely real signal will be symmetric and antisymmetric, respectively, so I can discard half of each and then reconstruct them before I do the IFFT when unpacking the compressed data.
The DCT can be mathematically derived from the FFT by doing similar tricks, as you're probably already familiar with (you treat the input waveform as half of a symmetric signal, which has no imaginary component in its twice-as-long FFT).
Out of curiosity, what is the basis behind the MDCT? I've heard of it but haven't seen an explanation of how it works.
[OT: I replied to your antimatter post with some of the information you asked for. Short version: It's really, *really* expensive to produce, and solar radiation isn't energetic enough to help.]
Re:[OT] FFT. (Score:2)
Actually, the FFT is quite fast. It works in near-linear time on the data (O(n log n)). It's the DFT that's the slow version.
Re:[OT] FFT. (Score:2)
This would be the case if I was throwing away all of the real or all of the imaginary data, but I'm keeping half of each.
The spectrum really is perfectly symmetric for the real component and antisymetric for the imaginary component. Deviations from this symmetry would cause an imaginary component to exist in the input signal, which can a priori be assumed not to be the case. Phase in the real signal is encoded in the ratio of real and imaginary spectrum components for a given frequency.
The handwaving argument from information theory is that because I'm only encoding N values for N samples (the real components of the samples), I only need N components out to retrieve all of the information (the real and imaginary components of half of the spectrum). The full spectrum has a factor of two redundancy.
Again, the DCT makes the same assumptions I do in throwing away this data (it just does some additional tweaking on top of it). I can show the derivation if you like, but you probably already have it on file.
Your signal processing library is more complete than mine
BTW - I'm looking for a way of doing the equivalent of a FFT on a non-power-of-2 number of samples. Zero-padding produces an interpolated spectrum containing more information than I need. Any other scheme I can think of takes as much work as a DFT of the samples would. Any thoughts on this?
Tribes2 uses GSM (Score:5)
Loki had the same sort of problems when they ported Tribes2. They switched over to a freely available GSM encoding (from a university in Germany). It worked so well they're adding the code to the Windows version so you can chat between versions.
Re:sure it's been done ... (Score:2)
--
Re:*nix based Voice over IP is easy! (Score:1)
For the remote speaker, we tried running esdrec and then esdmon | lame ... | nc broadcaster-host:12345.
For the broadcaster, we joined the remote speaker into the stream by: nc -l 12345 | mpg123 -s - | esdcat.
The only problem was - it had a latency of about 5 seconds :)
Add voice to old games with viavoice (Score:2)
With a text-to-speech system, you can get voice output without having to worry about bandwith issues, poor quality sound, or people without a microphone.
With Netrek's RCD macro system, it's pretty nifty the things you can do. For example, a player who is in a base is hurt, and pushes a single key for generic distress, causing everyone on their team to get a message like:
F0->FED Help(SB)! 0% shd, 50% dmg, 70% fuel, WTEMP! 9 armies!
But your client will speak, "Base hurt, weapon temped", because all those numbers are a pain to listen to. Later the base is ok, so he pushes the same key.
F0->FED Help(SB)! 99% shd, 0% dmg, 99% fuel, 1 army!
Now the client just speaks, "Base is ok". The macros can have "if" statements based on the relevant values, e.g. if(damage>50) "hurt" else "is ok". It's a lot faster to just push a key than to say the relevant information. And if you don't have all the noise, you don't lose text communication with your teammates.
BTW, if it wasn't obvious, this is a shameless plug.
Re:Let's avoid being ANAL (Score:1)
You'd probably get pissed off if I compared visual basic and C++, but to me they are more or less the same thing... so I avoid making those kinds of comparisons. do the same for us
VOCAL (Score:2)
if you search around this website, you should come across a program called VOCAL, which is known to compile under linux, and provide some degree of VoIP support.
Wow! (Score:3)
The comparison to ftp is entirely accurate.
I never said anyone was wrong, only that we should avoid confusion.
Let's avoid confusion.. (Score:5)
Re:Will this matter once games begin including voi (Score:2)
Re:Speak Freely for Unix (Score:1)
Re:Have you considered... (Score:2)
How lossy are you willing to have this?
Written english contains something like 1.6 bits of entropy per character, meaning aproximately 10 bits per word. That can be pumped into a speech synthesizer and result in *very* lossy representation yet it maintains almost 100% of the meaning. (Ignoring fine details like inflection, etc.)
The best way, from a bandwidth point of view would be to detect phonemes and transmit only those, recreating the audio on the other side. This is a 'bit' CPU heavy for realtime...
At any rate, MP3 encoders might not be the best thing to use, they're tweaked to work well on music not speech. Likely you'll create a larger file than you need. Simply mask out all frequencies below 500hz and above 3000hz (I think) and apply a simple logrithmic encoding and you'll get fairly decent compression. It's also fast.
Nice to see you still posting to Slashdot, I had thought you weren't here anymore, being that you vanished in the middle of a thread... Or, do you not check your posts (slashdot.org/users.pl) to see if you've had any responses?
Re:Have you considered... (Score:2)
I really wish you'd read the posts before you reply to them, you'd be a lot more relevant.
I never said MP3 wasn't usable, I said *MP3 ENCODERS* weren't a good bet. The MP3 encoders ARE tweaked towards music, that's what people tend to use them for, and what they are engineered to do well.
You made a comment about just finding one, tweaking it a bit, and having a great solution. The only open-source ones I've seen have been designed with "MP3s", or 128kbit music... If you used one of these you'd have a lot of tweaking to do to get decent compression from it.
It also doesn't fit the problem, as I saw it. This thread is about communication for a game. Barry White tends to perform very few concerts via Quake3 server.
While I'm at it, I think I should mention that telephone calls are compressed, usually with hard arbitrary cutoffs, and MuLaw encoding. Also, the whole point of logrithmic encoding is to have more resolution for quiet sounds, instead of the loud sounds. What you describe is exactly backwards from the way it works.
I assume though, that you find the quality of voice over the toll networks to be unbearably bad though.
You know, I did say "I think" after the numbers 500hz and 3000hz, that sounds right, but I may have misremembered. I even marked it as such. However, arbitrary cutoffs sound only marginally worse than a smoother cutoff, and usually only on specific benchmarks.
If you'd prefer a touch more CPU, then by all means, use a smooth falloff. I assumed speed was all-important because of the gaming aspect... I wouldn't want a 20% speed hit from audio encoding.
When you're doing audio compression, delaying the signal for 50 or 100ms is safe, and gives you all the context needed. I also understand about perceptive encoding, not that it's a huge technical achievment as you make it out to be.
As with the gun control thing... I think you've got an agenda and you can't see that it's not mine. I don't give a rat's ass about gun control in your country. I do however find it annoying when someone throws around a bunch of misused statistics and emotional ploys to deceive people as to the severity of the problem. If you really had a strong point, you wouldn't need to use a bunch of tricks.
And you don't really seem bitchier, you were more insulting in the other thread and just as quick to anger, without reading the post.
Don't waste your precious time on my account, if I want to have this kind of technical discussion I'll go talk with the marketing team.
Re:Have you considered... (Score:2)
Many encoders do this (arbitrary cutoffs, etc) but they aren't designed from the same point of view as an average encoder on a PC...
The reason simple *Law encoding is so popular in telephony is that it takes very little CPU time, on embedded CPUs without an FPU. It also produces a constant-sized output. That's very handy in networking with technologies like ATM where you can reserve bandwidth.
If you have CPU time to burn, even 10% of a modern CPU, then doing a perceptual encoding of speech in realtime gets your better quality at comparable sizes. But if you're trying to do this with so little CPU as to be transparent to the primary application (usually, in this discussion, a game taking 100% of the CPU) or on a tiny 8-bit CPU with 64 bytes of RAM, FFTs (etc) aren't an option.
The reason for the hard cutoffs is that they're usually done in hardware while the signal is still analog. I believe you can get current soundcards to do something like this while recording. If you can't, the work required to seperate out the unwanted frequencies through software means you should probably do a more advanced encoding.
If you're curious though, try an 8000hz voice sample with cutoffs where Gordonjcp suggested, then if your audio program supports it, try 8bit logrithmic encoding. (Not just 8-bit linear encoding, it's much worse.) It's definately not worth encoding music in, but it's not bad for voice. (After all, the telcos use it.)
What do you do for work? I have a feeling we're both in opposite ends of the same field.
Re:Impressive (Score:2)
Re:sure it's been done ... (Score:2)
Seriously, while a PITA for Linux, it will be great for 'Doze games.
Signalling + transport (Score:1)
Re:What about Macintosh? (Score:2)
UCL Conferencing tools (Score:1)
Some of the releases are a touch tempremental but you should be able to make one of them work!
MacOS X could help. (Score:1)
Games that are intended to sell well on the Mac and Windows have to be somewhat portable. The OS X environment resembles it's fellow Unices more than it does Windows. Of course, Microsoft can release DirectX libraries for OS X and not Linux..........
I've seen a lot of posts to effect that MacOS X is bad for Linux and will kill it's mindshare. OS X is no more bad for Linux than Linux is bad for the BSDs. Sure more people may use it as a desktop but it could mean that more commercial software comes to Linux (and the BSDs). A rising tide floats everyones boat.
Compression too! (Score:3)
I will point out that ssh compression works wonders on a vnc session if the server is running sshd. There is a nice howto on the vnc page for tunneling the connection over ssh. Secure and faster....bonus!
suggestions (Score:2)
If how about useing Speak Freely [speakfreely.org] or Open Phone [openphone.org]?
Re:Tribes2 uses GSM (Score:2)
------
Re:Have you considered... (Score:2)
I'd like to step in and ask: where I can find that information?
------
Re:sure it's been done ... (Score:2)
------
Re:Tribes2 uses GSM (Score:1)
Re:Speak Freely for Unix (Score:1)
Re:HawkNL (Score:1)
Not only is this a cool project, they have good comparisons of open source implementations of telephony codecs..
Re:Will this matter once games begin including voi (Score:1)
Re:Responsibility of Game Publishers (Score:2)
Petition, schmetition.
There are Linux game publishers out there. If their unit sales numbers were comparable to numbers for Windows games, then you'd see more publishers writing to Linux.
All the petitions in the world can't make an unprofitable situation profitable.
Re:waste (Score:1)
*nix based Voice over IP is easy! (Score:5)
Well, for the year 2001 you may want to use `ssh' instead of `rsh', and /dev/dsp instead of /dev/audio for Linux, but the idea is still the same ...
Almost as much fun as making the Sun next to you belch while the newbie is using it!
Re:Responsibility of Game Publishers (Score:1)
ad
Re:Back up those assertions, please (Score:1)
Not logged in, guess who.
Re:Back up those assertions, please (Score:1)
Re:Speak Freely for Unix (Score:2)
What else to say? It'll do multicast conversations, and coexist nicely in LPC10 mode with Quake on a 56K so long as you don't mind not being to hear Quake save the CD audio.. The delay can be unbearable. If I yell into the mic, I can hear myself five seconds later some nights. Others, it's nearly instantaneous.
Re:Speak Freely for Unix (Score:2)
Also, using H323 as a carrier would mean you need at least a 1/4 cycle (You are only sending or receiving data 1/4 of the time) peak throughput equivalent to a capped cable modem; I've used Speak Freely on as little as a 9600 baud modem in 100% cycle operation.
Re:Let's make one... (Score:4)
There are also public domain encoders for the military voice standards, LPC and LPC10. Those are usable in as little as
For Half-Life engine fans... (Score:2)
Off-topic, this upcoming Spectator feature [gamespy.com] would be sweet as well.
Half-Life 1.1.0.7 will have it (Score:1)
I was reading today that the next patch for Half-Life will include funky little voice communication.
The article is on Gamespy, but b'dern it, Links is not letting me grab the text with my mouse. Um ... I think this is the link [gamespy.com].
Re:Let's avoid being ANAL (Score:1)
Re:Open H.323 (Score:2)
"...to any Blizzard product..." (Score:1)
Tribes 2 Seems to Incorporate One (Score:2)
Speak Freely for Unix (Score:5)
How about Speak Freely for Unix [fourmilab.ch]?
I have played with it a bit, and it seemed to work, but I haven't actually used it for gaming yet.  It didn't seem as simple to configure and use as some of the windoze voice comm programs, though.
Freshmeat (Score:3)
Re:VoIP in Java (Score:1)
There was an IP phone in the perl journal recently (Score:2)
Let's make one... (Score:1)
The central server has to have some pretty processing power and bandwidth, but it'd work.
I can write the linux server and client, but I don't know anything about programming witht he soundcard in windows.
sure it's been done ... (Score:2)
Roger Wilco [rogerwilco.com] is notorious for using up resources and being near-unhearable and Battlecom [shadowfactor.com], while pretty solid, also has problems with static creeping into the transmission (interestingly, ShadowFactor is about to discontinue Battlecom).
One solution is simply to build Voice-Over-IP into the app
_f
Re:Let's avoid being ANAL (Score:2)
Re:Let's avoid being ANAL (Score:2)
Have you considered... (Score:2)
- Karen
Re:[OT] FFT. (Score:2)
The DCT and MDCT are transforms which use a waveform which is not symmetrical or antisymmetrical, and thus can represent an arbitrary signal. Thus, with a DCT or MDCT, you get the same amount of data back as you put in (albeit, the data you get back is in floating point format, so you still will lose a bit just simply by storing it back into a integer representation via quantization, which is slightly lossy), but it works
- Rei
Re:[OT] FFT. (Score:2)
The real issue is that you can't just throw away half of the data. If all you care about is magnitudes, then yes, you can take the magnitude of the energies for a given frequency. However, if you want to reproduce the original signal, you can't. The existance of the separate components - real and imaginary - store locational data. If you throw away where the peaks of the waves are located, in audio, cancellations with other waves don't work right, and some other problems. In images, its even worse, as data doesn't appear at all like it should (locations are wrong, intensities, etc).
I've never implemented the MDCT before (my library, when I last worked on it, covered DFTs, FFTs, DCTs, and wavelets (Haar and Daubiches), all of those in 1, 2, and 3d). I've only read a summary of it, but from what I read, it sounded like it works as follows: you start summing products from before the start of your block, and continue summing products past the end of your block (by 50%), normalizing appropriately. But, you only compute enough frequencies, using this, to be equal to the number of elements in your block. Now, this has the net effect of giving data that's not as accurate for your particular block; however, when you reproduce the signal, since every location has an overlap, when you average the overlapped sections together, the errors cancel out (an error in one direction on the MDCT corresponds to an error in the opposite direction in the adjacent block). It helps remove blocking artifacts while still keeping optimally-sized blocks for compression.
- Rei
P.S. - I responded to your post
Re:Have you considered... (Score:3)
First, before we can go into the FFT, we need to discuss the DCT (Discrete Fourier Transform). Of course, the purpose of the DFT is to break down a signal into component waveforms - but how? Well, picture that you have a waveform that is just a sine, lets say, 5 hz across the area you're looking at. Now, picture multiplying that waveform, at every location, times a sine waveform of zero hz, summing those together, and normalizing. What do you get? You get 0.
Now, as was discussed in another reply on this thread, to represent *any* signal, you can't just use a sine at even mhz boundaries and steps. For that, you have to use the sum of a sine and cosine. Due to a trick you can use involving imaginary exponents, you can get one part of the data to come out "real" and the other part "imaginary", and correspond to the individual components of your signal. But, since you get twice as much data back (real and imaginary components), it is a poor choice for compression. DCTs and MDCTs are briefly discussed in the other reply as well. The main difference between a DCT and MDCT is for use in block transforms. Also, the difference between a DFT and a FFT is that, in a DFT, you'll find that you're doing a lot of the same calculations multiple times, due to various properties of sines and cosines. FFTs are basicly a re-ordering of the calculations so that redundant ones are done (less often) (optimally, once). There are several different FFT algorithms.
Block transforms are to get rid of a nasty side effect of transformed data. If a signal exists in one part of the data, but not another, frequency decomposition has trouble dealing with this. It generally causes "ripples" of energy to appear (the goal of doing transforms, compression-wise, is to concentrate the energy in certain frequencies, and then store them - so, ripples are bad). If you look at a very large sample, many frequencies will start and stop. So, you break it down into blocks - if there's a start or stop of a frequency, it only causes ripples in that section.
This works fine until you start throwing away data on a DCT. Because different data will get thrown away in different blocks, while they'll have the same overall level of quality, there will still be discontinuities between the blocks. The MDCT effectively halves the block size and vastly reduces discontinuities by including an overlapped area in its calculations.
Before I can discuss quantization, you first have to understand thresholding and the principles of compressing a transformed signal, which were briefly discussed in my original post. After you transform the signal, ideally, your energy is concentrated in specific frequencies. The effect is something like a starburst in the upper left-hand corner of each block that was transformed. Generally, you will still have *some* energy in weak frequencies, but not much. So, you kill them off - generally with a threshold that varies over the human hearing range, in the case of audio. Also in audio, you generally want to take masking effects into account when killing off weak signals. Once your energy is left in strong signals, you need to store how strong. However, while your input signal might have been composed of 8 or 16 bit integers, your output data will generally be high decimal-resolution floating point values. You need to get it back into integers. This is known as quantization. Some schemes simply convert the data back linearly. Some create a table of arbitrary endpoints for what-converts-to-what. Some use a smooth function. There is a lot of debate over what is the best method. I personally recommend, in this case, after seing the tiny gains made by various other quantization methods, for a huge cpu/complexity cost, using linear quantization.
Huffman encoding is typically used to losslessly encode the quantized data. Huffman encoding has proved attractive because they already know what sort of tree they can build to compress the data well. However, I feel that using arithmatic encoding can give *huge* advantages, via frequency preduction. Because, not only do you know the signal density for a given location for an arbitrary signal, you know what it has been like for this *particular* signal in the past, and can scale your probabilities appropriately. (oh, btw, if you want info on how to use huffman or arithmatic encoding, just ask).
Anyways, I better get back to work. Ciao!
- Rei
Re:Have you considered... (Score:3)
- Rei
P.S. - for those who care, the cpu cost of having a varying threshold over various ranges, instead of a constant one, is negligable compared to the time it takes to do the MDCT, quantize, encode, etc.
P.P.S - Any specific URLs from those organizations I should check out? I'm always looking for a good distraction from work
Re:[OT] FFT. (Score:3)
e^(X*i) = cos(X) + i*sin(X) (or is it the other way around?). The 'i' merely acts as a placeholder, it doesn't actually mean that the frequencies themselves are imaginary. By using exponential math, we can simply add to multiply.
A sine which is contained completely in a rtain frequency range, like a cosine, cannot store phase information - it requires both of them. Now, of course, you can extend the waveform in question so that it isn't completely contained in a certain frequency range - but that is no longer an FFT, but a DCT.
FFTs are useful because they evenly separate signals, and are quite fast. By computing the magnitude of a certain frequency's complex component, you can do windowing quite nicely to tell where your signals are. But, this magnitude alone is not enough to accurately reproduce the original signal with phase information. And, without phase information, cancellation effects can be very bad in the worst case, in fact, to the point of completely messing up your block.
Your example was really a DCT, but using sines instead of cosines
- Rei
Re:Have you considered... (Score:5)
When you do a block DCT or MDCT on an audio signal, you're not looking at a whole page of text's worth. You generally look at a fraction of a second. Speech has little redundancy at this level. However, that isn't what I was referring to. Do you have any background in audio compression? There are two keys in compressing audio using current methods: Frequency masking and signal response. Frequency masking is the fact that when the human ear hears a strong signal, weak signals that are near it in frequency seem to "dissapear" or "merge" with the stronger signal. Signal response (hearing response, frequency sensitivity, etc), is how good, overall, the human mind/ears are at hearing weak signals at various frequencies across the spectrum. By a careful knowledge of these, in music or voice, you can kill off many more frequencies than without it. However, it also is a big CPU consumer to do it very carefully. Cutting out some of the analysis can save you a good bit of CPU - and, in the case of human voice, which tends to be in a very audible range with few masking effects, won't affect your compression rate much.
Second, please, if you can create a good sounding speech synth - especially one that can give inflections, emotion, etc - please, please share it with us. Until then, good luck having something like this work (simply neglecting CPU issues) without sounding like a 50s robot that messes up once a second.
Oh, and to answer your theory about masking out frequencies below 500hz and above 3000hz: No. That will sound so unbelievably awful. First off, lets neglect the fact that someone with a voice like Barry White would be inaudible, and that you'd never hear a 't' or 's'. Ignoring that, that's a silly way to do it. You need a simple curve, even just a simple line graph. It takes little CPU time, and will actually be able to reproduce the original sound well. Arbitrary truncation points are unbearably bad.
Next, you seem to be of the notion that MP3 encoders are "tweaked towards music". MP3 encoding is a fairly abitrary term. MP3 is a specific format for encoding streamable, quantized, transformed data. You can use any truncation scheme you want - even the silly one you proposed. Most encoders you'll find are tweaked towards the human hearing range - an optimal choice for both voice and music (especially voice, though! Voice compresses very well, because, compared to music, it has most of its energy concentrated in a few signals at any given time).
Next, why use "logarithmic encoding" for compression? Logarithmic encoding is a (poor) way to store raw (uncompressed) audio data - it sacrifices low-level clarity for the ability to represent very loud signals - something seldom of use in normal audio compression applications (have you ever noticed how quiet signals on an 8-bit sound card are very crackly, but the loud ones are clear? Thats the sort of effect logarithmic encoding gives to sound). It is useful in efficient Pulse Code Modulation (PCM) of data for maximizing the number of transmissions over a small number of physical channels, but doesn't even begin to apply as far as storing quantized data is concerned (that would be like using a bubble sort to compute Pi or something
Please... if you're qualified to discuss audio compression, how about the basics? Do you know how to compute an FFT? Do you know why you wouldn't use an FFT for audio or video compression? What about a DCT? MDCT? What do you know about quanization schemes? The advantages/disadvantages to storing quantized data with huffman encoding vs. arithmatic encoding? Have you ever written a single signal processing function? (I've written a whole library). Do you know anything about the subject at all?
If you don't know what you're talking about, please don't be suggesting encoding schemes. There are enough bad ones out there already
- Rei
P.S. - sorry if I seem a bit bitchy. For some reason, they decided to leave us without air conditioning today at work
Vocal and SIP (Score:2)
Will this matter once games begin including voice? (Score:2)
Granted, external programs like GameVoice and Roger Wilco offer some additional feature, but will the average gamer care? I would expect every game that has a team play element to include its own voice technology within a year or two.
Re:Let's avoid being ANAL (Score:2)
I'd debate this point. First, the article text specificially said "Voice over IP" (with voice being capitalized for no other reason). That, to me, implies a proper name rather than a general technology label, not unlike if someone where to say "File Transfer Protocol".
Second, the phrase "Voice over IP" is frequently used when referring to (what you call) "VoIP". For some reason, you seem to use "file transfer protocol" (FTP) versus "a file transfer protocol" (anything to move a file), but create an artificially more relaxed standard where the expansion of "VoIP" is the generic term. This simply isn't the case.
Also, while I hesitate to use pointing out popular usage as a means of winning this argument (and I'm sure it'll come back and bite me when the next cracker vs. hacker argument pops up), a quick google search on "voice over IP" appears to turn up links that're all about VoIP, rather than "use the Internet to talk to your friends". Again, while I'm loathe to normally invoke the popular definition, it's worth pointing out that it seems to coincide with the technical one, providing a much more compelling case.
Overall, I suspect your problem is that you've fixated on the fact that "voice over IP" is an overly broad term that, when analyzed, could apply to a broader set of items than what it's actually intended for. On the other hand, we've already got the afore-mentioned example of "file transfer protocol", we know that a "personal video recorder" is a TiVo-like device rather than just any video recorder owned by an individual, we know that a "television" (literally: far seeing) isn't just any device that lets you see far away -- telescopes and binoculars certainly aren't part of that group. When it comes down to it, a name is really just an arbitrary designator that just happens to usually have some relevance. If you want names that're complete descriptions, you might want to switch from English to German.
Searching freshmeat still seems to work ... (Score:2)
Don't know if it's really what the person is looking for, but it's worth a shot.
Re:Responsibility of Game Publishers (Score:2)
Whether the constraints of multiplatformism are benefitial? Most likely (looking at it structurally, and assuming that at least ONE profitable and worthwhile port will be made). However the platform that makes up for this cost deficit, ISN'T *nix.
The "bottlenecks" are managers--I guess you are correctin saying this, though i wouldn't use the term bottleneck. Programmers create a product, managers make money.
Scott
Re:Team up w/ Apple (OS X) - MORE BUYERS! (Score:2)
With SDL and OpenGL working on Mac, and Qt on its way there, companies may be very interested in utilizing these libraries/technologies for crossplatform Windows/Mac development. The kicker though is that these are also supported fully on Linux. If these libraries catch on in the Mac world, I'm fairly sure Linux will see lots of the same apps merely by side-effect.
-Justin
Re:You forget cost/profit analysis (Score:2)
1) Do you use Linux much? Have you used it for any game/non-game development? It is definitely capable of running games. Windows is only worthy of games because of the huge marketshare, not because it is a better gaming platform. It is also obvious that the Linux community wants games. Are you part of that community? Do you not agree?
2) Still, using OpenGL would make a Mac port easier. Choosing OpenGL over DirectX is mainly a portability decision. Do you not want your game to run on Mac either?
3) True. Any game worth its salt has its own GUI library. I meant to bring up Qt as a reference to application developers, broadening my argument for crossplatform programming to include all types of software. I must say Qt may be a good in-house tool at a game company though, for developing map editors, game editors, etc if you have a heterogenous development environment. It may encourage a crossplatform mindset in your company as well. Do any of your developers wish they could be working in Linux?
I understand what you mean though. It all comes down to whether or not you consider portability important, and whether or not it would turn a profit.
Re:You forget cost/profit analysis (Score:3)
I think the problem with Windows developers in general is that they don't think of coding crossplatform in the first place. It's easy to understand why: they are taught DirectX and MFC, and Windows has a huge percent of the desktop market. Also, some games are coded so horribly (compare the duct-tape-that-is-EverQuest to any Blizzard product) that porting certain games look like they would be a nightmare.
On the other hand, I think Linux developers are more trained to code portably. With all the unix flavors out there, source portability is already a must. It also seems that these developers care about porting to Windows. Many apps for X are available on Windows (like a lot of the Gtk stuff), but not the other way around.
So Linux developers actually care about portability, but Windows developers do not. Maybe we can convince them to change their ways?
Surely the Windows developers out there don't thoroughly enjoy Windows-only programming, do they? I've used DirectX, and it was ludicrous. It isn't direct at all (Come on, DirectMusic? DirectPlay? Direct is just a buzzword..) and the classes are a mess. I haven't heard much good about MFC either, but I've heard only good about Qt [trolltech.com] (and I've used both).
Qt works on Windows. There's no reason to use MFC. Yes it does cost money, but aren't we talking about real game companies here? SDL works on Windows. There's no reason to use DirectX "directly" (whatever that means). You know how long it would take to port Windows apps/games to Linux that were all written in Qt and SDL? All of a recompile.
List of VoIP applications (Score:2)
Re:sure it's been done ... (Score:2)
For the same sort of thing, but cross-platform, there's HawkNL [hawksoft.com]
HawkNL (Score:3)
Its targeted at game programmers, to be integrated in-game, as a cross-platform alternative to Microsoft's DirectPlay and DirectPlay voice, but could be used to do a stand-alone VOIP app as well (though I am not aware of any currently).
Just a little effort... (Score:2)
-Matt
easy problem--easy solution (Score:2)
Vovida (Score:2)
Responsibility of Game Publishers (Score:2)
This is the only way I can see we can solve the problem of not having the "latest and greatest" windows games on Linux. Also, by the time one completes porting a game from Windows to Linux, it is likely the game is passe (except maybe for halflife (the game will not die)).
Is there a petition I can sign? A list of game publisher email addresses to send an email to? I think part of the problem is that game publishers do not see a demand on Linux platforms. Perhaps if publishers were communicated a significant interest, then they should at least think about providing for Linux versions.
You forget cost/profit analysis (Score:4)
Alos, have you considered the expense of training all these developers in Linux? Remember, most of them do not have Linux experience.
Finally, when you consider that Windows controls 90something percent of the desktop gamer market, it just doesn't make sense for a company to pour massive resources into developing Linux and Windows games simultaneously that only a relatively small number of people would buy. At least a dedicated porting company like Loki doesn't have to worry about graphic artists, level designers, story writers, or game design as a whole.