AudRecordLib
|
Functions | |
FORCEINLINE void | StoreConvertedSamples (short *pDstSamples, const __m128 &samples) |
FORCEINLINE void | StoreConvertedSamples (long *pDstSamples, const __m128 &samples) |
FORCEINLINE __m128i | LongShortToShorts (__m128 &samples) |
FORCEINLINE void | StoreConvertedSamples (short *&pDstSamples, __m128 samples[4]) |
FORCEINLINE void | StoreConvertedSamples (long *&pDstSamples, __m128 samples[4]) |
FORCEINLINE void | Get16ConvertedSamples (const float *&pSrcSamples, __m128 samplePacks[4], const __m128 &scaleFactors) |
template<class DestSampleType > | |
void | Convert16SamplesLoop (const float *&pSrcSamples, DestSampleType *&pDstSamples, DWORD numPacks, const __m128 &scaleFactors) |
SSE2 specific functions used in the sample conversion process
void detail::SSE::Convert16SamplesLoop | ( | const float *& | pSrcSamples, |
DestSampleType *& | pDstSamples, | ||
DWORD | numPacks, | ||
const __m128 & | scaleFactors | ||
) |
Workhorse for converting 16 float samples at a time using SSE
Loops around numPacks times getting 16 samples from pSrcSamples pointer, converting them to the DestSampleType and storing them into the pDstSamples pointer.
DestSampleType | The integer type to convert the float samples to |
pSrcSamples | Pointer to the 16 * numPacks float samples, this pointer is updated during each call |
pDstSamples | Pointer to a buffer which will contain the 16 converted samples of type DestSampleType, this pointer is updated during each call |
numPacks | Number of 16-float 'packs' to convert |
scaleFactors | Prefilled SSE type containing 4 copies of the scale factor |
void detail::SSE::Get16ConvertedSamples | ( | const float *& | pSrcSamples, |
__m128 | samplePacks[4], | ||
const __m128 & | scaleFactors | ||
) |
Retreives and converts 16 samples
Loads 16 floats from a memory location into SSE variables and multiplies them all by a scale factor
[in,out] | pSrcSamples | The memory location containing the raw samples |
[out] | samplePacks | Four SSE variables to store the processed samples in |
scaleFactors | Contains four copies of the required scale factor |
__m128i detail::SSE::LongShortToShorts | ( | __m128 & | samples | ) |
Converts samples from packed floats to packed shorts
Firstly, samples are converted from float to long format. The packed longs (whose highwords are all zero) are treated as a bunch of eight shorts with contents of {x, 0, x, 0, x, 0, x, 0}. The shorts are then shuffled so that the shorts all occupy the first half of the register (i.e. the format is {x, x, x, x, 0, 0, 0, 0}) and are returned to the caller
samples | The premultiplied samples in packed float format |
void detail::SSE::StoreConvertedSamples | ( | short * | pDstSamples, |
const __m128 & | samples | ||
) |
Stores premultiplied samples to a memory location
Converts the premultiplied samples from packed float format to packed short format before storing the packed shorts into memory. This entire function could be simplified with the _mm_cvtps_pi16 intrinsic but that's not available when compiling for X64.
pDstSamples | Memory location to store the 4 samples |
samples | The premultiplied samples in packed float format |
void detail::SSE::StoreConvertedSamples | ( | long * | pDstSamples, |
const __m128 & | samples | ||
) |
Stores premultiplied samples to a memory location
Converts the premultiplied samples from packed float format to packed long format before storing the packed longs into memory.
pDstSamples | Memory location to store the 4 samples |
samples | The premultiplied samples in packed float format |
void detail::SSE::StoreConvertedSamples | ( | short *& | pDstSamples, |
__m128 | samples[4] | ||
) |
Stores four sets of premultiplied samples to a memory location
Firstly, four sets of samples are converted from float to short format. These four sets of samples are then shuffled from four variables containing 4 short values each, to two variables containing 8 shorts each (i.e. from four {x, 0, x, 0, x, 0, x, 0} to two {x, x, x, x, x, x, x, x}) The shuffled variables are then written to memory in one blast. There are more straight forward ways of implementing this but despite it's length, this method minimizes loads and stores from memory.
[in,out] | pDstSamples | The memory location to store the 16 samples, it is updated by the call |
samples | The four sets of premultiplied samples in packed float format |
void detail::SSE::StoreConvertedSamples | ( | long *& | pDstSamples, |
__m128 | samples[4] | ||
) |
Stores four sets of premultiplied samples to a memory location
Four sets of samples are converted from packed float to packed long format and then copied to memory.
[in,out] | pDstSamples | The memory location to store the 16 samples, it is updated by the call |
samples | The four sets of premultiplied samples in packed float format |