ReactOS Fundraising Campaign 2012
 
€ 4,410 / € 30,000

Information | Donate

Home | Info | Community | Development | myReactOS | Contact Us

  1. Home
  2. Community
  3. Development
  4. myReactOS
  5. Fundraiser 2012

  1. Main Page
  2. Alphabetical List
  3. Data Structures
  4. Directories
  5. File List
  6. Data Fields
  7. Globals
  8. Related Pages

ReactOS Development > Doxygen

jidctint.c
Go to the documentation of this file.
00001 /*
00002  * jidctint.c
00003  *
00004  * Copyright (C) 1991-1998, Thomas G. Lane.
00005  * Modification developed 2002-2009 by Guido Vollbeding.
00006  * This file is part of the Independent JPEG Group's software.
00007  * For conditions of distribution and use, see the accompanying README file.
00008  *
00009  * This file contains a slow-but-accurate integer implementation of the
00010  * inverse DCT (Discrete Cosine Transform).  In the IJG code, this routine
00011  * must also perform dequantization of the input coefficients.
00012  *
00013  * A 2-D IDCT can be done by 1-D IDCT on each column followed by 1-D IDCT
00014  * on each row (or vice versa, but it's more convenient to emit a row at
00015  * a time).  Direct algorithms are also available, but they are much more
00016  * complex and seem not to be any faster when reduced to code.
00017  *
00018  * This implementation is based on an algorithm described in
00019  *   C. Loeffler, A. Ligtenberg and G. Moschytz, "Practical Fast 1-D DCT
00020  *   Algorithms with 11 Multiplications", Proc. Int'l. Conf. on Acoustics,
00021  *   Speech, and Signal Processing 1989 (ICASSP '89), pp. 988-991.
00022  * The primary algorithm described there uses 11 multiplies and 29 adds.
00023  * We use their alternate method with 12 multiplies and 32 adds.
00024  * The advantage of this method is that no data path contains more than one
00025  * multiplication; this allows a very simple and accurate implementation in
00026  * scaled fixed-point arithmetic, with a minimal number of shifts.
00027  *
00028  * We also provide IDCT routines with various output sample block sizes for
00029  * direct resolution reduction or enlargement and for direct resolving the
00030  * common 2x1 and 1x2 subsampling cases without additional resampling: NxN
00031  * (N=1...16), 2NxN, and Nx2N (N=1...8) pixels for one 8x8 input DCT block.
00032  *
00033  * For N<8 we simply take the corresponding low-frequency coefficients of
00034  * the 8x8 input DCT block and apply an NxN point IDCT on the sub-block
00035  * to yield the downscaled outputs.
00036  * This can be seen as direct low-pass downsampling from the DCT domain
00037  * point of view rather than the usual spatial domain point of view,
00038  * yielding significant computational savings and results at least
00039  * as good as common bilinear (averaging) spatial downsampling.
00040  *
00041  * For N>8 we apply a partial NxN IDCT on the 8 input coefficients as
00042  * lower frequencies and higher frequencies assumed to be zero.
00043  * It turns out that the computational effort is similar to the 8x8 IDCT
00044  * regarding the output size.
00045  * Furthermore, the scaling and descaling is the same for all IDCT sizes.
00046  *
00047  * CAUTION: We rely on the FIX() macro except for the N=1,2,4,8 cases
00048  * since there would be too many additional constants to pre-calculate.
00049  */
00050 
00051 #define JPEG_INTERNALS
00052 #include "jinclude.h"
00053 #include "jpeglib.h"
00054 #include "jdct.h"       /* Private declarations for DCT subsystem */
00055 
00056 #ifdef DCT_ISLOW_SUPPORTED
00057 
00058 
00059 /*
00060  * This module is specialized to the case DCTSIZE = 8.
00061  */
00062 
00063 #if DCTSIZE != 8
00064   Sorry, this code only copes with 8x8 DCT blocks. /* deliberate syntax err */
00065 #endif
00066 
00067 
00068 /*
00069  * The poop on this scaling stuff is as follows:
00070  *
00071  * Each 1-D IDCT step produces outputs which are a factor of sqrt(N)
00072  * larger than the true IDCT outputs.  The final outputs are therefore
00073  * a factor of N larger than desired; since N=8 this can be cured by
00074  * a simple right shift at the end of the algorithm.  The advantage of
00075  * this arrangement is that we save two multiplications per 1-D IDCT,
00076  * because the y0 and y4 inputs need not be divided by sqrt(N).
00077  *
00078  * We have to do addition and subtraction of the integer inputs, which
00079  * is no problem, and multiplication by fractional constants, which is
00080  * a problem to do in integer arithmetic.  We multiply all the constants
00081  * by CONST_SCALE and convert them to integer constants (thus retaining
00082  * CONST_BITS bits of precision in the constants).  After doing a
00083  * multiplication we have to divide the product by CONST_SCALE, with proper
00084  * rounding, to produce the correct output.  This division can be done
00085  * cheaply as a right shift of CONST_BITS bits.  We postpone shifting
00086  * as long as possible so that partial sums can be added together with
00087  * full fractional precision.
00088  *
00089  * The outputs of the first pass are scaled up by PASS1_BITS bits so that
00090  * they are represented to better-than-integral precision.  These outputs
00091  * require BITS_IN_JSAMPLE + PASS1_BITS + 3 bits; this fits in a 16-bit word
00092  * with the recommended scaling.  (To scale up 12-bit sample data further, an
00093  * intermediate INT32 array would be needed.)
00094  *
00095  * To avoid overflow of the 32-bit intermediate results in pass 2, we must
00096  * have BITS_IN_JSAMPLE + CONST_BITS + PASS1_BITS <= 26.  Error analysis
00097  * shows that the values given below are the most effective.
00098  */
00099 
00100 #if BITS_IN_JSAMPLE == 8
00101 #define CONST_BITS  13
00102 #define PASS1_BITS  2
00103 #else
00104 #define CONST_BITS  13
00105 #define PASS1_BITS  1       /* lose a little precision to avoid overflow */
00106 #endif
00107 
00108 /* Some C compilers fail to reduce "FIX(constant)" at compile time, thus
00109  * causing a lot of useless floating-point operations at run time.
00110  * To get around this we use the following pre-calculated constants.
00111  * If you change CONST_BITS you may want to add appropriate values.
00112  * (With a reasonable C compiler, you can just rely on the FIX() macro...)
00113  */
00114 
00115 #if CONST_BITS == 13
00116 #define FIX_0_298631336  ((INT32)  2446)    /* FIX(0.298631336) */
00117 #define FIX_0_390180644  ((INT32)  3196)    /* FIX(0.390180644) */
00118 #define FIX_0_541196100  ((INT32)  4433)    /* FIX(0.541196100) */
00119 #define FIX_0_765366865  ((INT32)  6270)    /* FIX(0.765366865) */
00120 #define FIX_0_899976223  ((INT32)  7373)    /* FIX(0.899976223) */
00121 #define FIX_1_175875602  ((INT32)  9633)    /* FIX(1.175875602) */
00122 #define FIX_1_501321110  ((INT32)  12299)   /* FIX(1.501321110) */
00123 #define FIX_1_847759065  ((INT32)  15137)   /* FIX(1.847759065) */
00124 #define FIX_1_961570560  ((INT32)  16069)   /* FIX(1.961570560) */
00125 #define FIX_2_053119869  ((INT32)  16819)   /* FIX(2.053119869) */
00126 #define FIX_2_562915447  ((INT32)  20995)   /* FIX(2.562915447) */
00127 #define FIX_3_072711026  ((INT32)  25172)   /* FIX(3.072711026) */
00128 #else
00129 #define FIX_0_298631336  FIX(0.298631336)
00130 #define FIX_0_390180644  FIX(0.390180644)
00131 #define FIX_0_541196100  FIX(0.541196100)
00132 #define FIX_0_765366865  FIX(0.765366865)
00133 #define FIX_0_899976223  FIX(0.899976223)
00134 #define FIX_1_175875602  FIX(1.175875602)
00135 #define FIX_1_501321110  FIX(1.501321110)
00136 #define FIX_1_847759065  FIX(1.847759065)
00137 #define FIX_1_961570560  FIX(1.961570560)
00138 #define FIX_2_053119869  FIX(2.053119869)
00139 #define FIX_2_562915447  FIX(2.562915447)
00140 #define FIX_3_072711026  FIX(3.072711026)
00141 #endif
00142 
00143 
00144 /* Multiply an INT32 variable by an INT32 constant to yield an INT32 result.
00145  * For 8-bit samples with the recommended scaling, all the variable
00146  * and constant values involved are no more than 16 bits wide, so a
00147  * 16x16->32 bit multiply can be used instead of a full 32x32 multiply.
00148  * For 12-bit samples, a full 32-bit multiplication will be needed.
00149  */
00150 
00151 #if BITS_IN_JSAMPLE == 8
00152 #define MULTIPLY(var,const)  MULTIPLY16C16(var,const)
00153 #else
00154 #define MULTIPLY(var,const)  ((var) * (const))
00155 #endif
00156 
00157 
00158 /* Dequantize a coefficient by multiplying it by the multiplier-table
00159  * entry; produce an int result.  In this module, both inputs and result
00160  * are 16 bits or less, so either int or short multiply will work.
00161  */
00162 
00163 #define DEQUANTIZE(coef,quantval)  (((ISLOW_MULT_TYPE) (coef)) * (quantval))
00164 
00165 
00166 /*
00167  * Perform dequantization and inverse DCT on one block of coefficients.
00168  */
00169 
00170 GLOBAL(void)
00171 jpeg_idct_islow (j_decompress_ptr cinfo, jpeg_component_info * compptr,
00172          JCOEFPTR coef_block,
00173          JSAMPARRAY output_buf, JDIMENSION output_col)
00174 {
00175   INT32 tmp0, tmp1, tmp2, tmp3;
00176   INT32 tmp10, tmp11, tmp12, tmp13;
00177   INT32 z1, z2, z3;
00178   JCOEFPTR inptr;
00179   ISLOW_MULT_TYPE * quantptr;
00180   int * wsptr;
00181   JSAMPROW outptr;
00182   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
00183   int ctr;
00184   int workspace[DCTSIZE2];  /* buffers data between passes */
00185   SHIFT_TEMPS
00186 
00187   /* Pass 1: process columns from input, store into work array. */
00188   /* Note results are scaled up by sqrt(8) compared to a true IDCT; */
00189   /* furthermore, we scale the results by 2**PASS1_BITS. */
00190 
00191   inptr = coef_block;
00192   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
00193   wsptr = workspace;
00194   for (ctr = DCTSIZE; ctr > 0; ctr--) {
00195     /* Due to quantization, we will usually find that many of the input
00196      * coefficients are zero, especially the AC terms.  We can exploit this
00197      * by short-circuiting the IDCT calculation for any column in which all
00198      * the AC terms are zero.  In that case each output is equal to the
00199      * DC coefficient (with scale factor as needed).
00200      * With typical images and quantization tables, half or more of the
00201      * column DCT calculations can be simplified this way.
00202      */
00203 
00204     if (inptr[DCTSIZE*1] == 0 && inptr[DCTSIZE*2] == 0 &&
00205     inptr[DCTSIZE*3] == 0 && inptr[DCTSIZE*4] == 0 &&
00206     inptr[DCTSIZE*5] == 0 && inptr[DCTSIZE*6] == 0 &&
00207     inptr[DCTSIZE*7] == 0) {
00208       /* AC terms all zero */
00209       int dcval = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]) << PASS1_BITS;
00210 
00211       wsptr[DCTSIZE*0] = dcval;
00212       wsptr[DCTSIZE*1] = dcval;
00213       wsptr[DCTSIZE*2] = dcval;
00214       wsptr[DCTSIZE*3] = dcval;
00215       wsptr[DCTSIZE*4] = dcval;
00216       wsptr[DCTSIZE*5] = dcval;
00217       wsptr[DCTSIZE*6] = dcval;
00218       wsptr[DCTSIZE*7] = dcval;
00219 
00220       inptr++;          /* advance pointers to next column */
00221       quantptr++;
00222       wsptr++;
00223       continue;
00224     }
00225 
00226     /* Even part: reverse the even part of the forward DCT. */
00227     /* The rotator is sqrt(2)*c(-6). */
00228     
00229     z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
00230     z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
00231 
00232     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);
00233     tmp2 = z1 + MULTIPLY(z2, FIX_0_765366865);
00234     tmp3 = z1 - MULTIPLY(z3, FIX_1_847759065);
00235 
00236     z2 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
00237     z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
00238     z2 <<= CONST_BITS;
00239     z3 <<= CONST_BITS;
00240     /* Add fudge factor here for final descale. */
00241     z2 += ONE << (CONST_BITS-PASS1_BITS-1);
00242 
00243     tmp0 = z2 + z3;
00244     tmp1 = z2 - z3;
00245 
00246     tmp10 = tmp0 + tmp2;
00247     tmp13 = tmp0 - tmp2;
00248     tmp11 = tmp1 + tmp3;
00249     tmp12 = tmp1 - tmp3;
00250 
00251     /* Odd part per figure 8; the matrix is unitary and hence its
00252      * transpose is its inverse.  i0..i3 are y7,y5,y3,y1 respectively.
00253      */
00254 
00255     tmp0 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
00256     tmp1 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
00257     tmp2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
00258     tmp3 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
00259     
00260     z2 = tmp0 + tmp2;
00261     z3 = tmp1 + tmp3;
00262 
00263     z1 = MULTIPLY(z2 + z3, FIX_1_175875602); /* sqrt(2) * c3 */
00264     z2 = MULTIPLY(z2, - FIX_1_961570560); /* sqrt(2) * (-c3-c5) */
00265     z3 = MULTIPLY(z3, - FIX_0_390180644); /* sqrt(2) * (c5-c3) */
00266     z2 += z1;
00267     z3 += z1;
00268 
00269     z1 = MULTIPLY(tmp0 + tmp3, - FIX_0_899976223); /* sqrt(2) * (c7-c3) */
00270     tmp0 = MULTIPLY(tmp0, FIX_0_298631336); /* sqrt(2) * (-c1+c3+c5-c7) */
00271     tmp3 = MULTIPLY(tmp3, FIX_1_501321110); /* sqrt(2) * ( c1+c3-c5-c7) */
00272     tmp0 += z1 + z2;
00273     tmp3 += z1 + z3;
00274 
00275     z1 = MULTIPLY(tmp1 + tmp2, - FIX_2_562915447); /* sqrt(2) * (-c1-c3) */
00276     tmp1 = MULTIPLY(tmp1, FIX_2_053119869); /* sqrt(2) * ( c1+c3-c5+c7) */
00277     tmp2 = MULTIPLY(tmp2, FIX_3_072711026); /* sqrt(2) * ( c1+c3+c5-c7) */
00278     tmp1 += z1 + z3;
00279     tmp2 += z1 + z2;
00280 
00281     /* Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 */
00282 
00283     wsptr[DCTSIZE*0] = (int) RIGHT_SHIFT(tmp10 + tmp3, CONST_BITS-PASS1_BITS);
00284     wsptr[DCTSIZE*7] = (int) RIGHT_SHIFT(tmp10 - tmp3, CONST_BITS-PASS1_BITS);
00285     wsptr[DCTSIZE*1] = (int) RIGHT_SHIFT(tmp11 + tmp2, CONST_BITS-PASS1_BITS);
00286     wsptr[DCTSIZE*6] = (int) RIGHT_SHIFT(tmp11 - tmp2, CONST_BITS-PASS1_BITS);
00287     wsptr[DCTSIZE*2] = (int) RIGHT_SHIFT(tmp12 + tmp1, CONST_BITS-PASS1_BITS);
00288     wsptr[DCTSIZE*5] = (int) RIGHT_SHIFT(tmp12 - tmp1, CONST_BITS-PASS1_BITS);
00289     wsptr[DCTSIZE*3] = (int) RIGHT_SHIFT(tmp13 + tmp0, CONST_BITS-PASS1_BITS);
00290     wsptr[DCTSIZE*4] = (int) RIGHT_SHIFT(tmp13 - tmp0, CONST_BITS-PASS1_BITS);
00291     
00292     inptr++;            /* advance pointers to next column */
00293     quantptr++;
00294     wsptr++;
00295   }
00296 
00297   /* Pass 2: process rows from work array, store into output array. */
00298   /* Note that we must descale the results by a factor of 8 == 2**3, */
00299   /* and also undo the PASS1_BITS scaling. */
00300 
00301   wsptr = workspace;
00302   for (ctr = 0; ctr < DCTSIZE; ctr++) {
00303     outptr = output_buf[ctr] + output_col;
00304     /* Rows of zeroes can be exploited in the same way as we did with columns.
00305      * However, the column calculation has created many nonzero AC terms, so
00306      * the simplification applies less often (typically 5% to 10% of the time).
00307      * On machines with very fast multiplication, it's possible that the
00308      * test takes more time than it's worth.  In that case this section
00309      * may be commented out.
00310      */
00311 
00312 #ifndef NO_ZERO_ROW_TEST
00313     if (wsptr[1] == 0 && wsptr[2] == 0 && wsptr[3] == 0 && wsptr[4] == 0 &&
00314     wsptr[5] == 0 && wsptr[6] == 0 && wsptr[7] == 0) {
00315       /* AC terms all zero */
00316       JSAMPLE dcval = range_limit[(int) DESCALE((INT32) wsptr[0], PASS1_BITS+3)
00317                   & RANGE_MASK];
00318 
00319       outptr[0] = dcval;
00320       outptr[1] = dcval;
00321       outptr[2] = dcval;
00322       outptr[3] = dcval;
00323       outptr[4] = dcval;
00324       outptr[5] = dcval;
00325       outptr[6] = dcval;
00326       outptr[7] = dcval;
00327 
00328       wsptr += DCTSIZE;     /* advance pointer to next row */
00329       continue;
00330     }
00331 #endif
00332 
00333     /* Even part: reverse the even part of the forward DCT. */
00334     /* The rotator is sqrt(2)*c(-6). */
00335     
00336     z2 = (INT32) wsptr[2];
00337     z3 = (INT32) wsptr[6];
00338 
00339     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);
00340     tmp2 = z1 + MULTIPLY(z2, FIX_0_765366865);
00341     tmp3 = z1 - MULTIPLY(z3, FIX_1_847759065);
00342 
00343     /* Add fudge factor here for final descale. */
00344     z2 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
00345     z3 = (INT32) wsptr[4];
00346 
00347     tmp0 = (z2 + z3) << CONST_BITS;
00348     tmp1 = (z2 - z3) << CONST_BITS;
00349     
00350     tmp10 = tmp0 + tmp2;
00351     tmp13 = tmp0 - tmp2;
00352     tmp11 = tmp1 + tmp3;
00353     tmp12 = tmp1 - tmp3;
00354 
00355     /* Odd part per figure 8; the matrix is unitary and hence its
00356      * transpose is its inverse.  i0..i3 are y7,y5,y3,y1 respectively.
00357      */
00358 
00359     tmp0 = (INT32) wsptr[7];
00360     tmp1 = (INT32) wsptr[5];
00361     tmp2 = (INT32) wsptr[3];
00362     tmp3 = (INT32) wsptr[1];
00363 
00364     z2 = tmp0 + tmp2;
00365     z3 = tmp1 + tmp3;
00366 
00367     z1 = MULTIPLY(z2 + z3, FIX_1_175875602); /* sqrt(2) * c3 */
00368     z2 = MULTIPLY(z2, - FIX_1_961570560); /* sqrt(2) * (-c3-c5) */
00369     z3 = MULTIPLY(z3, - FIX_0_390180644); /* sqrt(2) * (c5-c3) */
00370     z2 += z1;
00371     z3 += z1;
00372 
00373     z1 = MULTIPLY(tmp0 + tmp3, - FIX_0_899976223); /* sqrt(2) * (c7-c3) */
00374     tmp0 = MULTIPLY(tmp0, FIX_0_298631336); /* sqrt(2) * (-c1+c3+c5-c7) */
00375     tmp3 = MULTIPLY(tmp3, FIX_1_501321110); /* sqrt(2) * ( c1+c3-c5-c7) */
00376     tmp0 += z1 + z2;
00377     tmp3 += z1 + z3;
00378 
00379     z1 = MULTIPLY(tmp1 + tmp2, - FIX_2_562915447); /* sqrt(2) * (-c1-c3) */
00380     tmp1 = MULTIPLY(tmp1, FIX_2_053119869); /* sqrt(2) * ( c1+c3-c5+c7) */
00381     tmp2 = MULTIPLY(tmp2, FIX_3_072711026); /* sqrt(2) * ( c1+c3+c5-c7) */
00382     tmp1 += z1 + z3;
00383     tmp2 += z1 + z2;
00384 
00385     /* Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 */
00386 
00387     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp3,
00388                           CONST_BITS+PASS1_BITS+3)
00389                 & RANGE_MASK];
00390     outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp3,
00391                           CONST_BITS+PASS1_BITS+3)
00392                 & RANGE_MASK];
00393     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp2,
00394                           CONST_BITS+PASS1_BITS+3)
00395                 & RANGE_MASK];
00396     outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp2,
00397                           CONST_BITS+PASS1_BITS+3)
00398                 & RANGE_MASK];
00399     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp1,
00400                           CONST_BITS+PASS1_BITS+3)
00401                 & RANGE_MASK];
00402     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp1,
00403                           CONST_BITS+PASS1_BITS+3)
00404                 & RANGE_MASK];
00405     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13 + tmp0,
00406                           CONST_BITS+PASS1_BITS+3)
00407                 & RANGE_MASK];
00408     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp13 - tmp0,
00409                           CONST_BITS+PASS1_BITS+3)
00410                 & RANGE_MASK];
00411 
00412     wsptr += DCTSIZE;       /* advance pointer to next row */
00413   }
00414 }
00415 
00416 #ifdef IDCT_SCALING_SUPPORTED
00417 
00418 
00419 /*
00420  * Perform dequantization and inverse DCT on one block of coefficients,
00421  * producing a 7x7 output block.
00422  *
00423  * Optimized algorithm with 12 multiplications in the 1-D kernel.
00424  * cK represents sqrt(2) * cos(K*pi/14).
00425  */
00426 
00427 GLOBAL(void)
00428 jpeg_idct_7x7 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
00429            JCOEFPTR coef_block,
00430            JSAMPARRAY output_buf, JDIMENSION output_col)
00431 {
00432   INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12, tmp13;
00433   INT32 z1, z2, z3;
00434   JCOEFPTR inptr;
00435   ISLOW_MULT_TYPE * quantptr;
00436   int * wsptr;
00437   JSAMPROW outptr;
00438   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
00439   int ctr;
00440   int workspace[7*7];   /* buffers data between passes */
00441   SHIFT_TEMPS
00442 
00443   /* Pass 1: process columns from input, store into work array. */
00444 
00445   inptr = coef_block;
00446   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
00447   wsptr = workspace;
00448   for (ctr = 0; ctr < 7; ctr++, inptr++, quantptr++, wsptr++) {
00449     /* Even part */
00450 
00451     tmp13 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
00452     tmp13 <<= CONST_BITS;
00453     /* Add fudge factor here for final descale. */
00454     tmp13 += ONE << (CONST_BITS-PASS1_BITS-1);
00455 
00456     z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
00457     z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
00458     z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
00459 
00460     tmp10 = MULTIPLY(z2 - z3, FIX(0.881747734));     /* c4 */
00461     tmp12 = MULTIPLY(z1 - z2, FIX(0.314692123));     /* c6 */
00462     tmp11 = tmp10 + tmp12 + tmp13 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6 */
00463     tmp0 = z1 + z3;
00464     z2 -= tmp0;
00465     tmp0 = MULTIPLY(tmp0, FIX(1.274162392)) + tmp13; /* c2 */
00466     tmp10 += tmp0 - MULTIPLY(z3, FIX(0.077722536));  /* c2-c4-c6 */
00467     tmp12 += tmp0 - MULTIPLY(z1, FIX(2.470602249));  /* c2+c4+c6 */
00468     tmp13 += MULTIPLY(z2, FIX(1.414213562));         /* c0 */
00469 
00470     /* Odd part */
00471 
00472     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
00473     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
00474     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
00475 
00476     tmp1 = MULTIPLY(z1 + z2, FIX(0.935414347));      /* (c3+c1-c5)/2 */
00477     tmp2 = MULTIPLY(z1 - z2, FIX(0.170262339));      /* (c3+c5-c1)/2 */
00478     tmp0 = tmp1 - tmp2;
00479     tmp1 += tmp2;
00480     tmp2 = MULTIPLY(z2 + z3, - FIX(1.378756276));    /* -c1 */
00481     tmp1 += tmp2;
00482     z2 = MULTIPLY(z1 + z3, FIX(0.613604268));        /* c5 */
00483     tmp0 += z2;
00484     tmp2 += z2 + MULTIPLY(z3, FIX(1.870828693));     /* c3+c1-c5 */
00485 
00486     /* Final output stage */
00487 
00488     wsptr[7*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
00489     wsptr[7*6] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
00490     wsptr[7*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);
00491     wsptr[7*5] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);
00492     wsptr[7*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);
00493     wsptr[7*4] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);
00494     wsptr[7*3] = (int) RIGHT_SHIFT(tmp13, CONST_BITS-PASS1_BITS);
00495   }
00496 
00497   /* Pass 2: process 7 rows from work array, store into output array. */
00498 
00499   wsptr = workspace;
00500   for (ctr = 0; ctr < 7; ctr++) {
00501     outptr = output_buf[ctr] + output_col;
00502 
00503     /* Even part */
00504 
00505     /* Add fudge factor here for final descale. */
00506     tmp13 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
00507     tmp13 <<= CONST_BITS;
00508 
00509     z1 = (INT32) wsptr[2];
00510     z2 = (INT32) wsptr[4];
00511     z3 = (INT32) wsptr[6];
00512 
00513     tmp10 = MULTIPLY(z2 - z3, FIX(0.881747734));     /* c4 */
00514     tmp12 = MULTIPLY(z1 - z2, FIX(0.314692123));     /* c6 */
00515     tmp11 = tmp10 + tmp12 + tmp13 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6 */
00516     tmp0 = z1 + z3;
00517     z2 -= tmp0;
00518     tmp0 = MULTIPLY(tmp0, FIX(1.274162392)) + tmp13; /* c2 */
00519     tmp10 += tmp0 - MULTIPLY(z3, FIX(0.077722536));  /* c2-c4-c6 */
00520     tmp12 += tmp0 - MULTIPLY(z1, FIX(2.470602249));  /* c2+c4+c6 */
00521     tmp13 += MULTIPLY(z2, FIX(1.414213562));         /* c0 */
00522 
00523     /* Odd part */
00524 
00525     z1 = (INT32) wsptr[1];
00526     z2 = (INT32) wsptr[3];
00527     z3 = (INT32) wsptr[5];
00528 
00529     tmp1 = MULTIPLY(z1 + z2, FIX(0.935414347));      /* (c3+c1-c5)/2 */
00530     tmp2 = MULTIPLY(z1 - z2, FIX(0.170262339));      /* (c3+c5-c1)/2 */
00531     tmp0 = tmp1 - tmp2;
00532     tmp1 += tmp2;
00533     tmp2 = MULTIPLY(z2 + z3, - FIX(1.378756276));    /* -c1 */
00534     tmp1 += tmp2;
00535     z2 = MULTIPLY(z1 + z3, FIX(0.613604268));        /* c5 */
00536     tmp0 += z2;
00537     tmp2 += z2 + MULTIPLY(z3, FIX(1.870828693));     /* c3+c1-c5 */
00538 
00539     /* Final output stage */
00540 
00541     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
00542                           CONST_BITS+PASS1_BITS+3)
00543                 & RANGE_MASK];
00544     outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
00545                           CONST_BITS+PASS1_BITS+3)
00546                 & RANGE_MASK];
00547     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
00548                           CONST_BITS+PASS1_BITS+3)
00549                 & RANGE_MASK];
00550     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
00551                           CONST_BITS+PASS1_BITS+3)
00552                 & RANGE_MASK];
00553     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
00554                           CONST_BITS+PASS1_BITS+3)
00555                 & RANGE_MASK];
00556     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
00557                           CONST_BITS+PASS1_BITS+3)
00558                 & RANGE_MASK];
00559     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13,
00560                           CONST_BITS+PASS1_BITS+3)
00561                 & RANGE_MASK];
00562 
00563     wsptr += 7;     /* advance pointer to next row */
00564   }
00565 }
00566 
00567 
00568 /*
00569  * Perform dequantization and inverse DCT on one block of coefficients,
00570  * producing a reduced-size 6x6 output block.
00571  *
00572  * Optimized algorithm with 3 multiplications in the 1-D kernel.
00573  * cK represents sqrt(2) * cos(K*pi/12).
00574  */
00575 
00576 GLOBAL(void)
00577 jpeg_idct_6x6 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
00578            JCOEFPTR coef_block,
00579            JSAMPARRAY output_buf, JDIMENSION output_col)
00580 {
00581   INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12;
00582   INT32 z1, z2, z3;
00583   JCOEFPTR inptr;
00584   ISLOW_MULT_TYPE * quantptr;
00585   int * wsptr;
00586   JSAMPROW outptr;
00587   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
00588   int ctr;
00589   int workspace[6*6];   /* buffers data between passes */
00590   SHIFT_TEMPS
00591 
00592   /* Pass 1: process columns from input, store into work array. */
00593 
00594   inptr = coef_block;
00595   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
00596   wsptr = workspace;
00597   for (ctr = 0; ctr < 6; ctr++, inptr++, quantptr++, wsptr++) {
00598     /* Even part */
00599 
00600     tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
00601     tmp0 <<= CONST_BITS;
00602     /* Add fudge factor here for final descale. */
00603     tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
00604     tmp2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
00605     tmp10 = MULTIPLY(tmp2, FIX(0.707106781));   /* c4 */
00606     tmp1 = tmp0 + tmp10;
00607     tmp11 = RIGHT_SHIFT(tmp0 - tmp10 - tmp10, CONST_BITS-PASS1_BITS);
00608     tmp10 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
00609     tmp0 = MULTIPLY(tmp10, FIX(1.224744871));   /* c2 */
00610     tmp10 = tmp1 + tmp0;
00611     tmp12 = tmp1 - tmp0;
00612 
00613     /* Odd part */
00614 
00615     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
00616     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
00617     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
00618     tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */
00619     tmp0 = tmp1 + ((z1 + z2) << CONST_BITS);
00620     tmp2 = tmp1 + ((z3 - z2) << CONST_BITS);
00621     tmp1 = (z1 - z2 - z3) << PASS1_BITS;
00622 
00623     /* Final output stage */
00624 
00625     wsptr[6*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
00626     wsptr[6*5] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
00627     wsptr[6*1] = (int) (tmp11 + tmp1);
00628     wsptr[6*4] = (int) (tmp11 - tmp1);
00629     wsptr[6*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);
00630     wsptr[6*3] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);
00631   }
00632 
00633   /* Pass 2: process 6 rows from work array, store into output array. */
00634 
00635   wsptr = workspace;
00636   for (ctr = 0; ctr < 6; ctr++) {
00637     outptr = output_buf[ctr] + output_col;
00638 
00639     /* Even part */
00640 
00641     /* Add fudge factor here for final descale. */
00642     tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
00643     tmp0 <<= CONST_BITS;
00644     tmp2 = (INT32) wsptr[4];
00645     tmp10 = MULTIPLY(tmp2, FIX(0.707106781));   /* c4 */
00646     tmp1 = tmp0 + tmp10;
00647     tmp11 = tmp0 - tmp10 - tmp10;
00648     tmp10 = (INT32) wsptr[2];
00649     tmp0 = MULTIPLY(tmp10, FIX(1.224744871));   /* c2 */
00650     tmp10 = tmp1 + tmp0;
00651     tmp12 = tmp1 - tmp0;
00652 
00653     /* Odd part */
00654 
00655     z1 = (INT32) wsptr[1];
00656     z2 = (INT32) wsptr[3];
00657     z3 = (INT32) wsptr[5];
00658     tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */
00659     tmp0 = tmp1 + ((z1 + z2) << CONST_BITS);
00660     tmp2 = tmp1 + ((z3 - z2) << CONST_BITS);
00661     tmp1 = (z1 - z2 - z3) << CONST_BITS;
00662 
00663     /* Final output stage */
00664 
00665     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
00666                           CONST_BITS+PASS1_BITS+3)
00667                 & RANGE_MASK];
00668     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
00669                           CONST_BITS+PASS1_BITS+3)
00670                 & RANGE_MASK];
00671     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
00672                           CONST_BITS+PASS1_BITS+3)
00673                 & RANGE_MASK];
00674     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
00675                           CONST_BITS+PASS1_BITS+3)
00676                 & RANGE_MASK];
00677     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
00678                           CONST_BITS+PASS1_BITS+3)
00679                 & RANGE_MASK];
00680     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
00681                           CONST_BITS+PASS1_BITS+3)
00682                 & RANGE_MASK];
00683 
00684     wsptr += 6;     /* advance pointer to next row */
00685   }
00686 }
00687 
00688 
00689 /*
00690  * Perform dequantization and inverse DCT on one block of coefficients,
00691  * producing a reduced-size 5x5 output block.
00692  *
00693  * Optimized algorithm with 5 multiplications in the 1-D kernel.
00694  * cK represents sqrt(2) * cos(K*pi/10).
00695  */
00696 
00697 GLOBAL(void)
00698 jpeg_idct_5x5 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
00699            JCOEFPTR coef_block,
00700            JSAMPARRAY output_buf, JDIMENSION output_col)
00701 {
00702   INT32 tmp0, tmp1, tmp10, tmp11, tmp12;
00703   INT32 z1, z2, z3;
00704   JCOEFPTR inptr;
00705   ISLOW_MULT_TYPE * quantptr;
00706   int * wsptr;
00707   JSAMPROW outptr;
00708   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
00709   int ctr;
00710   int workspace[5*5];   /* buffers data between passes */
00711   SHIFT_TEMPS
00712 
00713   /* Pass 1: process columns from input, store into work array. */
00714 
00715   inptr = coef_block;
00716   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
00717   wsptr = workspace;
00718   for (ctr = 0; ctr < 5; ctr++, inptr++, quantptr++, wsptr++) {
00719     /* Even part */
00720 
00721     tmp12 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
00722     tmp12 <<= CONST_BITS;
00723     /* Add fudge factor here for final descale. */
00724     tmp12 += ONE << (CONST_BITS-PASS1_BITS-1);
00725     tmp0 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
00726     tmp1 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
00727     z1 = MULTIPLY(tmp0 + tmp1, FIX(0.790569415)); /* (c2+c4)/2 */
00728     z2 = MULTIPLY(tmp0 - tmp1, FIX(0.353553391)); /* (c2-c4)/2 */
00729     z3 = tmp12 + z2;
00730     tmp10 = z3 + z1;
00731     tmp11 = z3 - z1;
00732     tmp12 -= z2 << 2;
00733 
00734     /* Odd part */
00735 
00736     z2 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
00737     z3 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
00738 
00739     z1 = MULTIPLY(z2 + z3, FIX(0.831253876));     /* c3 */
00740     tmp0 = z1 + MULTIPLY(z2, FIX(0.513743148));   /* c1-c3 */
00741     tmp1 = z1 - MULTIPLY(z3, FIX(2.176250899));   /* c1+c3 */
00742 
00743     /* Final output stage */
00744 
00745     wsptr[5*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
00746     wsptr[5*4] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
00747     wsptr[5*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);
00748     wsptr[5*3] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);
00749     wsptr[5*2] = (int) RIGHT_SHIFT(tmp12, CONST_BITS-PASS1_BITS);
00750   }
00751 
00752   /* Pass 2: process 5 rows from work array, store into output array. */
00753 
00754   wsptr = workspace;
00755   for (ctr = 0; ctr < 5; ctr++) {
00756     outptr = output_buf[ctr] + output_col;
00757 
00758     /* Even part */
00759 
00760     /* Add fudge factor here for final descale. */
00761     tmp12 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
00762     tmp12 <<= CONST_BITS;
00763     tmp0 = (INT32) wsptr[2];
00764     tmp1 = (INT32) wsptr[4];
00765     z1 = MULTIPLY(tmp0 + tmp1, FIX(0.790569415)); /* (c2+c4)/2 */
00766     z2 = MULTIPLY(tmp0 - tmp1, FIX(0.353553391)); /* (c2-c4)/2 */
00767     z3 = tmp12 + z2;
00768     tmp10 = z3 + z1;
00769     tmp11 = z3 - z1;
00770     tmp12 -= z2 << 2;
00771 
00772     /* Odd part */
00773 
00774     z2 = (INT32) wsptr[1];
00775     z3 = (INT32) wsptr[3];
00776 
00777     z1 = MULTIPLY(z2 + z3, FIX(0.831253876));     /* c3 */
00778     tmp0 = z1 + MULTIPLY(z2, FIX(0.513743148));   /* c1-c3 */
00779     tmp1 = z1 - MULTIPLY(z3, FIX(2.176250899));   /* c1+c3 */
00780 
00781     /* Final output stage */
00782 
00783     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
00784                           CONST_BITS+PASS1_BITS+3)
00785                 & RANGE_MASK];
00786     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
00787                           CONST_BITS+PASS1_BITS+3)
00788                 & RANGE_MASK];
00789     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
00790                           CONST_BITS+PASS1_BITS+3)
00791                 & RANGE_MASK];
00792     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
00793                           CONST_BITS+PASS1_BITS+3)
00794                 & RANGE_MASK];
00795     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12,
00796                           CONST_BITS+PASS1_BITS+3)
00797                 & RANGE_MASK];
00798 
00799     wsptr += 5;     /* advance pointer to next row */
00800   }
00801 }
00802 
00803 
00804 /*
00805  * Perform dequantization and inverse DCT on one block of coefficients,
00806  * producing a reduced-size 4x4 output block.
00807  *
00808  * Optimized algorithm with 3 multiplications in the 1-D kernel.
00809  * cK represents sqrt(2) * cos(K*pi/16) [refers to 8-point IDCT].
00810  */
00811 
00812 GLOBAL(void)
00813 jpeg_idct_4x4 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
00814            JCOEFPTR coef_block,
00815            JSAMPARRAY output_buf, JDIMENSION output_col)
00816 {
00817   INT32 tmp0, tmp2, tmp10, tmp12;
00818   INT32 z1, z2, z3;
00819   JCOEFPTR inptr;
00820   ISLOW_MULT_TYPE * quantptr;
00821   int * wsptr;
00822   JSAMPROW outptr;
00823   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
00824   int ctr;
00825   int workspace[4*4];   /* buffers data between passes */
00826   SHIFT_TEMPS
00827 
00828   /* Pass 1: process columns from input, store into work array. */
00829 
00830   inptr = coef_block;
00831   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
00832   wsptr = workspace;
00833   for (ctr = 0; ctr < 4; ctr++, inptr++, quantptr++, wsptr++) {
00834     /* Even part */
00835 
00836     tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
00837     tmp2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
00838     
00839     tmp10 = (tmp0 + tmp2) << PASS1_BITS;
00840     tmp12 = (tmp0 - tmp2) << PASS1_BITS;
00841 
00842     /* Odd part */
00843     /* Same rotation as in the even part of the 8x8 LL&M IDCT */
00844 
00845     z2 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
00846     z3 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
00847 
00848     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);               /* c6 */
00849     /* Add fudge factor here for final descale. */
00850     z1 += ONE << (CONST_BITS-PASS1_BITS-1);
00851     tmp0 = RIGHT_SHIFT(z1 + MULTIPLY(z2, FIX_0_765366865), /* c2-c6 */
00852                CONST_BITS-PASS1_BITS);
00853     tmp2 = RIGHT_SHIFT(z1 - MULTIPLY(z3, FIX_1_847759065), /* c2+c6 */
00854                CONST_BITS-PASS1_BITS);
00855 
00856     /* Final output stage */
00857 
00858     wsptr[4*0] = (int) (tmp10 + tmp0);
00859     wsptr[4*3] = (int) (tmp10 - tmp0);
00860     wsptr[4*1] = (int) (tmp12 + tmp2);
00861     wsptr[4*2] = (int) (tmp12 - tmp2);
00862   }
00863 
00864   /* Pass 2: process 4 rows from work array, store into output array. */
00865 
00866   wsptr = workspace;
00867   for (ctr = 0; ctr < 4; ctr++) {
00868     outptr = output_buf[ctr] + output_col;
00869 
00870     /* Even part */
00871 
00872     /* Add fudge factor here for final descale. */
00873     tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
00874     tmp2 = (INT32) wsptr[2];
00875 
00876     tmp10 = (tmp0 + tmp2) << CONST_BITS;
00877     tmp12 = (tmp0 - tmp2) << CONST_BITS;
00878 
00879     /* Odd part */
00880     /* Same rotation as in the even part of the 8x8 LL&M IDCT */
00881 
00882     z2 = (INT32) wsptr[1];
00883     z3 = (INT32) wsptr[3];
00884 
00885     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);   /* c6 */
00886     tmp0 = z1 + MULTIPLY(z2, FIX_0_765366865); /* c2-c6 */
00887     tmp2 = z1 - MULTIPLY(z3, FIX_1_847759065); /* c2+c6 */
00888 
00889     /* Final output stage */
00890 
00891     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
00892                           CONST_BITS+PASS1_BITS+3)
00893                 & RANGE_MASK];
00894     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
00895                           CONST_BITS+PASS1_BITS+3)
00896                 & RANGE_MASK];
00897     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
00898                           CONST_BITS+PASS1_BITS+3)
00899                 & RANGE_MASK];
00900     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
00901                           CONST_BITS+PASS1_BITS+3)
00902                 & RANGE_MASK];
00903 
00904     wsptr += 4;     /* advance pointer to next row */
00905   }
00906 }
00907 
00908 
00909 /*
00910  * Perform dequantization and inverse DCT on one block of coefficients,
00911  * producing a reduced-size 3x3 output block.
00912  *
00913  * Optimized algorithm with 2 multiplications in the 1-D kernel.
00914  * cK represents sqrt(2) * cos(K*pi/6).
00915  */
00916 
00917 GLOBAL(void)
00918 jpeg_idct_3x3 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
00919            JCOEFPTR coef_block,
00920            JSAMPARRAY output_buf, JDIMENSION output_col)
00921 {
00922   INT32 tmp0, tmp2, tmp10, tmp12;
00923   JCOEFPTR inptr;
00924   ISLOW_MULT_TYPE * quantptr;
00925   int * wsptr;
00926   JSAMPROW outptr;
00927   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
00928   int ctr;
00929   int workspace[3*3];   /* buffers data between passes */
00930   SHIFT_TEMPS
00931 
00932   /* Pass 1: process columns from input, store into work array. */
00933 
00934   inptr = coef_block;
00935   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
00936   wsptr = workspace;
00937   for (ctr = 0; ctr < 3; ctr++, inptr++, quantptr++, wsptr++) {
00938     /* Even part */
00939 
00940     tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
00941     tmp0 <<= CONST_BITS;
00942     /* Add fudge factor here for final descale. */
00943     tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
00944     tmp2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
00945     tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */
00946     tmp10 = tmp0 + tmp12;
00947     tmp2 = tmp0 - tmp12 - tmp12;
00948 
00949     /* Odd part */
00950 
00951     tmp12 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
00952     tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */
00953 
00954     /* Final output stage */
00955 
00956     wsptr[3*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
00957     wsptr[3*2] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
00958     wsptr[3*1] = (int) RIGHT_SHIFT(tmp2, CONST_BITS-PASS1_BITS);
00959   }
00960 
00961   /* Pass 2: process 3 rows from work array, store into output array. */
00962 
00963   wsptr = workspace;
00964   for (ctr = 0; ctr < 3; ctr++) {
00965     outptr = output_buf[ctr] + output_col;
00966 
00967     /* Even part */
00968 
00969     /* Add fudge factor here for final descale. */
00970     tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
00971     tmp0 <<= CONST_BITS;
00972     tmp2 = (INT32) wsptr[2];
00973     tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */
00974     tmp10 = tmp0 + tmp12;
00975     tmp2 = tmp0 - tmp12 - tmp12;
00976 
00977     /* Odd part */
00978 
00979     tmp12 = (INT32) wsptr[1];
00980     tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */
00981 
00982     /* Final output stage */
00983 
00984     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
00985                           CONST_BITS+PASS1_BITS+3)
00986                 & RANGE_MASK];
00987     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
00988                           CONST_BITS+PASS1_BITS+3)
00989                 & RANGE_MASK];
00990     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp2,
00991                           CONST_BITS+PASS1_BITS+3)
00992                 & RANGE_MASK];
00993 
00994     wsptr += 3;     /* advance pointer to next row */
00995   }
00996 }
00997 
00998 
00999 /*
01000  * Perform dequantization and inverse DCT on one block of coefficients,
01001  * producing a reduced-size 2x2 output block.
01002  *
01003  * Multiplication-less algorithm.
01004  */
01005 
01006 GLOBAL(void)
01007 jpeg_idct_2x2 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
01008            JCOEFPTR coef_block,
01009            JSAMPARRAY output_buf, JDIMENSION output_col)
01010 {
01011   INT32 tmp0, tmp1, tmp2, tmp3, tmp4, tmp5;
01012   ISLOW_MULT_TYPE * quantptr;
01013   JSAMPROW outptr;
01014   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
01015   SHIFT_TEMPS
01016 
01017   /* Pass 1: process columns from input. */
01018 
01019   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
01020 
01021   /* Column 0 */
01022   tmp4 = DEQUANTIZE(coef_block[DCTSIZE*0], quantptr[DCTSIZE*0]);
01023   tmp5 = DEQUANTIZE(coef_block[DCTSIZE*1], quantptr[DCTSIZE*1]);
01024   /* Add fudge factor here for final descale. */
01025   tmp4 += ONE << 2;
01026 
01027   tmp0 = tmp4 + tmp5;
01028   tmp2 = tmp4 - tmp5;
01029 
01030   /* Column 1 */
01031   tmp4 = DEQUANTIZE(coef_block[DCTSIZE*0+1], quantptr[DCTSIZE*0+1]);
01032   tmp5 = DEQUANTIZE(coef_block[DCTSIZE*1+1], quantptr[DCTSIZE*1+1]);
01033 
01034   tmp1 = tmp4 + tmp5;
01035   tmp3 = tmp4 - tmp5;
01036 
01037   /* Pass 2: process 2 rows, store into output array. */
01038 
01039   /* Row 0 */
01040   outptr = output_buf[0] + output_col;
01041 
01042   outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp0 + tmp1, 3) & RANGE_MASK];
01043   outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp0 - tmp1, 3) & RANGE_MASK];
01044 
01045   /* Row 1 */
01046   outptr = output_buf[1] + output_col;
01047 
01048   outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp2 + tmp3, 3) & RANGE_MASK];
01049   outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp2 - tmp3, 3) & RANGE_MASK];
01050 }
01051 
01052 
01053 /*
01054  * Perform dequantization and inverse DCT on one block of coefficients,
01055  * producing a reduced-size 1x1 output block.
01056  *
01057  * We hardly need an inverse DCT routine for this: just take the
01058  * average pixel value, which is one-eighth of the DC coefficient.
01059  */
01060 
01061 GLOBAL(void)
01062 jpeg_idct_1x1 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
01063            JCOEFPTR coef_block,
01064            JSAMPARRAY output_buf, JDIMENSION output_col)
01065 {
01066   int dcval;
01067   ISLOW_MULT_TYPE * quantptr;
01068   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
01069   SHIFT_TEMPS
01070 
01071   /* 1x1 is trivial: just take the DC coefficient divided by 8. */
01072   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
01073   dcval = DEQUANTIZE(coef_block[0], quantptr[0]);
01074   dcval = (int) DESCALE((INT32) dcval, 3);
01075 
01076   output_buf[0][output_col] = range_limit[dcval & RANGE_MASK];
01077 }
01078 
01079 
01080 /*
01081  * Perform dequantization and inverse DCT on one block of coefficients,
01082  * producing a 9x9 output block.
01083  *
01084  * Optimized algorithm with 10 multiplications in the 1-D kernel.
01085  * cK represents sqrt(2) * cos(K*pi/18).
01086  */
01087 
01088 GLOBAL(void)
01089 jpeg_idct_9x9 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
01090            JCOEFPTR coef_block,
01091            JSAMPARRAY output_buf, JDIMENSION output_col)
01092 {
01093   INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13, tmp14;
01094   INT32 z1, z2, z3, z4;
01095   JCOEFPTR inptr;
01096   ISLOW_MULT_TYPE * quantptr;
01097   int * wsptr;
01098   JSAMPROW outptr;
01099   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
01100   int ctr;
01101   int workspace[8*9];   /* buffers data between passes */
01102   SHIFT_TEMPS
01103 
01104   /* Pass 1: process columns from input, store into work array. */
01105 
01106   inptr = coef_block;
01107   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
01108   wsptr = workspace;
01109   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
01110     /* Even part */
01111 
01112     tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
01113     tmp0 <<= CONST_BITS;
01114     /* Add fudge factor here for final descale. */
01115     tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
01116 
01117     z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
01118     z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
01119     z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
01120 
01121     tmp3 = MULTIPLY(z3, FIX(0.707106781));      /* c6 */
01122     tmp1 = tmp0 + tmp3;
01123     tmp2 = tmp0 - tmp3 - tmp3;
01124 
01125     tmp0 = MULTIPLY(z1 - z2, FIX(0.707106781)); /* c6 */
01126     tmp11 = tmp2 + tmp0;
01127     tmp14 = tmp2 - tmp0 - tmp0;
01128 
01129     tmp0 = MULTIPLY(z1 + z2, FIX(1.328926049)); /* c2 */
01130     tmp2 = MULTIPLY(z1, FIX(1.083350441));      /* c4 */
01131     tmp3 = MULTIPLY(z2, FIX(0.245575608));      /* c8 */
01132 
01133     tmp10 = tmp1 + tmp0 - tmp3;
01134     tmp12 = tmp1 - tmp0 + tmp2;
01135     tmp13 = tmp1 - tmp2 + tmp3;
01136 
01137     /* Odd part */
01138 
01139     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
01140     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
01141     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
01142     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
01143 
01144     z2 = MULTIPLY(z2, - FIX(1.224744871));           /* -c3 */
01145 
01146     tmp2 = MULTIPLY(z1 + z3, FIX(0.909038955));      /* c5 */
01147     tmp3 = MULTIPLY(z1 + z4, FIX(0.483689525));      /* c7 */
01148     tmp0 = tmp2 + tmp3 - z2;
01149     tmp1 = MULTIPLY(z3 - z4, FIX(1.392728481));      /* c1 */
01150     tmp2 += z2 - tmp1;
01151     tmp3 += z2 + tmp1;
01152     tmp1 = MULTIPLY(z1 - z3 - z4, FIX(1.224744871)); /* c3 */
01153 
01154     /* Final output stage */
01155 
01156     wsptr[8*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
01157     wsptr[8*8] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
01158     wsptr[8*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);
01159     wsptr[8*7] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);
01160     wsptr[8*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);
01161     wsptr[8*6] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);
01162     wsptr[8*3] = (int) RIGHT_SHIFT(tmp13 + tmp3, CONST_BITS-PASS1_BITS);
01163     wsptr[8*5] = (int) RIGHT_SHIFT(tmp13 - tmp3, CONST_BITS-PASS1_BITS);
01164     wsptr[8*4] = (int) RIGHT_SHIFT(tmp14, CONST_BITS-PASS1_BITS);
01165   }
01166 
01167   /* Pass 2: process 9 rows from work array, store into output array. */
01168 
01169   wsptr = workspace;
01170   for (ctr = 0; ctr < 9; ctr++) {
01171     outptr = output_buf[ctr] + output_col;
01172 
01173     /* Even part */
01174 
01175     /* Add fudge factor here for final descale. */
01176     tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
01177     tmp0 <<= CONST_BITS;
01178 
01179     z1 = (INT32) wsptr[2];
01180     z2 = (INT32) wsptr[4];
01181     z3 = (INT32) wsptr[6];
01182 
01183     tmp3 = MULTIPLY(z3, FIX(0.707106781));      /* c6 */
01184     tmp1 = tmp0 + tmp3;
01185     tmp2 = tmp0 - tmp3 - tmp3;
01186 
01187     tmp0 = MULTIPLY(z1 - z2, FIX(0.707106781)); /* c6 */
01188     tmp11 = tmp2 + tmp0;
01189     tmp14 = tmp2 - tmp0 - tmp0;
01190 
01191     tmp0 = MULTIPLY(z1 + z2, FIX(1.328926049)); /* c2 */
01192     tmp2 = MULTIPLY(z1, FIX(1.083350441));      /* c4 */
01193     tmp3 = MULTIPLY(z2, FIX(0.245575608));      /* c8 */
01194 
01195     tmp10 = tmp1 + tmp0 - tmp3;
01196     tmp12 = tmp1 - tmp0 + tmp2;
01197     tmp13 = tmp1 - tmp2 + tmp3;
01198 
01199     /* Odd part */
01200 
01201     z1 = (INT32) wsptr[1];
01202     z2 = (INT32) wsptr[3];
01203     z3 = (INT32) wsptr[5];
01204     z4 = (INT32) wsptr[7];
01205 
01206     z2 = MULTIPLY(z2, - FIX(1.224744871));           /* -c3 */
01207 
01208     tmp2 = MULTIPLY(z1 + z3, FIX(0.909038955));      /* c5 */
01209     tmp3 = MULTIPLY(z1 + z4, FIX(0.483689525));      /* c7 */
01210     tmp0 = tmp2 + tmp3 - z2;
01211     tmp1 = MULTIPLY(z3 - z4, FIX(1.392728481));      /* c1 */
01212     tmp2 += z2 - tmp1;
01213     tmp3 += z2 + tmp1;
01214     tmp1 = MULTIPLY(z1 - z3 - z4, FIX(1.224744871)); /* c3 */
01215 
01216     /* Final output stage */
01217 
01218     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
01219                           CONST_BITS+PASS1_BITS+3)
01220                 & RANGE_MASK];
01221     outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
01222                           CONST_BITS+PASS1_BITS+3)
01223                 & RANGE_MASK];
01224     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
01225                           CONST_BITS+PASS1_BITS+3)
01226                 & RANGE_MASK];
01227     outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
01228                           CONST_BITS+PASS1_BITS+3)
01229                 & RANGE_MASK];
01230     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
01231                           CONST_BITS+PASS1_BITS+3)
01232                 & RANGE_MASK];
01233     outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
01234                           CONST_BITS+PASS1_BITS+3)
01235                 & RANGE_MASK];
01236     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13 + tmp3,
01237                           CONST_BITS+PASS1_BITS+3)
01238                 & RANGE_MASK];
01239     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp13 - tmp3,
01240                           CONST_BITS+PASS1_BITS+3)
01241                 & RANGE_MASK];
01242     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp14,
01243                           CONST_BITS+PASS1_BITS+3)
01244                 & RANGE_MASK];
01245 
01246     wsptr += 8;     /* advance pointer to next row */
01247   }
01248 }
01249 
01250 
01251 /*
01252  * Perform dequantization and inverse DCT on one block of coefficients,
01253  * producing a 10x10 output block.
01254  *
01255  * Optimized algorithm with 12 multiplications in the 1-D kernel.
01256  * cK represents sqrt(2) * cos(K*pi/20).
01257  */
01258 
01259 GLOBAL(void)
01260 jpeg_idct_10x10 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
01261          JCOEFPTR coef_block,
01262          JSAMPARRAY output_buf, JDIMENSION output_col)
01263 {
01264   INT32 tmp10, tmp11, tmp12, tmp13, tmp14;
01265   INT32 tmp20, tmp21, tmp22, tmp23, tmp24;
01266   INT32 z1, z2, z3, z4, z5;
01267   JCOEFPTR inptr;
01268   ISLOW_MULT_TYPE * quantptr;
01269   int * wsptr;
01270   JSAMPROW outptr;
01271   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
01272   int ctr;
01273   int workspace[8*10];  /* buffers data between passes */
01274   SHIFT_TEMPS
01275 
01276   /* Pass 1: process columns from input, store into work array. */
01277 
01278   inptr = coef_block;
01279   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
01280   wsptr = workspace;
01281   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
01282     /* Even part */
01283 
01284     z3 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
01285     z3 <<= CONST_BITS;
01286     /* Add fudge factor here for final descale. */
01287     z3 += ONE << (CONST_BITS-PASS1_BITS-1);
01288     z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
01289     z1 = MULTIPLY(z4, FIX(1.144122806));         /* c4 */
01290     z2 = MULTIPLY(z4, FIX(0.437016024));         /* c8 */
01291     tmp10 = z3 + z1;
01292     tmp11 = z3 - z2;
01293 
01294     tmp22 = RIGHT_SHIFT(z3 - ((z1 - z2) << 1),   /* c0 = (c4-c8)*2 */
01295             CONST_BITS-PASS1_BITS);
01296 
01297     z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
01298     z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
01299 
01300     z1 = MULTIPLY(z2 + z3, FIX(0.831253876));    /* c6 */
01301     tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */
01302     tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */
01303 
01304     tmp20 = tmp10 + tmp12;
01305     tmp24 = tmp10 - tmp12;
01306     tmp21 = tmp11 + tmp13;
01307     tmp23 = tmp11 - tmp13;
01308 
01309     /* Odd part */
01310 
01311     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
01312     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
01313     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
01314     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
01315 
01316     tmp11 = z2 + z4;
01317     tmp13 = z2 - z4;
01318 
01319     tmp12 = MULTIPLY(tmp13, FIX(0.309016994));        /* (c3-c7)/2 */
01320     z5 = z3 << CONST_BITS;
01321 
01322     z2 = MULTIPLY(tmp11, FIX(0.951056516));           /* (c3+c7)/2 */
01323     z4 = z5 + tmp12;
01324 
01325     tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */
01326     tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */
01327 
01328     z2 = MULTIPLY(tmp11, FIX(0.587785252));           /* (c1-c9)/2 */
01329     z4 = z5 - tmp12 - (tmp13 << (CONST_BITS - 1));
01330 
01331     tmp12 = (z1 - tmp13 - z3) << PASS1_BITS;
01332 
01333     tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */
01334     tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */
01335 
01336     /* Final output stage */
01337 
01338     wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
01339     wsptr[8*9] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
01340     wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
01341     wsptr[8*8] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
01342     wsptr[8*2] = (int) (tmp22 + tmp12);
01343     wsptr[8*7] = (int) (tmp22 - tmp12);
01344     wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
01345     wsptr[8*6] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
01346     wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
01347     wsptr[8*5] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
01348   }
01349 
01350   /* Pass 2: process 10 rows from work array, store into output array. */
01351 
01352   wsptr = workspace;
01353   for (ctr = 0; ctr < 10; ctr++) {
01354     outptr = output_buf[ctr] + output_col;
01355 
01356     /* Even part */
01357 
01358     /* Add fudge factor here for final descale. */
01359     z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
01360     z3 <<= CONST_BITS;
01361     z4 = (INT32) wsptr[4];
01362     z1 = MULTIPLY(z4, FIX(1.144122806));         /* c4 */
01363     z2 = MULTIPLY(z4, FIX(0.437016024));         /* c8 */
01364     tmp10 = z3 + z1;
01365     tmp11 = z3 - z2;
01366 
01367     tmp22 = z3 - ((z1 - z2) << 1);               /* c0 = (c4-c8)*2 */
01368 
01369     z2 = (INT32) wsptr[2];
01370     z3 = (INT32) wsptr[6];
01371 
01372     z1 = MULTIPLY(z2 + z3, FIX(0.831253876));    /* c6 */
01373     tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */
01374     tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */
01375 
01376     tmp20 = tmp10 + tmp12;
01377     tmp24 = tmp10 - tmp12;
01378     tmp21 = tmp11 + tmp13;
01379     tmp23 = tmp11 - tmp13;
01380 
01381     /* Odd part */
01382 
01383     z1 = (INT32) wsptr[1];
01384     z2 = (INT32) wsptr[3];
01385     z3 = (INT32) wsptr[5];
01386     z3 <<= CONST_BITS;
01387     z4 = (INT32) wsptr[7];
01388 
01389     tmp11 = z2 + z4;
01390     tmp13 = z2 - z4;
01391 
01392     tmp12 = MULTIPLY(tmp13, FIX(0.309016994));        /* (c3-c7)/2 */
01393 
01394     z2 = MULTIPLY(tmp11, FIX(0.951056516));           /* (c3+c7)/2 */
01395     z4 = z3 + tmp12;
01396 
01397     tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */
01398     tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */
01399 
01400     z2 = MULTIPLY(tmp11, FIX(0.587785252));           /* (c1-c9)/2 */
01401     z4 = z3 - tmp12 - (tmp13 << (CONST_BITS - 1));
01402 
01403     tmp12 = ((z1 - tmp13) << CONST_BITS) - z3;
01404 
01405     tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */
01406     tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */
01407 
01408     /* Final output stage */
01409 
01410     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
01411                           CONST_BITS+PASS1_BITS+3)
01412                 & RANGE_MASK];
01413     outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
01414                           CONST_BITS+PASS1_BITS+3)
01415                 & RANGE_MASK];
01416     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
01417                           CONST_BITS+PASS1_BITS+3)
01418                 & RANGE_MASK];
01419     outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
01420                           CONST_BITS+PASS1_BITS+3)
01421                 & RANGE_MASK];
01422     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
01423                           CONST_BITS+PASS1_BITS+3)
01424                 & RANGE_MASK];
01425     outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
01426                           CONST_BITS+PASS1_BITS+3)
01427                 & RANGE_MASK];
01428     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
01429                           CONST_BITS+PASS1_BITS+3)
01430                 & RANGE_MASK];
01431     outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
01432                           CONST_BITS+PASS1_BITS+3)
01433                 & RANGE_MASK];
01434     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
01435                           CONST_BITS+PASS1_BITS+3)
01436                 & RANGE_MASK];
01437     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
01438                           CONST_BITS+PASS1_BITS+3)
01439                 & RANGE_MASK];
01440 
01441     wsptr += 8;     /* advance pointer to next row */
01442   }
01443 }
01444 
01445 
01446 /*
01447  * Perform dequantization and inverse DCT on one block of coefficients,
01448  * producing a 11x11 output block.
01449  *
01450  * Optimized algorithm with 24 multiplications in the 1-D kernel.
01451  * cK represents sqrt(2) * cos(K*pi/22).
01452  */
01453 
01454 GLOBAL(void)
01455 jpeg_idct_11x11 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
01456          JCOEFPTR coef_block,
01457          JSAMPARRAY output_buf, JDIMENSION output_col)
01458 {
01459   INT32 tmp10, tmp11, tmp12, tmp13, tmp14;
01460   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25;
01461   INT32 z1, z2, z3, z4;
01462   JCOEFPTR inptr;
01463   ISLOW_MULT_TYPE * quantptr;
01464   int * wsptr;
01465   JSAMPROW outptr;
01466   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
01467   int ctr;
01468   int workspace[8*11];  /* buffers data between passes */
01469   SHIFT_TEMPS
01470 
01471   /* Pass 1: process columns from input, store into work array. */
01472 
01473   inptr = coef_block;
01474   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
01475   wsptr = workspace;
01476   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
01477     /* Even part */
01478 
01479     tmp10 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
01480     tmp10 <<= CONST_BITS;
01481     /* Add fudge factor here for final descale. */
01482     tmp10 += ONE << (CONST_BITS-PASS1_BITS-1);
01483 
01484     z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
01485     z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
01486     z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
01487 
01488     tmp20 = MULTIPLY(z2 - z3, FIX(2.546640132));     /* c2+c4 */
01489     tmp23 = MULTIPLY(z2 - z1, FIX(0.430815045));     /* c2-c6 */
01490     z4 = z1 + z3;
01491     tmp24 = MULTIPLY(z4, - FIX(1.155664402));        /* -(c2-c10) */
01492     z4 -= z2;
01493     tmp25 = tmp10 + MULTIPLY(z4, FIX(1.356927976));  /* c2 */
01494     tmp21 = tmp20 + tmp23 + tmp25 -
01495         MULTIPLY(z2, FIX(1.821790775));          /* c2+c4+c10-c6 */
01496     tmp20 += tmp25 + MULTIPLY(z3, FIX(2.115825087)); /* c4+c6 */
01497     tmp23 += tmp25 - MULTIPLY(z1, FIX(1.513598477)); /* c6+c8 */
01498     tmp24 += tmp25;
01499     tmp22 = tmp24 - MULTIPLY(z3, FIX(0.788749120));  /* c8+c10 */
01500     tmp24 += MULTIPLY(z2, FIX(1.944413522)) -        /* c2+c8 */
01501          MULTIPLY(z1, FIX(1.390975730));         /* c4+c10 */
01502     tmp25 = tmp10 - MULTIPLY(z4, FIX(1.414213562));  /* c0 */
01503 
01504     /* Odd part */
01505 
01506     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
01507     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
01508     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
01509     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
01510 
01511     tmp11 = z1 + z2;
01512     tmp14 = MULTIPLY(tmp11 + z3 + z4, FIX(0.398430003)); /* c9 */
01513     tmp11 = MULTIPLY(tmp11, FIX(0.887983902));           /* c3-c9 */
01514     tmp12 = MULTIPLY(z1 + z3, FIX(0.670361295));         /* c5-c9 */
01515     tmp13 = tmp14 + MULTIPLY(z1 + z4, FIX(0.366151574)); /* c7-c9 */
01516     tmp10 = tmp11 + tmp12 + tmp13 -
01517         MULTIPLY(z1, FIX(0.923107866));              /* c7+c5+c3-c1-2*c9 */
01518     z1    = tmp14 - MULTIPLY(z2 + z3, FIX(1.163011579)); /* c7+c9 */
01519     tmp11 += z1 + MULTIPLY(z2, FIX(2.073276588));        /* c1+c7+3*c9-c3 */
01520     tmp12 += z1 - MULTIPLY(z3, FIX(1.192193623));        /* c3+c5-c7-c9 */
01521     z1    = MULTIPLY(z2 + z4, - FIX(1.798248910));       /* -(c1+c9) */
01522     tmp11 += z1;
01523     tmp13 += z1 + MULTIPLY(z4, FIX(2.102458632));        /* c1+c5+c9-c7 */
01524     tmp14 += MULTIPLY(z2, - FIX(1.467221301)) +          /* -(c5+c9) */
01525          MULTIPLY(z3, FIX(1.001388905)) -            /* c1-c9 */
01526          MULTIPLY(z4, FIX(1.684843907));             /* c3+c9 */
01527 
01528     /* Final output stage */
01529 
01530     wsptr[8*0]  = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
01531     wsptr[8*10] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
01532     wsptr[8*1]  = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
01533     wsptr[8*9]  = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
01534     wsptr[8*2]  = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
01535     wsptr[8*8]  = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
01536     wsptr[8*3]  = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
01537     wsptr[8*7]  = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
01538     wsptr[8*4]  = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
01539     wsptr[8*6]  = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
01540     wsptr[8*5]  = (int) RIGHT_SHIFT(tmp25, CONST_BITS-PASS1_BITS);
01541   }
01542 
01543   /* Pass 2: process 11 rows from work array, store into output array. */
01544 
01545   wsptr = workspace;
01546   for (ctr = 0; ctr < 11; ctr++) {
01547     outptr = output_buf[ctr] + output_col;
01548 
01549     /* Even part */
01550 
01551     /* Add fudge factor here for final descale. */
01552     tmp10 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
01553     tmp10 <<= CONST_BITS;
01554 
01555     z1 = (INT32) wsptr[2];
01556     z2 = (INT32) wsptr[4];
01557     z3 = (INT32) wsptr[6];
01558 
01559     tmp20 = MULTIPLY(z2 - z3, FIX(2.546640132));     /* c2+c4 */
01560     tmp23 = MULTIPLY(z2 - z1, FIX(0.430815045));     /* c2-c6 */
01561     z4 = z1 + z3;
01562     tmp24 = MULTIPLY(z4, - FIX(1.155664402));        /* -(c2-c10) */
01563     z4 -= z2;
01564     tmp25 = tmp10 + MULTIPLY(z4, FIX(1.356927976));  /* c2 */
01565     tmp21 = tmp20 + tmp23 + tmp25 -
01566         MULTIPLY(z2, FIX(1.821790775));          /* c2+c4+c10-c6 */
01567     tmp20 += tmp25 + MULTIPLY(z3, FIX(2.115825087)); /* c4+c6 */
01568     tmp23 += tmp25 - MULTIPLY(z1, FIX(1.513598477)); /* c6+c8 */
01569     tmp24 += tmp25;
01570     tmp22 = tmp24 - MULTIPLY(z3, FIX(0.788749120));  /* c8+c10 */
01571     tmp24 += MULTIPLY(z2, FIX(1.944413522)) -        /* c2+c8 */
01572          MULTIPLY(z1, FIX(1.390975730));         /* c4+c10 */
01573     tmp25 = tmp10 - MULTIPLY(z4, FIX(1.414213562));  /* c0 */
01574 
01575     /* Odd part */
01576 
01577     z1 = (INT32) wsptr[1];
01578     z2 = (INT32) wsptr[3];
01579     z3 = (INT32) wsptr[5];
01580     z4 = (INT32) wsptr[7];
01581 
01582     tmp11 = z1 + z2;
01583     tmp14 = MULTIPLY(tmp11 + z3 + z4, FIX(0.398430003)); /* c9 */
01584     tmp11 = MULTIPLY(tmp11, FIX(0.887983902));           /* c3-c9 */
01585     tmp12 = MULTIPLY(z1 + z3, FIX(0.670361295));         /* c5-c9 */
01586     tmp13 = tmp14 + MULTIPLY(z1 + z4, FIX(0.366151574)); /* c7-c9 */
01587     tmp10 = tmp11 + tmp12 + tmp13 -
01588         MULTIPLY(z1, FIX(0.923107866));              /* c7+c5+c3-c1-2*c9 */
01589     z1    = tmp14 - MULTIPLY(z2 + z3, FIX(1.163011579)); /* c7+c9 */
01590     tmp11 += z1 + MULTIPLY(z2, FIX(2.073276588));        /* c1+c7+3*c9-c3 */
01591     tmp12 += z1 - MULTIPLY(z3, FIX(1.192193623));        /* c3+c5-c7-c9 */
01592     z1    = MULTIPLY(z2 + z4, - FIX(1.798248910));       /* -(c1+c9) */
01593     tmp11 += z1;
01594     tmp13 += z1 + MULTIPLY(z4, FIX(2.102458632));        /* c1+c5+c9-c7 */
01595     tmp14 += MULTIPLY(z2, - FIX(1.467221301)) +          /* -(c5+c9) */
01596          MULTIPLY(z3, FIX(1.001388905)) -            /* c1-c9 */
01597          MULTIPLY(z4, FIX(1.684843907));             /* c3+c9 */
01598 
01599     /* Final output stage */
01600 
01601     outptr[0]  = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
01602                            CONST_BITS+PASS1_BITS+3)
01603                  & RANGE_MASK];
01604     outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
01605                            CONST_BITS+PASS1_BITS+3)
01606                  & RANGE_MASK];
01607     outptr[1]  = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
01608                            CONST_BITS+PASS1_BITS+3)
01609                  & RANGE_MASK];
01610     outptr[9]  = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
01611                            CONST_BITS+PASS1_BITS+3)
01612                  & RANGE_MASK];
01613     outptr[2]  = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
01614                            CONST_BITS+PASS1_BITS+3)
01615                  & RANGE_MASK];
01616     outptr[8]  = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
01617                            CONST_BITS+PASS1_BITS+3)
01618                  & RANGE_MASK];
01619     outptr[3]  = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
01620                            CONST_BITS+PASS1_BITS+3)
01621                  & RANGE_MASK];
01622     outptr[7]  = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
01623                            CONST_BITS+PASS1_BITS+3)
01624                  & RANGE_MASK];
01625     outptr[4]  = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
01626                            CONST_BITS+PASS1_BITS+3)
01627                  & RANGE_MASK];
01628     outptr[6]  = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
01629                            CONST_BITS+PASS1_BITS+3)
01630                  & RANGE_MASK];
01631     outptr[5]  = range_limit[(int) RIGHT_SHIFT(tmp25,
01632                            CONST_BITS+PASS1_BITS+3)
01633                  & RANGE_MASK];
01634 
01635     wsptr += 8;     /* advance pointer to next row */
01636   }
01637 }
01638 
01639 
01640 /*
01641  * Perform dequantization and inverse DCT on one block of coefficients,
01642  * producing a 12x12 output block.
01643  *
01644  * Optimized algorithm with 15 multiplications in the 1-D kernel.
01645  * cK represents sqrt(2) * cos(K*pi/24).
01646  */
01647 
01648 GLOBAL(void)
01649 jpeg_idct_12x12 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
01650          JCOEFPTR coef_block,
01651          JSAMPARRAY output_buf, JDIMENSION output_col)
01652 {
01653   INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15;
01654   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25;
01655   INT32 z1, z2, z3, z4;
01656   JCOEFPTR inptr;
01657   ISLOW_MULT_TYPE * quantptr;
01658   int * wsptr;
01659   JSAMPROW outptr;
01660   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
01661   int ctr;
01662   int workspace[8*12];  /* buffers data between passes */
01663   SHIFT_TEMPS
01664 
01665   /* Pass 1: process columns from input, store into work array. */
01666 
01667   inptr = coef_block;
01668   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
01669   wsptr = workspace;
01670   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
01671     /* Even part */
01672 
01673     z3 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
01674     z3 <<= CONST_BITS;
01675     /* Add fudge factor here for final descale. */
01676     z3 += ONE << (CONST_BITS-PASS1_BITS-1);
01677 
01678     z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
01679     z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */
01680 
01681     tmp10 = z3 + z4;
01682     tmp11 = z3 - z4;
01683 
01684     z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
01685     z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */
01686     z1 <<= CONST_BITS;
01687     z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
01688     z2 <<= CONST_BITS;
01689 
01690     tmp12 = z1 - z2;
01691 
01692     tmp21 = z3 + tmp12;
01693     tmp24 = z3 - tmp12;
01694 
01695     tmp12 = z4 + z2;
01696 
01697     tmp20 = tmp10 + tmp12;
01698     tmp25 = tmp10 - tmp12;
01699 
01700     tmp12 = z4 - z1 - z2;
01701 
01702     tmp22 = tmp11 + tmp12;
01703     tmp23 = tmp11 - tmp12;
01704 
01705     /* Odd part */
01706 
01707     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
01708     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
01709     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
01710     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
01711 
01712     tmp11 = MULTIPLY(z2, FIX(1.306562965));                  /* c3 */
01713     tmp14 = MULTIPLY(z2, - FIX_0_541196100);                 /* -c9 */
01714 
01715     tmp10 = z1 + z3;
01716     tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669));          /* c7 */
01717     tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384));       /* c5-c7 */
01718     tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716));  /* c1-c5 */
01719     tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580));           /* -(c7+c11) */
01720     tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */
01721     tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */
01722     tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) -        /* c7-c11 */
01723          MULTIPLY(z4, FIX(1.982889723));                 /* c5+c7 */
01724 
01725     z1 -= z4;
01726     z2 -= z3;
01727     z3 = MULTIPLY(z1 + z2, FIX_0_541196100);                 /* c9 */
01728     tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865);              /* c3-c9 */
01729     tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065);              /* c3+c9 */
01730 
01731     /* Final output stage */
01732 
01733     wsptr[8*0]  = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
01734     wsptr[8*11] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
01735     wsptr[8*1]  = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
01736     wsptr[8*10] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
01737     wsptr[8*2]  = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
01738     wsptr[8*9]  = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
01739     wsptr[8*3]  = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
01740     wsptr[8*8]  = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
01741     wsptr[8*4]  = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
01742     wsptr[8*7]  = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
01743     wsptr[8*5]  = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
01744     wsptr[8*6]  = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
01745   }
01746 
01747   /* Pass 2: process 12 rows from work array, store into output array. */
01748 
01749   wsptr = workspace;
01750   for (ctr = 0; ctr < 12; ctr++) {
01751     outptr = output_buf[ctr] + output_col;
01752 
01753     /* Even part */
01754 
01755     /* Add fudge factor here for final descale. */
01756     z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
01757     z3 <<= CONST_BITS;
01758 
01759     z4 = (INT32) wsptr[4];
01760     z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */
01761 
01762     tmp10 = z3 + z4;
01763     tmp11 = z3 - z4;
01764 
01765     z1 = (INT32) wsptr[2];
01766     z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */
01767     z1 <<= CONST_BITS;
01768     z2 = (INT32) wsptr[6];
01769     z2 <<= CONST_BITS;
01770 
01771     tmp12 = z1 - z2;
01772 
01773     tmp21 = z3 + tmp12;
01774     tmp24 = z3 - tmp12;
01775 
01776     tmp12 = z4 + z2;
01777 
01778     tmp20 = tmp10 + tmp12;
01779     tmp25 = tmp10 - tmp12;
01780 
01781     tmp12 = z4 - z1 - z2;
01782 
01783     tmp22 = tmp11 + tmp12;
01784     tmp23 = tmp11 - tmp12;
01785 
01786     /* Odd part */
01787 
01788     z1 = (INT32) wsptr[1];
01789     z2 = (INT32) wsptr[3];
01790     z3 = (INT32) wsptr[5];
01791     z4 = (INT32) wsptr[7];
01792 
01793     tmp11 = MULTIPLY(z2, FIX(1.306562965));                  /* c3 */
01794     tmp14 = MULTIPLY(z2, - FIX_0_541196100);                 /* -c9 */
01795 
01796     tmp10 = z1 + z3;
01797     tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669));          /* c7 */
01798     tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384));       /* c5-c7 */
01799     tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716));  /* c1-c5 */
01800     tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580));           /* -(c7+c11) */
01801     tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */
01802     tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */
01803     tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) -        /* c7-c11 */
01804          MULTIPLY(z4, FIX(1.982889723));                 /* c5+c7 */
01805 
01806     z1 -= z4;
01807     z2 -= z3;
01808     z3 = MULTIPLY(z1 + z2, FIX_0_541196100);                 /* c9 */
01809     tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865);              /* c3-c9 */
01810     tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065);              /* c3+c9 */
01811 
01812     /* Final output stage */
01813 
01814     outptr[0]  = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
01815                            CONST_BITS+PASS1_BITS+3)
01816                  & RANGE_MASK];
01817     outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
01818                            CONST_BITS+PASS1_BITS+3)
01819                  & RANGE_MASK];
01820     outptr[1]  = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
01821                            CONST_BITS+PASS1_BITS+3)
01822                  & RANGE_MASK];
01823     outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
01824                            CONST_BITS+PASS1_BITS+3)
01825                  & RANGE_MASK];
01826     outptr[2]  = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
01827                            CONST_BITS+PASS1_BITS+3)
01828                  & RANGE_MASK];
01829     outptr[9]  = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
01830                            CONST_BITS+PASS1_BITS+3)
01831                  & RANGE_MASK];
01832     outptr[3]  = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
01833                            CONST_BITS+PASS1_BITS+3)
01834                  & RANGE_MASK];
01835     outptr[8]  = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
01836                            CONST_BITS+PASS1_BITS+3)
01837                  & RANGE_MASK];
01838     outptr[4]  = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
01839                            CONST_BITS+PASS1_BITS+3)
01840                  & RANGE_MASK];
01841     outptr[7]  = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
01842                            CONST_BITS+PASS1_BITS+3)
01843                  & RANGE_MASK];
01844     outptr[5]  = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
01845                            CONST_BITS+PASS1_BITS+3)
01846                  & RANGE_MASK];
01847     outptr[6]  = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
01848                            CONST_BITS+PASS1_BITS+3)
01849                  & RANGE_MASK];
01850 
01851     wsptr += 8;     /* advance pointer to next row */
01852   }
01853 }
01854 
01855 
01856 /*
01857  * Perform dequantization and inverse DCT on one block of coefficients,
01858  * producing a 13x13 output block.
01859  *
01860  * Optimized algorithm with 29 multiplications in the 1-D kernel.
01861  * cK represents sqrt(2) * cos(K*pi/26).
01862  */
01863 
01864 GLOBAL(void)
01865 jpeg_idct_13x13 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
01866          JCOEFPTR coef_block,
01867          JSAMPARRAY output_buf, JDIMENSION output_col)
01868 {
01869   INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15;
01870   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26;
01871   INT32 z1, z2, z3, z4;
01872   JCOEFPTR inptr;
01873   ISLOW_MULT_TYPE * quantptr;
01874   int * wsptr;
01875   JSAMPROW outptr;
01876   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
01877   int ctr;
01878   int workspace[8*13];  /* buffers data between passes */
01879   SHIFT_TEMPS
01880 
01881   /* Pass 1: process columns from input, store into work array. */
01882 
01883   inptr = coef_block;
01884   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
01885   wsptr = workspace;
01886   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
01887     /* Even part */
01888 
01889     z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
01890     z1 <<= CONST_BITS;
01891     /* Add fudge factor here for final descale. */
01892     z1 += ONE << (CONST_BITS-PASS1_BITS-1);
01893 
01894     z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
01895     z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
01896     z4 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
01897 
01898     tmp10 = z3 + z4;
01899     tmp11 = z3 - z4;
01900 
01901     tmp12 = MULTIPLY(tmp10, FIX(1.155388986));                /* (c4+c6)/2 */
01902     tmp13 = MULTIPLY(tmp11, FIX(0.096834934)) + z1;           /* (c4-c6)/2 */
01903 
01904     tmp20 = MULTIPLY(z2, FIX(1.373119086)) + tmp12 + tmp13;   /* c2 */
01905     tmp22 = MULTIPLY(z2, FIX(0.501487041)) - tmp12 + tmp13;   /* c10 */
01906 
01907     tmp12 = MULTIPLY(tmp10, FIX(0.316450131));                /* (c8-c12)/2 */
01908     tmp13 = MULTIPLY(tmp11, FIX(0.486914739)) + z1;           /* (c8+c12)/2 */
01909 
01910     tmp21 = MULTIPLY(z2, FIX(1.058554052)) - tmp12 + tmp13;   /* c6 */
01911     tmp25 = MULTIPLY(z2, - FIX(1.252223920)) + tmp12 + tmp13; /* c4 */
01912 
01913     tmp12 = MULTIPLY(tmp10, FIX(0.435816023));                /* (c2-c10)/2 */
01914     tmp13 = MULTIPLY(tmp11, FIX(0.937303064)) - z1;           /* (c2+c10)/2 */
01915 
01916     tmp23 = MULTIPLY(z2, - FIX(0.170464608)) - tmp12 - tmp13; /* c12 */
01917     tmp24 = MULTIPLY(z2, - FIX(0.803364869)) + tmp12 - tmp13; /* c8 */
01918 
01919     tmp26 = MULTIPLY(tmp11 - z2, FIX(1.414213562)) + z1;      /* c0 */
01920 
01921     /* Odd part */
01922 
01923     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
01924     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
01925     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
01926     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
01927 
01928     tmp11 = MULTIPLY(z1 + z2, FIX(1.322312651));     /* c3 */
01929     tmp12 = MULTIPLY(z1 + z3, FIX(1.163874945));     /* c5 */
01930     tmp15 = z1 + z4;
01931     tmp13 = MULTIPLY(tmp15, FIX(0.937797057));       /* c7 */
01932     tmp10 = tmp11 + tmp12 + tmp13 -
01933         MULTIPLY(z1, FIX(2.020082300));          /* c7+c5+c3-c1 */
01934     tmp14 = MULTIPLY(z2 + z3, - FIX(0.338443458));   /* -c11 */
01935     tmp11 += tmp14 + MULTIPLY(z2, FIX(0.837223564)); /* c5+c9+c11-c3 */
01936     tmp12 += tmp14 - MULTIPLY(z3, FIX(1.572116027)); /* c1+c5-c9-c11 */
01937     tmp14 = MULTIPLY(z2 + z4, - FIX(1.163874945));   /* -c5 */
01938     tmp11 += tmp14;
01939     tmp13 += tmp14 + MULTIPLY(z4, FIX(2.205608352)); /* c3+c5+c9-c7 */
01940     tmp14 = MULTIPLY(z3 + z4, - FIX(0.657217813));   /* -c9 */
01941     tmp12 += tmp14;
01942     tmp13 += tmp14;
01943     tmp15 = MULTIPLY(tmp15, FIX(0.338443458));       /* c11 */
01944     tmp14 = tmp15 + MULTIPLY(z1, FIX(0.318774355)) - /* c9-c11 */
01945         MULTIPLY(z2, FIX(0.466105296));          /* c1-c7 */
01946     z1    = MULTIPLY(z3 - z2, FIX(0.937797057));     /* c7 */
01947     tmp14 += z1;
01948     tmp15 += z1 + MULTIPLY(z3, FIX(0.384515595)) -   /* c3-c7 */
01949          MULTIPLY(z4, FIX(1.742345811));         /* c1+c11 */
01950 
01951     /* Final output stage */
01952 
01953     wsptr[8*0]  = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
01954     wsptr[8*12] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
01955     wsptr[8*1]  = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
01956     wsptr[8*11] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
01957     wsptr[8*2]  = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
01958     wsptr[8*10] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
01959     wsptr[8*3]  = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
01960     wsptr[8*9]  = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
01961     wsptr[8*4]  = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
01962     wsptr[8*8]  = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
01963     wsptr[8*5]  = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
01964     wsptr[8*7]  = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
01965     wsptr[8*6]  = (int) RIGHT_SHIFT(tmp26, CONST_BITS-PASS1_BITS);
01966   }
01967 
01968   /* Pass 2: process 13 rows from work array, store into output array. */
01969 
01970   wsptr = workspace;
01971   for (ctr = 0; ctr < 13; ctr++) {
01972     outptr = output_buf[ctr] + output_col;
01973 
01974     /* Even part */
01975 
01976     /* Add fudge factor here for final descale. */
01977     z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
01978     z1 <<= CONST_BITS;
01979 
01980     z2 = (INT32) wsptr[2];
01981     z3 = (INT32) wsptr[4];
01982     z4 = (INT32) wsptr[6];
01983 
01984     tmp10 = z3 + z4;
01985     tmp11 = z3 - z4;
01986 
01987     tmp12 = MULTIPLY(tmp10, FIX(1.155388986));                /* (c4+c6)/2 */
01988     tmp13 = MULTIPLY(tmp11, FIX(0.096834934)) + z1;           /* (c4-c6)/2 */
01989 
01990     tmp20 = MULTIPLY(z2, FIX(1.373119086)) + tmp12 + tmp13;   /* c2 */
01991     tmp22 = MULTIPLY(z2, FIX(0.501487041)) - tmp12 + tmp13;   /* c10 */
01992 
01993     tmp12 = MULTIPLY(tmp10, FIX(0.316450131));                /* (c8-c12)/2 */
01994     tmp13 = MULTIPLY(tmp11, FIX(0.486914739)) + z1;           /* (c8+c12)/2 */
01995 
01996     tmp21 = MULTIPLY(z2, FIX(1.058554052)) - tmp12 + tmp13;   /* c6 */
01997     tmp25 = MULTIPLY(z2, - FIX(1.252223920)) + tmp12 + tmp13; /* c4 */
01998 
01999     tmp12 = MULTIPLY(tmp10, FIX(0.435816023));                /* (c2-c10)/2 */
02000     tmp13 = MULTIPLY(tmp11, FIX(0.937303064)) - z1;           /* (c2+c10)/2 */
02001 
02002     tmp23 = MULTIPLY(z2, - FIX(0.170464608)) - tmp12 - tmp13; /* c12 */
02003     tmp24 = MULTIPLY(z2, - FIX(0.803364869)) + tmp12 - tmp13; /* c8 */
02004 
02005     tmp26 = MULTIPLY(tmp11 - z2, FIX(1.414213562)) + z1;      /* c0 */
02006 
02007     /* Odd part */
02008 
02009     z1 = (INT32) wsptr[1];
02010     z2 = (INT32) wsptr[3];
02011     z3 = (INT32) wsptr[5];
02012     z4 = (INT32) wsptr[7];
02013 
02014     tmp11 = MULTIPLY(z1 + z2, FIX(1.322312651));     /* c3 */
02015     tmp12 = MULTIPLY(z1 + z3, FIX(1.163874945));     /* c5 */
02016     tmp15 = z1 + z4;
02017     tmp13 = MULTIPLY(tmp15, FIX(0.937797057));       /* c7 */
02018     tmp10 = tmp11 + tmp12 + tmp13 -
02019         MULTIPLY(z1, FIX(2.020082300));          /* c7+c5+c3-c1 */
02020     tmp14 = MULTIPLY(z2 + z3, - FIX(0.338443458));   /* -c11 */
02021     tmp11 += tmp14 + MULTIPLY(z2, FIX(0.837223564)); /* c5+c9+c11-c3 */
02022     tmp12 += tmp14 - MULTIPLY(z3, FIX(1.572116027)); /* c1+c5-c9-c11 */
02023     tmp14 = MULTIPLY(z2 + z4, - FIX(1.163874945));   /* -c5 */
02024     tmp11 += tmp14;
02025     tmp13 += tmp14 + MULTIPLY(z4, FIX(2.205608352)); /* c3+c5+c9-c7 */
02026     tmp14 = MULTIPLY(z3 + z4, - FIX(0.657217813));   /* -c9 */
02027     tmp12 += tmp14;
02028     tmp13 += tmp14;
02029     tmp15 = MULTIPLY(tmp15, FIX(0.338443458));       /* c11 */
02030     tmp14 = tmp15 + MULTIPLY(z1, FIX(0.318774355)) - /* c9-c11 */
02031         MULTIPLY(z2, FIX(0.466105296));          /* c1-c7 */
02032     z1    = MULTIPLY(z3 - z2, FIX(0.937797057));     /* c7 */
02033     tmp14 += z1;
02034     tmp15 += z1 + MULTIPLY(z3, FIX(0.384515595)) -   /* c3-c7 */
02035          MULTIPLY(z4, FIX(1.742345811));         /* c1+c11 */
02036 
02037     /* Final output stage */
02038 
02039     outptr[0]  = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
02040                            CONST_BITS+PASS1_BITS+3)
02041                  & RANGE_MASK];
02042     outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
02043                            CONST_BITS+PASS1_BITS+3)
02044                  & RANGE_MASK];
02045     outptr[1]  = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
02046                            CONST_BITS+PASS1_BITS+3)
02047                  & RANGE_MASK];
02048     outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
02049                            CONST_BITS+PASS1_BITS+3)
02050                  & RANGE_MASK];
02051     outptr[2]  = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
02052                            CONST_BITS+PASS1_BITS+3)
02053                  & RANGE_MASK];
02054     outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
02055                            CONST_BITS+PASS1_BITS+3)
02056                  & RANGE_MASK];
02057     outptr[3]  = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
02058                            CONST_BITS+PASS1_BITS+3)
02059                  & RANGE_MASK];
02060     outptr[9]  = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
02061                            CONST_BITS+PASS1_BITS+3)
02062                  & RANGE_MASK];
02063     outptr[4]  = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
02064                            CONST_BITS+PASS1_BITS+3)
02065                  & RANGE_MASK];
02066     outptr[8]  = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
02067                            CONST_BITS+PASS1_BITS+3)
02068                  & RANGE_MASK];
02069     outptr[5]  = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
02070                            CONST_BITS+PASS1_BITS+3)
02071                  & RANGE_MASK];
02072     outptr[7]  = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
02073                            CONST_BITS+PASS1_BITS+3)
02074                  & RANGE_MASK];
02075     outptr[6]  = range_limit[(int) RIGHT_SHIFT(tmp26,
02076                            CONST_BITS+PASS1_BITS+3)
02077                  & RANGE_MASK];
02078 
02079     wsptr += 8;     /* advance pointer to next row */
02080   }
02081 }
02082 
02083 
02084 /*
02085  * Perform dequantization and inverse DCT on one block of coefficients,
02086  * producing a 14x14 output block.
02087  *
02088  * Optimized algorithm with 20 multiplications in the 1-D kernel.
02089  * cK represents sqrt(2) * cos(K*pi/28).
02090  */
02091 
02092 GLOBAL(void)
02093 jpeg_idct_14x14 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
02094          JCOEFPTR coef_block,
02095          JSAMPARRAY output_buf, JDIMENSION output_col)
02096 {
02097   INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16;
02098   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26;
02099   INT32 z1, z2, z3, z4;
02100   JCOEFPTR inptr;
02101   ISLOW_MULT_TYPE * quantptr;
02102   int * wsptr;
02103   JSAMPROW outptr;
02104   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
02105   int ctr;
02106   int workspace[8*14];  /* buffers data between passes */
02107   SHIFT_TEMPS
02108 
02109   /* Pass 1: process columns from input, store into work array. */
02110 
02111   inptr = coef_block;
02112   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
02113   wsptr = workspace;
02114   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
02115     /* Even part */
02116 
02117     z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
02118     z1 <<= CONST_BITS;
02119     /* Add fudge factor here for final descale. */
02120     z1 += ONE << (CONST_BITS-PASS1_BITS-1);
02121     z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
02122     z2 = MULTIPLY(z4, FIX(1.274162392));         /* c4 */
02123     z3 = MULTIPLY(z4, FIX(0.314692123));         /* c12 */
02124     z4 = MULTIPLY(z4, FIX(0.881747734));         /* c8 */
02125 
02126     tmp10 = z1 + z2;
02127     tmp11 = z1 + z3;
02128     tmp12 = z1 - z4;
02129 
02130     tmp23 = RIGHT_SHIFT(z1 - ((z2 + z3 - z4) << 1), /* c0 = (c4+c12-c8)*2 */
02131             CONST_BITS-PASS1_BITS);
02132 
02133     z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
02134     z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
02135 
02136     z3 = MULTIPLY(z1 + z2, FIX(1.105676686));    /* c6 */
02137 
02138     tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */
02139     tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */
02140     tmp15 = MULTIPLY(z1, FIX(0.613604268)) -     /* c10 */
02141         MULTIPLY(z2, FIX(1.378756276));      /* c2 */
02142 
02143     tmp20 = tmp10 + tmp13;
02144     tmp26 = tmp10 - tmp13;
02145     tmp21 = tmp11 + tmp14;
02146     tmp25 = tmp11 - tmp14;
02147     tmp22 = tmp12 + tmp15;
02148     tmp24 = tmp12 - tmp15;
02149 
02150     /* Odd part */
02151 
02152     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
02153     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
02154     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
02155     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
02156     tmp13 = z4 << CONST_BITS;
02157 
02158     tmp14 = z1 + z3;
02159     tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607));           /* c3 */
02160     tmp12 = MULTIPLY(tmp14, FIX(1.197448846));             /* c5 */
02161     tmp10 = tmp11 + tmp12 + tmp13 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */
02162     tmp14 = MULTIPLY(tmp14, FIX(0.752406978));             /* c9 */
02163     tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426));        /* c9+c11-c13 */
02164     z1    -= z2;
02165     tmp15 = MULTIPLY(z1, FIX(0.467085129)) - tmp13;        /* c11 */
02166     tmp16 += tmp15;
02167     z1    += z4;
02168     z4    = MULTIPLY(z2 + z3, - FIX(0.158341681)) - tmp13; /* -c13 */
02169     tmp11 += z4 - MULTIPLY(z2, FIX(0.424103948));          /* c3-c9-c13 */
02170     tmp12 += z4 - MULTIPLY(z3, FIX(2.373959773));          /* c3+c5-c13 */
02171     z4    = MULTIPLY(z3 - z2, FIX(1.405321284));           /* c1 */
02172     tmp14 += z4 + tmp13 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */
02173     tmp15 += z4 + MULTIPLY(z2, FIX(0.674957567));          /* c1+c11-c5 */
02174 
02175     tmp13 = (z1 - z3) << PASS1_BITS;
02176 
02177     /* Final output stage */
02178 
02179     wsptr[8*0]  = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
02180     wsptr[8*13] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
02181     wsptr[8*1]  = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
02182     wsptr[8*12] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
02183     wsptr[8*2]  = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
02184     wsptr[8*11] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
02185     wsptr[8*3]  = (int) (tmp23 + tmp13);
02186     wsptr[8*10] = (int) (tmp23 - tmp13);
02187     wsptr[8*4]  = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
02188     wsptr[8*9]  = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
02189     wsptr[8*5]  = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
02190     wsptr[8*8]  = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
02191     wsptr[8*6]  = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS);
02192     wsptr[8*7]  = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS);
02193   }
02194 
02195   /* Pass 2: process 14 rows from work array, store into output array. */
02196 
02197   wsptr = workspace;
02198   for (ctr = 0; ctr < 14; ctr++) {
02199     outptr = output_buf[ctr] + output_col;
02200 
02201     /* Even part */
02202 
02203     /* Add fudge factor here for final descale. */
02204     z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
02205     z1 <<= CONST_BITS;
02206     z4 = (INT32) wsptr[4];
02207     z2 = MULTIPLY(z4, FIX(1.274162392));         /* c4 */
02208     z3 = MULTIPLY(z4, FIX(0.314692123));         /* c12 */
02209     z4 = MULTIPLY(z4, FIX(0.881747734));         /* c8 */
02210 
02211     tmp10 = z1 + z2;
02212     tmp11 = z1 + z3;
02213     tmp12 = z1 - z4;
02214 
02215     tmp23 = z1 - ((z2 + z3 - z4) << 1);          /* c0 = (c4+c12-c8)*2 */
02216 
02217     z1 = (INT32) wsptr[2];
02218     z2 = (INT32) wsptr[6];
02219 
02220     z3 = MULTIPLY(z1 + z2, FIX(1.105676686));    /* c6 */
02221 
02222     tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */
02223     tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */
02224     tmp15 = MULTIPLY(z1, FIX(0.613604268)) -     /* c10 */
02225         MULTIPLY(z2, FIX(1.378756276));      /* c2 */
02226 
02227     tmp20 = tmp10 + tmp13;
02228     tmp26 = tmp10 - tmp13;
02229     tmp21 = tmp11 + tmp14;
02230     tmp25 = tmp11 - tmp14;
02231     tmp22 = tmp12 + tmp15;
02232     tmp24 = tmp12 - tmp15;
02233 
02234     /* Odd part */
02235 
02236     z1 = (INT32) wsptr[1];
02237     z2 = (INT32) wsptr[3];
02238     z3 = (INT32) wsptr[5];
02239     z4 = (INT32) wsptr[7];
02240     z4 <<= CONST_BITS;
02241 
02242     tmp14 = z1 + z3;
02243     tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607));           /* c3 */
02244     tmp12 = MULTIPLY(tmp14, FIX(1.197448846));             /* c5 */
02245     tmp10 = tmp11 + tmp12 + z4 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */
02246     tmp14 = MULTIPLY(tmp14, FIX(0.752406978));             /* c9 */
02247     tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426));        /* c9+c11-c13 */
02248     z1    -= z2;
02249     tmp15 = MULTIPLY(z1, FIX(0.467085129)) - z4;           /* c11 */
02250     tmp16 += tmp15;
02251     tmp13 = MULTIPLY(z2 + z3, - FIX(0.158341681)) - z4;    /* -c13 */
02252     tmp11 += tmp13 - MULTIPLY(z2, FIX(0.424103948));       /* c3-c9-c13 */
02253     tmp12 += tmp13 - MULTIPLY(z3, FIX(2.373959773));       /* c3+c5-c13 */
02254     tmp13 = MULTIPLY(z3 - z2, FIX(1.405321284));           /* c1 */
02255     tmp14 += tmp13 + z4 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */
02256     tmp15 += tmp13 + MULTIPLY(z2, FIX(0.674957567));       /* c1+c11-c5 */
02257 
02258     tmp13 = ((z1 - z3) << CONST_BITS) + z4;
02259 
02260     /* Final output stage */
02261 
02262     outptr[0]  = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
02263                            CONST_BITS+PASS1_BITS+3)
02264                  & RANGE_MASK];
02265     outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
02266                            CONST_BITS+PASS1_BITS+3)
02267                  & RANGE_MASK];
02268     outptr[1]  = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
02269                            CONST_BITS+PASS1_BITS+3)
02270                  & RANGE_MASK];
02271     outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
02272                            CONST_BITS+PASS1_BITS+3)
02273                  & RANGE_MASK];
02274     outptr[2]  = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
02275                            CONST_BITS+PASS1_BITS+3)
02276                  & RANGE_MASK];
02277     outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
02278                            CONST_BITS+PASS1_BITS+3)
02279                  & RANGE_MASK];
02280     outptr[3]  = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
02281                            CONST_BITS+PASS1_BITS+3)
02282                  & RANGE_MASK];
02283     outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
02284                            CONST_BITS+PASS1_BITS+3)
02285                  & RANGE_MASK];
02286     outptr[4]  = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
02287                            CONST_BITS+PASS1_BITS+3)
02288                  & RANGE_MASK];
02289     outptr[9]  = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
02290                            CONST_BITS+PASS1_BITS+3)
02291                  & RANGE_MASK];
02292     outptr[5]  = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
02293                            CONST_BITS+PASS1_BITS+3)
02294                  & RANGE_MASK];
02295     outptr[8]  = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
02296                            CONST_BITS+PASS1_BITS+3)
02297                  & RANGE_MASK];
02298     outptr[6]  = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16,
02299                            CONST_BITS+PASS1_BITS+3)
02300                  & RANGE_MASK];
02301     outptr[7]  = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16,
02302                            CONST_BITS+PASS1_BITS+3)
02303                  & RANGE_MASK];
02304 
02305     wsptr += 8;     /* advance pointer to next row */
02306   }
02307 }
02308 
02309 
02310 /*
02311  * Perform dequantization and inverse DCT on one block of coefficients,
02312  * producing a 15x15 output block.
02313  *
02314  * Optimized algorithm with 22 multiplications in the 1-D kernel.
02315  * cK represents sqrt(2) * cos(K*pi/30).
02316  */
02317 
02318 GLOBAL(void)
02319 jpeg_idct_15x15 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
02320          JCOEFPTR coef_block,
02321          JSAMPARRAY output_buf, JDIMENSION output_col)
02322 {
02323   INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16;
02324   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27;
02325   INT32 z1, z2, z3, z4;
02326   JCOEFPTR inptr;
02327   ISLOW_MULT_TYPE * quantptr;
02328   int * wsptr;
02329   JSAMPROW outptr;
02330   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
02331   int ctr;
02332   int workspace[8*15];  /* buffers data between passes */
02333   SHIFT_TEMPS
02334 
02335   /* Pass 1: process columns from input, store into work array. */
02336 
02337   inptr = coef_block;
02338   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
02339   wsptr = workspace;
02340   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
02341     /* Even part */
02342 
02343     z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
02344     z1 <<= CONST_BITS;
02345     /* Add fudge factor here for final descale. */
02346     z1 += ONE << (CONST_BITS-PASS1_BITS-1);
02347 
02348     z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
02349     z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
02350     z4 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
02351 
02352     tmp10 = MULTIPLY(z4, FIX(0.437016024)); /* c12 */
02353     tmp11 = MULTIPLY(z4, FIX(1.144122806)); /* c6 */
02354 
02355     tmp12 = z1 - tmp10;
02356     tmp13 = z1 + tmp11;
02357     z1 -= (tmp11 - tmp10) << 1;             /* c0 = (c6-c12)*2 */
02358 
02359     z4 = z2 - z3;
02360     z3 += z2;
02361     tmp10 = MULTIPLY(z3, FIX(1.337628990)); /* (c2+c4)/2 */
02362     tmp11 = MULTIPLY(z4, FIX(0.045680613)); /* (c2-c4)/2 */
02363     z2 = MULTIPLY(z2, FIX(1.439773946));    /* c4+c14 */
02364 
02365     tmp20 = tmp13 + tmp10 + tmp11;
02366     tmp23 = tmp12 - tmp10 + tmp11 + z2;
02367 
02368     tmp10 = MULTIPLY(z3, FIX(0.547059574)); /* (c8+c14)/2 */
02369     tmp11 = MULTIPLY(z4, FIX(0.399234004)); /* (c8-c14)/2 */
02370 
02371     tmp25 = tmp13 - tmp10 - tmp11;
02372     tmp26 = tmp12 + tmp10 - tmp11 - z2;
02373 
02374     tmp10 = MULTIPLY(z3, FIX(0.790569415)); /* (c6+c12)/2 */
02375     tmp11 = MULTIPLY(z4, FIX(0.353553391)); /* (c6-c12)/2 */
02376 
02377     tmp21 = tmp12 + tmp10 + tmp11;
02378     tmp24 = tmp13 - tmp10 + tmp11;
02379     tmp11 += tmp11;
02380     tmp22 = z1 + tmp11;                     /* c10 = c6-c12 */
02381     tmp27 = z1 - tmp11 - tmp11;             /* c0 = (c6-c12)*2 */
02382 
02383     /* Odd part */
02384 
02385     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
02386     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
02387     z4 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
02388     z3 = MULTIPLY(z4, FIX(1.224744871));                    /* c5 */
02389     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
02390 
02391     tmp13 = z2 - z4;
02392     tmp15 = MULTIPLY(z1 + tmp13, FIX(0.831253876));         /* c9 */
02393     tmp11 = tmp15 + MULTIPLY(z1, FIX(0.513743148));         /* c3-c9 */
02394     tmp14 = tmp15 - MULTIPLY(tmp13, FIX(2.176250899));      /* c3+c9 */
02395 
02396     tmp13 = MULTIPLY(z2, - FIX(0.831253876));               /* -c9 */
02397     tmp15 = MULTIPLY(z2, - FIX(1.344997024));               /* -c3 */
02398     z2 = z1 - z4;
02399     tmp12 = z3 + MULTIPLY(z2, FIX(1.406466353));            /* c1 */
02400 
02401     tmp10 = tmp12 + MULTIPLY(z4, FIX(2.457431844)) - tmp15; /* c1+c7 */
02402     tmp16 = tmp12 - MULTIPLY(z1, FIX(1.112434820)) + tmp13; /* c1-c13 */
02403     tmp12 = MULTIPLY(z2, FIX(1.224744871)) - z3;            /* c5 */
02404     z2 = MULTIPLY(z1 + z4, FIX(0.575212477));               /* c11 */
02405     tmp13 += z2 + MULTIPLY(z1, FIX(0.475753014)) - z3;      /* c7-c11 */
02406     tmp15 += z2 - MULTIPLY(z4, FIX(0.869244010)) + z3;      /* c11+c13 */
02407 
02408     /* Final output stage */
02409 
02410     wsptr[8*0]  = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
02411     wsptr[8*14] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
02412     wsptr[8*1]  = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
02413     wsptr[8*13] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
02414     wsptr[8*2]  = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
02415     wsptr[8*12] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
02416     wsptr[8*3]  = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
02417     wsptr[8*11] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
02418     wsptr[8*4]  = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
02419     wsptr[8*10] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
02420     wsptr[8*5]  = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
02421     wsptr[8*9]  = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
02422     wsptr[8*6]  = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS);
02423     wsptr[8*8]  = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS);
02424     wsptr[8*7]  = (int) RIGHT_SHIFT(tmp27, CONST_BITS-PASS1_BITS);
02425   }
02426 
02427   /* Pass 2: process 15 rows from work array, store into output array. */
02428 
02429   wsptr = workspace;
02430   for (ctr = 0; ctr < 15; ctr++) {
02431     outptr = output_buf[ctr] + output_col;
02432 
02433     /* Even part */
02434 
02435     /* Add fudge factor here for final descale. */
02436     z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
02437     z1 <<= CONST_BITS;
02438 
02439     z2 = (INT32) wsptr[2];
02440     z3 = (INT32) wsptr[4];
02441     z4 = (INT32) wsptr[6];
02442 
02443     tmp10 = MULTIPLY(z4, FIX(0.437016024)); /* c12 */
02444     tmp11 = MULTIPLY(z4, FIX(1.144122806)); /* c6 */
02445 
02446     tmp12 = z1 - tmp10;
02447     tmp13 = z1 + tmp11;
02448     z1 -= (tmp11 - tmp10) << 1;             /* c0 = (c6-c12)*2 */
02449 
02450     z4 = z2 - z3;
02451     z3 += z2;
02452     tmp10 = MULTIPLY(z3, FIX(1.337628990)); /* (c2+c4)/2 */
02453     tmp11 = MULTIPLY(z4, FIX(0.045680613)); /* (c2-c4)/2 */
02454     z2 = MULTIPLY(z2, FIX(1.439773946));    /* c4+c14 */
02455 
02456     tmp20 = tmp13 + tmp10 + tmp11;
02457     tmp23 = tmp12 - tmp10 + tmp11 + z2;
02458 
02459     tmp10 = MULTIPLY(z3, FIX(0.547059574)); /* (c8+c14)/2 */
02460     tmp11 = MULTIPLY(z4, FIX(0.399234004)); /* (c8-c14)/2 */
02461 
02462     tmp25 = tmp13 - tmp10 - tmp11;
02463     tmp26 = tmp12 + tmp10 - tmp11 - z2;
02464 
02465     tmp10 = MULTIPLY(z3, FIX(0.790569415)); /* (c6+c12)/2 */
02466     tmp11 = MULTIPLY(z4, FIX(0.353553391)); /* (c6-c12)/2 */
02467 
02468     tmp21 = tmp12 + tmp10 + tmp11;
02469     tmp24 = tmp13 - tmp10 + tmp11;
02470     tmp11 += tmp11;
02471     tmp22 = z1 + tmp11;                     /* c10 = c6-c12 */
02472     tmp27 = z1 - tmp11 - tmp11;             /* c0 = (c6-c12)*2 */
02473 
02474     /* Odd part */
02475 
02476     z1 = (INT32) wsptr[1];
02477     z2 = (INT32) wsptr[3];
02478     z4 = (INT32) wsptr[5];
02479     z3 = MULTIPLY(z4, FIX(1.224744871));                    /* c5 */
02480     z4 = (INT32) wsptr[7];
02481 
02482     tmp13 = z2 - z4;
02483     tmp15 = MULTIPLY(z1 + tmp13, FIX(0.831253876));         /* c9 */
02484     tmp11 = tmp15 + MULTIPLY(z1, FIX(0.513743148));         /* c3-c9 */
02485     tmp14 = tmp15 - MULTIPLY(tmp13, FIX(2.176250899));      /* c3+c9 */
02486 
02487     tmp13 = MULTIPLY(z2, - FIX(0.831253876));               /* -c9 */
02488     tmp15 = MULTIPLY(z2, - FIX(1.344997024));               /* -c3 */
02489     z2 = z1 - z4;
02490     tmp12 = z3 + MULTIPLY(z2, FIX(1.406466353));            /* c1 */
02491 
02492     tmp10 = tmp12 + MULTIPLY(z4, FIX(2.457431844)) - tmp15; /* c1+c7 */
02493     tmp16 = tmp12 - MULTIPLY(z1, FIX(1.112434820)) + tmp13; /* c1-c13 */
02494     tmp12 = MULTIPLY(z2, FIX(1.224744871)) - z3;            /* c5 */
02495     z2 = MULTIPLY(z1 + z4, FIX(0.575212477));               /* c11 */
02496     tmp13 += z2 + MULTIPLY(z1, FIX(0.475753014)) - z3;      /* c7-c11 */
02497     tmp15 += z2 - MULTIPLY(z4, FIX(0.869244010)) + z3;      /* c11+c13 */
02498 
02499     /* Final output stage */
02500 
02501     outptr[0]  = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
02502                            CONST_BITS+PASS1_BITS+3)
02503                  & RANGE_MASK];
02504     outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
02505                            CONST_BITS+PASS1_BITS+3)
02506                  & RANGE_MASK];
02507     outptr[1]  = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
02508                            CONST_BITS+PASS1_BITS+3)
02509                  & RANGE_MASK];
02510     outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
02511                            CONST_BITS+PASS1_BITS+3)
02512                  & RANGE_MASK];
02513     outptr[2]  = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
02514                            CONST_BITS+PASS1_BITS+3)
02515                  & RANGE_MASK];
02516     outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
02517                            CONST_BITS+PASS1_BITS+3)
02518                  & RANGE_MASK];
02519     outptr[3]  = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
02520                            CONST_BITS+PASS1_BITS+3)
02521                  & RANGE_MASK];
02522     outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
02523                            CONST_BITS+PASS1_BITS+3)
02524                  & RANGE_MASK];
02525     outptr[4]  = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
02526                            CONST_BITS+PASS1_BITS+3)
02527                  & RANGE_MASK];
02528     outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
02529                            CONST_BITS+PASS1_BITS+3)
02530                  & RANGE_MASK];
02531     outptr[5]  = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
02532                            CONST_BITS+PASS1_BITS+3)
02533                  & RANGE_MASK];
02534     outptr[9]  = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
02535                            CONST_BITS+PASS1_BITS+3)
02536                  & RANGE_MASK];
02537     outptr[6]  = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16,
02538                            CONST_BITS+PASS1_BITS+3)
02539                  & RANGE_MASK];
02540     outptr[8]  = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16,
02541                            CONST_BITS+PASS1_BITS+3)
02542                  & RANGE_MASK];
02543     outptr[7]  = range_limit[(int) RIGHT_SHIFT(tmp27,
02544                            CONST_BITS+PASS1_BITS+3)
02545                  & RANGE_MASK];
02546 
02547     wsptr += 8;     /* advance pointer to next row */
02548   }
02549 }
02550 
02551 
02552 /*
02553  * Perform dequantization and inverse DCT on one block of coefficients,
02554  * producing a 16x16 output block.
02555  *
02556  * Optimized algorithm with 28 multiplications in the 1-D kernel.
02557  * cK represents sqrt(2) * cos(K*pi/32).
02558  */
02559 
02560 GLOBAL(void)
02561 jpeg_idct_16x16 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
02562          JCOEFPTR coef_block,
02563          JSAMPARRAY output_buf, JDIMENSION output_col)
02564 {
02565   INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13;
02566   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27;
02567   INT32 z1, z2, z3, z4;
02568   JCOEFPTR inptr;
02569   ISLOW_MULT_TYPE * quantptr;
02570   int * wsptr;
02571   JSAMPROW outptr;
02572   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
02573   int ctr;
02574   int workspace[8*16];  /* buffers data between passes */
02575   SHIFT_TEMPS
02576 
02577   /* Pass 1: process columns from input, store into work array. */
02578 
02579   inptr = coef_block;
02580   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
02581   wsptr = workspace;
02582   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
02583     /* Even part */
02584 
02585     tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
02586     tmp0 <<= CONST_BITS;
02587     /* Add fudge factor here for final descale. */
02588     tmp0 += 1 << (CONST_BITS-PASS1_BITS-1);
02589 
02590     z1 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
02591     tmp1 = MULTIPLY(z1, FIX(1.306562965));      /* c4[16] = c2[8] */
02592     tmp2 = MULTIPLY(z1, FIX_0_541196100);       /* c12[16] = c6[8] */
02593 
02594     tmp10 = tmp0 + tmp1;
02595     tmp11 = tmp0 - tmp1;
02596     tmp12 = tmp0 + tmp2;
02597     tmp13 = tmp0 - tmp2;
02598 
02599     z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
02600     z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
02601     z3 = z1 - z2;
02602     z4 = MULTIPLY(z3, FIX(0.275899379));        /* c14[16] = c7[8] */
02603     z3 = MULTIPLY(z3, FIX(1.387039845));        /* c2[16] = c1[8] */
02604 
02605     tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447);  /* (c6+c2)[16] = (c3+c1)[8] */
02606     tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223);  /* (c6-c14)[16] = (c3-c7)[8] */
02607     tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */
02608     tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] */
02609 
02610     tmp20 = tmp10 + tmp0;
02611     tmp27 = tmp10 - tmp0;
02612     tmp21 = tmp12 + tmp1;
02613     tmp26 = tmp12 - tmp1;
02614     tmp22 = tmp13 + tmp2;
02615     tmp25 = tmp13 - tmp2;
02616     tmp23 = tmp11 + tmp3;
02617     tmp24 = tmp11 - tmp3;
02618 
02619     /* Odd part */
02620 
02621     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
02622     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
02623     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
02624     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
02625 
02626     tmp11 = z1 + z3;
02627 
02628     tmp1  = MULTIPLY(z1 + z2, FIX(1.353318001));   /* c3 */
02629     tmp2  = MULTIPLY(tmp11,   FIX(1.247225013));   /* c5 */
02630     tmp3  = MULTIPLY(z1 + z4, FIX(1.093201867));   /* c7 */
02631     tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586));   /* c9 */
02632     tmp11 = MULTIPLY(tmp11,   FIX(0.666655658));   /* c11 */
02633     tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528));   /* c13 */
02634     tmp0  = tmp1 + tmp2 + tmp3 -
02635         MULTIPLY(z1, FIX(2.286341144));        /* c7+c5+c3-c1 */
02636     tmp13 = tmp10 + tmp11 + tmp12 -
02637         MULTIPLY(z1, FIX(1.835730603));        /* c9+c11+c13-c15 */
02638     z1    = MULTIPLY(z2 + z3, FIX(0.138617169));   /* c15 */
02639     tmp1  += z1 + MULTIPLY(z2, FIX(0.071888074));  /* c9+c11-c3-c15 */
02640     tmp2  += z1 - MULTIPLY(z3, FIX(1.125726048));  /* c5+c7+c15-c3 */
02641     z1    = MULTIPLY(z3 - z2, FIX(1.407403738));   /* c1 */
02642     tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282));  /* c1+c11-c9-c13 */
02643     tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411));  /* c1+c5+c13-c7 */
02644     z2    += z4;
02645     z1    = MULTIPLY(z2, - FIX(0.666655658));      /* -c11 */
02646     tmp1  += z1;
02647     tmp3  += z1 + MULTIPLY(z4, FIX(1.065388962));  /* c3+c11+c15-c7 */
02648     z2    = MULTIPLY(z2, - FIX(1.247225013));      /* -c5 */
02649     tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809));  /* c1+c5+c9-c13 */
02650     tmp12 += z2;
02651     z2    = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */
02652     tmp2  += z2;
02653     tmp3  += z2;
02654     z2    = MULTIPLY(z4 - z3, FIX(0.410524528));   /* c13 */
02655     tmp10 += z2;
02656     tmp11 += z2;
02657 
02658     /* Final output stage */
02659 
02660     wsptr[8*0]  = (int) RIGHT_SHIFT(tmp20 + tmp0,  CONST_BITS-PASS1_BITS);
02661     wsptr[8*15] = (int) RIGHT_SHIFT(tmp20 - tmp0,  CONST_BITS-PASS1_BITS);
02662     wsptr[8*1]  = (int) RIGHT_SHIFT(tmp21 + tmp1,  CONST_BITS-PASS1_BITS);
02663     wsptr[8*14] = (int) RIGHT_SHIFT(tmp21 - tmp1,  CONST_BITS-PASS1_BITS);
02664     wsptr[8*2]  = (int) RIGHT_SHIFT(tmp22 + tmp2,  CONST_BITS-PASS1_BITS);
02665     wsptr[8*13] = (int) RIGHT_SHIFT(tmp22 - tmp2,  CONST_BITS-PASS1_BITS);
02666     wsptr[8*3]  = (int) RIGHT_SHIFT(tmp23 + tmp3,  CONST_BITS-PASS1_BITS);
02667     wsptr[8*12] = (int) RIGHT_SHIFT(tmp23 - tmp3,  CONST_BITS-PASS1_BITS);
02668     wsptr[8*4]  = (int) RIGHT_SHIFT(tmp24 + tmp10, CONST_BITS-PASS1_BITS);
02669     wsptr[8*11] = (int) RIGHT_SHIFT(tmp24 - tmp10, CONST_BITS-PASS1_BITS);
02670     wsptr[8*5]  = (int) RIGHT_SHIFT(tmp25 + tmp11, CONST_BITS-PASS1_BITS);
02671     wsptr[8*10] = (int) RIGHT_SHIFT(tmp25 - tmp11, CONST_BITS-PASS1_BITS);
02672     wsptr[8*6]  = (int) RIGHT_SHIFT(tmp26 + tmp12, CONST_BITS-PASS1_BITS);
02673     wsptr[8*9]  = (int) RIGHT_SHIFT(tmp26 - tmp12, CONST_BITS-PASS1_BITS);
02674     wsptr[8*7]  = (int) RIGHT_SHIFT(tmp27 + tmp13, CONST_BITS-PASS1_BITS);
02675     wsptr[8*8]  = (int) RIGHT_SHIFT(tmp27 - tmp13, CONST_BITS-PASS1_BITS);
02676   }
02677 
02678   /* Pass 2: process 16 rows from work array, store into output array. */
02679 
02680   wsptr = workspace;
02681   for (ctr = 0; ctr < 16; ctr++) {
02682     outptr = output_buf[ctr] + output_col;
02683 
02684     /* Even part */
02685 
02686     /* Add fudge factor here for final descale. */
02687     tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
02688     tmp0 <<= CONST_BITS;
02689 
02690     z1 = (INT32) wsptr[4];
02691     tmp1 = MULTIPLY(z1, FIX(1.306562965));      /* c4[16] = c2[8] */
02692     tmp2 = MULTIPLY(z1, FIX_0_541196100);       /* c12[16] = c6[8] */
02693 
02694     tmp10 = tmp0 + tmp1;
02695     tmp11 = tmp0 - tmp1;
02696     tmp12 = tmp0 + tmp2;
02697     tmp13 = tmp0 - tmp2;
02698 
02699     z1 = (INT32) wsptr[2];
02700     z2 = (INT32) wsptr[6];
02701     z3 = z1 - z2;
02702     z4 = MULTIPLY(z3, FIX(0.275899379));        /* c14[16] = c7[8] */
02703     z3 = MULTIPLY(z3, FIX(1.387039845));        /* c2[16] = c1[8] */
02704 
02705     tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447);  /* (c6+c2)[16] = (c3+c1)[8] */
02706     tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223);  /* (c6-c14)[16] = (c3-c7)[8] */
02707     tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */
02708     tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] */
02709 
02710     tmp20 = tmp10 + tmp0;
02711     tmp27 = tmp10 - tmp0;
02712     tmp21 = tmp12 + tmp1;
02713     tmp26 = tmp12 - tmp1;
02714     tmp22 = tmp13 + tmp2;
02715     tmp25 = tmp13 - tmp2;
02716     tmp23 = tmp11 + tmp3;
02717     tmp24 = tmp11 - tmp3;
02718 
02719     /* Odd part */
02720 
02721     z1 = (INT32) wsptr[1];
02722     z2 = (INT32) wsptr[3];
02723     z3 = (INT32) wsptr[5];
02724     z4 = (INT32) wsptr[7];
02725 
02726     tmp11 = z1 + z3;
02727 
02728     tmp1  = MULTIPLY(z1 + z2, FIX(1.353318001));   /* c3 */
02729     tmp2  = MULTIPLY(tmp11,   FIX(1.247225013));   /* c5 */
02730     tmp3  = MULTIPLY(z1 + z4, FIX(1.093201867));   /* c7 */
02731     tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586));   /* c9 */
02732     tmp11 = MULTIPLY(tmp11,   FIX(0.666655658));   /* c11 */
02733     tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528));   /* c13 */
02734     tmp0  = tmp1 + tmp2 + tmp3 -
02735         MULTIPLY(z1, FIX(2.286341144));        /* c7+c5+c3-c1 */
02736     tmp13 = tmp10 + tmp11 + tmp12 -
02737         MULTIPLY(z1, FIX(1.835730603));        /* c9+c11+c13-c15 */
02738     z1    = MULTIPLY(z2 + z3, FIX(0.138617169));   /* c15 */
02739     tmp1  += z1 + MULTIPLY(z2, FIX(0.071888074));  /* c9+c11-c3-c15 */
02740     tmp2  += z1 - MULTIPLY(z3, FIX(1.125726048));  /* c5+c7+c15-c3 */
02741     z1    = MULTIPLY(z3 - z2, FIX(1.407403738));   /* c1 */
02742     tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282));  /* c1+c11-c9-c13 */
02743     tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411));  /* c1+c5+c13-c7 */
02744     z2    += z4;
02745     z1    = MULTIPLY(z2, - FIX(0.666655658));      /* -c11 */
02746     tmp1  += z1;
02747     tmp3  += z1 + MULTIPLY(z4, FIX(1.065388962));  /* c3+c11+c15-c7 */
02748     z2    = MULTIPLY(z2, - FIX(1.247225013));      /* -c5 */
02749     tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809));  /* c1+c5+c9-c13 */
02750     tmp12 += z2;
02751     z2    = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */
02752     tmp2  += z2;
02753     tmp3  += z2;
02754     z2    = MULTIPLY(z4 - z3, FIX(0.410524528));   /* c13 */
02755     tmp10 += z2;
02756     tmp11 += z2;
02757 
02758     /* Final output stage */
02759 
02760     outptr[0]  = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp0,
02761                            CONST_BITS+PASS1_BITS+3)
02762                  & RANGE_MASK];
02763     outptr[15] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp0,
02764                            CONST_BITS+PASS1_BITS+3)
02765                  & RANGE_MASK];
02766     outptr[1]  = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp1,
02767                            CONST_BITS+PASS1_BITS+3)
02768                  & RANGE_MASK];
02769     outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp1,
02770                            CONST_BITS+PASS1_BITS+3)
02771                  & RANGE_MASK];
02772     outptr[2]  = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp2,
02773                            CONST_BITS+PASS1_BITS+3)
02774                  & RANGE_MASK];
02775     outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp2,
02776                            CONST_BITS+PASS1_BITS+3)
02777                  & RANGE_MASK];
02778     outptr[3]  = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp3,
02779                            CONST_BITS+PASS1_BITS+3)
02780                  & RANGE_MASK];
02781     outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp3,
02782                            CONST_BITS+PASS1_BITS+3)
02783                  & RANGE_MASK];
02784     outptr[4]  = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp10,
02785                            CONST_BITS+PASS1_BITS+3)
02786                  & RANGE_MASK];
02787     outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp10,
02788                            CONST_BITS+PASS1_BITS+3)
02789                  & RANGE_MASK];
02790     outptr[5]  = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp11,
02791                            CONST_BITS+PASS1_BITS+3)
02792                  & RANGE_MASK];
02793     outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp11,
02794                            CONST_BITS+PASS1_BITS+3)
02795                  & RANGE_MASK];
02796     outptr[6]  = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp12,
02797                            CONST_BITS+PASS1_BITS+3)
02798                  & RANGE_MASK];
02799     outptr[9]  = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp12,
02800                            CONST_BITS+PASS1_BITS+3)
02801                  & RANGE_MASK];
02802     outptr[7]  = range_limit[(int) RIGHT_SHIFT(tmp27 + tmp13,
02803                            CONST_BITS+PASS1_BITS+3)
02804                  & RANGE_MASK];
02805     outptr[8]  = range_limit[(int) RIGHT_SHIFT(tmp27 - tmp13,
02806                            CONST_BITS+PASS1_BITS+3)
02807                  & RANGE_MASK];
02808 
02809     wsptr += 8;     /* advance pointer to next row */
02810   }
02811 }
02812 
02813 
02814 /*
02815  * Perform dequantization and inverse DCT on one block of coefficients,
02816  * producing a 16x8 output block.
02817  *
02818  * 8-point IDCT in pass 1 (columns), 16-point in pass 2 (rows).
02819  */
02820 
02821 GLOBAL(void)
02822 jpeg_idct_16x8 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
02823         JCOEFPTR coef_block,
02824         JSAMPARRAY output_buf, JDIMENSION output_col)
02825 {
02826   INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13;
02827   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27;
02828   INT32 z1, z2, z3, z4;
02829   JCOEFPTR inptr;
02830   ISLOW_MULT_TYPE * quantptr;
02831   int * wsptr;
02832   JSAMPROW outptr;
02833   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
02834   int ctr;
02835   int workspace[8*8];   /* buffers data between passes */
02836   SHIFT_TEMPS
02837 
02838   /* Pass 1: process columns from input, store into work array. */
02839   /* Note results are scaled up by sqrt(8) compared to a true IDCT; */
02840   /* furthermore, we scale the results by 2**PASS1_BITS. */
02841 
02842   inptr = coef_block;
02843   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
02844   wsptr = workspace;
02845   for (ctr = DCTSIZE; ctr > 0; ctr--) {
02846     /* Due to quantization, we will usually find that many of the input
02847      * coefficients are zero, especially the AC terms.  We can exploit this
02848      * by short-circuiting the IDCT calculation for any column in which all
02849      * the AC terms are zero.  In that case each output is equal to the
02850      * DC coefficient (with scale factor as needed).
02851      * With typical images and quantization tables, half or more of the
02852      * column DCT calculations can be simplified this way.
02853      */
02854     
02855     if (inptr[DCTSIZE*1] == 0 && inptr[DCTSIZE*2] == 0 &&
02856     inptr[DCTSIZE*3] == 0 && inptr[DCTSIZE*4] == 0 &&
02857     inptr[DCTSIZE*5] == 0 && inptr[DCTSIZE*6] == 0 &&
02858     inptr[DCTSIZE*7] == 0) {
02859       /* AC terms all zero */
02860       int dcval = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]) << PASS1_BITS;
02861       
02862       wsptr[DCTSIZE*0] = dcval;
02863       wsptr[DCTSIZE*1] = dcval;
02864       wsptr[DCTSIZE*2] = dcval;
02865       wsptr[DCTSIZE*3] = dcval;
02866       wsptr[DCTSIZE*4] = dcval;
02867       wsptr[DCTSIZE*5] = dcval;
02868       wsptr[DCTSIZE*6] = dcval;
02869       wsptr[DCTSIZE*7] = dcval;
02870       
02871       inptr++;          /* advance pointers to next column */
02872       quantptr++;
02873       wsptr++;
02874       continue;
02875     }
02876     
02877     /* Even part: reverse the even part of the forward DCT. */
02878     /* The rotator is sqrt(2)*c(-6). */
02879     
02880     z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
02881     z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
02882     
02883     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);
02884     tmp2 = z1 + MULTIPLY(z2, FIX_0_765366865);
02885     tmp3 = z1 - MULTIPLY(z3, FIX_1_847759065);
02886     
02887     z2 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
02888     z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
02889     z2 <<= CONST_BITS;
02890     z3 <<= CONST_BITS;
02891     /* Add fudge factor here for final descale. */
02892     z2 += ONE << (CONST_BITS-PASS1_BITS-1);
02893 
02894     tmp0 = z2 + z3;
02895     tmp1 = z2 - z3;
02896     
02897     tmp10 = tmp0 + tmp2;
02898     tmp13 = tmp0 - tmp2;
02899     tmp11 = tmp1 + tmp3;
02900     tmp12 = tmp1 - tmp3;
02901     
02902     /* Odd part per figure 8; the matrix is unitary and hence its
02903      * transpose is its inverse.  i0..i3 are y7,y5,y3,y1 respectively.
02904      */
02905     
02906     tmp0 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
02907     tmp1 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
02908     tmp2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
02909     tmp3 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
02910     
02911     z2 = tmp0 + tmp2;
02912     z3 = tmp1 + tmp3;
02913 
02914     z1 = MULTIPLY(z2 + z3, FIX_1_175875602); /* sqrt(2) * c3 */
02915     z2 = MULTIPLY(z2, - FIX_1_961570560); /* sqrt(2) * (-c3-c5) */
02916     z3 = MULTIPLY(z3, - FIX_0_390180644); /* sqrt(2) * (c5-c3) */
02917     z2 += z1;
02918     z3 += z1;
02919 
02920     z1 = MULTIPLY(tmp0 + tmp3, - FIX_0_899976223); /* sqrt(2) * (c7-c3) */
02921     tmp0 = MULTIPLY(tmp0, FIX_0_298631336); /* sqrt(2) * (-c1+c3+c5-c7) */
02922     tmp3 = MULTIPLY(tmp3, FIX_1_501321110); /* sqrt(2) * ( c1+c3-c5-c7) */
02923     tmp0 += z1 + z2;
02924     tmp3 += z1 + z3;
02925 
02926     z1 = MULTIPLY(tmp1 + tmp2, - FIX_2_562915447); /* sqrt(2) * (-c1-c3) */
02927     tmp1 = MULTIPLY(tmp1, FIX_2_053119869); /* sqrt(2) * ( c1+c3-c5+c7) */
02928     tmp2 = MULTIPLY(tmp2, FIX_3_072711026); /* sqrt(2) * ( c1+c3+c5-c7) */
02929     tmp1 += z1 + z3;
02930     tmp2 += z1 + z2;
02931     
02932     /* Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 */
02933     
02934     wsptr[DCTSIZE*0] = (int) RIGHT_SHIFT(tmp10 + tmp3, CONST_BITS-PASS1_BITS);
02935     wsptr[DCTSIZE*7] = (int) RIGHT_SHIFT(tmp10 - tmp3, CONST_BITS-PASS1_BITS);
02936     wsptr[DCTSIZE*1] = (int) RIGHT_SHIFT(tmp11 + tmp2, CONST_BITS-PASS1_BITS);
02937     wsptr[DCTSIZE*6] = (int) RIGHT_SHIFT(tmp11 - tmp2, CONST_BITS-PASS1_BITS);
02938     wsptr[DCTSIZE*2] = (int) RIGHT_SHIFT(tmp12 + tmp1, CONST_BITS-PASS1_BITS);
02939     wsptr[DCTSIZE*5] = (int) RIGHT_SHIFT(tmp12 - tmp1, CONST_BITS-PASS1_BITS);
02940     wsptr[DCTSIZE*3] = (int) RIGHT_SHIFT(tmp13 + tmp0, CONST_BITS-PASS1_BITS);
02941     wsptr[DCTSIZE*4] = (int) RIGHT_SHIFT(tmp13 - tmp0, CONST_BITS-PASS1_BITS);
02942     
02943     inptr++;            /* advance pointers to next column */
02944     quantptr++;
02945     wsptr++;
02946   }
02947 
02948   /* Pass 2: process 8 rows from work array, store into output array.
02949    * 16-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/32).
02950    */
02951   wsptr = workspace;
02952   for (ctr = 0; ctr < 8; ctr++) {
02953     outptr = output_buf[ctr] + output_col;
02954 
02955     /* Even part */
02956 
02957     /* Add fudge factor here for final descale. */
02958     tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
02959     tmp0 <<= CONST_BITS;
02960 
02961     z1 = (INT32) wsptr[4];
02962     tmp1 = MULTIPLY(z1, FIX(1.306562965));      /* c4[16] = c2[8] */
02963     tmp2 = MULTIPLY(z1, FIX_0_541196100);       /* c12[16] = c6[8] */
02964 
02965     tmp10 = tmp0 + tmp1;
02966     tmp11 = tmp0 - tmp1;
02967     tmp12 = tmp0 + tmp2;
02968     tmp13 = tmp0 - tmp2;
02969 
02970     z1 = (INT32) wsptr[2];
02971     z2 = (INT32) wsptr[6];
02972     z3 = z1 - z2;
02973     z4 = MULTIPLY(z3, FIX(0.275899379));        /* c14[16] = c7[8] */
02974     z3 = MULTIPLY(z3, FIX(1.387039845));        /* c2[16] = c1[8] */
02975 
02976     tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447);  /* (c6+c2)[16] = (c3+c1)[8] */
02977     tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223);  /* (c6-c14)[16] = (c3-c7)[8] */
02978     tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */
02979     tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] */
02980 
02981     tmp20 = tmp10 + tmp0;
02982     tmp27 = tmp10 - tmp0;
02983     tmp21 = tmp12 + tmp1;
02984     tmp26 = tmp12 - tmp1;
02985     tmp22 = tmp13 + tmp2;
02986     tmp25 = tmp13 - tmp2;
02987     tmp23 = tmp11 + tmp3;
02988     tmp24 = tmp11 - tmp3;
02989 
02990     /* Odd part */
02991 
02992     z1 = (INT32) wsptr[1];
02993     z2 = (INT32) wsptr[3];
02994     z3 = (INT32) wsptr[5];
02995     z4 = (INT32) wsptr[7];
02996 
02997     tmp11 = z1 + z3;
02998 
02999     tmp1  = MULTIPLY(z1 + z2, FIX(1.353318001));   /* c3 */
03000     tmp2  = MULTIPLY(tmp11,   FIX(1.247225013));   /* c5 */
03001     tmp3  = MULTIPLY(z1 + z4, FIX(1.093201867));   /* c7 */
03002     tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586));   /* c9 */
03003     tmp11 = MULTIPLY(tmp11,   FIX(0.666655658));   /* c11 */
03004     tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528));   /* c13 */
03005     tmp0  = tmp1 + tmp2 + tmp3 -
03006         MULTIPLY(z1, FIX(2.286341144));        /* c7+c5+c3-c1 */
03007     tmp13 = tmp10 + tmp11 + tmp12 -
03008         MULTIPLY(z1, FIX(1.835730603));        /* c9+c11+c13-c15 */
03009     z1    = MULTIPLY(z2 + z3, FIX(0.138617169));   /* c15 */
03010     tmp1  += z1 + MULTIPLY(z2, FIX(0.071888074));  /* c9+c11-c3-c15 */
03011     tmp2  += z1 - MULTIPLY(z3, FIX(1.125726048));  /* c5+c7+c15-c3 */
03012     z1    = MULTIPLY(z3 - z2, FIX(1.407403738));   /* c1 */
03013     tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282));  /* c1+c11-c9-c13 */
03014     tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411));  /* c1+c5+c13-c7 */
03015     z2    += z4;
03016     z1    = MULTIPLY(z2, - FIX(0.666655658));      /* -c11 */
03017     tmp1  += z1;
03018     tmp3  += z1 + MULTIPLY(z4, FIX(1.065388962));  /* c3+c11+c15-c7 */
03019     z2    = MULTIPLY(z2, - FIX(1.247225013));      /* -c5 */
03020     tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809));  /* c1+c5+c9-c13 */
03021     tmp12 += z2;
03022     z2    = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */
03023     tmp2  += z2;
03024     tmp3  += z2;
03025     z2    = MULTIPLY(z4 - z3, FIX(0.410524528));   /* c13 */
03026     tmp10 += z2;
03027     tmp11 += z2;
03028 
03029     /* Final output stage */
03030 
03031     outptr[0]  = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp0,
03032                            CONST_BITS+PASS1_BITS+3)
03033                  & RANGE_MASK];
03034     outptr[15] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp0,
03035                            CONST_BITS+PASS1_BITS+3)
03036                  & RANGE_MASK];
03037     outptr[1]  = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp1,
03038                            CONST_BITS+PASS1_BITS+3)
03039                  & RANGE_MASK];
03040     outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp1,
03041                            CONST_BITS+PASS1_BITS+3)
03042                  & RANGE_MASK];
03043     outptr[2]  = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp2,
03044                            CONST_BITS+PASS1_BITS+3)
03045                  & RANGE_MASK];
03046     outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp2,
03047                            CONST_BITS+PASS1_BITS+3)
03048                  & RANGE_MASK];
03049     outptr[3]  = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp3,
03050                            CONST_BITS+PASS1_BITS+3)
03051                  & RANGE_MASK];
03052     outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp3,
03053                            CONST_BITS+PASS1_BITS+3)
03054                  & RANGE_MASK];
03055     outptr[4]  = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp10,
03056                            CONST_BITS+PASS1_BITS+3)
03057                  & RANGE_MASK];
03058     outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp10,
03059                            CONST_BITS+PASS1_BITS+3)
03060                  & RANGE_MASK];
03061     outptr[5]  = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp11,
03062                            CONST_BITS+PASS1_BITS+3)
03063                  & RANGE_MASK];
03064     outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp11,
03065                            CONST_BITS+PASS1_BITS+3)
03066                  & RANGE_MASK];
03067     outptr[6]  = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp12,
03068                            CONST_BITS+PASS1_BITS+3)
03069                  & RANGE_MASK];
03070     outptr[9]  = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp12,
03071                            CONST_BITS+PASS1_BITS+3)
03072                  & RANGE_MASK];
03073     outptr[7]  = range_limit[(int) RIGHT_SHIFT(tmp27 + tmp13,
03074                            CONST_BITS+PASS1_BITS+3)
03075                  & RANGE_MASK];
03076     outptr[8]  = range_limit[(int) RIGHT_SHIFT(tmp27 - tmp13,
03077                            CONST_BITS+PASS1_BITS+3)
03078                  & RANGE_MASK];
03079 
03080     wsptr += 8;     /* advance pointer to next row */
03081   }
03082 }
03083 
03084 
03085 /*
03086  * Perform dequantization and inverse DCT on one block of coefficients,
03087  * producing a 14x7 output block.
03088  *
03089  * 7-point IDCT in pass 1 (columns), 14-point in pass 2 (rows).
03090  */
03091 
03092 GLOBAL(void)
03093 jpeg_idct_14x7 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
03094         JCOEFPTR coef_block,
03095         JSAMPARRAY output_buf, JDIMENSION output_col)
03096 {
03097   INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16;
03098   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26;
03099   INT32 z1, z2, z3, z4;
03100   JCOEFPTR inptr;
03101   ISLOW_MULT_TYPE * quantptr;
03102   int * wsptr;
03103   JSAMPROW outptr;
03104   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
03105   int ctr;
03106   int workspace[8*7];   /* buffers data between passes */
03107   SHIFT_TEMPS
03108 
03109   /* Pass 1: process columns from input, store into work array.
03110    * 7-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/14).
03111    */
03112   inptr = coef_block;
03113   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
03114   wsptr = workspace;
03115   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
03116     /* Even part */
03117 
03118     tmp23 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
03119     tmp23 <<= CONST_BITS;
03120     /* Add fudge factor here for final descale. */
03121     tmp23 += ONE << (CONST_BITS-PASS1_BITS-1);
03122 
03123     z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
03124     z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
03125     z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
03126 
03127     tmp20 = MULTIPLY(z2 - z3, FIX(0.881747734));       /* c4 */
03128     tmp22 = MULTIPLY(z1 - z2, FIX(0.314692123));       /* c6 */
03129     tmp21 = tmp20 + tmp22 + tmp23 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6 */
03130     tmp10 = z1 + z3;
03131     z2 -= tmp10;
03132     tmp10 = MULTIPLY(tmp10, FIX(1.274162392)) + tmp23; /* c2 */
03133     tmp20 += tmp10 - MULTIPLY(z3, FIX(0.077722536));   /* c2-c4-c6 */
03134     tmp22 += tmp10 - MULTIPLY(z1, FIX(2.470602249));   /* c2+c4+c6 */
03135     tmp23 += MULTIPLY(z2, FIX(1.414213562));           /* c0 */
03136 
03137     /* Odd part */
03138 
03139     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
03140     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
03141     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
03142 
03143     tmp11 = MULTIPLY(z1 + z2, FIX(0.935414347));       /* (c3+c1-c5)/2 */
03144     tmp12 = MULTIPLY(z1 - z2, FIX(0.170262339));       /* (c3+c5-c1)/2 */
03145     tmp10 = tmp11 - tmp12;
03146     tmp11 += tmp12;
03147     tmp12 = MULTIPLY(z2 + z3, - FIX(1.378756276));     /* -c1 */
03148     tmp11 += tmp12;
03149     z2 = MULTIPLY(z1 + z3, FIX(0.613604268));          /* c5 */
03150     tmp10 += z2;
03151     tmp12 += z2 + MULTIPLY(z3, FIX(1.870828693));      /* c3+c1-c5 */
03152 
03153     /* Final output stage */
03154 
03155     wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
03156     wsptr[8*6] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
03157     wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
03158     wsptr[8*5] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
03159     wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
03160     wsptr[8*4] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
03161     wsptr[8*3] = (int) RIGHT_SHIFT(tmp23, CONST_BITS-PASS1_BITS);
03162   }
03163 
03164   /* Pass 2: process 7 rows from work array, store into output array.
03165    * 14-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/28).
03166    */
03167   wsptr = workspace;
03168   for (ctr = 0; ctr < 7; ctr++) {
03169     outptr = output_buf[ctr] + output_col;
03170 
03171     /* Even part */
03172 
03173     /* Add fudge factor here for final descale. */
03174     z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
03175     z1 <<= CONST_BITS;
03176     z4 = (INT32) wsptr[4];
03177     z2 = MULTIPLY(z4, FIX(1.274162392));         /* c4 */
03178     z3 = MULTIPLY(z4, FIX(0.314692123));         /* c12 */
03179     z4 = MULTIPLY(z4, FIX(0.881747734));         /* c8 */
03180 
03181     tmp10 = z1 + z2;
03182     tmp11 = z1 + z3;
03183     tmp12 = z1 - z4;
03184 
03185     tmp23 = z1 - ((z2 + z3 - z4) << 1);          /* c0 = (c4+c12-c8)*2 */
03186 
03187     z1 = (INT32) wsptr[2];
03188     z2 = (INT32) wsptr[6];
03189 
03190     z3 = MULTIPLY(z1 + z2, FIX(1.105676686));    /* c6 */
03191 
03192     tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */
03193     tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */
03194     tmp15 = MULTIPLY(z1, FIX(0.613604268)) -     /* c10 */
03195         MULTIPLY(z2, FIX(1.378756276));      /* c2 */
03196 
03197     tmp20 = tmp10 + tmp13;
03198     tmp26 = tmp10 - tmp13;
03199     tmp21 = tmp11 + tmp14;
03200     tmp25 = tmp11 - tmp14;
03201     tmp22 = tmp12 + tmp15;
03202     tmp24 = tmp12 - tmp15;
03203 
03204     /* Odd part */
03205 
03206     z1 = (INT32) wsptr[1];
03207     z2 = (INT32) wsptr[3];
03208     z3 = (INT32) wsptr[5];
03209     z4 = (INT32) wsptr[7];
03210     z4 <<= CONST_BITS;
03211 
03212     tmp14 = z1 + z3;
03213     tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607));           /* c3 */
03214     tmp12 = MULTIPLY(tmp14, FIX(1.197448846));             /* c5 */
03215     tmp10 = tmp11 + tmp12 + z4 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */
03216     tmp14 = MULTIPLY(tmp14, FIX(0.752406978));             /* c9 */
03217     tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426));        /* c9+c11-c13 */
03218     z1    -= z2;
03219     tmp15 = MULTIPLY(z1, FIX(0.467085129)) - z4;           /* c11 */
03220     tmp16 += tmp15;
03221     tmp13 = MULTIPLY(z2 + z3, - FIX(0.158341681)) - z4;    /* -c13 */
03222     tmp11 += tmp13 - MULTIPLY(z2, FIX(0.424103948));       /* c3-c9-c13 */
03223     tmp12 += tmp13 - MULTIPLY(z3, FIX(2.373959773));       /* c3+c5-c13 */
03224     tmp13 = MULTIPLY(z3 - z2, FIX(1.405321284));           /* c1 */
03225     tmp14 += tmp13 + z4 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */
03226     tmp15 += tmp13 + MULTIPLY(z2, FIX(0.674957567));       /* c1+c11-c5 */
03227 
03228     tmp13 = ((z1 - z3) << CONST_BITS) + z4;
03229 
03230     /* Final output stage */
03231 
03232     outptr[0]  = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
03233                            CONST_BITS+PASS1_BITS+3)
03234                  & RANGE_MASK];
03235     outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
03236                            CONST_BITS+PASS1_BITS+3)
03237                  & RANGE_MASK];
03238     outptr[1]  = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
03239                            CONST_BITS+PASS1_BITS+3)
03240                  & RANGE_MASK];
03241     outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
03242                            CONST_BITS+PASS1_BITS+3)
03243                  & RANGE_MASK];
03244     outptr[2]  = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
03245                            CONST_BITS+PASS1_BITS+3)
03246                  & RANGE_MASK];
03247     outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
03248                            CONST_BITS+PASS1_BITS+3)
03249                  & RANGE_MASK];
03250     outptr[3]  = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
03251                            CONST_BITS+PASS1_BITS+3)
03252                  & RANGE_MASK];
03253     outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
03254                            CONST_BITS+PASS1_BITS+3)
03255                  & RANGE_MASK];
03256     outptr[4]  = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
03257                            CONST_BITS+PASS1_BITS+3)
03258                  & RANGE_MASK];
03259     outptr[9]  = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
03260                            CONST_BITS+PASS1_BITS+3)
03261                  & RANGE_MASK];
03262     outptr[5]  = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
03263                            CONST_BITS+PASS1_BITS+3)
03264                  & RANGE_MASK];
03265     outptr[8]  = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
03266                            CONST_BITS+PASS1_BITS+3)
03267                  & RANGE_MASK];
03268     outptr[6]  = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16,
03269                            CONST_BITS+PASS1_BITS+3)
03270                  & RANGE_MASK];
03271     outptr[7]  = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16,
03272                            CONST_BITS+PASS1_BITS+3)
03273                  & RANGE_MASK];
03274 
03275     wsptr += 8;     /* advance pointer to next row */
03276   }
03277 }
03278 
03279 
03280 /*
03281  * Perform dequantization and inverse DCT on one block of coefficients,
03282  * producing a 12x6 output block.
03283  *
03284  * 6-point IDCT in pass 1 (columns), 12-point in pass 2 (rows).
03285  */
03286 
03287 GLOBAL(void)
03288 jpeg_idct_12x6 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
03289         JCOEFPTR coef_block,
03290         JSAMPARRAY output_buf, JDIMENSION output_col)
03291 {
03292   INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15;
03293   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25;
03294   INT32 z1, z2, z3, z4;
03295   JCOEFPTR inptr;
03296   ISLOW_MULT_TYPE * quantptr;
03297   int * wsptr;
03298   JSAMPROW outptr;
03299   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
03300   int ctr;
03301   int workspace[8*6];   /* buffers data between passes */
03302   SHIFT_TEMPS
03303 
03304   /* Pass 1: process columns from input, store into work array.
03305    * 6-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/12).
03306    */
03307   inptr = coef_block;
03308   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
03309   wsptr = workspace;
03310   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
03311     /* Even part */
03312 
03313     tmp10 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
03314     tmp10 <<= CONST_BITS;
03315     /* Add fudge factor here for final descale. */
03316     tmp10 += ONE << (CONST_BITS-PASS1_BITS-1);
03317     tmp12 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
03318     tmp20 = MULTIPLY(tmp12, FIX(0.707106781));   /* c4 */
03319     tmp11 = tmp10 + tmp20;
03320     tmp21 = RIGHT_SHIFT(tmp10 - tmp20 - tmp20, CONST_BITS-PASS1_BITS);
03321     tmp20 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
03322     tmp10 = MULTIPLY(tmp20, FIX(1.224744871));   /* c2 */
03323     tmp20 = tmp11 + tmp10;
03324     tmp22 = tmp11 - tmp10;
03325 
03326     /* Odd part */
03327 
03328     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
03329     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
03330     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
03331     tmp11 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */
03332     tmp10 = tmp11 + ((z1 + z2) << CONST_BITS);
03333     tmp12 = tmp11 + ((z3 - z2) << CONST_BITS);
03334     tmp11 = (z1 - z2 - z3) << PASS1_BITS;
03335 
03336     /* Final output stage */
03337 
03338     wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
03339     wsptr[8*5] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
03340     wsptr[8*1] = (int) (tmp21 + tmp11);
03341     wsptr[8*4] = (int) (tmp21 - tmp11);
03342     wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
03343     wsptr[8*3] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
03344   }
03345 
03346   /* Pass 2: process 6 rows from work array, store into output array.
03347    * 12-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/24).
03348    */
03349   wsptr = workspace;
03350   for (ctr = 0; ctr < 6; ctr++) {
03351     outptr = output_buf[ctr] + output_col;
03352 
03353     /* Even part */
03354 
03355     /* Add fudge factor here for final descale. */
03356     z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
03357     z3 <<= CONST_BITS;
03358 
03359     z4 = (INT32) wsptr[4];
03360     z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */
03361 
03362     tmp10 = z3 + z4;
03363     tmp11 = z3 - z4;
03364 
03365     z1 = (INT32) wsptr[2];
03366     z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */
03367     z1 <<= CONST_BITS;
03368     z2 = (INT32) wsptr[6];
03369     z2 <<= CONST_BITS;
03370 
03371     tmp12 = z1 - z2;
03372 
03373     tmp21 = z3 + tmp12;
03374     tmp24 = z3 - tmp12;
03375 
03376     tmp12 = z4 + z2;
03377 
03378     tmp20 = tmp10 + tmp12;
03379     tmp25 = tmp10 - tmp12;
03380 
03381     tmp12 = z4 - z1 - z2;
03382 
03383     tmp22 = tmp11 + tmp12;
03384     tmp23 = tmp11 - tmp12;
03385 
03386     /* Odd part */
03387 
03388     z1 = (INT32) wsptr[1];
03389     z2 = (INT32) wsptr[3];
03390     z3 = (INT32) wsptr[5];
03391     z4 = (INT32) wsptr[7];
03392 
03393     tmp11 = MULTIPLY(z2, FIX(1.306562965));                  /* c3 */
03394     tmp14 = MULTIPLY(z2, - FIX_0_541196100);                 /* -c9 */
03395 
03396     tmp10 = z1 + z3;
03397     tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669));          /* c7 */
03398     tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384));       /* c5-c7 */
03399     tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716));  /* c1-c5 */
03400     tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580));           /* -(c7+c11) */
03401     tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */
03402     tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */
03403     tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) -        /* c7-c11 */
03404          MULTIPLY(z4, FIX(1.982889723));                 /* c5+c7 */
03405 
03406     z1 -= z4;
03407     z2 -= z3;
03408     z3 = MULTIPLY(z1 + z2, FIX_0_541196100);                 /* c9 */
03409     tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865);              /* c3-c9 */
03410     tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065);              /* c3+c9 */
03411 
03412     /* Final output stage */
03413 
03414     outptr[0]  = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
03415                            CONST_BITS+PASS1_BITS+3)
03416                  & RANGE_MASK];
03417     outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
03418                            CONST_BITS+PASS1_BITS+3)
03419                  & RANGE_MASK];
03420     outptr[1]  = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
03421                            CONST_BITS+PASS1_BITS+3)
03422                  & RANGE_MASK];
03423     outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
03424                            CONST_BITS+PASS1_BITS+3)
03425                  & RANGE_MASK];
03426     outptr[2]  = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
03427                            CONST_BITS+PASS1_BITS+3)
03428                  & RANGE_MASK];
03429     outptr[9]  = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
03430                            CONST_BITS+PASS1_BITS+3)
03431                  & RANGE_MASK];
03432     outptr[3]  = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
03433                            CONST_BITS+PASS1_BITS+3)
03434                  & RANGE_MASK];
03435     outptr[8]  = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
03436                            CONST_BITS+PASS1_BITS+3)
03437                  & RANGE_MASK];
03438     outptr[4]  = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
03439                            CONST_BITS+PASS1_BITS+3)
03440                  & RANGE_MASK];
03441     outptr[7]  = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
03442                            CONST_BITS+PASS1_BITS+3)
03443                  & RANGE_MASK];
03444     outptr[5]  = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
03445                            CONST_BITS+PASS1_BITS+3)
03446                  & RANGE_MASK];
03447     outptr[6]  = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
03448                            CONST_BITS+PASS1_BITS+3)
03449                  & RANGE_MASK];
03450 
03451     wsptr += 8;     /* advance pointer to next row */
03452   }
03453 }
03454 
03455 
03456 /*
03457  * Perform dequantization and inverse DCT on one block of coefficients,
03458  * producing a 10x5 output block.
03459  *
03460  * 5-point IDCT in pass 1 (columns), 10-point in pass 2 (rows).
03461  */
03462 
03463 GLOBAL(void)
03464 jpeg_idct_10x5 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
03465         JCOEFPTR coef_block,
03466         JSAMPARRAY output_buf, JDIMENSION output_col)
03467 {
03468   INT32 tmp10, tmp11, tmp12, tmp13, tmp14;
03469   INT32 tmp20, tmp21, tmp22, tmp23, tmp24;
03470   INT32 z1, z2, z3, z4;
03471   JCOEFPTR inptr;
03472   ISLOW_MULT_TYPE * quantptr;
03473   int * wsptr;
03474   JSAMPROW outptr;
03475   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
03476   int ctr;
03477   int workspace[8*5];   /* buffers data between passes */
03478   SHIFT_TEMPS
03479 
03480   /* Pass 1: process columns from input, store into work array.
03481    * 5-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/10).
03482    */
03483   inptr = coef_block;
03484   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
03485   wsptr = workspace;
03486   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
03487     /* Even part */
03488 
03489     tmp12 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
03490     tmp12 <<= CONST_BITS;
03491     /* Add fudge factor here for final descale. */
03492     tmp12 += ONE << (CONST_BITS-PASS1_BITS-1);
03493     tmp13 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
03494     tmp14 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
03495     z1 = MULTIPLY(tmp13 + tmp14, FIX(0.790569415)); /* (c2+c4)/2 */
03496     z2 = MULTIPLY(tmp13 - tmp14, FIX(0.353553391)); /* (c2-c4)/2 */
03497     z3 = tmp12 + z2;
03498     tmp10 = z3 + z1;
03499     tmp11 = z3 - z1;
03500     tmp12 -= z2 << 2;
03501 
03502     /* Odd part */
03503 
03504     z2 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
03505     z3 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
03506 
03507     z1 = MULTIPLY(z2 + z3, FIX(0.831253876));       /* c3 */
03508     tmp13 = z1 + MULTIPLY(z2, FIX(0.513743148));    /* c1-c3 */
03509     tmp14 = z1 - MULTIPLY(z3, FIX(2.176250899));    /* c1+c3 */
03510 
03511     /* Final output stage */
03512 
03513     wsptr[8*0] = (int) RIGHT_SHIFT(tmp10 + tmp13, CONST_BITS-PASS1_BITS);
03514     wsptr[8*4] = (int) RIGHT_SHIFT(tmp10 - tmp13, CONST_BITS-PASS1_BITS);
03515     wsptr[8*1] = (int) RIGHT_SHIFT(tmp11 + tmp14, CONST_BITS-PASS1_BITS);
03516     wsptr[8*3] = (int) RIGHT_SHIFT(tmp11 - tmp14, CONST_BITS-PASS1_BITS);
03517     wsptr[8*2] = (int) RIGHT_SHIFT(tmp12, CONST_BITS-PASS1_BITS);
03518   }
03519 
03520   /* Pass 2: process 5 rows from work array, store into output array.
03521    * 10-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/20).
03522    */
03523   wsptr = workspace;
03524   for (ctr = 0; ctr < 5; ctr++) {
03525     outptr = output_buf[ctr] + output_col;
03526 
03527     /* Even part */
03528 
03529     /* Add fudge factor here for final descale. */
03530     z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
03531     z3 <<= CONST_BITS;
03532     z4 = (INT32) wsptr[4];
03533     z1 = MULTIPLY(z4, FIX(1.144122806));         /* c4 */
03534     z2 = MULTIPLY(z4, FIX(0.437016024));         /* c8 */
03535     tmp10 = z3 + z1;
03536     tmp11 = z3 - z2;
03537 
03538     tmp22 = z3 - ((z1 - z2) << 1);               /* c0 = (c4-c8)*2 */
03539 
03540     z2 = (INT32) wsptr[2];
03541     z3 = (INT32) wsptr[6];
03542 
03543     z1 = MULTIPLY(z2 + z3, FIX(0.831253876));    /* c6 */
03544     tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */
03545     tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */
03546 
03547     tmp20 = tmp10 + tmp12;
03548     tmp24 = tmp10 - tmp12;
03549     tmp21 = tmp11 + tmp13;
03550     tmp23 = tmp11 - tmp13;
03551 
03552     /* Odd part */
03553 
03554     z1 = (INT32) wsptr[1];
03555     z2 = (INT32) wsptr[3];
03556     z3 = (INT32) wsptr[5];
03557     z3 <<= CONST_BITS;
03558     z4 = (INT32) wsptr[7];
03559 
03560     tmp11 = z2 + z4;
03561     tmp13 = z2 - z4;
03562 
03563     tmp12 = MULTIPLY(tmp13, FIX(0.309016994));        /* (c3-c7)/2 */
03564 
03565     z2 = MULTIPLY(tmp11, FIX(0.951056516));           /* (c3+c7)/2 */
03566     z4 = z3 + tmp12;
03567 
03568     tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */
03569     tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */
03570 
03571     z2 = MULTIPLY(tmp11, FIX(0.587785252));           /* (c1-c9)/2 */
03572     z4 = z3 - tmp12 - (tmp13 << (CONST_BITS - 1));
03573 
03574     tmp12 = ((z1 - tmp13) << CONST_BITS) - z3;
03575 
03576     tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */
03577     tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */
03578 
03579     /* Final output stage */
03580 
03581     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
03582                           CONST_BITS+PASS1_BITS+3)
03583                 & RANGE_MASK];
03584     outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
03585                           CONST_BITS+PASS1_BITS+3)
03586                 & RANGE_MASK];
03587     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
03588                           CONST_BITS+PASS1_BITS+3)
03589                 & RANGE_MASK];
03590     outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
03591                           CONST_BITS+PASS1_BITS+3)
03592                 & RANGE_MASK];
03593     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
03594                           CONST_BITS+PASS1_BITS+3)
03595                 & RANGE_MASK];
03596     outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
03597                           CONST_BITS+PASS1_BITS+3)
03598                 & RANGE_MASK];
03599     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
03600                           CONST_BITS+PASS1_BITS+3)
03601                 & RANGE_MASK];
03602     outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
03603                           CONST_BITS+PASS1_BITS+3)
03604                 & RANGE_MASK];
03605     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
03606                           CONST_BITS+PASS1_BITS+3)
03607                 & RANGE_MASK];
03608     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
03609                           CONST_BITS+PASS1_BITS+3)
03610                 & RANGE_MASK];
03611 
03612     wsptr += 8;     /* advance pointer to next row */
03613   }
03614 }
03615 
03616 
03617 /*
03618  * Perform dequantization and inverse DCT on one block of coefficients,
03619  * producing a 8x4 output block.
03620  *
03621  * 4-point IDCT in pass 1 (columns), 8-point in pass 2 (rows).
03622  */
03623 
03624 GLOBAL(void)
03625 jpeg_idct_8x4 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
03626            JCOEFPTR coef_block,
03627            JSAMPARRAY output_buf, JDIMENSION output_col)
03628 {
03629   INT32 tmp0, tmp1, tmp2, tmp3;
03630   INT32 tmp10, tmp11, tmp12, tmp13;
03631   INT32 z1, z2, z3;
03632   JCOEFPTR inptr;
03633   ISLOW_MULT_TYPE * quantptr;
03634   int * wsptr;
03635   JSAMPROW outptr;
03636   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
03637   int ctr;
03638   int workspace[8*4];   /* buffers data between passes */
03639   SHIFT_TEMPS
03640 
03641   /* Pass 1: process columns from input, store into work array.
03642    * 4-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/16).
03643    */
03644   inptr = coef_block;
03645   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
03646   wsptr = workspace;
03647   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
03648     /* Even part */
03649 
03650     tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
03651     tmp2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
03652 
03653     tmp10 = (tmp0 + tmp2) << PASS1_BITS;
03654     tmp12 = (tmp0 - tmp2) << PASS1_BITS;
03655 
03656     /* Odd part */
03657     /* Same rotation as in the even part of the 8x8 LL&M IDCT */
03658 
03659     z2 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
03660     z3 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
03661 
03662     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);               /* c6 */
03663     /* Add fudge factor here for final descale. */
03664     z1 += ONE << (CONST_BITS-PASS1_BITS-1);
03665     tmp0 = RIGHT_SHIFT(z1 + MULTIPLY(z2, FIX_0_765366865), /* c2-c6 */
03666                CONST_BITS-PASS1_BITS);
03667     tmp2 = RIGHT_SHIFT(z1 - MULTIPLY(z3, FIX_1_847759065), /* c2+c6 */
03668                CONST_BITS-PASS1_BITS);
03669 
03670     /* Final output stage */
03671 
03672     wsptr[8*0] = (int) (tmp10 + tmp0);
03673     wsptr[8*3] = (int) (tmp10 - tmp0);
03674     wsptr[8*1] = (int) (tmp12 + tmp2);
03675     wsptr[8*2] = (int) (tmp12 - tmp2);
03676   }
03677 
03678   /* Pass 2: process rows from work array, store into output array. */
03679   /* Note that we must descale the results by a factor of 8 == 2**3, */
03680   /* and also undo the PASS1_BITS scaling. */
03681 
03682   wsptr = workspace;
03683   for (ctr = 0; ctr < 4; ctr++) {
03684     outptr = output_buf[ctr] + output_col;
03685 
03686     /* Even part: reverse the even part of the forward DCT. */
03687     /* The rotator is sqrt(2)*c(-6). */
03688 
03689     z2 = (INT32) wsptr[2];
03690     z3 = (INT32) wsptr[6];
03691     
03692     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);
03693     tmp2 = z1 + MULTIPLY(z2, FIX_0_765366865);
03694     tmp3 = z1 - MULTIPLY(z3, FIX_1_847759065);
03695     
03696     /* Add fudge factor here for final descale. */
03697     z2 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
03698     z3 = (INT32) wsptr[4];
03699     
03700     tmp0 = (z2 + z3) << CONST_BITS;
03701     tmp1 = (z2 - z3) << CONST_BITS;
03702     
03703     tmp10 = tmp0 + tmp2;
03704     tmp13 = tmp0 - tmp2;
03705     tmp11 = tmp1 + tmp3;
03706     tmp12 = tmp1 - tmp3;
03707 
03708     /* Odd part per figure 8; the matrix is unitary and hence its
03709      * transpose is its inverse.  i0..i3 are y7,y5,y3,y1 respectively.
03710      */
03711 
03712     tmp0 = (INT32) wsptr[7];
03713     tmp1 = (INT32) wsptr[5];
03714     tmp2 = (INT32) wsptr[3];
03715     tmp3 = (INT32) wsptr[1];
03716 
03717     z2 = tmp0 + tmp2;
03718     z3 = tmp1 + tmp3;
03719 
03720     z1 = MULTIPLY(z2 + z3, FIX_1_175875602); /* sqrt(2) * c3 */
03721     z2 = MULTIPLY(z2, - FIX_1_961570560); /* sqrt(2) * (-c3-c5) */
03722     z3 = MULTIPLY(z3, - FIX_0_390180644); /* sqrt(2) * (c5-c3) */
03723     z2 += z1;
03724     z3 += z1;
03725 
03726     z1 = MULTIPLY(tmp0 + tmp3, - FIX_0_899976223); /* sqrt(2) * (c7-c3) */
03727     tmp0 = MULTIPLY(tmp0, FIX_0_298631336); /* sqrt(2) * (-c1+c3+c5-c7) */
03728     tmp3 = MULTIPLY(tmp3, FIX_1_501321110); /* sqrt(2) * ( c1+c3-c5-c7) */
03729     tmp0 += z1 + z2;
03730     tmp3 += z1 + z3;
03731 
03732     z1 = MULTIPLY(tmp1 + tmp2, - FIX_2_562915447); /* sqrt(2) * (-c1-c3) */
03733     tmp1 = MULTIPLY(tmp1, FIX_2_053119869); /* sqrt(2) * ( c1+c3-c5+c7) */
03734     tmp2 = MULTIPLY(tmp2, FIX_3_072711026); /* sqrt(2) * ( c1+c3+c5-c7) */
03735     tmp1 += z1 + z3;
03736     tmp2 += z1 + z2;
03737 
03738     /* Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 */
03739 
03740     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp3,
03741                           CONST_BITS+PASS1_BITS+3)
03742                 & RANGE_MASK];
03743     outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp3,
03744                           CONST_BITS+PASS1_BITS+3)
03745                 & RANGE_MASK];
03746     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp2,
03747                           CONST_BITS+PASS1_BITS+3)
03748                 & RANGE_MASK];
03749     outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp2,
03750                           CONST_BITS+PASS1_BITS+3)
03751                 & RANGE_MASK];
03752     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp1,
03753                           CONST_BITS+PASS1_BITS+3)
03754                 & RANGE_MASK];
03755     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp1,
03756                           CONST_BITS+PASS1_BITS+3)
03757                 & RANGE_MASK];
03758     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13 + tmp0,
03759                           CONST_BITS+PASS1_BITS+3)
03760                 & RANGE_MASK];
03761     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp13 - tmp0,
03762                           CONST_BITS+PASS1_BITS+3)
03763                 & RANGE_MASK];
03764 
03765     wsptr += DCTSIZE;       /* advance pointer to next row */
03766   }
03767 }
03768 
03769 
03770 /*
03771  * Perform dequantization and inverse DCT on one block of coefficients,
03772  * producing a reduced-size 6x3 output block.
03773  *
03774  * 3-point IDCT in pass 1 (columns), 6-point in pass 2 (rows).
03775  */
03776 
03777 GLOBAL(void)
03778 jpeg_idct_6x3 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
03779            JCOEFPTR coef_block,
03780            JSAMPARRAY output_buf, JDIMENSION output_col)
03781 {
03782   INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12;
03783   INT32 z1, z2, z3;
03784   JCOEFPTR inptr;
03785   ISLOW_MULT_TYPE * quantptr;
03786   int * wsptr;
03787   JSAMPROW outptr;
03788   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
03789   int ctr;
03790   int workspace[6*3];   /* buffers data between passes */
03791   SHIFT_TEMPS
03792 
03793   /* Pass 1: process columns from input, store into work array.
03794    * 3-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/6).
03795    */
03796   inptr = coef_block;
03797   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
03798   wsptr = workspace;
03799   for (ctr = 0; ctr < 6; ctr++, inptr++, quantptr++, wsptr++) {
03800     /* Even part */
03801 
03802     tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
03803     tmp0 <<= CONST_BITS;
03804     /* Add fudge factor here for final descale. */
03805     tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
03806     tmp2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
03807     tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */
03808     tmp10 = tmp0 + tmp12;
03809     tmp2 = tmp0 - tmp12 - tmp12;
03810 
03811     /* Odd part */
03812 
03813     tmp12 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
03814     tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */
03815 
03816     /* Final output stage */
03817 
03818     wsptr[6*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
03819     wsptr[6*2] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
03820     wsptr[6*1] = (int) RIGHT_SHIFT(tmp2, CONST_BITS-PASS1_BITS);
03821   }
03822   
03823   /* Pass 2: process 3 rows from work array, store into output array.
03824    * 6-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/12).
03825    */
03826   wsptr = workspace;
03827   for (ctr = 0; ctr < 3; ctr++) {
03828     outptr = output_buf[ctr] + output_col;
03829 
03830     /* Even part */
03831 
03832     /* Add fudge factor here for final descale. */
03833     tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
03834     tmp0 <<= CONST_BITS;
03835     tmp2 = (INT32) wsptr[4];
03836     tmp10 = MULTIPLY(tmp2, FIX(0.707106781));   /* c4 */
03837     tmp1 = tmp0 + tmp10;
03838     tmp11 = tmp0 - tmp10 - tmp10;
03839     tmp10 = (INT32) wsptr[2];
03840     tmp0 = MULTIPLY(tmp10, FIX(1.224744871));   /* c2 */
03841     tmp10 = tmp1 + tmp0;
03842     tmp12 = tmp1 - tmp0;
03843 
03844     /* Odd part */
03845 
03846     z1 = (INT32) wsptr[1];
03847     z2 = (INT32) wsptr[3];
03848     z3 = (INT32) wsptr[5];
03849     tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */
03850     tmp0 = tmp1 + ((z1 + z2) << CONST_BITS);
03851     tmp2 = tmp1 + ((z3 - z2) << CONST_BITS);
03852     tmp1 = (z1 - z2 - z3) << CONST_BITS;
03853 
03854     /* Final output stage */
03855 
03856     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
03857                           CONST_BITS+PASS1_BITS+3)
03858                 & RANGE_MASK];
03859     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
03860                           CONST_BITS+PASS1_BITS+3)
03861                 & RANGE_MASK];
03862     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
03863                           CONST_BITS+PASS1_BITS+3)
03864                 & RANGE_MASK];
03865     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
03866                           CONST_BITS+PASS1_BITS+3)
03867                 & RANGE_MASK];
03868     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
03869                           CONST_BITS+PASS1_BITS+3)
03870                 & RANGE_MASK];
03871     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
03872                           CONST_BITS+PASS1_BITS+3)
03873                 & RANGE_MASK];
03874 
03875     wsptr += 6;     /* advance pointer to next row */
03876   }
03877 }
03878 
03879 
03880 /*
03881  * Perform dequantization and inverse DCT on one block of coefficients,
03882  * producing a 4x2 output block.
03883  *
03884  * 2-point IDCT in pass 1 (columns), 4-point in pass 2 (rows).
03885  */
03886 
03887 GLOBAL(void)
03888 jpeg_idct_4x2 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
03889            JCOEFPTR coef_block,
03890            JSAMPARRAY output_buf, JDIMENSION output_col)
03891 {
03892   INT32 tmp0, tmp2, tmp10, tmp12;
03893   INT32 z1, z2, z3;
03894   JCOEFPTR inptr;
03895   ISLOW_MULT_TYPE * quantptr;
03896   INT32 * wsptr;
03897   JSAMPROW outptr;
03898   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
03899   int ctr;
03900   INT32 workspace[4*2]; /* buffers data between passes */
03901   SHIFT_TEMPS
03902 
03903   /* Pass 1: process columns from input, store into work array. */
03904 
03905   inptr = coef_block;
03906   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
03907   wsptr = workspace;
03908   for (ctr = 0; ctr < 4; ctr++, inptr++, quantptr++, wsptr++) {
03909     /* Even part */
03910 
03911     tmp10 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
03912 
03913     /* Odd part */
03914 
03915     tmp0 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
03916 
03917     /* Final output stage */
03918 
03919     wsptr[4*0] = tmp10 + tmp0;
03920     wsptr[4*1] = tmp10 - tmp0;
03921   }
03922 
03923   /* Pass 2: process 2 rows from work array, store into output array.
03924    * 4-point IDCT kernel,
03925    * cK represents sqrt(2) * cos(K*pi/16) [refers to 8-point IDCT].
03926    */
03927   wsptr = workspace;
03928   for (ctr = 0; ctr < 2; ctr++) {
03929     outptr = output_buf[ctr] + output_col;
03930 
03931     /* Even part */
03932 
03933     /* Add fudge factor here for final descale. */
03934     tmp0 = wsptr[0] + (ONE << 2);
03935     tmp2 = wsptr[2];
03936 
03937     tmp10 = (tmp0 + tmp2) << CONST_BITS;
03938     tmp12 = (tmp0 - tmp2) << CONST_BITS;
03939 
03940     /* Odd part */
03941     /* Same rotation as in the even part of the 8x8 LL&M IDCT */
03942 
03943     z2 = wsptr[1];
03944     z3 = wsptr[3];
03945 
03946     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);   /* c6 */
03947     tmp0 = z1 + MULTIPLY(z2, FIX_0_765366865); /* c2-c6 */
03948     tmp2 = z1 - MULTIPLY(z3, FIX_1_847759065); /* c2+c6 */
03949 
03950     /* Final output stage */
03951 
03952     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
03953                           CONST_BITS+3)
03954                 & RANGE_MASK];
03955     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
03956                           CONST_BITS+3)
03957                 & RANGE_MASK];
03958     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
03959                           CONST_BITS+3)
03960                 & RANGE_MASK];
03961     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
03962                           CONST_BITS+3)
03963                 & RANGE_MASK];
03964 
03965     wsptr += 4;     /* advance pointer to next row */
03966   }
03967 }
03968 
03969 
03970 /*
03971  * Perform dequantization and inverse DCT on one block of coefficients,
03972  * producing a 2x1 output block.
03973  *
03974  * 1-point IDCT in pass 1 (columns), 2-point in pass 2 (rows).
03975  */
03976 
03977 GLOBAL(void)
03978 jpeg_idct_2x1 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
03979            JCOEFPTR coef_block,
03980            JSAMPARRAY output_buf, JDIMENSION output_col)
03981 {
03982   INT32 tmp0, tmp10;
03983   ISLOW_MULT_TYPE * quantptr;
03984   JSAMPROW outptr;
03985   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
03986   SHIFT_TEMPS
03987 
03988   /* Pass 1: empty. */
03989 
03990   /* Pass 2: process 1 row from input, store into output array. */
03991 
03992   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
03993   outptr = output_buf[0] + output_col;
03994 
03995   /* Even part */
03996 
03997   tmp10 = DEQUANTIZE(coef_block[0], quantptr[0]);
03998   /* Add fudge factor here for final descale. */
03999   tmp10 += ONE << 2;
04000 
04001   /* Odd part */
04002 
04003   tmp0 = DEQUANTIZE(coef_block[1], quantptr[1]);
04004 
04005   /* Final output stage */
04006 
04007   outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0, 3) & RANGE_MASK];
04008   outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0, 3) & RANGE_MASK];
04009 }
04010 
04011 
04012 /*
04013  * Perform dequantization and inverse DCT on one block of coefficients,
04014  * producing a 8x16 output block.
04015  *
04016  * 16-point IDCT in pass 1 (columns), 8-point in pass 2 (rows).
04017  */
04018 
04019 GLOBAL(void)
04020 jpeg_idct_8x16 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
04021         JCOEFPTR coef_block,
04022         JSAMPARRAY output_buf, JDIMENSION output_col)
04023 {
04024   INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13;
04025   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27;
04026   INT32 z1, z2, z3, z4;
04027   JCOEFPTR inptr;
04028   ISLOW_MULT_TYPE * quantptr;
04029   int * wsptr;
04030   JSAMPROW outptr;
04031   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
04032   int ctr;
04033   int workspace[8*16];  /* buffers data between passes */
04034   SHIFT_TEMPS
04035 
04036   /* Pass 1: process columns from input, store into work array.
04037    * 16-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/32).
04038    */
04039   inptr = coef_block;
04040   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
04041   wsptr = workspace;
04042   for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
04043     /* Even part */
04044 
04045     tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
04046     tmp0 <<= CONST_BITS;
04047     /* Add fudge factor here for final descale. */
04048     tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
04049 
04050     z1 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
04051     tmp1 = MULTIPLY(z1, FIX(1.306562965));      /* c4[16] = c2[8] */
04052     tmp2 = MULTIPLY(z1, FIX_0_541196100);       /* c12[16] = c6[8] */
04053 
04054     tmp10 = tmp0 + tmp1;
04055     tmp11 = tmp0 - tmp1;
04056     tmp12 = tmp0 + tmp2;
04057     tmp13 = tmp0 - tmp2;
04058 
04059     z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
04060     z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
04061     z3 = z1 - z2;
04062     z4 = MULTIPLY(z3, FIX(0.275899379));        /* c14[16] = c7[8] */
04063     z3 = MULTIPLY(z3, FIX(1.387039845));        /* c2[16] = c1[8] */
04064 
04065     tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447);  /* (c6+c2)[16] = (c3+c1)[8] */
04066     tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223);  /* (c6-c14)[16] = (c3-c7)[8] */
04067     tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */
04068     tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] */
04069 
04070     tmp20 = tmp10 + tmp0;
04071     tmp27 = tmp10 - tmp0;
04072     tmp21 = tmp12 + tmp1;
04073     tmp26 = tmp12 - tmp1;
04074     tmp22 = tmp13 + tmp2;
04075     tmp25 = tmp13 - tmp2;
04076     tmp23 = tmp11 + tmp3;
04077     tmp24 = tmp11 - tmp3;
04078 
04079     /* Odd part */
04080 
04081     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
04082     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
04083     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
04084     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
04085 
04086     tmp11 = z1 + z3;
04087 
04088     tmp1  = MULTIPLY(z1 + z2, FIX(1.353318001));   /* c3 */
04089     tmp2  = MULTIPLY(tmp11,   FIX(1.247225013));   /* c5 */
04090     tmp3  = MULTIPLY(z1 + z4, FIX(1.093201867));   /* c7 */
04091     tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586));   /* c9 */
04092     tmp11 = MULTIPLY(tmp11,   FIX(0.666655658));   /* c11 */
04093     tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528));   /* c13 */
04094     tmp0  = tmp1 + tmp2 + tmp3 -
04095         MULTIPLY(z1, FIX(2.286341144));        /* c7+c5+c3-c1 */
04096     tmp13 = tmp10 + tmp11 + tmp12 -
04097         MULTIPLY(z1, FIX(1.835730603));        /* c9+c11+c13-c15 */
04098     z1    = MULTIPLY(z2 + z3, FIX(0.138617169));   /* c15 */
04099     tmp1  += z1 + MULTIPLY(z2, FIX(0.071888074));  /* c9+c11-c3-c15 */
04100     tmp2  += z1 - MULTIPLY(z3, FIX(1.125726048));  /* c5+c7+c15-c3 */
04101     z1    = MULTIPLY(z3 - z2, FIX(1.407403738));   /* c1 */
04102     tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282));  /* c1+c11-c9-c13 */
04103     tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411));  /* c1+c5+c13-c7 */
04104     z2    += z4;
04105     z1    = MULTIPLY(z2, - FIX(0.666655658));      /* -c11 */
04106     tmp1  += z1;
04107     tmp3  += z1 + MULTIPLY(z4, FIX(1.065388962));  /* c3+c11+c15-c7 */
04108     z2    = MULTIPLY(z2, - FIX(1.247225013));      /* -c5 */
04109     tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809));  /* c1+c5+c9-c13 */
04110     tmp12 += z2;
04111     z2    = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */
04112     tmp2  += z2;
04113     tmp3  += z2;
04114     z2    = MULTIPLY(z4 - z3, FIX(0.410524528));   /* c13 */
04115     tmp10 += z2;
04116     tmp11 += z2;
04117 
04118     /* Final output stage */
04119 
04120     wsptr[8*0]  = (int) RIGHT_SHIFT(tmp20 + tmp0,  CONST_BITS-PASS1_BITS);
04121     wsptr[8*15] = (int) RIGHT_SHIFT(tmp20 - tmp0,  CONST_BITS-PASS1_BITS);
04122     wsptr[8*1]  = (int) RIGHT_SHIFT(tmp21 + tmp1,  CONST_BITS-PASS1_BITS);
04123     wsptr[8*14] = (int) RIGHT_SHIFT(tmp21 - tmp1,  CONST_BITS-PASS1_BITS);
04124     wsptr[8*2]  = (int) RIGHT_SHIFT(tmp22 + tmp2,  CONST_BITS-PASS1_BITS);
04125     wsptr[8*13] = (int) RIGHT_SHIFT(tmp22 - tmp2,  CONST_BITS-PASS1_BITS);
04126     wsptr[8*3]  = (int) RIGHT_SHIFT(tmp23 + tmp3,  CONST_BITS-PASS1_BITS);
04127     wsptr[8*12] = (int) RIGHT_SHIFT(tmp23 - tmp3,  CONST_BITS-PASS1_BITS);
04128     wsptr[8*4]  = (int) RIGHT_SHIFT(tmp24 + tmp10, CONST_BITS-PASS1_BITS);
04129     wsptr[8*11] = (int) RIGHT_SHIFT(tmp24 - tmp10, CONST_BITS-PASS1_BITS);
04130     wsptr[8*5]  = (int) RIGHT_SHIFT(tmp25 + tmp11, CONST_BITS-PASS1_BITS);
04131     wsptr[8*10] = (int) RIGHT_SHIFT(tmp25 - tmp11, CONST_BITS-PASS1_BITS);
04132     wsptr[8*6]  = (int) RIGHT_SHIFT(tmp26 + tmp12, CONST_BITS-PASS1_BITS);
04133     wsptr[8*9]  = (int) RIGHT_SHIFT(tmp26 - tmp12, CONST_BITS-PASS1_BITS);
04134     wsptr[8*7]  = (int) RIGHT_SHIFT(tmp27 + tmp13, CONST_BITS-PASS1_BITS);
04135     wsptr[8*8]  = (int) RIGHT_SHIFT(tmp27 - tmp13, CONST_BITS-PASS1_BITS);
04136   }
04137   
04138   /* Pass 2: process rows from work array, store into output array. */
04139   /* Note that we must descale the results by a factor of 8 == 2**3, */
04140   /* and also undo the PASS1_BITS scaling. */
04141 
04142   wsptr = workspace;
04143   for (ctr = 0; ctr < 16; ctr++) {
04144     outptr = output_buf[ctr] + output_col;
04145     
04146     /* Even part: reverse the even part of the forward DCT. */
04147     /* The rotator is sqrt(2)*c(-6). */
04148     
04149     z2 = (INT32) wsptr[2];
04150     z3 = (INT32) wsptr[6];
04151     
04152     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);
04153     tmp2 = z1 + MULTIPLY(z2, FIX_0_765366865);
04154     tmp3 = z1 - MULTIPLY(z3, FIX_1_847759065);
04155     
04156     /* Add fudge factor here for final descale. */
04157     z2 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
04158     z3 = (INT32) wsptr[4];
04159     
04160     tmp0 = (z2 + z3) << CONST_BITS;
04161     tmp1 = (z2 - z3) << CONST_BITS;
04162     
04163     tmp10 = tmp0 + tmp2;
04164     tmp13 = tmp0 - tmp2;
04165     tmp11 = tmp1 + tmp3;
04166     tmp12 = tmp1 - tmp3;
04167     
04168     /* Odd part per figure 8; the matrix is unitary and hence its
04169      * transpose is its inverse.  i0..i3 are y7,y5,y3,y1 respectively.
04170      */
04171     
04172     tmp0 = (INT32) wsptr[7];
04173     tmp1 = (INT32) wsptr[5];
04174     tmp2 = (INT32) wsptr[3];
04175     tmp3 = (INT32) wsptr[1];
04176     
04177     z2 = tmp0 + tmp2;
04178     z3 = tmp1 + tmp3;
04179 
04180     z1 = MULTIPLY(z2 + z3, FIX_1_175875602); /* sqrt(2) * c3 */
04181     z2 = MULTIPLY(z2, - FIX_1_961570560); /* sqrt(2) * (-c3-c5) */
04182     z3 = MULTIPLY(z3, - FIX_0_390180644); /* sqrt(2) * (c5-c3) */
04183     z2 += z1;
04184     z3 += z1;
04185 
04186     z1 = MULTIPLY(tmp0 + tmp3, - FIX_0_899976223); /* sqrt(2) * (c7-c3) */
04187     tmp0 = MULTIPLY(tmp0, FIX_0_298631336); /* sqrt(2) * (-c1+c3+c5-c7) */
04188     tmp3 = MULTIPLY(tmp3, FIX_1_501321110); /* sqrt(2) * ( c1+c3-c5-c7) */
04189     tmp0 += z1 + z2;
04190     tmp3 += z1 + z3;
04191 
04192     z1 = MULTIPLY(tmp1 + tmp2, - FIX_2_562915447); /* sqrt(2) * (-c1-c3) */
04193     tmp1 = MULTIPLY(tmp1, FIX_2_053119869); /* sqrt(2) * ( c1+c3-c5+c7) */
04194     tmp2 = MULTIPLY(tmp2, FIX_3_072711026); /* sqrt(2) * ( c1+c3+c5-c7) */
04195     tmp1 += z1 + z3;
04196     tmp2 += z1 + z2;
04197     
04198     /* Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 */
04199     
04200     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp3,
04201                           CONST_BITS+PASS1_BITS+3)
04202                 & RANGE_MASK];
04203     outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp3,
04204                           CONST_BITS+PASS1_BITS+3)
04205                 & RANGE_MASK];
04206     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp2,
04207                           CONST_BITS+PASS1_BITS+3)
04208                 & RANGE_MASK];
04209     outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp2,
04210                           CONST_BITS+PASS1_BITS+3)
04211                 & RANGE_MASK];
04212     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp1,
04213                           CONST_BITS+PASS1_BITS+3)
04214                 & RANGE_MASK];
04215     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp1,
04216                           CONST_BITS+PASS1_BITS+3)
04217                 & RANGE_MASK];
04218     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13 + tmp0,
04219                           CONST_BITS+PASS1_BITS+3)
04220                 & RANGE_MASK];
04221     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp13 - tmp0,
04222                           CONST_BITS+PASS1_BITS+3)
04223                 & RANGE_MASK];
04224     
04225     wsptr += DCTSIZE;       /* advance pointer to next row */
04226   }
04227 }
04228 
04229 
04230 /*
04231  * Perform dequantization and inverse DCT on one block of coefficients,
04232  * producing a 7x14 output block.
04233  *
04234  * 14-point IDCT in pass 1 (columns), 7-point in pass 2 (rows).
04235  */
04236 
04237 GLOBAL(void)
04238 jpeg_idct_7x14 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
04239         JCOEFPTR coef_block,
04240         JSAMPARRAY output_buf, JDIMENSION output_col)
04241 {
04242   INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16;
04243   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26;
04244   INT32 z1, z2, z3, z4;
04245   JCOEFPTR inptr;
04246   ISLOW_MULT_TYPE * quantptr;
04247   int * wsptr;
04248   JSAMPROW outptr;
04249   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
04250   int ctr;
04251   int workspace[7*14];  /* buffers data between passes */
04252   SHIFT_TEMPS
04253 
04254   /* Pass 1: process columns from input, store into work array.
04255    * 14-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/28).
04256    */
04257   inptr = coef_block;
04258   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
04259   wsptr = workspace;
04260   for (ctr = 0; ctr < 7; ctr++, inptr++, quantptr++, wsptr++) {
04261     /* Even part */
04262 
04263     z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
04264     z1 <<= CONST_BITS;
04265     /* Add fudge factor here for final descale. */
04266     z1 += ONE << (CONST_BITS-PASS1_BITS-1);
04267     z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
04268     z2 = MULTIPLY(z4, FIX(1.274162392));         /* c4 */
04269     z3 = MULTIPLY(z4, FIX(0.314692123));         /* c12 */
04270     z4 = MULTIPLY(z4, FIX(0.881747734));         /* c8 */
04271 
04272     tmp10 = z1 + z2;
04273     tmp11 = z1 + z3;
04274     tmp12 = z1 - z4;
04275 
04276     tmp23 = RIGHT_SHIFT(z1 - ((z2 + z3 - z4) << 1), /* c0 = (c4+c12-c8)*2 */
04277             CONST_BITS-PASS1_BITS);
04278 
04279     z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
04280     z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
04281 
04282     z3 = MULTIPLY(z1 + z2, FIX(1.105676686));    /* c6 */
04283 
04284     tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */
04285     tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */
04286     tmp15 = MULTIPLY(z1, FIX(0.613604268)) -     /* c10 */
04287         MULTIPLY(z2, FIX(1.378756276));      /* c2 */
04288 
04289     tmp20 = tmp10 + tmp13;
04290     tmp26 = tmp10 - tmp13;
04291     tmp21 = tmp11 + tmp14;
04292     tmp25 = tmp11 - tmp14;
04293     tmp22 = tmp12 + tmp15;
04294     tmp24 = tmp12 - tmp15;
04295 
04296     /* Odd part */
04297 
04298     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
04299     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
04300     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
04301     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
04302     tmp13 = z4 << CONST_BITS;
04303 
04304     tmp14 = z1 + z3;
04305     tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607));           /* c3 */
04306     tmp12 = MULTIPLY(tmp14, FIX(1.197448846));             /* c5 */
04307     tmp10 = tmp11 + tmp12 + tmp13 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */
04308     tmp14 = MULTIPLY(tmp14, FIX(0.752406978));             /* c9 */
04309     tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426));        /* c9+c11-c13 */
04310     z1    -= z2;
04311     tmp15 = MULTIPLY(z1, FIX(0.467085129)) - tmp13;        /* c11 */
04312     tmp16 += tmp15;
04313     z1    += z4;
04314     z4    = MULTIPLY(z2 + z3, - FIX(0.158341681)) - tmp13; /* -c13 */
04315     tmp11 += z4 - MULTIPLY(z2, FIX(0.424103948));          /* c3-c9-c13 */
04316     tmp12 += z4 - MULTIPLY(z3, FIX(2.373959773));          /* c3+c5-c13 */
04317     z4    = MULTIPLY(z3 - z2, FIX(1.405321284));           /* c1 */
04318     tmp14 += z4 + tmp13 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */
04319     tmp15 += z4 + MULTIPLY(z2, FIX(0.674957567));          /* c1+c11-c5 */
04320 
04321     tmp13 = (z1 - z3) << PASS1_BITS;
04322 
04323     /* Final output stage */
04324 
04325     wsptr[7*0]  = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
04326     wsptr[7*13] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
04327     wsptr[7*1]  = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
04328     wsptr[7*12] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
04329     wsptr[7*2]  = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
04330     wsptr[7*11] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
04331     wsptr[7*3]  = (int) (tmp23 + tmp13);
04332     wsptr[7*10] = (int) (tmp23 - tmp13);
04333     wsptr[7*4]  = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
04334     wsptr[7*9]  = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
04335     wsptr[7*5]  = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
04336     wsptr[7*8]  = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
04337     wsptr[7*6]  = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS);
04338     wsptr[7*7]  = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS);
04339   }
04340 
04341   /* Pass 2: process 14 rows from work array, store into output array.
04342    * 7-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/14).
04343    */
04344   wsptr = workspace;
04345   for (ctr = 0; ctr < 14; ctr++) {
04346     outptr = output_buf[ctr] + output_col;
04347 
04348     /* Even part */
04349 
04350     /* Add fudge factor here for final descale. */
04351     tmp23 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
04352     tmp23 <<= CONST_BITS;
04353 
04354     z1 = (INT32) wsptr[2];
04355     z2 = (INT32) wsptr[4];
04356     z3 = (INT32) wsptr[6];
04357 
04358     tmp20 = MULTIPLY(z2 - z3, FIX(0.881747734));       /* c4 */
04359     tmp22 = MULTIPLY(z1 - z2, FIX(0.314692123));       /* c6 */
04360     tmp21 = tmp20 + tmp22 + tmp23 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6 */
04361     tmp10 = z1 + z3;
04362     z2 -= tmp10;
04363     tmp10 = MULTIPLY(tmp10, FIX(1.274162392)) + tmp23; /* c2 */
04364     tmp20 += tmp10 - MULTIPLY(z3, FIX(0.077722536));   /* c2-c4-c6 */
04365     tmp22 += tmp10 - MULTIPLY(z1, FIX(2.470602249));   /* c2+c4+c6 */
04366     tmp23 += MULTIPLY(z2, FIX(1.414213562));           /* c0 */
04367 
04368     /* Odd part */
04369 
04370     z1 = (INT32) wsptr[1];
04371     z2 = (INT32) wsptr[3];
04372     z3 = (INT32) wsptr[5];
04373 
04374     tmp11 = MULTIPLY(z1 + z2, FIX(0.935414347));       /* (c3+c1-c5)/2 */
04375     tmp12 = MULTIPLY(z1 - z2, FIX(0.170262339));       /* (c3+c5-c1)/2 */
04376     tmp10 = tmp11 - tmp12;
04377     tmp11 += tmp12;
04378     tmp12 = MULTIPLY(z2 + z3, - FIX(1.378756276));     /* -c1 */
04379     tmp11 += tmp12;
04380     z2 = MULTIPLY(z1 + z3, FIX(0.613604268));          /* c5 */
04381     tmp10 += z2;
04382     tmp12 += z2 + MULTIPLY(z3, FIX(1.870828693));      /* c3+c1-c5 */
04383 
04384     /* Final output stage */
04385 
04386     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
04387                           CONST_BITS+PASS1_BITS+3)
04388                 & RANGE_MASK];
04389     outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
04390                           CONST_BITS+PASS1_BITS+3)
04391                 & RANGE_MASK];
04392     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
04393                           CONST_BITS+PASS1_BITS+3)
04394                 & RANGE_MASK];
04395     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
04396                           CONST_BITS+PASS1_BITS+3)
04397                 & RANGE_MASK];
04398     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
04399                           CONST_BITS+PASS1_BITS+3)
04400                 & RANGE_MASK];
04401     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
04402                           CONST_BITS+PASS1_BITS+3)
04403                 & RANGE_MASK];
04404     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23,
04405                           CONST_BITS+PASS1_BITS+3)
04406                 & RANGE_MASK];
04407 
04408     wsptr += 7;     /* advance pointer to next row */
04409   }
04410 }
04411 
04412 
04413 /*
04414  * Perform dequantization and inverse DCT on one block of coefficients,
04415  * producing a 6x12 output block.
04416  *
04417  * 12-point IDCT in pass 1 (columns), 6-point in pass 2 (rows).
04418  */
04419 
04420 GLOBAL(void)
04421 jpeg_idct_6x12 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
04422         JCOEFPTR coef_block,
04423         JSAMPARRAY output_buf, JDIMENSION output_col)
04424 {
04425   INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15;
04426   INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25;
04427   INT32 z1, z2, z3, z4;
04428   JCOEFPTR inptr;
04429   ISLOW_MULT_TYPE * quantptr;
04430   int * wsptr;
04431   JSAMPROW outptr;
04432   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
04433   int ctr;
04434   int workspace[6*12];  /* buffers data between passes */
04435   SHIFT_TEMPS
04436 
04437   /* Pass 1: process columns from input, store into work array.
04438    * 12-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/24).
04439    */
04440   inptr = coef_block;
04441   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
04442   wsptr = workspace;
04443   for (ctr = 0; ctr < 6; ctr++, inptr++, quantptr++, wsptr++) {
04444     /* Even part */
04445 
04446     z3 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
04447     z3 <<= CONST_BITS;
04448     /* Add fudge factor here for final descale. */
04449     z3 += ONE << (CONST_BITS-PASS1_BITS-1);
04450 
04451     z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
04452     z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */
04453 
04454     tmp10 = z3 + z4;
04455     tmp11 = z3 - z4;
04456 
04457     z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
04458     z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */
04459     z1 <<= CONST_BITS;
04460     z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
04461     z2 <<= CONST_BITS;
04462 
04463     tmp12 = z1 - z2;
04464 
04465     tmp21 = z3 + tmp12;
04466     tmp24 = z3 - tmp12;
04467 
04468     tmp12 = z4 + z2;
04469 
04470     tmp20 = tmp10 + tmp12;
04471     tmp25 = tmp10 - tmp12;
04472 
04473     tmp12 = z4 - z1 - z2;
04474 
04475     tmp22 = tmp11 + tmp12;
04476     tmp23 = tmp11 - tmp12;
04477 
04478     /* Odd part */
04479 
04480     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
04481     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
04482     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
04483     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
04484 
04485     tmp11 = MULTIPLY(z2, FIX(1.306562965));                  /* c3 */
04486     tmp14 = MULTIPLY(z2, - FIX_0_541196100);                 /* -c9 */
04487 
04488     tmp10 = z1 + z3;
04489     tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669));          /* c7 */
04490     tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384));       /* c5-c7 */
04491     tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716));  /* c1-c5 */
04492     tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580));           /* -(c7+c11) */
04493     tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */
04494     tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */
04495     tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) -        /* c7-c11 */
04496          MULTIPLY(z4, FIX(1.982889723));                 /* c5+c7 */
04497 
04498     z1 -= z4;
04499     z2 -= z3;
04500     z3 = MULTIPLY(z1 + z2, FIX_0_541196100);                 /* c9 */
04501     tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865);              /* c3-c9 */
04502     tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065);              /* c3+c9 */
04503 
04504     /* Final output stage */
04505 
04506     wsptr[6*0]  = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
04507     wsptr[6*11] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
04508     wsptr[6*1]  = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
04509     wsptr[6*10] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
04510     wsptr[6*2]  = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
04511     wsptr[6*9]  = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
04512     wsptr[6*3]  = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
04513     wsptr[6*8]  = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
04514     wsptr[6*4]  = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
04515     wsptr[6*7]  = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
04516     wsptr[6*5]  = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
04517     wsptr[6*6]  = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
04518   }
04519 
04520   /* Pass 2: process 12 rows from work array, store into output array.
04521    * 6-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/12).
04522    */
04523   wsptr = workspace;
04524   for (ctr = 0; ctr < 12; ctr++) {
04525     outptr = output_buf[ctr] + output_col;
04526 
04527     /* Even part */
04528 
04529     /* Add fudge factor here for final descale. */
04530     tmp10 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
04531     tmp10 <<= CONST_BITS;
04532     tmp12 = (INT32) wsptr[4];
04533     tmp20 = MULTIPLY(tmp12, FIX(0.707106781));   /* c4 */
04534     tmp11 = tmp10 + tmp20;
04535     tmp21 = tmp10 - tmp20 - tmp20;
04536     tmp20 = (INT32) wsptr[2];
04537     tmp10 = MULTIPLY(tmp20, FIX(1.224744871));   /* c2 */
04538     tmp20 = tmp11 + tmp10;
04539     tmp22 = tmp11 - tmp10;
04540 
04541     /* Odd part */
04542 
04543     z1 = (INT32) wsptr[1];
04544     z2 = (INT32) wsptr[3];
04545     z3 = (INT32) wsptr[5];
04546     tmp11 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */
04547     tmp10 = tmp11 + ((z1 + z2) << CONST_BITS);
04548     tmp12 = tmp11 + ((z3 - z2) << CONST_BITS);
04549     tmp11 = (z1 - z2 - z3) << CONST_BITS;
04550 
04551     /* Final output stage */
04552 
04553     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
04554                           CONST_BITS+PASS1_BITS+3)
04555                 & RANGE_MASK];
04556     outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
04557                           CONST_BITS+PASS1_BITS+3)
04558                 & RANGE_MASK];
04559     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
04560                           CONST_BITS+PASS1_BITS+3)
04561                 & RANGE_MASK];
04562     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
04563                           CONST_BITS+PASS1_BITS+3)
04564                 & RANGE_MASK];
04565     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
04566                           CONST_BITS+PASS1_BITS+3)
04567                 & RANGE_MASK];
04568     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
04569                           CONST_BITS+PASS1_BITS+3)
04570                 & RANGE_MASK];
04571 
04572     wsptr += 6;     /* advance pointer to next row */
04573   }
04574 }
04575 
04576 
04577 /*
04578  * Perform dequantization and inverse DCT on one block of coefficients,
04579  * producing a 5x10 output block.
04580  *
04581  * 10-point IDCT in pass 1 (columns), 5-point in pass 2 (rows).
04582  */
04583 
04584 GLOBAL(void)
04585 jpeg_idct_5x10 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
04586         JCOEFPTR coef_block,
04587         JSAMPARRAY output_buf, JDIMENSION output_col)
04588 {
04589   INT32 tmp10, tmp11, tmp12, tmp13, tmp14;
04590   INT32 tmp20, tmp21, tmp22, tmp23, tmp24;
04591   INT32 z1, z2, z3, z4, z5;
04592   JCOEFPTR inptr;
04593   ISLOW_MULT_TYPE * quantptr;
04594   int * wsptr;
04595   JSAMPROW outptr;
04596   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
04597   int ctr;
04598   int workspace[5*10];  /* buffers data between passes */
04599   SHIFT_TEMPS
04600 
04601   /* Pass 1: process columns from input, store into work array.
04602    * 10-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/20).
04603    */
04604   inptr = coef_block;
04605   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
04606   wsptr = workspace;
04607   for (ctr = 0; ctr < 5; ctr++, inptr++, quantptr++, wsptr++) {
04608     /* Even part */
04609 
04610     z3 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
04611     z3 <<= CONST_BITS;
04612     /* Add fudge factor here for final descale. */
04613     z3 += ONE << (CONST_BITS-PASS1_BITS-1);
04614     z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
04615     z1 = MULTIPLY(z4, FIX(1.144122806));         /* c4 */
04616     z2 = MULTIPLY(z4, FIX(0.437016024));         /* c8 */
04617     tmp10 = z3 + z1;
04618     tmp11 = z3 - z2;
04619 
04620     tmp22 = RIGHT_SHIFT(z3 - ((z1 - z2) << 1),   /* c0 = (c4-c8)*2 */
04621             CONST_BITS-PASS1_BITS);
04622 
04623     z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
04624     z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
04625 
04626     z1 = MULTIPLY(z2 + z3, FIX(0.831253876));    /* c6 */
04627     tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */
04628     tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */
04629 
04630     tmp20 = tmp10 + tmp12;
04631     tmp24 = tmp10 - tmp12;
04632     tmp21 = tmp11 + tmp13;
04633     tmp23 = tmp11 - tmp13;
04634 
04635     /* Odd part */
04636 
04637     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
04638     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
04639     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
04640     z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
04641 
04642     tmp11 = z2 + z4;
04643     tmp13 = z2 - z4;
04644 
04645     tmp12 = MULTIPLY(tmp13, FIX(0.309016994));        /* (c3-c7)/2 */
04646     z5 = z3 << CONST_BITS;
04647 
04648     z2 = MULTIPLY(tmp11, FIX(0.951056516));           /* (c3+c7)/2 */
04649     z4 = z5 + tmp12;
04650 
04651     tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */
04652     tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */
04653 
04654     z2 = MULTIPLY(tmp11, FIX(0.587785252));           /* (c1-c9)/2 */
04655     z4 = z5 - tmp12 - (tmp13 << (CONST_BITS - 1));
04656 
04657     tmp12 = (z1 - tmp13 - z3) << PASS1_BITS;
04658 
04659     tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */
04660     tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */
04661 
04662     /* Final output stage */
04663 
04664     wsptr[5*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
04665     wsptr[5*9] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
04666     wsptr[5*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
04667     wsptr[5*8] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
04668     wsptr[5*2] = (int) (tmp22 + tmp12);
04669     wsptr[5*7] = (int) (tmp22 - tmp12);
04670     wsptr[5*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
04671     wsptr[5*6] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
04672     wsptr[5*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
04673     wsptr[5*5] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
04674   }
04675 
04676   /* Pass 2: process 10 rows from work array, store into output array.
04677    * 5-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/10).
04678    */
04679   wsptr = workspace;
04680   for (ctr = 0; ctr < 10; ctr++) {
04681     outptr = output_buf[ctr] + output_col;
04682 
04683     /* Even part */
04684 
04685     /* Add fudge factor here for final descale. */
04686     tmp12 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
04687     tmp12 <<= CONST_BITS;
04688     tmp13 = (INT32) wsptr[2];
04689     tmp14 = (INT32) wsptr[4];
04690     z1 = MULTIPLY(tmp13 + tmp14, FIX(0.790569415)); /* (c2+c4)/2 */
04691     z2 = MULTIPLY(tmp13 - tmp14, FIX(0.353553391)); /* (c2-c4)/2 */
04692     z3 = tmp12 + z2;
04693     tmp10 = z3 + z1;
04694     tmp11 = z3 - z1;
04695     tmp12 -= z2 << 2;
04696 
04697     /* Odd part */
04698 
04699     z2 = (INT32) wsptr[1];
04700     z3 = (INT32) wsptr[3];
04701 
04702     z1 = MULTIPLY(z2 + z3, FIX(0.831253876));       /* c3 */
04703     tmp13 = z1 + MULTIPLY(z2, FIX(0.513743148));    /* c1-c3 */
04704     tmp14 = z1 - MULTIPLY(z3, FIX(2.176250899));    /* c1+c3 */
04705 
04706     /* Final output stage */
04707 
04708     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp13,
04709                           CONST_BITS+PASS1_BITS+3)
04710                 & RANGE_MASK];
04711     outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp13,
04712                           CONST_BITS+PASS1_BITS+3)
04713                 & RANGE_MASK];
04714     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp14,
04715                           CONST_BITS+PASS1_BITS+3)
04716                 & RANGE_MASK];
04717     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp14,
04718                           CONST_BITS+PASS1_BITS+3)
04719                 & RANGE_MASK];
04720     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12,
04721                           CONST_BITS+PASS1_BITS+3)
04722                 & RANGE_MASK];
04723 
04724     wsptr += 5;     /* advance pointer to next row */
04725   }
04726 }
04727 
04728 
04729 /*
04730  * Perform dequantization and inverse DCT on one block of coefficients,
04731  * producing a 4x8 output block.
04732  *
04733  * 8-point IDCT in pass 1 (columns), 4-point in pass 2 (rows).
04734  */
04735 
04736 GLOBAL(void)
04737 jpeg_idct_4x8 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
04738            JCOEFPTR coef_block,
04739            JSAMPARRAY output_buf, JDIMENSION output_col)
04740 {
04741   INT32 tmp0, tmp1, tmp2, tmp3;
04742   INT32 tmp10, tmp11, tmp12, tmp13;
04743   INT32 z1, z2, z3;
04744   JCOEFPTR inptr;
04745   ISLOW_MULT_TYPE * quantptr;
04746   int * wsptr;
04747   JSAMPROW outptr;
04748   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
04749   int ctr;
04750   int workspace[4*8];   /* buffers data between passes */
04751   SHIFT_TEMPS
04752 
04753   /* Pass 1: process columns from input, store into work array. */
04754   /* Note results are scaled up by sqrt(8) compared to a true IDCT; */
04755   /* furthermore, we scale the results by 2**PASS1_BITS. */
04756 
04757   inptr = coef_block;
04758   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
04759   wsptr = workspace;
04760   for (ctr = 4; ctr > 0; ctr--) {
04761     /* Due to quantization, we will usually find that many of the input
04762      * coefficients are zero, especially the AC terms.  We can exploit this
04763      * by short-circuiting the IDCT calculation for any column in which all
04764      * the AC terms are zero.  In that case each output is equal to the
04765      * DC coefficient (with scale factor as needed).
04766      * With typical images and quantization tables, half or more of the
04767      * column DCT calculations can be simplified this way.
04768      */
04769 
04770     if (inptr[DCTSIZE*1] == 0 && inptr[DCTSIZE*2] == 0 &&
04771     inptr[DCTSIZE*3] == 0 && inptr[DCTSIZE*4] == 0 &&
04772     inptr[DCTSIZE*5] == 0 && inptr[DCTSIZE*6] == 0 &&
04773     inptr[DCTSIZE*7] == 0) {
04774       /* AC terms all zero */
04775       int dcval = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]) << PASS1_BITS;
04776 
04777       wsptr[4*0] = dcval;
04778       wsptr[4*1] = dcval;
04779       wsptr[4*2] = dcval;
04780       wsptr[4*3] = dcval;
04781       wsptr[4*4] = dcval;
04782       wsptr[4*5] = dcval;
04783       wsptr[4*6] = dcval;
04784       wsptr[4*7] = dcval;
04785 
04786       inptr++;          /* advance pointers to next column */
04787       quantptr++;
04788       wsptr++;
04789       continue;
04790     }
04791 
04792     /* Even part: reverse the even part of the forward DCT. */
04793     /* The rotator is sqrt(2)*c(-6). */
04794 
04795     z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
04796     z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
04797     
04798     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);
04799     tmp2 = z1 + MULTIPLY(z2, FIX_0_765366865);
04800     tmp3 = z1 - MULTIPLY(z3, FIX_1_847759065);
04801     
04802     z2 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
04803     z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
04804     z2 <<= CONST_BITS;
04805     z3 <<= CONST_BITS;
04806     /* Add fudge factor here for final descale. */
04807     z2 += ONE << (CONST_BITS-PASS1_BITS-1);
04808 
04809     tmp0 = z2 + z3;
04810     tmp1 = z2 - z3;
04811     
04812     tmp10 = tmp0 + tmp2;
04813     tmp13 = tmp0 - tmp2;
04814     tmp11 = tmp1 + tmp3;
04815     tmp12 = tmp1 - tmp3;
04816 
04817     /* Odd part per figure 8; the matrix is unitary and hence its
04818      * transpose is its inverse.  i0..i3 are y7,y5,y3,y1 respectively.
04819      */
04820 
04821     tmp0 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
04822     tmp1 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
04823     tmp2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
04824     tmp3 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
04825 
04826     z2 = tmp0 + tmp2;
04827     z3 = tmp1 + tmp3;
04828 
04829     z1 = MULTIPLY(z2 + z3, FIX_1_175875602); /* sqrt(2) * c3 */
04830     z2 = MULTIPLY(z2, - FIX_1_961570560); /* sqrt(2) * (-c3-c5) */
04831     z3 = MULTIPLY(z3, - FIX_0_390180644); /* sqrt(2) * (c5-c3) */
04832     z2 += z1;
04833     z3 += z1;
04834 
04835     z1 = MULTIPLY(tmp0 + tmp3, - FIX_0_899976223); /* sqrt(2) * (c7-c3) */
04836     tmp0 = MULTIPLY(tmp0, FIX_0_298631336); /* sqrt(2) * (-c1+c3+c5-c7) */
04837     tmp3 = MULTIPLY(tmp3, FIX_1_501321110); /* sqrt(2) * ( c1+c3-c5-c7) */
04838     tmp0 += z1 + z2;
04839     tmp3 += z1 + z3;
04840 
04841     z1 = MULTIPLY(tmp1 + tmp2, - FIX_2_562915447); /* sqrt(2) * (-c1-c3) */
04842     tmp1 = MULTIPLY(tmp1, FIX_2_053119869); /* sqrt(2) * ( c1+c3-c5+c7) */
04843     tmp2 = MULTIPLY(tmp2, FIX_3_072711026); /* sqrt(2) * ( c1+c3+c5-c7) */
04844     tmp1 += z1 + z3;
04845     tmp2 += z1 + z2;
04846 
04847     /* Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 */
04848 
04849     wsptr[4*0] = (int) RIGHT_SHIFT(tmp10 + tmp3, CONST_BITS-PASS1_BITS);
04850     wsptr[4*7] = (int) RIGHT_SHIFT(tmp10 - tmp3, CONST_BITS-PASS1_BITS);
04851     wsptr[4*1] = (int) RIGHT_SHIFT(tmp11 + tmp2, CONST_BITS-PASS1_BITS);
04852     wsptr[4*6] = (int) RIGHT_SHIFT(tmp11 - tmp2, CONST_BITS-PASS1_BITS);
04853     wsptr[4*2] = (int) RIGHT_SHIFT(tmp12 + tmp1, CONST_BITS-PASS1_BITS);
04854     wsptr[4*5] = (int) RIGHT_SHIFT(tmp12 - tmp1, CONST_BITS-PASS1_BITS);
04855     wsptr[4*3] = (int) RIGHT_SHIFT(tmp13 + tmp0, CONST_BITS-PASS1_BITS);
04856     wsptr[4*4] = (int) RIGHT_SHIFT(tmp13 - tmp0, CONST_BITS-PASS1_BITS);
04857 
04858     inptr++;            /* advance pointers to next column */
04859     quantptr++;
04860     wsptr++;
04861   }
04862 
04863   /* Pass 2: process 8 rows from work array, store into output array.
04864    * 4-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/16).
04865    */
04866   wsptr = workspace;
04867   for (ctr = 0; ctr < 8; ctr++) {
04868     outptr = output_buf[ctr] + output_col;
04869 
04870     /* Even part */
04871 
04872     /* Add fudge factor here for final descale. */
04873     tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
04874     tmp2 = (INT32) wsptr[2];
04875 
04876     tmp10 = (tmp0 + tmp2) << CONST_BITS;
04877     tmp12 = (tmp0 - tmp2) << CONST_BITS;
04878 
04879     /* Odd part */
04880     /* Same rotation as in the even part of the 8x8 LL&M IDCT */
04881 
04882     z2 = (INT32) wsptr[1];
04883     z3 = (INT32) wsptr[3];
04884 
04885     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);   /* c6 */
04886     tmp0 = z1 + MULTIPLY(z2, FIX_0_765366865); /* c2-c6 */
04887     tmp2 = z1 - MULTIPLY(z3, FIX_1_847759065); /* c2+c6 */
04888 
04889     /* Final output stage */
04890 
04891     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
04892                           CONST_BITS+PASS1_BITS+3)
04893                 & RANGE_MASK];
04894     outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
04895                           CONST_BITS+PASS1_BITS+3)
04896                 & RANGE_MASK];
04897     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
04898                           CONST_BITS+PASS1_BITS+3)
04899                 & RANGE_MASK];
04900     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
04901                           CONST_BITS+PASS1_BITS+3)
04902                 & RANGE_MASK];
04903     
04904     wsptr += 4;     /* advance pointer to next row */
04905   }
04906 }
04907 
04908 
04909 /*
04910  * Perform dequantization and inverse DCT on one block of coefficients,
04911  * producing a reduced-size 3x6 output block.
04912  *
04913  * 6-point IDCT in pass 1 (columns), 3-point in pass 2 (rows).
04914  */
04915 
04916 GLOBAL(void)
04917 jpeg_idct_3x6 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
04918            JCOEFPTR coef_block,
04919            JSAMPARRAY output_buf, JDIMENSION output_col)
04920 {
04921   INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12;
04922   INT32 z1, z2, z3;
04923   JCOEFPTR inptr;
04924   ISLOW_MULT_TYPE * quantptr;
04925   int * wsptr;
04926   JSAMPROW outptr;
04927   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
04928   int ctr;
04929   int workspace[3*6];   /* buffers data between passes */
04930   SHIFT_TEMPS
04931 
04932   /* Pass 1: process columns from input, store into work array.
04933    * 6-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/12).
04934    */
04935   inptr = coef_block;
04936   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
04937   wsptr = workspace;
04938   for (ctr = 0; ctr < 3; ctr++, inptr++, quantptr++, wsptr++) {
04939     /* Even part */
04940 
04941     tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
04942     tmp0 <<= CONST_BITS;
04943     /* Add fudge factor here for final descale. */
04944     tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
04945     tmp2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
04946     tmp10 = MULTIPLY(tmp2, FIX(0.707106781));   /* c4 */
04947     tmp1 = tmp0 + tmp10;
04948     tmp11 = RIGHT_SHIFT(tmp0 - tmp10 - tmp10, CONST_BITS-PASS1_BITS);
04949     tmp10 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
04950     tmp0 = MULTIPLY(tmp10, FIX(1.224744871));   /* c2 */
04951     tmp10 = tmp1 + tmp0;
04952     tmp12 = tmp1 - tmp0;
04953 
04954     /* Odd part */
04955 
04956     z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
04957     z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
04958     z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
04959     tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */
04960     tmp0 = tmp1 + ((z1 + z2) << CONST_BITS);
04961     tmp2 = tmp1 + ((z3 - z2) << CONST_BITS);
04962     tmp1 = (z1 - z2 - z3) << PASS1_BITS;
04963 
04964     /* Final output stage */
04965 
04966     wsptr[3*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
04967     wsptr[3*5] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
04968     wsptr[3*1] = (int) (tmp11 + tmp1);
04969     wsptr[3*4] = (int) (tmp11 - tmp1);
04970     wsptr[3*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);
04971     wsptr[3*3] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);
04972   }
04973 
04974   /* Pass 2: process 6 rows from work array, store into output array.
04975    * 3-point IDCT kernel, cK represents sqrt(2) * cos(K*pi/6).
04976    */
04977   wsptr = workspace;
04978   for (ctr = 0; ctr < 6; ctr++) {
04979     outptr = output_buf[ctr] + output_col;
04980 
04981     /* Even part */
04982 
04983     /* Add fudge factor here for final descale. */
04984     tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
04985     tmp0 <<= CONST_BITS;
04986     tmp2 = (INT32) wsptr[2];
04987     tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */
04988     tmp10 = tmp0 + tmp12;
04989     tmp2 = tmp0 - tmp12 - tmp12;
04990 
04991     /* Odd part */
04992 
04993     tmp12 = (INT32) wsptr[1];
04994     tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */
04995 
04996     /* Final output stage */
04997 
04998     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
04999                           CONST_BITS+PASS1_BITS+3)
05000                 & RANGE_MASK];
05001     outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
05002                           CONST_BITS+PASS1_BITS+3)
05003                 & RANGE_MASK];
05004     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp2,
05005                           CONST_BITS+PASS1_BITS+3)
05006                 & RANGE_MASK];
05007 
05008     wsptr += 3;     /* advance pointer to next row */
05009   }
05010 }
05011 
05012 
05013 /*
05014  * Perform dequantization and inverse DCT on one block of coefficients,
05015  * producing a 2x4 output block.
05016  *
05017  * 4-point IDCT in pass 1 (columns), 2-point in pass 2 (rows).
05018  */
05019 
05020 GLOBAL(void)
05021 jpeg_idct_2x4 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
05022            JCOEFPTR coef_block,
05023            JSAMPARRAY output_buf, JDIMENSION output_col)
05024 {
05025   INT32 tmp0, tmp2, tmp10, tmp12;
05026   INT32 z1, z2, z3;
05027   JCOEFPTR inptr;
05028   ISLOW_MULT_TYPE * quantptr;
05029   INT32 * wsptr;
05030   JSAMPROW outptr;
05031   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
05032   int ctr;
05033   INT32 workspace[2*4]; /* buffers data between passes */
05034   SHIFT_TEMPS
05035 
05036   /* Pass 1: process columns from input, store into work array.
05037    * 4-point IDCT kernel,
05038    * cK represents sqrt(2) * cos(K*pi/16) [refers to 8-point IDCT].
05039    */
05040   inptr = coef_block;
05041   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
05042   wsptr = workspace;
05043   for (ctr = 0; ctr < 2; ctr++, inptr++, quantptr++, wsptr++) {
05044     /* Even part */
05045 
05046     tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
05047     tmp2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
05048 
05049     tmp10 = (tmp0 + tmp2) << CONST_BITS;
05050     tmp12 = (tmp0 - tmp2) << CONST_BITS;
05051 
05052     /* Odd part */
05053     /* Same rotation as in the even part of the 8x8 LL&M IDCT */
05054 
05055     z2 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
05056     z3 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
05057 
05058     z1 = MULTIPLY(z2 + z3, FIX_0_541196100);   /* c6 */
05059     tmp0 = z1 + MULTIPLY(z2, FIX_0_765366865); /* c2-c6 */
05060     tmp2 = z1 - MULTIPLY(z3, FIX_1_847759065); /* c2+c6 */
05061 
05062     /* Final output stage */
05063 
05064     wsptr[2*0] = tmp10 + tmp0;
05065     wsptr[2*3] = tmp10 - tmp0;
05066     wsptr[2*1] = tmp12 + tmp2;
05067     wsptr[2*2] = tmp12 - tmp2;
05068   }
05069 
05070   /* Pass 2: process 4 rows from work array, store into output array. */
05071 
05072   wsptr = workspace;
05073   for (ctr = 0; ctr < 4; ctr++) {
05074     outptr = output_buf[ctr] + output_col;
05075 
05076     /* Even part */
05077 
05078     /* Add fudge factor here for final descale. */
05079     tmp10 = wsptr[0] + (ONE << (CONST_BITS+2));
05080 
05081     /* Odd part */
05082 
05083     tmp0 = wsptr[1];
05084 
05085     /* Final output stage */
05086 
05087     outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS+3)
05088                 & RANGE_MASK];
05089     outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS+3)
05090                 & RANGE_MASK];
05091 
05092     wsptr += 2;     /* advance pointer to next row */
05093   }
05094 }
05095 
05096 
05097 /*
05098  * Perform dequantization and inverse DCT on one block of coefficients,
05099  * producing a 1x2 output block.
05100  *
05101  * 2-point IDCT in pass 1 (columns), 1-point in pass 2 (rows).
05102  */
05103 
05104 GLOBAL(void)
05105 jpeg_idct_1x2 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
05106            JCOEFPTR coef_block,
05107            JSAMPARRAY output_buf, JDIMENSION output_col)
05108 {
05109   INT32 tmp0, tmp10;
05110   ISLOW_MULT_TYPE * quantptr;
05111   JSAMPLE *range_limit = IDCT_range_limit(cinfo);
05112   SHIFT_TEMPS
05113 
05114   /* Process 1 column from input, store into output array. */
05115 
05116   quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
05117 
05118   /* Even part */
05119     
05120   tmp10 = DEQUANTIZE(coef_block[DCTSIZE*0], quantptr[DCTSIZE*0]);
05121   /* Add fudge factor here for final descale. */
05122   tmp10 += ONE << 2;
05123 
05124   /* Odd part */
05125 
05126   tmp0 = DEQUANTIZE(coef_block[DCTSIZE*1], quantptr[DCTSIZE*1]);
05127 
05128   /* Final output stage */
05129 
05130   output_buf[0][output_col] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0, 3)
05131                       & RANGE_MASK];
05132   output_buf[1][output_col] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0, 3)
05133                       & RANGE_MASK];
05134 }
05135 
05136 #endif /* IDCT_SCALING_SUPPORTED */
05137 #endif /* DCT_ISLOW_SUPPORTED */

Generated on Mon May 28 2012 04:19:13 for ReactOS by doxygen 1.7.6.1

ReactOS is a registered trademark or a trademark of ReactOS Foundation in the United States and other countries.