Top: Computers: Hardware:
See also:
| This category in other languages: |
| | |
 |
|
» Hardware Central - Community and forum providing news, reviews, reports, and editorials on computer hardware.
|
 |
|
» VLSI Discussion Forum - Covers many related topics: EDA tools, VHDL, Verilog; fabrication; FPGAs, ASICs, microprocessors, semiconductors, CMOS.
|
- Re: Renamer Port Reduction
True. You have to trade off the increased leakage due to the decreased
space efficiency (which Mitch mentioned in another post). You leak (most)
all the time, while clock gating often means you only pay dynamic power as
needed...
Ned
- Re: Renamer Port Reduction
I read comments like this in discussions like this and I wonder: why
don't we program the way that computer architects think?
I'm sure that, to understand the real answer to a question like that,
I'd have to go through the experience of trying to be an architect
myself. There isn't enough time left, and I probably never had the
- Re: Renamer Port Reduction
The prior discussion was conversing about whether multiple architected
register files were good for ILP, and how the wires (and wiring
density) between the register file and the calculation units were
dense and difficult.
I brought up the point that the renamer, and its ports on the path
towards the instruction queues is, made worse by multiple register
- call for papers
( WE APOLOGIZE IF YOU RECEIVE MULTIPLE COPIES OF THIS MESSAGE )
============================== ===========================
ARPN Journal of Systems and Software
Call for Research Papers
[link]
============================== ===========================
Dear Sir/ Madam,
- Re: Where is Bulldozer's renamer?
Thanks, David (or should I say, thanks RWT.com). You're allowed to toot
your own horn, in moderation. RWT.com is one of the best tech sites.
By the way, I encourage you to clip quotes, rather than including all of
my very long posts.
Q: where did you get this information about the renamer position? I did
- Re: Renamer Port Reduction
Yep.
I originally thought that you needed a trace cache to fetch more than
one basic block per cycle.
With BTB unrolling - trace BTBs, rather than a trace instruction cache
- and/or the Jourdan/Seznec N-ahead BTB, you don't need that. You can
fetch multiple discontiguous basic blocks out of a multiported (e.g.
- Re: Renamer Port Reduction
Exactly.
You can choose how complete or incomplete you want to make the
comparator network.
E.g. if you have 2 inputs per operand, you might compare the first input
to all preceding outputs, but use a sparse network for the second.
(This is, by the way, the sort of trick that is played with incomplete
- Re: Renamer Port Reduction
I think if we ever see trace cache again, it will have to be justified using
these sorts of techniques to reduce power.
Ned
- Versace Bags wholesaler
Relica LV Stephen Sprouse LV new handbags Gucci pelham bags Prada
Replica Prada Fairy Boston Handbag , Prada Embossed shoulder Bag
Handbags Copys, Prada Ruffled Shoulder Bag snakeskin Purses Prada
Ruffled A Tote Bag Prada Bowler leather Bag with horsehair knockoffs,
Hermes Birkin Bag canvas with ostrich leather 35cm
- Re: Where is Bulldozer's renamer?
If I might toot my own horn here:
[link]
Retirement tracking is handled per core. Renaming for integer and
memory ops is done in each core. Renaming for FP/SIMD is done within
the shared FP unit.
David
- Re: What will Microsoft use its ARM license for?
I'm wondering if it might not be a win to have both:
I.e. a predecode bits in the I$ cache saves power because you can skip
the power-hungry parallel decoder when you have a hit.
This presupposes that you can easily separate out and power off that
part of the decoder which figures out instruction boundaries for virgin
- Re: Renamer Port Reduction
[snipped lots of interesting stuff]
Ouch!
That sounds like hw which will stall (at least some of the time) on
carefully scheduled (Pentium-style) code, with interleaved instruction
streams and maximum distance between load and use, right?
OTOH it will run naive compiler-generated code, with lots and lots of
- Re: What will Microsoft use its ARM license for?
I think that Brad Burgess said that in his Hotchips presentation that
AMD Bobcat did not bother with predecode bits - they were a power
waster. As in, they may save power when they can be used, but they
waste power when you have to recycle instructions that do not have the
predecode bits set. I.e. when missing the caches that hold the
- Where is Bulldozer's renamer?
By now, most of you will be familiar with Bulldozer and/or the
multicluster multithreading concept:
Shared front end
* branch prediction
* instruction cache
* decode
Separate clusters of the tight loop (AMD calls these cores)
* scheduler
* execution units
* L1 data cache
Shared
* L2 cache
* and in Bulldozer's case, floating point.
- Renamer Port Reduction
I tried to find Mitch's post to reply to, but my newsreader,
Thunderbird, is not cooperative. So I'll have to swag it:
Mitch Alsup a little while back said something like "It isn't the ports
on the register file that are a problem, it's the ports on the renamer."
With, IIRC, a comment that you had to squeeze the renamer into a