Baixe o app para aproveitar ainda mais
Prévia do material em texto
1 Ch6.a-11998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipeline Ch6.a-21998 Morgan Kaufmann PublishersPaulo C. Centoducatte Lavanderia – analogia com o pipelining Time 76 PM 8 9 10 11 12 1 2 AM A B C D Task� order Time 76 PM 8 9 10 11 12 1 2 AM A B C D Task� order 2 Ch6.a-31998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipeline de instruções no MIPS • Fetch da instrução • Leitura dos registradores e decodificação • Execução da operação ou cálculo de endereço • Acesso ao operando na memória • Escrita do resultado em um registrador Ch6.a-41998 Morgan Kaufmann PublishersPaulo C. Centoducatte Exemplo • Compare o tempo médio entre instruções da implementação em single-cycle (uma instrução por ciclo) com uma implementação com pipeline. Supor maior tempo de operação para acesso à memória = 2ns, operação da ULA = 2ns e acesso ao register file = 1ns. (Instrs lw, sw, add, sub, and, or slt e beq). • Inicialmente suponha a execução de 3 instruções lw (tempo entre o início da 1ª instrução e o início da 4ª instrução) 3 Ch6.a-51998 Morgan Kaufmann PublishersPaulo C. Centoducatte Tempo total para as oito instruções calculado a partir do tempo de cada componente Ch6.a-61998 Morgan Kaufmann PublishersPaulo C. Centoducatte Execução não-pipeline X pipeline Instruction fetch Reg ALU Data access Reg 8 ns Instruction fetch Reg ALU Data access Reg 8 ns Instruction fetch 8 ns Time lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 ... Program execution order (in instructions) Instruction fetch Reg ALU Data access Reg Time lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2 ns Instruction fetch Reg ALU Data access Reg 2 ns Instruction fetch Reg ALU Data access Reg 2 ns 2 ns 2 ns 2 ns 2 ns Program execution order (in instructions) 4 Ch6.a-71998 Morgan Kaufmann PublishersPaulo C. Centoducatte OBS.: • Sob condições ideais, com estágios balanceados, o speedup do pipeline é igual ao número de estágios do pipeline ( 5 estágios , 5 vezes mais rápido) • Na realidade o tempo de execução de uma instrução é um pouco superior (overheads) � speedup é menor que o número de estágios do pipeline Suponha a execução de 1003 instruções com pipeline ���� 1000 X 2ns + 14 = 2014 (para cada instrução adiciono 2ns) sem pipeline ���� 1000 X 8ns + 24 = 8024 spedup = 8024 / 2014 = 3.98 ~~ 8 / 2 Desempenho do pipeline é devido ao aumento do throughput. Ch6.a-81998 Morgan Kaufmann PublishersPaulo C. Centoducatte Projeto de um conjunto de instruções para pipeline • O que torna a implementação mais fácil – Instruções de mesmo tamanho – Poucos formatos, com campos de registradores sempre dispostos no mesmo lugar (Simetria, no 2º estágio podemos ler registradores e decodificar ao mesmo tempo). – Acesso à memória apenas com as instruções lw e sw. – Operandos alinhados na memória: o dado pode ser transferido da memória para a CPU e CPU para a memória em um único estágio do pipeline. 5 Ch6.a-91998 Morgan Kaufmann PublishersPaulo C. Centoducatte Projeto de um conjunto de instruções para pipeline • O que torna a implementação mais dificil – Hazard • Hazard Estrural • Hazard de Controle • Hazard de Dados Ch6.a-101998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipeline Hazards • Hazard Estrutural – O hardware não suporta uma combinação de instruções que queremos executar em um único período de clock • Ex.: escrever e ler da memória em um mesmo ciclo 6 Ch6.a-111998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipeline Hazards • Hazard de Controle – Problemas devido à execução de instruções de desvio • Ex.: Quando um branch é tomado, como tratar a(s) instruções que seguem (fisicamente) o branch no programa e que já estão no pipeline Ch6.a-121998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipelining stalling para instruções branch Instruction fetch Reg ALU Data access Reg Time beq $1, $2, 40 add $4, $5, $6 lw $3, 300($0) 4 ns Instruction fetch Reg ALU Data access Reg 2ns Instruction fetch Reg ALU Data access Reg 2ns 2 4 6 8 10 12 14 16 Program execution order (in instructions) 7 Ch6.a-131998 Morgan Kaufmann PublishersPaulo C. Centoducatte Branch prediction: Tentar “adivinhar” qual dos caminhos do branch será tomado Instruction fetch Reg ALU Data access Reg Time beq $1, $2, 40 add $4, $5, $6 lw $3, 300($0) Instruction fetch Reg ALU Data access Reg 2 ns Instruction fetch Reg ALU Data access Reg 2 ns Program execution order (in instructions) Instruction fetch Reg ALU Data access Reg Time beq $1, $2, 40 add $4, $5 ,$6 or $7, $8, $9 Instruction fetch Reg ALU Data access Reg 2 4 6 8 10 12 14 2 4 6 8 10 12 14 Instruction fetch Reg ALU Data access Reg 2 ns 4 ns bubble bubble bubble bubble bubble Program execution order (in instructions) O branch não será tomado Ch6.a-141998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipeline delayed branch Instruction fetch Reg ALU Data access Reg Time beq $1, $2, 40 add $4, $5, $6 lw $3, 300($0) Instruction fetch Reg ALU Data access Reg 2 ns Instruction fetch Reg ALU Data access Reg 2 ns 2 4 6 8 10 12 14 2 ns (Delayed branch slot) Program execution order (in instructions) 8 Ch6.a-151998 Morgan Kaufmann PublishersPaulo C. Centoducatte Hazard de Dados • Quando uma instrução necessita de um dado que ainda não foi calculado – Ex.: add $s0,$t0,$t1 sub $t2,$s0,$t3 Soluções : Compilador (programador) gera código livre de data hazard (introduzindo, por ex., instruções nop no código; alterando a ordem das instruções; ...) Stall; Forwarding ou bypassing Ch6.a-161998 Morgan Kaufmann PublishersPaulo C. Centoducatte Hazard de Dados Time 2 4 6 8 10 add $s0, $t0, $t1 IF ID WBEX MEM add $s0, $t0, $t1 sub $t2, $s0, $t3 Program execution order (in instructions) IF ID WBEX IF ID MEMEX Time 2 4 6 8 10 MEM WBMEM 9 Ch6.a-171998 Morgan Kaufmann PublishersPaulo C. Centoducatte Hazard de Dados Time 2 4 6 8 10 12 14 lw $s0, 20($t1) sub $t2, $s0, $t3 Program execution order (in instructions) IF ID WBMEMEX IF ID WBMEMEX bubble bubble bubble bubble bubble Stall Ch6.a-181998 Morgan Kaufmann PublishersPaulo C. Centoducatte Exemplo • Encontre o hazard no código abaixo e resolva-o: # $t1 tem o end. de v[k] lw $t0, 0($t1) # $t0 = v[k] lw $t2,4($t1) # $t2 = v[k+1] sw $t2, 0($t1) # v[k] = $t2 sw $t0, 4($t1) # v[k+1] = $t0 Solução: # $t1 tem o end. de v[k] lw $t0, 0($t1) # $t0 = v[k] lw $t2,4($t1) # $t2 = v[k+1] sw $t0, 4($t1) # v[k+1] = $t0 sw $t2, 0($t1) # v[k] = $t2 10 Ch6.a-191998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipeline: Idéia Básica • 5 estágios: Fetch; Decodificação e leitura dos regs; execução ou cálculo de end. ; acesso à memória; escrita no reg. destino Instruction memory Address 4 32 0 Add Addresult Shift left 2 Instruction M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read dataAddress Data memory 1 ALU result M u x ALU Zero IF: Instruction fetch ID: Instruction decode/ register file read EX: Execute/ address calculation MEM: Memory access WB: Write back O que é necessário para tornar cada divisão em estágios? Ch6.a-201998 Morgan Kaufmann PublishersPaulo C. Centoducatte Instruções sendo executadas pelo datapath IM Reg DM RegALU IM Reg DM RegALU CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 Time (in clock cycles) lw $2, 200($0) lw $3, 300($0) Program execution order (in instructions) lw $1, 100($0) IM Reg DM RegALU 11 Ch6.a-211998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipelined Datapath Instruction memory Address 4 32 0 Add Add result Shift left 2 In s t r u c tio n IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Writeregister Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address Pode acontecer algum problema nesta solução se não existir dependência de dados? A execução de qual instrução causa o problema? Ch6.a-221998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipelined Datapath Instruction� memory Address 4 32 0 Add Add� result Shift� left 2 In s tr u ct io n IF/ID EX/MEM MEM/WB M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend Write� register Write� data Read� data 1 ALU� result M� u� x ALU Zero ID/EX Instruction fetch lw Address Data� memory 12 Ch6.a-231998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipelined Datapath Instruction� memory Address 4 32 0 Add Add� result Shift� left 2 In s tr u ct io n IF/ID EX/MEM M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend Write� register Write� data Read� data 1 ALU� result M� u� x ALU Zero ID/EX MEM/WB Instruction decode lw Address Data� memory Ch6.a-241998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipelined Datapath Instruction memory Address 4 32 0 Add Add result Shift left 2 In s tr u ct io n IF/ID EX/MEM M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX MEM/WB Execution lw Address D ata memory 13 Ch6.a-251998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipelined Datapath Instruction� memory Address 4 32 0 Add Add� result Shift� left 2 In st ru c tio n IF/ID EX/MEM M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend Write� register Write� data Read� data Data� memory 1 ALU� result M� u� x ALU Zero ID/EX MEM/WB Memory lw Address Ch6.a-261998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipelined Datapath Instruction� memory Address 4 32 0 Add Add� result Shift� left 2 In s tr u ct io n IF/ID EX/MEM M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend Write� data Read� dataData� memory 1 ALU� result M� u� x ALU Zero ID/EX MEM/WB Write back lw Write� register Address 14 Ch6.a-271998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipelined Datapath Instruction memory Address 4 32 0 Add Add result Shift left 2 In st ru c tio n IF/ID EX/MEM M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data Data memory 1 ALU result M u x ALU Zero ID/EX MEM/WB Execution sw Address Ch6.a-281998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipelined Datapath Instruction� memory Address 4 32 0 Add Add� result Shift� left 2 In st ru c tio n IF/ID EX/MEM M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend Write� register Write� data Read� data Data� memory 1 ALU� result M� u� x ALU Zero ID/EX MEM/WB Memory sw Address 15 Ch6.a-291998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipelined Datapath Instruction� memory Address 4 32 0 Add Add� result Shift� left 2 In s tr u ct io n IF/ID EX/MEM M� u� x 0 1 Add PC 0 Address Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend Write� register Write� data Read� data Data� memory 1 ALU� result M� u� x ALU Zero ID/EX MEM/WB Write back sw Ch6.a-301998 Morgan Kaufmann PublishersPaulo C. Centoducatte Datapath Correto Instruction� memory Address 4 32 0 Add Add� result Shift� left 2 In s tr u ct io n IF/ID EX/MEM MEM/WB M� u� x 0 1 Add PC 0 Address Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend Write� register Write� data Read� data Data� memory 1 ALU� result M� u� x ALU Zero ID/EX Problema na execução da instrução load. Qual é o erro? 16 Ch6.a-311998 Morgan Kaufmann PublishersPaulo C. Centoducatte Datapath com os estágios usados para um instrução lw Instruction memory Address 4 32 0 Add Add result Shift left 2 In s tr u c t io n IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Address Data memory Ch6.a-321998 Morgan Kaufmann PublishersPaulo C. Centoducatte Representações Gráfica do Pipeline IM R eg D M R e g IM R e g D M R eg C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 T im e (in c lo c k c y cles ) lw $1 0 , 2 0 ( $1 ) P rog ra m e x e c u t io n o rd e r ( in in s tr uc tio n s ) s ub $ 11 , $ 2 , $3 A L U A L U Ajuda a responder perguntas como: Quantos ciclos são gasto para executar este código? O que a ALU está fazendo durante o ciclo 10? Ajuda a entender os datapaths 17 Ch6.a-331998 Morgan Kaufmann PublishersPaulo C. Centoducatte Representações Gráfica do Pipeline Program execution order (in instructions) Time ( in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 Instruction fetch Instruction decode Instruction fetch Instruction decode Execution Write back Execution Data access Data access Write back lw $10, $20($1) sub $11, $2, $3 Ch6.a-341998 Morgan Kaufmann PublishersPaulo C. Centoducatte Instruction� memory Address 4 32 0 Add Add�result Shift� left 2 In st ru ct io n IF/ID EX/MEM MEM/WB M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend Write� register Write� data Read� data 1 ALU� result M� u� x ALU Zero ID/EX Instruction fetch lw $10, 20($1) Address Data� memory Clock 1 18 Ch6.a-351998 Morgan Kaufmann PublishersPaulo C. Centoducatte Instruction� memory Address 4 32 0 Add Add� result Shift� left 2 In s tr u ct io n IF/ID EX/MEM MEM/WB M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend Write� register Write� data Read� data 1 ALU� result M� u� x ALU Zero ID/EX Instruction decode lw $10, 20($1) Instruction fetch sub $11, $2, $3 Address Data� memory Clock 2 Ch6.a-361998 Morgan Kaufmann PublishersPaulo C. Centoducatte Instruction� memory Address 4 0 Add Add� result Shift� left 2 In st ru ct io n IF/ID EX/MEM MEM/WB M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 Write� register Write� data Read� data 1 ALU� result M� u� x ALU Zero ID/EX Execution lw $10, 20($1) Instruction decode sub $11, $2, $3 3216 Sign� extend Address Data� memory Clock 3 19 Ch6.a-371998 Morgan Kaufmann PublishersPaulo C. Centoducatte Instruction� memory Address 4 0 Add Add� result Shift� left 2 In s tr u ct io n IF/ID EX/MEM MEM/WB M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 3216 Sign� extend Write� register Write� data Memory lw $10, 20($1) Read� data 1 ALU� result M� u� x ALU Zero ID/EX Execution sub $11, $2, $3 Data� memory Address Clock 4 Ch6.a-381998 Morgan Kaufmann PublishersPaulo C. Centoducatte Instruction� memory Address 4 32 0 Add Add� result 1 ALU� result Zero Shift� left 2 In st ru ct io n IF/ID EX/MEMID/EX MEM/WB Write back M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read�data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend M� u� x ALU Read� data Write� register Write� data lw $10, 20($1) Memory sub $11, $2, $3 Address Data� memory Clock 5 20 Ch6.a-391998 Morgan Kaufmann PublishersPaulo C. Centoducatte Instruction� memory Address 4 32 0 Add Add� result 1 ALU� result Zero Shift� left 2 In s tr u ct io n IF/ID EX/MEMID/EX MEM/WB Write backM�u� x 0 1 Add PC 0 Write� data M� u� x 1 Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 16 Sign� extend M� u� x ALU Read� data Write� register Write� data sub $11, $2, $3 Address Data� memory Clock 6 Ch6.a-401998 Morgan Kaufmann PublishersPaulo C. Centoducatte Controle do Pipeline PC Instruction memory Address In st ru ct io n Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32 Instruction [15– 0] 0 0 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 Write data Read data M u x 1 ALU control RegWrite MemRead Instruction [15– 11] 6 IF/ID ID/EX EX/MEM MEM/WB MemWrite Address Data memory PCSrc Zero Add Add result Shift left 2 ALU result ALU Zero Add 0 1 M u x 0 1 M u x 21 Ch6.a-411998 Morgan Kaufmann PublishersPaulo C. Centoducatte Controle do Pipeline • 5 estágios. O que deve ser controlado em cada estágio? – 10: Fetch da instrução e incremento do PC – 20: Decodificação da instrução e Fetch dos registradores – 30: Execução – 40: Acesso à memória – 50: Write back Ch6.a-421998 Morgan Kaufmann PublishersPaulo C. Centoducatte Sinais de controle 22 Ch6.a-431998 Morgan Kaufmann PublishersPaulo C. Centoducatte Sinais de controle Ch6.a-441998 Morgan Kaufmann PublishersPaulo C. Centoducatte Sinais de controle Execution/Address Calculation stage control lines Memory access stage control lines Write-back stage control lines Instruction Reg Dst ALU Op1 ALU Op0 ALU Src Branch Mem Read Mem Write Reg write Mem to Reg R-format 1 1 0 0 0 0 0 1 0 lw 0 0 0 1 0 1 0 1 1 sw X 0 0 1 0 0 1 0 X beq X 0 1 0 1 0 0 0 X 23 Ch6.a-451998 Morgan Kaufmann PublishersPaulo C. Centoducatte Control EX M WB M WB WB IF/ID ID/EX EX/MEM MEM/WB Instruction Ch6.a-461998 Morgan Kaufmann PublishersPaulo C. Centoducatte Datapath com Controle PC Instruc tion memory In s tr u c ti o n Add Instruction [20– 16] M e m t o R e g ALU Op Branch RegD st ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result R egisters W rite register W rite data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2 R e gW r i te M emRead Control ALU Instruction [15– 11] 6 EX M W B M WB WB IF/ID PCSrc ID/EX EX/MEM MEM/WB M u x 0 1 M e m W ri te Address Data memory Address 24 Ch6.a-471998 Morgan Kaufmann PublishersPaulo C. Centoducatte lw $10, 20 ($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 Instruction� memory Instruction� [20–16] M e m to R e g ALUOp Branch RegDst ALUSrc 4 Instruction� [15– 0] 0 M� u� x 0 1 Add Add� result Registers Write� register Write� data Read� data 1 Read� data 2 Read� register 1 Read� register 2 Sign� extend M� u� x 1 ALU� result Zero ALU� control Shift� left 2 R e g W ri te MemRead Control ALU Instruction� [15–11] EX M WB M WB WB In s tr u ct io n IF/ID EX/MEMID/EX ID: before<1> EX: before<2> MEM: before<3> WB: before<4> MEM/WB IF: lw $10, 20($1) 000 00 0000 000 00 00 0 0 00 0 0 0 0 0 M� u� x 0 1 Add PC 0 Data� memory Address Write� data Read� data M� u� x 1 M e m W ri te Address Clock 1 1o ciclo Ch6.a-481998 Morgan Kaufmann PublishersPaulo C. Centoducatte lw $10, 20 ($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 WB EX M Instruction� memory M e m to R e g ALUOp Branch RegDst ALUSrc 4 0 M� u� x 0 1 Add Add�result Write� register Write� data M� u� x 1 ALU� result Zero ALU� control Shift� left 2 R e g W ri te ALU M WB WB In s tr u ct io n IF/ID EX/MEMID/EX ID: lw $10, 20($1) EX: before<1> MEM: before<2> WB: before<3> MEM/WB IF: sub $11, $2, $3 010 11 0001 000 00 00 0 0 00 0 0 0 0 0 M� u� x 0 1 Add PC 0 Write� data Read� data M� u� x 1 lw Control Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 X 10 20 X 1 Instruction� [20– 16] Instruction� [15– 0] Sign� extend Instruction� [15– 11] 20 $X $1 10 X MemRead M e m W ri te Data� memory Address Address Clock 2 2o ciclo 25 Ch6.a-491998 Morgan Kaufmann PublishersPaulo C. Centoducatte Instruction� memory Address Instruction� [20– 16] M e m to R e g Branch ALUSrc 4 Instruction� [15– 0] 0 1 Add Add� result Registers Write� register Write� data Read� data 1 Read� data 2 Read� register 1 Read� register 2 ALU� result Shift� left 2 R e g W ri te MemRead Control ALU Instruction� [15– 11] EX M WB WB In s tr u ct io n IF/ID EX/MEMID/EX ID: sub $11, $2, $3 EX: lw $10, . . . MEM: before<1> WB: before<2> MEM/WB IF: and $12, $4, $5 000 10 1100 010 11 00 0 1 00 0 0 0 0 0 M� u� x 0 1 Add PC 0 Write� data Read� data M� u� x 1 M e m W ri te sub 11 X X 3 2 X $3 $2 X 11 $1 20 10 M� u� x 0 M� u� x 1 ALUOp RegDst ALU� control M WB Zero Sign� extend Data� memory Address Clock 3 lw $10, 20 ($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 3o ciclo Ch6.a-501998 Morgan Kaufmann PublishersPaulo C. Centoducatte WB EX M Instruction� memory Address M e m to R e g ALUOp Branch RegDst ALUSrc 4 0 0 1 Add Add�result Write� register Write� data 1 ALU� result ALU� control Shift� left 2 R e g W ri te M WB In st ru ct io n IF/ID EX/MEMID/EX ID: and $12, $2, $3 EX: sub $11, . . . MEM: lw $10, . . . WB: before<1> MEM/WB IF: or $13, $6, $7 000 10 1100 000 10 10 1 0 11 1 0 0 0 0 M� u� x 0 1 Add PC 0 Write� data M� u� x 1 and Control Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 12 X X 5 4 Instruction� [20– 16] Instruction� [15– 0] Instruction� [15– 11] X $5 $4 X 12 MemRead M e m W ri te $3 $2 11 M� u� x M� u� x ALU Address Read� data Data� memory 10 WB Zero Sign� extend Clock 4 lw $10, 20 ($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 4o ciclo 26 Ch6.a-511998 Morgan Kaufmann PublishersPaulo C. Centoducatte Instruction� memory Address Instruction� [20– 16] Branch ALUSrc 4 Instruction� [15– 0] 0 1 Add Add� result Registers Write� register Write� data Read� data 1 Read� data 2 Read� register 1 Read� register 2 ALU� result Shift� left 2 R e g W ri te MemRead Control ALU Instruction� [15– 11] EX M WB In s tr u ct io n IF/ID EX/MEMID/EX ID: or $13, $6, $7 EX: and $12, . . . MEM: sub $11, . . . WB: lw $10, . . . MEM/WB IF: add $14, $8, $9 000 10 1100 000 10 10 1 0 10 0 0 0 M� u� x 0 1 Add PC 0 Write� data Read� data M� u� x 1 M e m W ri te or 13 X X 7 6 X $7 $6 X 13 $4 M� u� x 0 M� u� x 1 ALUOp RegDst ALU� control M WB 11 10 10 $5 12 WB M e m to R e g 1 1 Zero Data� memory Address Sign� extend Clock 5 lw $10, 20 ($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 5o ciclo Ch6.a-521998 Morgan Kaufmann PublishersPaulo C. Centoducatte WB EX M Instruction� memory Address M e m to R e g ALUOp Branch RegDst ALUSrc 4 0 0 1 Add Add�result 1 ALU� result ALU� control Shift� left 2 R e g W ri te M WB In st ru ct io n IF/ID EX/MEMID/EX ID: add $14, $8, $9 EX: or $13, . . . MEM: and $12, . . . WB: sub $11, . . . MEM/WB IF:after<1> 000 10 1100 000 10 10 1 0 10 0 0 0 1 0 M� u� x 0 1 Add PC 0 Write� data M� u� x 1 add Control Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 14 X X 9 8 Instruction� [20– 16] Instruction� [15– 0] Instruction� [15– 11] X $9 $8 X 14 MemRead M e m W ri te $7 $6 13 M� u� x M� u� x ALU Read� data 12 WB 11 11 Write� register Write� data Zero Data� memory Address Sign� extend Clock 6 lw $10, 20 ($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 6o ciclo 27 Ch6.a-531998 Morgan Kaufmann PublishersPaulo C. Centoducatte Instruction� memory Address Instruction� [20–16] Branch ALUSrc 4 Instruction� [15–0] 0 1 Add Add� result Registers Write� register Write� data ALU� result Shift� left 2 R e g W ri te MemRead Control ALU Instruction� [15–11] Sign� extend EX M WB In s tr u ct io n IF/ID EX/MEMID/EX ID: after<1> EX: add $14, . . . MEM: or $13, . . . WB: and $12, . . . MEM/WB IF: after<2> 000 00 0000 000 10 10 1 0 10 0 0 0 M� u� x 0 1 Add PC 0 Write� data Read� data M� u� x 1 ID: after<2> EX: after<1> MEM: add $14, . . . WB: or $13, . . .IF: after<3> M e m W ri te $8 M� u� x 0 M� u� x 1 ALUOp RegDst ALU� control M WB 13 12 12 $9 14 WB M e m to R e g 1 0 Read� data 1 Read� data 2 Read� register 1 Read� register 2 Zero Data� memory Address Clock 7 lw $10, 20 ($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 7o ciclo Ch6.a-541998 Morgan Kaufmann PublishersPaulo C. Centoducatte WB EX M Instruction� memory Address M e m to R e g ALUOp Branch RegDst ALUSrc 4 0 0 1 Add Add�result 1 ALU� result Zero ALU� control Shift� left 2 R e g W ri te M WB In st ru ct io n IF/ID EX/MEMID/EX ID: after<2> EX: after<1> MEM: add $14, . . . WB: or $13, . . . MEM/WB IF: after<3> 000 00 0000 000 00 00 0 0 10 0 0 0 1 0 M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Control Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 Instruction� [20–16] Instruction� [15–0] Sign� extend Instruction� [15–11] MemRead M e m W ri te M� u� x M� u� x ALU Read� data 14 WB 13 13 Write� register Write� data Data� memory Address Clock 8 lw $10, 20 ($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 8o ciclo 28 Ch6.a-551998 Morgan Kaufmann PublishersPaulo C. Centoducatte WB EX M Instruction� memory Address M e m to R e g ALUOp Branch RegDst ALUSrc 4 0 0 1 Add Add� result 1 ALU� result Zero ALU� control Shift� left 2 R e g W ri te M WB In s tr u ct io n IF/ID EX/MEMID/EX ID: after<3> EX: after<2> MEM: after<1> WB: add $14, . . . MEM/WB IF: after<4> 000 00 0000 000 00 00 0 0 00 0 0 0 1 0 M� u� x 0 1 Add PC 0 Write� data M� u� x 1 Control Registers Read� data 1 Read� data 2 Read� register 1 Read� register 2 Instruction� [20– 16] Instruction� [15– 0] Sign� extend Instruction� [15– 11] MemRead M e m W ri te M� u� x M� u� x ALU Read� data WB 14 14 Write� register Write� data Data� memory Address Clock 9 lw $10, 20 ($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 9o ciclo Ch6.a-561998 Morgan Kaufmann PublishersPaulo C. Centoducatte Dependências • Problema com iniciar a execução de uma instrução antes do término da anterior – Exemplo: Instruções com dependências sub $2, $1, $3 # reg $2 modificado and $12, $2, $5 # valor (1º operando) de $2 # depende do sub or $13, $6, $2 # idem (2º operando) add $14, $2, $2 # idem (1º e 2º operando) sw $15, 100($2) # idem (base do endereçamento) 29 Ch6.a-571998 Morgan Kaufmann PublishersPaulo C. Centoducatte IM Reg IM Reg CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 Time (in clock cycles) sub $2, $1, $3 Program execution order (in instructions) and $12, $2, $5 IM Reg DM Reg IM DM Reg IM DM Reg CC 7 CC 8 CC 9 10 10 10 10 10/–20 –20 –20 –20 –20 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Value of register $2: DM Reg Reg Reg Reg DM Se a escrita no banco de registradores é feita no 1o semi-ciclo e a leitura no 2o, em CC5 não há hazard. CC4 CC5 CC6 Reg WR Reg RD Ch6.a-581998 Morgan Kaufmann PublishersPaulo C. Centoducatte Solução por Software • Compilador garante um código sem Hazard inserindo nops – Onde inserir os nops? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) sub $2, $1, $3 nop nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Problema: reduz o desempenho 30 Ch6.a-591998 Morgan Kaufmann PublishersPaulo C. Centoducatte Solução por Hardware • Usar o resultado assim que calculado, não esperar que ele seja escrito. • Neste caso é necessário mecanismo para detectar o hazard. Qual a condição para que haja hazard? Quando uma instrução tenta ler um registrador (estágio EX) e esse registrador será escrito por uma instrução anterior no estágio WB Ch6.a-601998 Morgan Kaufmann PublishersPaulo C. Centoducatte Forwarding O que ocorre se $13 no lugar de $2 IM Reg IM Reg CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 Time (in clock cycles) sub $2, $1, $3 Program execution order (in instructions) and $12, $2, $5 IM Reg DM Reg IM DM Reg IM DM Reg CC 7 CC 8 CC 9 10 10 10 10 10/– 20 – 20 – 20 – 20 – 20 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Value of register $2 : DM Reg Reg Reg Reg X X X – 20 X X X X XValue of EX/MEM : X X X X – 20 X X X XValue of MEM/W B : DM 31 Ch6.a-611998 Morgan Kaufmann PublishersPaulo C. Centoducatte Condições que determinam Hazards de dados • EX/MEM – EX/MEM.RegisterRd = ID/EX.RegisterRs – EX/MEM.RegisterRd = ID/EX.RegisterRt • MEM/WB – MEM/WB.RegisterRd = ID/EX.RegisterRs – MEM/WB.RegisterRd = ID/EX.RegisterRt Ch6.a-621998 Morgan Kaufmann PublishersPaulo C. Centoducatte Condições que determinam Hazard de dados • Exemplo: – sub–and: EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 – sub-or: MEM/WB.RegisterRd = ID/EX.RegisterRt = $2 – sub-add: não tem hazard – sub-sw: não tem hazard sub $2, $1, $3 # reg $2 modificado and $12, $2, $5 # valor de $2 depende do sub or $13, $6, $2 # idem (2º operando) add $14,$2, $2 # idem (1º e 2º operandos) sw $15, 100($2) # idem (base do endereçamento) 32 Ch6.a-631998 Morgan Kaufmann PublishersPaulo C. Centoducatte Condições que determinam Hazard de dados • Esta não é uma política precisa, pois existem instruções que não escrevem no register file. – solução seria verificar o sinal RegWrite • MIPS usa $0 para operandos de valor 0. Para instruções onde $0 é destino? • sll $0, $1, 2 • valor diferente de zero nos regs de pipeline • add $3, $0, $2 • se houver fowarding, $3 terá valor errado no fim da instrução add • Para isto temos que incluir as condições • EX/MEM.registerRd <> 0 (1º hazard) • MEM/WB.registerRd <> 0 (2º hazard) Ch6.a-641998 Morgan Kaufmann PublishersPaulo C. Centoducatte Datapath sem Forwarding M� u� x ALU ID/EX MEM/WB Data� memory EX/MEM a. No forwarding Registers 33 Ch6.a-651998 Morgan Kaufmann PublishersPaulo C. Centoducatte Datapath com Forwarding ( para add, sub, and e or ) Registers M� u� x M� u� x ALU ID/EX MEM/WB Data� memory M� u� x Forwarding� unit EX/MEM b. With forwarding ForwardB Rd EX/MEM.RegisterRd MEM/WB.RegisterRd Rt Rt Rs ForwardA M� u� x Ch6.a-661998 Morgan Kaufmann PublishersPaulo C. Centoducatte Valores dos sinais ForwardA e ForwardB Mux Control Source Explanação ForwardA = 00 ID/EX Primeiro operando da ULA ���� register file ForwardA = 10 EX/MEM Primeiro operando da ULA ���� resultado anteiror da ULA ForwardA = 01 MEM/WB Primeiro operando da ULA é antecipado da memória de dados ou um resultado anterior da ULA ForwardB = 00 ID/EX Segundo operando da ULA ���� register file ForwardB = 10 EX/MEM Segundooperando da ULA ���� resultado anteiror da ULA ForwardB = 01 MEM/WB Segundo operando da ULA é antecipado da memória de dados ou um resultado anterior da ULA 34 Ch6.a-671998 Morgan Kaufmann PublishersPaulo C. Centoducatte Detecção e Controle de hazard • O controle de fowarding será no estágio EX, pois é neste estágio que se encontram os multiplexadores de fowarding da ULA. Portanto devemos passar o número do registrador operando do estágio ID via registrador de pipeline ID/EX ���� adicionar campo rs (bits 25-21). Ch6.a-681998 Morgan Kaufmann PublishersPaulo C. Centoducatte Condições para detecção de hazard EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd <> 0 ) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) FowardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd <> 0 ) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) FowardB = 10 MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd <> 0 ) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) FowardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd <> 0 ) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) FowardB = 01 35 Ch6.a-691998 Morgan Kaufmann PublishersPaulo C. Centoducatte Observações: • Não existe data hazard no estágio WB, pois estamos assumindo que o register file supre o resultado correto se a instrução no estágio ID é o mesmo registrador escrito pela instrução no estágio WB ���� forwarding no register file – Na 1a edição do livro P&H tem hazard Ch6.a-701998 Morgan Kaufmann PublishersPaulo C. Centoducatte Observações: • Um data hazard potencial pode ocorrer entre o resultado de uma instrução em WB, o resultado de uma instrução em MEM e o operando fonte da instrução no estágio da ULA • Neste caso o resultado é antecipado do estágio MEM porque o resultado neste estágio é mais recente add $1, $1, $2 add $1, $1, $3 add $1, $1, $4 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd <> 0 ) and (EX/MEM.RegisterRd <> ID/EX.registerRs) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) FowardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd <> 0 ) and (EX/MEM.RegisterRd <> ID/EX.registerRt) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) FowardB = 01 36 Ch6.a-711998 Morgan Kaufmann PublishersPaulo C. Centoducatte Datapath modificado para fowarding PC Instruction memory Registers M u x M u x Control ALU EX M WB M WB WB ID/EX EX/MEM MEM/WB Data memory M u x Forwarding unit IF/ID In s tr u c tio n M u x Rd EX/MEM.RegisterRd MEM/WB.RegisterRd Rt Rt Rs IF/ID.RegisterRd IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRs Ch6.a-721998 Morgan Kaufmann PublishersPaulo C. Centoducatte Forwarding PC Instruction� memory Registers M� u� x M� u� x M� u� x EX M WB WB Data� memory M� u� x Forwarding� unit In st ru ct io n IF/ID and $4, $2, $5 sub $2, $1, $3 ID/EX before<1> EX/MEM before<2> MEM/WB or $4, $4, $2 Clock 3 2 5 10 10 $2 $5 5 2 4 $1 $3 3 1 2 Control ALU M WB sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 37 Ch6.a-731998 Morgan Kaufmann PublishersPaulo C. Centoducatte Forwarding PC Instruction� memory Registers M� u� x M� u� x M� u� x EX M WB M WB Data� memory M� u� x Forwarding� unit In st ru c tio n IF/ID or $4, $4, $2 and $4, $2, $5 ID/EX sub $2, . . . EX/MEM before<1> MEM/WB add $9, $4, $2 Clock 4 4 6 10 10 $4 $2 6 2 4 $2 $5 5 2 4 Control ALU 10 2 WB sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Ch6.a-741998 Morgan Kaufmann PublishersPaulo C. Centoducatte Forwarding PC Instruction� memory Registers M� u� x M� u� x M� u� x EX M WB M WB Data� memory M� u� x Forwarding� unit In st ru ct io n IF/ID add $9, $4, $2 or $4, $4, $2 ID/EX and $4, . . . EX/MEM sub $2, . . . MEM/WB after<1> Clock 5 4 2 10 10 $4 $2 2 4 9 $4 $2 4 2 2 4 Control ALU 10 WB 2 1 4 sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 38 Ch6.a-751998 Morgan Kaufmann PublishersPaulo C. Centoducatte Forwarding PC Instruction� memory M� u� x M� u� x M� u� x EX M WB M WB Data� memory M� u� x Forwarding� unit after<1>after<2> add $9, $4, $2 or $4, . . . EX/MEM and $4, . . . MEM/WB Clock 6 10 $4 $2 2 4 9 ALU 10 4 WB 4 1 Registers In st ru c tio n IF/ID ID/EX 4 Control sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Ch6.a-761998 Morgan Kaufmann PublishersPaulo C. Centoducatte Forwarding: Datapath para entrada signed-imediate necessária para lw e sw ALUSrc Registers M u x M u x M u x ALU ID/EX MEM/WB Data memory M u x Forwarding unit EX/MEM M u x 39 Ch6.a-771998 Morgan Kaufmann PublishersPaulo C. Centoducatte Data Hazard e Stalls Reg IM Reg Reg IM CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 Time (in clock cycles) lw $2, 20($1) Program� execution� order� (in instructions) and $4, $2, $5 IM Reg DM Reg IM DM Reg IM DM Reg CC 7 CC 8 CC 9 or $8, $2, $6 add $9, $4, $2 slt $1, $6, $7 DM Reg Reg Reg DM Ch6.a-781998 Morgan Kaufmann PublishersPaulo C. Centoducatte Data Hazards e Stalls • Quando uma instrução tenta ler um registrador precedida por uma instrução de load, que escreve no mesmo registrador ���� o dado tem que ser mantido (ciclo 4) enquanto a ULA executa a operação ���� atrasar o pipeline para que a instrução leia o valor correto. • Condição de detecção de hazard para atraso no pipeline # testa se é um load: if (ID/EX.MemRead and # verifica se registrador destino da instrução load em EX é o registrador fonte da instrução em ID: ((ID/EX.RegisterRt = IF/ID.RegisterRs) or ( ID/EX.RegisterRt = IF/ID.RegisterRt))) stall pipeline 40 Ch6.a-791998 Morgan Kaufmann PublishersPaulo C. Centoducatte Data Hazards e Stalls lw $2, 20($1) Program� execution� order� (in instructions) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $1, $6, $7 Reg IM Reg Reg IM DM CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 Time (in clock cycles) IM Reg DM RegIM IM DM Reg IM DM Reg CC 7 CC 8 CC 9 CC 10 DM Reg RegReg Reg bubble Ch6.a-801998 Morgan Kaufmann PublishersPaulo C. Centoducatte Data Hazards e Stalls • Se a instrução no estágio ID é atrasada, a instrução no estágio IF também deve ser atrasada • Como fazer? Impedir a mudança no PC e no registrador IF/IF. A instrução em IF continua sendo lida e em ID continuam sendo lido os mesmos campos da instrução. • Stall pipeline: mesmo efeito da instrução nop começando pelo estágio EX ���� desativar os 9 sinais de controle dos estágios EX, MEM e WB 41 Ch6.a-811998 Morgan Kaufmann PublishersPaulo C. Centoducatte Datapath com forwarding e data hazard detection PC Instruction memory Registers M u x M u x M u x Control ALU EX M WB M WB WB ID/EX EX/MEM MEM/WB Data memory M u x Hazard detection unit Forwarding unit 0 M u x IF/ID In s tr u ct io n ID/EX.MemRead IF /I D W r i te P C W ri te ID/EX.RegisterRt IF/ID.RegisterRd IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRs Rt Rs Rd Rt EX/MEM.RegisterRd MEM/WB.RegisterRd Ch6.a-821998 Morgan Kaufmann PublishersPaulo C. Centoducatte Seqüência de execução do exemplo anterior Hazard� detection� unit 0 M� u� x IF /I D W ri te P C W ri te ID/EX.RegisterRt ID/EX.MemRead M WB $1 $X X 1 2 before<3> PC Instruction� memory Registers M� u� x M� u� x M� u� x EX WB Data� memory M� u� x Forwarding� unit In st ru ct io n IF/ID ID/EX EX/MEM MEM/WB and $4, $2, $5 lw $2, 20($1) before<1> before<2> Clock 2 1 1 X X 11 Control ALU M WB 42 Ch6.a-831998 Morgan Kaufmann PublishersPaulo C. Centoducatte Seqüência de execução do exemplo anterior Hazard� detection� unit 0 M� u� x IF /I D W ri te P C W ri te ID/EX.RegisterRt lw $2, 20($1) PC Instruction� memory Registers M� u� x M� u� x M� u� x EX M WB WB Data� memory M� u� x Forwarding� unit In st ru c tio n IF/ID and $4, $2, $5 ID/EX before<1> EX/MEM before<2> MEM/WB or $4, $4, $2 Clock 3 2 5 2 5 00 11 $2$5 5 2 4 $1 $X X 1 2 Control ALU M WB ID/EX.MemRead Ch6.a-841998 Morgan Kaufmann PublishersPaulo C. Centoducatte Seqüência de execução do exemplo anterior $2 $5 5 2 2 4 WB Hazard� detection� unit 0 M� u� x IF /I D W ri te P C W ri te ID/EX.RegisterRt PC Instruction� memory Registers M� u� x M� u� x M� u� x EX M WB Data� memory M� u� x In s tr u c tio n IF/ID and $4, $2, $5 bubble ID/EX lw $2, . . . EX/MEM before<1> MEM/WB Clock 4 2 2 5 5 10 11 00 $2 $5 5 2 4 Control ALU M WB Forwarding� unit ID/EX.MemRead or $4, $4, $2 43 Ch6.a-851998 Morgan Kaufmann PublishersPaulo C. Centoducatte Seqüência de execução do exemplo anterior Hazard� detection� unit 0 M� u� x IF /I D W ri te P C W ri te ID/EX.RegisterRt 2 bubble lw $2, . . . PC Instruction� memory Registers M� u� x M� u� x M� u� x EX M WB M WB Data� memory M� u� x Forwarding� unit In s tr u ct io n IF/ID and $4, $2, $5 ID/EX EX/MEM MEM/WB add $9, $4, $2 Clock 5 2 2 10 10 11 $4 $2 2 4 4 4 2 4 $2 $5 5 2 4 Control ALU 0 WB ID/EX.MemRead or $4, $4, $2 Ch6.a-861998 Morgan Kaufmann PublishersPaulo C. Centoducatte Seqüência de execução do exemplo anterior PC Instruction� memory Hazard� detection� unit 0 M� u� x IF /I D W ri te P C W ri te ID/EX.RegisterRt bubble Registers M� u� x M� u� x EX M WB M WB Data� memory M� u� x In st ru c tio n IF/ID add $9, $4, $2 ID/EX and $4, . . . EX/MEM MEM/WB Clock 6 4 4 2 2 10 10 $4 $2 2 4 49 $2 2 Control ALU 10 WB 0 after<1> Forwarding� unit $4 4 4 or $4, $4, $2 ID/EX.MemRead M� u� x 44 Ch6.a-871998 Morgan Kaufmann PublishersPaulo C. Centoducatte Seqüência de execução do exemplo anterior Registers In s tr u ct io n ID/EX 4 Control PC Instruction� memory IF /I D W ri te P C W ri te add $9, $4, $2 or $4, . . . and $4, . . .after<2> after<1> Clock 7 M� u� x M� u� x M� u� x EX M WB M WB Data� memory M� u� x Forwarding� unit EX/MEM MEM/WB 10 10 $4 $2 2 4 9 ALU 10 WB 44 1 Hazard� detection� unit 0 M� u� x ID/EX.RegisterRt ID/EX.MemRead IF/ID Ch6.a-881998 Morgan Kaufmann PublishersPaulo C. Centoducatte Branch Hazards • Devemos ter um fetch de instrução por ciclo de clock, decisão: qual caminho de um branch deve ocorrer até o estágio MEM. Reg Reg CC 1 Time (in clock cycles) 40 beq $1, $3, 7 Program� execution� order� (in instructions) IM Reg IM DM IM DM IM DM DM DM Reg Reg Reg Reg Reg RegIM 44 and $12, $2, $5 48 or $13, $6, $2 52 add $14, $2, $2 72 lw $4, 50($7) CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 Reg 45 Ch6.a-891998 Morgan Kaufmann PublishersPaulo C. Centoducatte Branch Hazards • Tês esquemas para resolver control hazard : • Branch-delay Slots • Assume Branch Not Taken • Dynamic Branch Prediction • Assume Branch Not Taken • continua a execução seqüencialmente e se o branch for tomado, descarta as instruções entre a instrução de branch e a instrução no endereço alvo, fazendo seus sinais de controle iguais a zero Ch6.a-901998 Morgan Kaufmann PublishersPaulo C. Centoducatte Branch Hazards • Redução do atraso de branches • reduzir o custo se o branch for tomado • adiantar a execução da instrução de branch. • O next PC para uma instrução de branch é selecionado no estágio MEM • executar o branch no estágio ID (apenas uma instrução será descartada) • deslocar o cálculo do endereço de branch (branch adder) do MEM para o ID e comparando os registradores lidos do register file. 46 Ch6.a-911998 Morgan Kaufmann PublishersPaulo C. Centoducatte Branch Hazards PC Instruction memory 4 Registers M u x M u x M u x ALU EX M WB M WB WB ID/EX 0 EX/MEM MEM/WB Data memory M u x Hazard detection unit Forwarding unit IF.Flush IF/ID Sign extend Control M u x = Shift left 2 M u x Ch6.a-921998 Morgan Kaufmann PublishersPaulo C. Centoducatte Branch Hazards (exemplo) 36 sub $10, $4, $8 40 beq $1, $3, 7 # PC 40 + 4 +7*4 = 72 44 and $12, $2, $5 48 or $13, $2, $6 52 add $14, $4, $2 56 slt $15, $6, $7 .... ............ .... ............ 72 lw $4, 50($7) 47 Ch6.a-931998 Morgan Kaufmann PublishersPaulo C. Centoducatte Branch Hazards (exemplo) PC Instruction� memory 4 Registers Sign� extend M� u� x M� u� x Control EX M WB M WB WB M� u� x Hazard� detection� unit Forwarding� unit M� u� x IF.Flush IF/ID and $12, $2, $5 beq $1, $3, 7 sub $10, $4, $8 MEM/WB EX/MEM ID/EX Clock 3 72 44 48 44 28 7 � $1 $3 10 48 72 72 0 $4 $8 � ALU Data� memory M� u� x Shift� left 2 � before<1> before<2> = Ch6.a-941998 Morgan Kaufmann PublishersPaulo C. Centoducatte Branch Hazards (exemplo) M� u� x 0 bubble (nop)lw $4, 50($7) Clock 4 � beq $1, $3, 7 sub $10, . . . before<1> PC Instruction� memory 4 Registers Sign� extend M� u� x M� u� x Control EX M WB M WB WB M� u� x Hazard� detection� unit Forwarding� unit IF.Flush IF/ID MEM/WB EX/MEM ID/EX 76 72 76 72 � � � $1 $3 10 76 � � � ALU Data� memory M� u� x Shift� left 2 = 48 Ch6.a-951998 Morgan Kaufmann PublishersPaulo C. Centoducatte Dymanic Branch Prediction • Branch not taken • forma de branch predicton que assume que o branch não será tomado. • Dymanic Branch Prediction • descobre se o branch foi tomado ou não na última vez que foi executado e faz o fetch das instruções pelo mesmo local da última vez. Ch6.a-961998 Morgan Kaufmann PublishersPaulo C. Centoducatte Dymanic Branch Prediction • Implementação • branch prediction buffer • branch history table • Branch prediction buffer: pequena memória indexada por bits menos significativos do endereço da instrução de branch. Ela contém um bit que diz se o branch foi recentemente tomado ou não (Neste esquema não sabemos se a previsão é correta ou não, pois este buffer pode ser alterado por outra instrução de branch que tem os mesmos bits menos significativos de endereço). 49 Ch6.a-971998 Morgan Kaufmann PublishersPaulo C. Centoducatte Tratamento de Exceção PC Instruction� memory 4 Registers Sign� extend M� u� x M� u� x M� u� x Control ALU EX M WB M WB WB ID/EX EX/MEM MEM/WB M� u� x Data� memory M� u� x Hazard� detection� unit Forwarding� unit IF.Flush IF/ID = Except� PC 40000040 0 M� u� x 0 M� u� x 0 M� u� x ID.Flush EX.Flush Cause Shift� left 2 Ch6.a-981998 Morgan Kaufmann PublishersPaulo C. Centoducatte Tratamento de Exceção Exemplo: Dado a seqüência abaixo: 40hex sub $11, $2, $4 44hex and $12, $2, $5 48hex or $13, $2, $6 4Chex add $1, $2, $1 50hex slt $15, $6, $7 54hex lw $16, 50($7) Assumir que as instruções a serem chamadas em um tratamento de exceção comecem com: 40000040hex sw $25, 1000($0) 40000044hex sw $26, 1004($0) Mostre o que acontece no pipeline se uma exceção de overflow ocorre na instrução add. 50 Ch6.a-991998 Morgan Kaufmann PublishersPaulo C. Centoducatte Tratamento de Exceção - A figura abaixo mostra o que acontece a partir da instrução add em EX (clock 5) s lt $ 1 5 , $ 6 , $ 7lw $ 1 6 , 5 0 ($ 7 ) a d d $ 1 , $ 2 , $ 1 o r $ 1 3 , . . . a n d $ 1 2 , . . . C lo c k 5 4 0 0 0 0 0 4 0 0 0 0 0 1 0 1 0 0 0 1 5 8 5 4 5 4 1 2 $ 6 $ 7 1 5 5 0 $ 2 $ 1 $ 1 1 3 1 2 D a t a� m e m o r y b u b b l e ( n o p )s w $ 2 5 , 1 0 0 0 ( $ 0 ) b u b b le b u b b le o r $ 1 3 , . . . C lo c k 6 4 0 0 0 0 0 4 4 4 0 0 0 0 0 4 4 1 3 0 0 0 0 0 0 0 0 0 0 0 0 1 1 3 D a ta� m e m o r y 4 0 0 0 0 0 4 0 P C 4 R e g is t e r s S ig n � e x te n d M � u � x M � u � x M � u � x A L U E X M W B M W B W B ID / E X E X / M E M M E M / W B M � u� x M � u � x H a z a rd� d e te c t i o n � u n i t F o r w a r d in g� u n it IF .F l u s h IF / ID = E x c e p t� P C 4 0 0 0 0 0 4 0 0 M � u� x 0 M � u� x 0 M � u� x I D . F lus h E X . F lu s h C a u s e S h ift� le ft 2 P C 4 R e g is t e r s S i g n � e x te n d M � u � x M � u � x M � u � x C o n t ro l A L U E X M W B M W B W B ID / E X E X / M E M M E M / W B M � u� x M � u � x H a z a rd� d e te c t i o n � u n i t F o r w a r d in g� u n it IF .F l u s h IF / ID = E x c e p t� P C 4 0 0 0 0 0 4 0 0 M � u� x 0 M � u� x 0 M � u� x ID .F lu s h E X . F lu s h C a u s e S h i ft� le f t 2 In s t r u c t io n � m e m o r y � In s t r u c t io n � m e m o r y C o n tr o l Ch6.a-1001998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipeline Super Escalar PC Instruction� memory 4 Registers M� u� x M� u� x ALU M� u� x Data� memory M� u� x 40000040 Sign� extend Sign� extend ALU Address Write� data 51 Ch6.a-1011998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipeline Super Escalar Exemplo: Como o loop abaixo será escalonado em um MIPS superscalar: Loop: lw $t0, $0($s2) # t0= elemento do array addu$t0, $t0, $t2 # add escalar em $s2 sw $t0, 0($s2) # armazena o resultado addi $s1, $s1,-4 # decrementa o ponteiro bne $s1, $zero, Loop # branch se $s1 != 0 Reordenar as instruções para evitar tantos pipelines stalls quanto possível Ch6.a-1021998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipeline Super Escalar Solução: A primeira com a terceira e as duas últimas instruções tem dependência de dados. 52 Ch6.a-1031998 Morgan Kaufmann PublishersPaulo C. Centoducatte Pipeline Super Escalar Loop unrolling para pipelines escalares ���� técnica para aumentar o desempenho para loops que acessam arrays ���� múltiplas cópias do corpo do loop são feitas e instruções de diferentes iterações são escalonadas juntas Exemplo ���� supor exemplo anterior Solução ���� 4 cópias do corpo do loop Ch6.a-1041998 Morgan Kaufmann PublishersPaulo C. Centoducatte Datapath final PC Instruction memory 4 Registers Sign extend M u x M u x M u x Control ALU EX M WB M WB WB ID/EX EX/MEM MEM/WB M u x Data memory M u x Hazard detection unit Forwarding unit IF.Flush IF/ID M u x Except PC 40000040 0 M u x 0 M u x 0 M u x ID.Flush EX.Flush Cause Shift left 2 Write data Read data Address Read data Address Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 ALU control 3216 I ns tr uc ti o n Instruction [15–11] Instruction [20–16] Instruction [20–16] Instruction [25–21] Re g Wri t e ALUOp ALUSrc RegDst M e m Wr it e MemRead M e mt o R e g Branch =
Compartilhar