IBM 440

	Drivers
	Remarks

Loosely related remarks

Up to 64GB address space (36-bit) and 2GB RAM - that's interesing. I think about straight forward routing when instead of 4 or 6 bytes IP addresses nodes use 128 bits dynamically generated keys. Packet forwarder in this case can look like a giant hash table.

Advanced branch prediction algorithms and single cycle multiply. Throw in 512K secondary cache and we get excellent CPU for C++ code.

MMU supports 8 page sizes. The top four are 256K, 1MB, 16MB, 256MB.

What is this scan-flush initialization thing ?

Choise between strapping pins and IIC interface - nice.

...software device driver uses the buffer descriptor structure to inform MAL about buffer locations and packet or buffer status

. Application allocates block, fills with data, calls send(). Driver calls application to allocate block for recive(). When data received driver notifys application about new packet. When a packet transmitted driver notifys application - can be done in the context of send().

Software can generate interrupts to simplify software development and for diagnostics

Does it mean that i can simulate HW interrupt instead of enabling real HW device ?

Decoupled address and data buses support split-bus transaction capability for improved bandwidth

What does it mean ?

The contents of the ISR can be accessed by using the move from device control register (mfdcr) and move to device control register (mtdcr) instructions.

Does it mean that clear interrupt requires assembler ? Is move device control instruction single cycle ? Can i move data from control register to the RAM in sinle atom operation or i have to disable interrupts ?

There are four instruction storage addressing modes supported by the PPC440GP:

I-form branch instructions (unconditional):
The 24-bit LI field is ... added to the address of the branch instruction
Taken B-form branch instructions:
The 14-bit BD field is ... added to the address of the branch instruction
Taken XL-form branch instructions:
The contents of bits 0:29 of the Link Register (LR) or the Count Register (CTR) are concatenated on the right with 0b00 to form the 32-bit effective address of the next instruction.
Next sequential instruction fetching

14 bits is 16K or 4K opcodes - one opcode is 4 bytes long. This is most 'local' branch. In case of virtual methods C++ compiler is going to use XL ? The shortest data storage addressing mode is 16 bits. All instructions are single word and implicitly word alligned.

This ordering is called big endian because the 'big end' (most-significant end) of the scalar, considered as a binary number, comes first in storage.

I did not know this.

Which byte ordering is used is controlled on a memory page basis by the endian (E) storage attribute, which is a field within the TLB entry for the page

Can be usefull for network protcols, like setup IP address in the IP packet header. Naturally it works for the instructions too.

The icbt instruction is typically used as a 'hint' to the processor that a particular block of instructions is likely to be executed in the near future. Thus the processor can begin filling that block into the instruction cache, so that when the executing program eventually branches there the instructions will already be present in the cache, thereby improving performance.

This one is strong. Just think about following scenario. HW interrupt handler loads code of the sendSignal() routine, then handles the interrupt, then calls to sendSignal(). Or, in another, scenario before entering some loop containing memcpy() we can load memcpy() code into the cache this way avoding cache miss.

Place BDs (DMA buffuer descriptors) in non-cached memory and data buffers in cachable memory. Max packet size is (4KB - 16) bytes.

Up to 256 descriptors in the buffer descriptor table per channel..It is suggested that each data buffer start on a cache line boundary and be a multiple of a cache line in size if it resides in cachable memory. (The cache line size in the PPC440GP processor core is 32 bytes.)...During retransmission of a backed-up packet, MAL may use descriptors on which the Ready bit was already cleared. Therefore, the device driver should not reuse descriptors before the Ready bit of the last descriptor is cleared.

In case of short packets (64 bytes) 256 BDs give us burst size up to 16K or ~130 micro for 1G interface (to be accurate there are another 4K in Rx/Tx FIFOs). Apparently pure polling mode is unacceptable in the most applications. Interesting what is latency of interrupt ? It also means that Tx requires interrupt as well. I would consider polling from 100 micro interrupt. Not directly from the interrupt context but driven from the interrupt. Rx thread loops until all BDs are not handled and then enable timer interrupt go to sleep. Interrupt wakes the thread up, the thread disables the interrupt and handles received packets if any and so on. This scheme will work is 100 micro is acceptable latency. Unfortunately this is not the case for many devices.
Bottom line: motorola's are better controllers.

Supply voltages required: 3.3V, 2.5V, 1.5V

Rates for PPC440GX are 500MHz, 533MHz, 667MHz, 800MHz.

The 440GX processor offers 440GP customers the advantage of enhanced performance in speed and communication, as outlined in Table 1. No circuitboard modifications are required to accommodate the 440GX package size, which is the same as the 440GP package size. All the 440GX signal locations are the same as the 440GP signal locations.

Adds among other things

2x 10/100/gigabit ethernets (EMAC4)
2x TCP/IP acceleration hardware
Level 2 cache
PCI 2.3 compliant
256Kbytesp acket/code store SRAM
Updated PowerPC 440 core with parity

Contact e-mail larytet.5870194@bloglines.com
Home

Some links Building and Testing gcc/glibc cross toolchains http://www.kegel.com/crosstool/