Lab 5

Hey there everyone,

It is my, Anh again with some of my experience with lab5, a lab that is frankly, quite hard for me. I will talk about how I resolve my problem throughout this post. For this lab, I have to use some of lab 4 files again, which is a little bit convenient.

Part 1: Auto-Vectorization

First, I change some element for the Makefile file by adding -fopt-info-vec-all line.

Then after I ran the command gcc-g -O3 -fopt-info-vec-all vol1.c -o vol1, I saw that there is a loop vectorized at line 32.


While at line 38, it is not vectorized yet .

So in order to vectorized other part of the code, I decided to change some part of the sum up the data part from this:

To this:

And after ran the command again, the line has been vectorized, so I have successfully vectorized 1 more loop.

Part 2: Inline Assembler

For the next part, I need to look at add.c. Make sure that Iunderstand how the inline assembler code works and why. Modify the code to calculate b mod a using inline assembler, and print the result. So I change what I need in the add.c file.

After that, I ran the time command to get the runtime.

The next objective is that the file vol_inline.c contains a version of the volume scaling problem which uses inline assembler and the SQDMULH instruction. Copy, build, and verify the operation of this program on an AArch64 system.

As we can see, default assembler is :

vol.h

#define SAMPLES 5000000

And with a simple time command, we can see how long it is for it to run

If I try to increase the sample, the runtime will be increase, and if I try to lower down the sample, the runtime will be decrease too.

Now I will try to answer some question in this lab.

Question 1: What is an alternate approach?
I would let the compiler choose which registers to use for the variables instead of doing it myself.

Question 2: Should we use 32767 or 32768 in next line? why?
I use 32767 since the upper bound of a int16_t value is 32767, so using 32768 will cause problem .

Question 3: What does it mean to “duplicate” values in the next line?
It means to put them into the correct vector locations in the registers.

It means storing the volume factor into the SIMD register 8 times .

Question 4: Why is #16 included in the str line but not in the ldr line?
I did not want to increase the cursor at ldr since I need the current cursor position to store the values in the str.

Question 5: What do these next three lines do?

1st line will be the output value. “+r” means that it will be a read/write register.

2nd line declare input operand.

3rd line declares the asm clobbers memory, means the compiler will reload data from memory after execution.

Question 6: are the results usable? are they correct?
It does not return the same number, so I would say no.

Part 3: C Intrinsics

For this part, I need to run vol_intrinsics program, and after run the command, this is how long it took.

If I try to increase the sample, the runtime will be increase, and if I try to lower down the sample, the runtime will be decrease too.

Question 1: What do these intrinsic functions do?
vst1q_s16 stores a single vector.
vqdmulhq_s16 multiplies 2 vectors.
vdupq_n_s16 loads all vector lanes to the same literal value.

Question 2: Why is the increment below 8 instead of 16 or some other value?
We are using int16_t so we have 8 vector lanes. We increment by 8 to get the next 8 values.

Question 3: Why is this line not needed in the inline assembler verson of this program?
Because we set the vectors up to be8 lane, and it is done in the inline assembler while it is not done with intrinsics.

Question 4: are the results usable? are they correct?
It does not return the same reuslt.

This lab takes longer than I thought it would be, I may go back here to update something if i need to. Until then, see you in my next post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s