Assembly

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

How would I get started in Assembly reverse engineering? debugging things and finding out how they work and stuff? Where'd you start aluigi?

aluigi · **Posted:** 10 May 2008 13:11

Personally I started to read a bit about the basic assembly instructions and the "intel syntax" (the default one on windows, the other is the att which is generally used by default on *nix), just something to learn a bit the theory.
Then I started to use the disassembler and moreover the debugger where you can see all the "theory" in action.
For example I have never written an assembly program because I have never been interested in it, I was interested only in reading and understandingthe assembly for portions of code I needed to know.

So as first thing search a quick and simple introduction to Assembly in your native language, the rest will be made by the practice and by the "Intel 64 and IA-32 architectures software developer's manual" which contains all (except the non-intel cpu dependent instructions like 3dnow) the available assembly instructions and all the details about them

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

I see, wow thanks a lot Aluigi. So you pretty much just get a general understanding of Assembly Language and then get into X86 instruction sets and understand each key definitions? Is there a specific website that has some really useful ones?

So you normally just de-assemble things and find key words in assembly and determine what the program is doing? Then re-write in C?

aluigi · **Posted:** 10 May 2008 18:33

In reversing the only thing you must do is understanding what happens, so what the reversed program does.
Then with a bit of practice you will understand better the sets of instructions which are usually used to do something, for example rep movsd/movsb for memcpy, rep movsb for memmove, rep scasb before the previous instructions for strcpy and so on.

The traduction from assembly to high level language as C is an operation made for understaning better the code and (like in my case) for making a working code which emulates that assembly part which you reversed.

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

Cool, I understand. Making a small C program and then de-assembling it, viewing it in ASM give me a better understanding of reverse engineering? Could that be a good technique?

O yeah, last question... What de-assembler do you use? Thanks for all your help man.

aluigi · **Posted:** 12 May 2008 12:17

yes the C example program to reverse is a very good way to start.

the best disassembler is IDA without doubts, but I like to use the one included in olly or w32dasm (an old and bugged disassembler but very easy to use)

ratsoul · **Joined:** 14 Aug 2007 11:17 **Posts:** 24

Speedy if you want to try IDA, there is an IDA freeware version... search on google.

Sethioz · **Posted:** 12 May 2008 15:39

http://www.hex-rays.com/idapro/idadownfreeware.htm
this i think ...

ratsoul · **Joined:** 14 Aug 2007 11:17 **Posts:** 24

yep ... thx Sethioz... I'm lazy ;)

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

Alrighty. I was wondering if you need a lot of math skills to do reverse engineering. I know that Luigi is good with algorithms and such. Do you use a lot of math skills for it? Or just basic math? Thanks a lot guys!

AnonymousCoward · **Joined:** 24 Feb 2008 08:31 **Posts:** 10

All you really need to know is binary math (bitwise operators, bit shifting, etc.). Now if you are trying to reverse engineer more complex algorithms (such as encryption/decryption), higher math skills are useful for simplifying the analysis, but not necessarily required (as long as you know binary math, assembly opcodes, and general programming concepts). And usually you can Google search some of the "magic" numbers (the ones that look random) used in assembly to figure out which encryption algorithm is being used.

aluigi · **Posted:** 13 May 2008 11:59

To know the existence of some known algorithms is also possible to use signature based scanners, I created signsrch just for this job and it works very well:

http://aluigi.org/mytoolz.htm#signsrch

ratsoul has also created a plugin for immunity debugger which does the same:

http://www.autistici.org/ratsoul/iss.html

Luckily the most known and used algorithms use tables and constants so identifying them is a joke, but exist also other less known algorithm which can't be identified through signatures.

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

Thanks a lot man!

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

I got another question... When you guys are debugging, do you normally search for key strings and then look for opcodes? Or does it depend on what you're trying to do?

I had another question that I've always been trying to figure out. What is all those numbers like this: 0x8800000... Is that like a memory address or something? Soo confusing. Is that how code interacts with hardware or something? I don't really understand if I'm right or wrong.

Ok, well I just did a little bit of research and it looks like they're memory address. But now I need to know if they interact with hardware and stuff, because I've seen many sources in C and always became confused in what they really do.

Sethioz · **Posted:** 14 May 2008 21:28

as far as i know, it is a memory address. it is how program maps itself..or how to put it. like a memory map .. so program knows where is what. this is also y every error message has an offset where it occured. (well not every error msg, but most)
like it starts with:
0x0000
0x0001 ..etc.
http://en.wikipedia.org/wiki/Memory_address
http://en.wikipedia.org/wiki/Offset_%28 ... science%29
read those. that should give you exact picture of it.

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

Ok I understand that now, thanks! I need to know what an offset and segments are really for. What do they do exactly?

aluigi · **Posted:** 15 May 2008 10:34

The classical segment:offset notation was used in the past with 16 bit code to go over the limit of 64kb of memory.
Currently the only things you need to consider are the offsets which are the virtual addresses of the data in memory.
So when you launch an executable it's loaded in memory at a certain offset (case address), for understanding better this take a look to the M button in Ollydbg which shows all the mapped memory of the process.

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

wow thanks again aluigi. When you explain things, I understand it very well. I really appreciate all your guys helps here.

Ok so I've been wondering... Inside C, people make all these libraries and create some really exceptionally advanced functions like "sockets"(which uses a network card) or other functions that have interactions with hardware. How do they do this?

Are they using memory addresses. Like a certain memory address that would hold information coming in from the network card, such as "TCP/UDP Packets"? I would really love to know this bit, I find it really interesting.

I assumed that a hardware device such as a network card/graphics card/etc would recieve information and send it into memory, and then your processor gets the memory from the memory address, then into a program, and it interacts from program to hardware. Is that how it works? I may be wrong.

The main thing I'd like to know is how they would go from a few instructions in C to all of that(Using sockets functions from libraries or reading text files). Is it mainly pointers that make the interactions with memory? IDK, but this seems to be a pretty cool thing I'd like to know. I've done some research, and no website has explained it. I think one of you guys would know the answer to this question. :)

I have readed the whole first default reference guide from C, without the libraries. That is the main language of C. Now all those codes, I can't seem to find any besides the pointer code that would interact with the memory. The rest of them are just simple statements and functions.

Sorry for the long read!

Thanks again!

aluigi · **Posted:** 15 May 2008 20:48

Speedy wrote:

Ok so I've been wondering... Inside C, people make all these libraries and create some really exceptionally advanced functions like "sockets" or interactions with hardware. How do they do this?

Uhmm libraries are specific functions which you can add to your programs more easily, while sockets in C are usually referred to the network sockets, so in short the identifiers of the connections.
When you want to listen on a port you must create a socket and you must create a socket also for making an outgoing connection and so on.
If you want to take a look to sockets implemented in C take a look to one of my Proof-of-concept referred to network vulnerabilities, some of them are very simple to understand.

Quote:

Are they using memory addresses. Like a certain memory address would hold information coming in from the network card, such as "TCP/UDP Packets"? I would really love to know this bit, I find it really interesting.

access to a device and access to the network functions are on two different levels, the first is very low and requires specific instructions (this is not my field so I can't go deeper in it to avoid to say something wrong) while for the second is enough to use the needed functions/API (on Windows the sockets are handled by a specifc API called winsock... that's why there is ever WSAStartup in my code).
Another example is sniffing packets that for example on Windows is composed by the Winpcap driver which interacts at low level with the network layer of Windows and gives you an easy to use API for programming it.

Quote:

I assumed that a hardware device such as a network card/graphics card/etc would recieve information and send it into memory, and then your processor gets the memory address from a program, and then it interacts.

Usually the rule is you work with the API and the API does the raw job, that's why the graphic is handled through OpenGL and/or DirectX.

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

Thanks again, but I don't think I explained it properly. I know what sockets are for.

Edit: Ok wait, I think I know where I went wrong. are the library files that have the extention .a/.ib and the include files which extentions are .h different?

Just wondering, luigi... Do you know all the codes & statements that are able to be read by the G++.exe, C compiler? Is it on some website? Is it that website you gave me?

aluigi · **Posted:** 15 May 2008 22:12

in short the .a and .lib are functions (like a .c) which have been already compiled while the .h are usually used to include only the function prototypes and some #defines

So when you need to use a library you need the .h which contains the format of the functions you will use and the .a/.lib which contain the compiled code of these functions.

With code and statements I think you refer to the syntax and the preprocessing directives right?
I mean #define #include typedef #pragma char int long and so on right?
These info are available in the C guides

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

sorry about that guys, libws2_32.a. Instead of using header files, is there a way I can get the function prototypes from that library and possibly get a better understanding?

Ok, I think Luigi was right. I have googled function prototypes for a specific library and I got an API documentation of it.

I think that's all I need for right now, thanks a lot guys!

aluigi · **Posted:** 15 May 2008 22:22

It's not possible to get the prototype from precompiled libraries because they don't have the prototype saved in them, but in some cases are used the mangled names where are included many details of the function:

http://en.wikipedia.org/wiki/Name_mangling

If name mangling is not available you need to retrieve the prototype of the function through disassembling and/or reversing

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

Yes, I just opened it up in IDA Pro. I'm kind of new to reverse engineering so IDK what to do, but I understood a few things.

I found something interesting online, I was looking for something in relation to the WS2_32.lib and I got this:
http://msdn.microsoft.com/en-us/library/ms740506(VS.85).aspx

Do you think it has to do with the function prototypes the library provides?

I'll look into this name mangling. Looks interesting

aluigi · **Posted:** 15 May 2008 23:21

yes socket() is one of the functions contained in *ws2_32.*, it creates the sockets.
Anyway to be correct is not good to say that it "contains" because it's only an interface for the functions included in ws2_32.dll, that's the difference between static and shared libraries.
The first ones allow you to include all the functions without an external dependency (like a dll) while the seconds do the opposite.
The first have the pro of having all the stuff in the executable which can work on any pc without additional files but have the cons of having a bigger size and they require to be recompiled if the library is updated.
The seconds instead have smaller executable but depend by the dll and in some cases is necessary that this dll is the correct version of the library used by who compiled the program since the prototype of the functions could have been changed.

Speedy · **Joined:** 10 May 2008 01:24 **Posts:** 12

alright, thanks a lot luigi

djeZo · **Joined:** 22 Dec 2007 15:57 **Posts:** 10

Speedy wrote:

wow thanks again aluigi. When you explain things, I understand it very well. I really appreciate all your guys helps here.

Ok so I've been wondering... Inside C, people make all these libraries and create some really exceptionally advanced functions like "sockets"(which uses a network card) or other functions that have interactions with hardware. How do they do this?

Are they using memory addresses. Like a certain memory address that would hold information coming in from the network card, such as "TCP/UDP Packets"? I would really love to know this bit, I find it really interesting.

I assumed that a hardware device such as a network card/graphics card/etc would recieve information and send it into memory, and then your processor gets the memory from the memory address, then into a program, and it interacts from program to hardware. Is that how it works? I may be wrong.

you have to understand one very important things: on intel cpus, there are rings of access. ever heard of ring0 and ring3? OSs we are currently using (new ones) therefore have 2 parts: kernel mode and user mode. kernel mode works in ring0 and user mode works in ring3. if your code is executed in ring0, it means you can use additional opcodes that are not avaliable/not working in ring3. one such opcode is interrupt - communication with hardware directly. so, you can try whatever you want, but in ring3 level you wont be able to communicate with hardware directly. it all goes through kernel mode. thats why OSes have so called "drivers" that take care of low level ring0 hardware communication.

on the case you are asking; when you want to send a packet and you use send() function (on winnt system) the path goes as following: send is located at certain offset in ws2_32.dll (this is user mode or ring3), which calls some other functions in ring3 to check various stuff etc. there are some dlls used but at the end there is switch to kernel mode. it looks smth like that (this is just some x function):

in ntdll.dll:
first move parameter into eax
MOV EAX,0AD
move some address into edx
MOV EDX,7FFE0300
call edx
CALL DWORD PTR DS:[EDX]

when we follow edx, we see following:
MOV EDX,ESP
SYSENTER

SYSENTER means that we switch from ring3 to ring0. and from there on, your debugger wont work anymore and kernel takes over the things.

btw: i started with assembly few months back. first i read article "smashing the stack for fun and profit". then ive got a lot of help from aluigi on certain exploitation of buffer overflow. and meanwhile he was helping me, it hit me and all became so logically and understandable.

Luigi Auriemma

Assembly