Sqlite Data Carving -A way to trace

Spread The Knowledge 😌

1.0 Introduction:

SQL or Structured Query language is the language used to communicate with relational databases. What are relational databases? Well, most of the popular database systems you may know, such as MS Access, MySQL or SQLite, are all relational. That is, they all use a relational model, which, it turns out, can be described much like a spreadsheet:

  • Data are organized into tables (relations) that represent a collection of similar objects (e.g. contributors).
  • The columns of the table represent the attributes that members of the collection share (last name, home address, amount of contribution).
  • Each row in the table represents an individual member of the collection (one contributor).
  • And the values in the row represent the attributes of that individual (Smith, 1228 Laurel St., $250).

Much of the power of a relational database lies in the ability to query these relations, both within a table (give me all contributors who donated at least $500 and who live in Wyoming) and among tables (from the contributors, judges and litigants tables, give me all contributors who donated at least $1000 to Judge Crawford and who also had legal cases over which Judge Crawford presided). SQL is the powerful and rather minimalist language we use to ask such questions of our data in a relational database. How minimalist is SQL? The basic vocabulary for querying data comes down to a few main verbs:

SELECT

INSERT

UPDATE

DELETE

To create and change the structure of tables in the database, there are a few other verbs to use:

CREATE

DROP

ALTER

As stated above one can perform all relational data base activities on SQLite databases as well but there is one huge difference between other database engine and SQLite databases.

Difference between SQlite Database and Other database engines:

SQLiteDatabase
Light weighted databaseHeavy database, occupy more space
Require less time in query processingRequire more time in query processing
no external dependencies.External dependencies exist
Rollback is not possibleRollback is possible
User activity tracing is not possibleActivity tracing is possible

Data recovery in SQLite browser differs from all other databases like MSSQL or Mysql or oracle. So before we proceed need to understand SQLite database internal architecture:

1.1 File System in SQLite database file:

Internally an Sqlite file comprise of Pages ,table and index and most important tree

1.1.1 Pages:

  • SQLite database is made up of no of fix pages,each page has fixed size like 1024 ,2048,4096,8192,16384,32768 or 65536 bytes.
  • The first 100 bytes of the first page contain a header the remainder of this page follows the same structure as every other page (although the ‘data’ area is reduced by 100 bytes).
  • A database consist of multiple tables/indices , each table or idnex uses one page, so no page can contain more than one table/index.
  • Each table or index is stored as a B-Tree (Balanced tree) with each B-Tree occupying one or more pages.
  • There is a master table (sqlite_master) that resides (has its root) in the first page of the database and may overflow into additional pages. The master table tells us where the root of every user  created table and index can be found.

1.1.2 Internal tree and leaf pages:

SQLite table with in B-tree(Balance tree) consist of either internal pages or leaf pages.

1. Internal Pages:

Internal pages contains pointer to other internal pages.

2. Leaf pages:

Leaf page contain data for table

  • Larger table or index contains root page which points to internal pages ,and internal pages contains pointer to other internal pages or leaf pages. For small tables leaf page can double itself as leaf pages. We can say root page is a internal page without parent.

1

When ever any table is modified(data inserted,updated or deleted) , table can shrink in size or can grow bigger and records can move between pages.

“A list of pages that are not currently in use is held in the freelist, the freelist itself is stored as one or more pages.”

1.2 Page Types:

There are three main types of pages

  • Leaf
  • Internal
  • Overflow
  • leaf and internal pages for both tables and indexes and overflow pages for entries that cannot fit into a single page (Binary Large Objects (BLOBs) such as pictures).
  • The pages which starts with 12 byte header are Internal pages and the pages which starts with 8 byte header are known as leaf page.

1.2.4  SQLite Page Types:

  • The lockbyte page
  • A Freelist page
    • A Freelist trunk page
    • A Freelist leaf page
  • A Btree page
    • A table B-Tree interior page
    • A table B-Tree leaf page
    • An index B-Tree interior page
    • An index B-Tree leaf page
  • A payload overflow page
  • A pointer map page

1.2.5 SQLite page header first 100 bytes entails:

Offset SizeDescription
016The header string: “SQLite format 3\000”
162The database page size in bytes. Must be a power of two between 512 and 32768 inclusive, or the value 1 representing a page size of 65536
181File format write version. 1 for legacy; 2 for WAL
191File format read version. 1 for legacy; 2 for WAL.
201Bytes of unused “reserved” space at the end of each page. Usually 0
211Maximum embedded payload fraction. Must be 64.
221Minimum embedded payload fraction. Must be 32.
231Leaf payload fraction. Must be 32.
244File change counter
284Size of the database file in pages. The “inheader database size”.
324Page number of the first freelist trunk page.
364Total number of freelist pages.
404The schema cookie.
444The schema format number. Supported schema formats are 1, 2, 3, and 4.
484Default page cache size
524The page number of the largest root btree Page when in auto vacuum or incremental vacuum modes, or zero otherwise.
564The database text encoding. A value of 1 means UTF8 A value of 2 means UTF16 i.e A value of 3 means UTF16 be.
604The “user version” as read and set by the user_version pragma.
644True (nonzero) for incremental vacuum mode. False (zero) otherwise.
684The “Application ID” set by PRAGMA application_id.
7220Reserved for expansion. Must be zero.
924The version valid for number.
964SQLITE version

1.2.6 B-Tree Pages:

This B-Tree pages works on B-Tree algorithm,this algo. provides Key/Data storage with Unique and ordered key on page oriented storage device, which is made by knuth.

  • The algorithm that Knuth calls “B*Tree” stores all data in the leaves of the tree. SQLite calls this variety of BTree a “Table B-tree“.
  • The algorithm which is known as Knuth “B-Tree” stores both the key and the data together in both leaves and in interior pages.
  • In the SQLite implementation, the original BTree algorithm stores keys only, excluding the data entirely, and is called an “Index Btree“.
  • A Btree page is either an interior page or a leaf page. A leaf page contains keys .
  • In the case of a “Table B-Tree” each key has associated data. An interior page contains K keys together with K+1 pointers to child btree pages. A “pointer” in an interior Btree page is just the 31bit integer page number of the child page.
  • A B-Tree page is either a table btree page or an index btree page. All pages within each complete btree are of the same type: either table or index. There is a one table btrees in the database file for each rowid table in the database schema, including system tables such as sqlite_master. There is one index btrees in the database file for each index in the schema, including implied indexes created by uniqueness constraints.

“WITHOUT ROWID” table uses index Btree so there is one index btree table in   the database for each WITHOUT ROWID table

2

 

1.2.7 B-Tree Header

The b-tree page header is formatted as follows:

  1. Byte 0: A flag indicating the type of b-tree page: 0x02 indicates interior index page, 0x05 indicates interior table page, 0x0A indicates a leaf index page, 0x0D indicates a leaf table page.
  2. Bytes 1-2: Byte offset into first freeblock.
  3. Bytes 3-4: Number of cells on page.
  4. Bytes 5-6: Offset into first byte of the cell content area.
  5. Byte 7: Number of fragmented free bytes within cell area content
  6. Bytes 8-11: Right most pointer (interior b-tree pages only)

2.0 Logic:

Step 1: Open File in rb mode.

Store it in a variable string f=read(‘abc.sqlite’ , rb)

Step 2 : Find File Size by counting length of all bytes. File length is 10,485,760 bytes.

store this value in a variable “filesize”

Step 3: Go to the starting position of file i.e Offset 0.

Step 4: Read from offset 0 and go to offset 15 .These 16 byte contains file signature of SQLite file.which further will be used for file verification.

1

Fig 1:File Signature

Step 5: Verify file file signature whether it is SQLITE file or not.

Step 6: As previously stated that each and every page in SQLite are of equal size, so to find page size we have to read offset 16-17 and convert value into decimal into form.

1

Fig 2:Page Size

as shown figure above offset 16 and 17 has hex value “0x8000” if we convert it into decimal then value would be 32768. Store this value into variable pagesize.

Step 7: Initialize a variable “offset” initialize it with value zero(0).

         offset=0

Step8: Move again to the beginning of page offset 0.

         move=0

Step 9: Now initialize a loop start reading bytes from offset zero untill offset size is less than file-size.

while(offset<pagesize)

{

       1. Read first byte(byte zero) and store its value in a variable called flag.

2. Comapare flag value whether it is 13 or not .(value 13 means current page is leaf table B-tree ,and as mentioned above only leaf table B-tree contains data).

if(flag==13)

13. If condition if (flag==13) is true then read next 7 consecutive bytes as already mentioned in B-tree structure in section 1.2.7 each set of bytes has some meaning

I. Read first byte at offset 1-2 and convert it to ASCII and store value in variable “first_fb_offset”.

a). This value locates offset number of first free block offset.

1Fig 3: Free Block Offset

Here value is zero it means no free block is available.
So first_fb_offset=0

II. Read offset 3-4 which gives number of cells on present on the page store this value in variable num_of_cells variable. And also convert it into ASCII

1Fig4: Number of cells present

Here 0x024A so value is 586(number of cells present)

num_of_cells=586

III. Byte 5-6 shows offset number of first cell ,convert this value to ASCII and store in variable first_cell_offset_num.

1Fig5:Offset number of first cell

Here value of byte 5-6 is 0x5061 the offset number would be 20577. If this value is zero then offset value would be 65536

first_cell_offset_num=20577

IV. Byte 7 explains number of free bytes ,convert this value in ASCII and store in num_free_bytes variable .

1fig 6:Free bytes

num_free_bytes=0

Here byte 7 shows zero so there is no free bytes are available

4. Now we have parsed first B-tree header now to extract deleted data we have to identify following .

a). Un-allocated space

b). Length of Un-allocated space

c). Size of cell pointer array

To understand concept more precisely we have to understand B-tree header layout

1

Fig 6:B-tree layout

I. Find the starting occupied area of B-tree page

a). Header(first 8 bytes) + Number of cells(num_of_cells)
here value of number of cells are counted as 586 and each cell is of 2 bytes so total size of all cells is: num_of_cells= 586*2=1172

b). Now we can calculate the starting occupied bytes of B-tree header starting_occ_bytes=1172+8=1180

starting_occ_Bytes is a variable which stores length of starting bytes of page

In our example below screen shots show cell content area after 8th byte of header. whole cell content area was not able to fit into image so i am putting a part of the image

So length of Un-allocated space is difference between first cell offset and starting occupied bytes i.e

Unallocated=first_cell_offset_num-starting_occ_bytes

[Unallocated = 20577-1180]

Unallocated_length=19397

Our current pointer is at end of cell header ie byte 8

1

Fig7:Cell Content Area

Note: Always remember that in SQLite browser page start filling data from bottom of the page

Now move control to the end cell pointer array i.e read next 1172 bytes from current location:

read( num_of_cells) or

read( 1172)

Previously we had already read fist 8 byte of header now after reading 19397 bytes we will           be at 66716 location

5. Now we are on ending of cell content area which is starting of un-allocated space,to extract data we just need to read whole area Read(Unallocated_length)

6. Remove Ascii character.

7. Although this is end of extraction but if in sqlite file page contain some free block then        we have to follow following steps if first_fb_offset variable doesn’t have value zero

Then we have to follow following steps.

I. Read fist free block location offset no anf move there

a). Move(irst_fb_offset)

II. Now we on free block free block consist of following structure

First 2 byte

(Offset number of next free block)

Subsequent 4 byte

(Size of current free block)

III. Read first two bytes and convert to decimal value its offset of next free block if it is zero there is no free block offset save this value to nxt_free_block-offset.

IV. Now calculate size of current free block by reading next four byes and store it       into variable free_block_size.

V. Read this free block from first offset of free block location to end of free block.

VI. Remove ascii characters and print the output

VII. offset = offset + pagesize

End of while

Note: when table B-tee header contains value 0x05 then it is interior table btree page some time this page also contains deleted so to extract data from this part i need more time to do research data

Leave a Reply