Bitmessage is an encrypted, peer to peer messaging protocol. It provides strong privacy guarantees by encrypting messages with a public key and then distributing them through the network without a recipient address. The idea is that each node will receive every message and try to decrypt it with their private keys. If a message cannot be decrypted then it wasn’t intended for your node but no one else knows so that’s how receiver anonymity is achieved. In addition to this the protocol employs proof of work to prevent spam so with all this overhead and latency, Bitmessage is more analogous to email rather than instant messaging.

The reference implementation of Bitmessage is called PyBitmessage and stores data in an SQLite file called messages.dat and may be found in a number of places including $BITMESSAGE_HOME, $XDG_CONFIG_HOME/PyBitmessage/ or $HOME/PyBitmessage/ (PyBitmessage searches for paths in this order at startup).

† sqlite3 messages.dat '.schema'
CREATE TABLE inbox (msgid blob, toaddress text, fromaddress text, subject text, received text, message text, folder text, encodingtype int, read bool, sighash blob, UNIQUE(msgid) ON CONFLICT REPLACE);
CREATE TABLE sent (msgid blob, toaddress text, toripe blob, fromaddress text, subject text, message text, ackdata blob, senttime integer, lastactiontime integer, sleeptill integer, status text, retrynumber integer, folder text, encodingtype int, ttl int);
CREATE TABLE subscriptions (label text, address text, enabled bool);
CREATE TABLE addressbook (label text, address text);
CREATE TABLE blacklist (label text, address text, enabled bool);
CREATE TABLE whitelist (label text, address text, enabled bool);
CREATE TABLE pubkeys (address text, addressversion int, transmitdata blob, time int, usedpersonally text, UNIQUE(address) ON CONFLICT REPLACE);
CREATE TABLE inventory (hash blob, objecttype int, streamnumber int, payload blob, expirestime integer, tag blob, UNIQUE(hash) ON CONFLICT REPLACE);
CREATE TABLE settings (key blob, value blob, UNIQUE(key) ON CONFLICT REPLACE);
CREATE TABLE objectprocessorqueue (objecttype int, data blob, UNIQUE(objecttype, data) ON CONFLICT REPLACE);

Yes, many of these fields are binary blobs including the primary key. This seems like a stupid design decision but we can deal with it. SQLite includes a function called hex() to turn these fields into printable ascii.

† sqlite3 messages.dat 'select hex(msgid) from inbox;' | head

More importantly we can invert this process to do lookups on the database by adding where hex(msg_id) = to our queries.

†  sqlite3 messages.dat "select subject from inbox where hex(msgid)='0002AD7D1668A45E3893EB57F5D8B445CEA8459B1BDEF301ED8DEFC08C8C6238';"
Re: Be advised that Protonmail's web client is only open source. Read the link below with the admonition,

To extract the data I want the message contents in a file, metadata in another and any embedded files extracted too.

Metadata can be extracted into shell variables like this

row=$(sqlite3 messages.dat 'select toaddress, fromaddress, received, folder, encodingtype, read, hex(sighash) from inbox where hex(msgid)='"'$msgid'"';')

toaddress=$(echo "$row" | awk -F'|' '{print $1}')
fromaddress=$(echo "$row" | awk -F'|' '{print $2}')
received=$(echo "$row" | awk -F'|' '{print $3}')
folder=$(echo "$row" | awk -F'|' '{print $4}')
encodingtype=$(echo "$row" | awk -F'|' '{print $5}')
wasread=$(echo "$row" | awk -F'|' '{print $6}')
sighash=$(echo "$row" | awk -F'|' '{print $7}')

The subject and contents can be extracted like this

sqlite3 messages.dat 'select subject from inbox where hex(msgid)='"'$msgid'"';' > "${msgid}.message"
sqlite3 messages.dat 'select message from inbox where hex(msgid)='"'$msgid'"';' >> "${msgid}.message"

I decided to use the msgid for the filenames instead of the subject because the latter could contain anything and escaping all the potential badness does not sound like fun.

Even though messages are text only, the PyBitmessage client supports displaying files embedded using html. Note that given Bitmessage does seem successful in providing some level of sender anonymity and the nature of the design makes it impossible for the network to censor messages based on content so you may want to be careful with extracting any images.

Since we’ve already extracted all messages into text files, we can leverage that to pick out the ones with embedded files without going through the database:

grep -oP '<img' output/*.message | cut -d: -f1 > files

while read file;
done < files

Some users will use mail clients to send html messages through their Bitmessage daemon and some mail clients will format messages into 80 column lines. An easy way to undo this formatting is to remove all whitespace from the message.

sed -re 's/\s//g' < "$file" > flattened

Each message may have multiple embedded files so we can extract them like this

grep -oP 'data:image/[^;]+;base64,[a-zA-Z0-9+/=]+' flattened > imgs

It is then just a matter of extracting the base64 encoded data from the html tags and decoding it.

msgid=$(echo "$file" | cut -d/ -f2 | cut -d. -f1)
while read j;
  ext=$(echo "$j" | grep -oP 'image/[^;]+' | cut -d/ -f2)
  echo "$j" | cut -d, -f2 | base64 -d > "output/$filename"
  echo "output/$filename"
  i=$(expr $i + 1)
done < imgs

There is nothing extra to know about unpacking the sent table except it has some additional fields such as ackdata and senttime.