Quick start

This notebook gives an example on how to use this SDK to upload, start analysis and get the analysis result of a file.

Initialize

To initialize the SDK, please prepare your Secret ID and Secret Key. Please apply from us if you don’t have one.

The Secret ID & Key is the only credential to access API, so please keep it safely. We recommend you read your keys to environment variable, instead of saving in your code:

$ read BINARYAI_SECRET_ID
#(enter your secret id)
$ read BINARYAI_SECRET_KEY
#(enter your secret key)
$ export BINARYAI_SECRET_ID
$ export BINARYAI_SECRET_KEY

Once those environment variables are set, our SDK can read them directly.

To initialize the SDK:

[1]:
# Uncomment to get more logs
# import logging
# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logger = logging.getLogger("binaryai_sdk")

from binaryai import BinaryAI

bai = BinaryAI() # Initialize the client

Great! If no exceptions raised, the client is initialized.

Upload and analyze file

Note: file upload might be rejected if file is too big or upload is too quick.

Now you can upload by the file path:

[2]:
 # if upload succeed, file hash is returned
sha256 = bai.upload("/bin/echo")

# wait until done. timeout=-1 means wait forever
bai.wait_until_analysis_done(sha256, timeout=-1)

print("analysis succeed")
analysis succeed

Get analysis result

You can get analysis result by giving hash of a file for each method:

[3]:
bai.get_overview(sha256)
[3]:
{'fileType': 'ELF64',
 'machine': 'AMD64',
 'platform': 'LINUX',
 'endian': 'LITTLE_ENDIAN',
 'loader': 'x86:LE:64:default',
 'entryPoint': 1059200,
 'baseAddress': 1048576}
[4]:
funcs = bai.list_funcs(sha256)
for i, f in enumerate(funcs):
    print("[{}: {}]".format(i+1, f.name))
    if i > 10:
        break
[1: _DT_INIT]
[2: FUN_00102020]
[3: <EXTERNAL>::getenv]
[4: <EXTERNAL>::free]
[5: <EXTERNAL>::abort]
[6: <EXTERNAL>::__errno_location]
[7: <EXTERNAL>::strncmp]
[8: <EXTERNAL>::_exit]
[9: <EXTERNAL>::__fpending]
[10: <EXTERNAL>::textdomain]
[11: <EXTERNAL>::fclose]
[12: <EXTERNAL>::bindtextdomain]

Or initialize a file object and call it:

[5]:
from binaryai import BinaryAIFile
# This pair of hash is the same file
sha256 = "289616b59a145e2033baddb8a8a9b5a8fb01bdbba1b8cf9acadcdd92e6cc0562"
md5 = "c3366c6b688a5b5fa4451fec09930e06"
bai_file = BinaryAIFile(bai, md5=md5)
for component in bai_file.get_sca_result():
    print(component.name)
    print("----")
reptile
----
tsh
----

You can also get a file’s KHash, which can be used to compare similarities:

[6]:
from binaryai import BinaryAIFile

fileA = BinaryAIFile(bai, md5="346136457e1eb6eca44a06bb55f93284").get_khash_info()
fileB = BinaryAIFile(bai, sha256="841de34799fc46bf4b926559e4e7a70e0cc386050963978d5081595e9a280ae1").get_khash_info()
fileC = BinaryAIFile(bai, sha256="9b53a3936c8c4202e418c37cbadeaef7cc7471f6a6522f6ead1a19b31831f4a1").get_khash_info()
assert fileA[1] == fileB[1]
assert fileB[1] == fileC[1]

# calculate hamming distance
def khash_similarity(khash_a: str, khash_b: str):
    from scipy.spatial import distance
    khash_a, khash_b = list(bin(int(khash_a, 16))[2:]), list(bin(int(khash_b, 16))[2:])
    return 1 - distance.hamming(khash_a, khash_b)
print(f"A<->B: {khash_similarity(fileA[0].hex(), fileB[0].hex())}")
print(f"A<->C: {khash_similarity(fileA[0].hex(), fileC[0].hex())}")
print(f"B<->C: {khash_similarity(fileB[0].hex(), fileC[0].hex())}")

A<->B: 0.958984375
A<->C: 0.583984375
B<->C: 0.580078125

As shown above, you can always give a file hash (md5 or sha256) to get its analysis result.

Read examples/ in SDK repository or read the SDK API document for more info.