{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quick start" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook gives an example on how to use this SDK to upload, start analysis and get the analysis result of a file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize\n", "\n", "To initialize the SDK, please prepare your Secret ID and Secret Key. Please [apply from us](https://www.binaryai.cn/doc/) if\n", "you don't have one.\n", "\n", "The Secret ID & Key is the *only* credential to access API, so please keep it safely. We recommend you read your keys to\n", "environment variable, instead of saving in your code:\n", "\n", "```bash\n", "$ read BINARYAI_SECRET_ID\n", "#(enter your secret id)\n", "$ read BINARYAI_SECRET_KEY\n", "#(enter your secret key)\n", "$ export BINARYAI_SECRET_ID\n", "$ export BINARYAI_SECRET_KEY\n", "```\n", "\n", "Once those environment variables are set, our SDK can read them directly.\n", "\n", "To initialize the SDK:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "metadata": {} }, "outputs": [], "source": [ "# Uncomment to get more logs\n", "# import logging\n", "# logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n", "# logger = logging.getLogger(\"binaryai_sdk\")\n", "\n", "from binaryai import BinaryAI\n", "\n", "bai = BinaryAI() # Initialize the client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! If no exceptions raised, the client is initialized." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Upload and analyze file\n", "\n", "Note: file upload might be rejected if file is too big or upload is too quick.\n", "\n", "Now you can upload by the file path:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "analysis succeed\n" ] } ], "source": [ " # if upload succeed, file hash is returned\n", "sha256 = bai.upload(\"/bin/echo\")\n", "\n", "# wait until done. timeout=-1 means wait forever\n", "bai.wait_until_analysis_done(sha256, timeout=-1)\n", "\n", "print(\"analysis succeed\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get analysis result\n", "\n", "You can get analysis result by giving hash of a file for each method:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "metadata": {} }, "outputs": [ { "data": { "text/plain": [ "{'fileType': 'ELF64',\n", " 'machine': 'AMD64',\n", " 'platform': 'LINUX',\n", " 'endian': 'LITTLE_ENDIAN',\n", " 'loader': 'x86:LE:64:default',\n", " 'entryPoint': 1059200,\n", " 'baseAddress': 1048576}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bai.get_overview(sha256)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1: _DT_INIT]\n", "[2: FUN_00102020]\n", "[3: ::getenv]\n", "[4: ::free]\n", "[5: ::abort]\n", "[6: ::__errno_location]\n", "[7: ::strncmp]\n", "[8: ::_exit]\n", "[9: ::__fpending]\n", "[10: ::textdomain]\n", "[11: ::fclose]\n", "[12: ::bindtextdomain]\n" ] } ], "source": [ "funcs = bai.list_funcs(sha256)\n", "for i, f in enumerate(funcs):\n", " print(\"[{}: {}]\".format(i+1, f.name))\n", " if i > 10:\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or initialize a file object and call it:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "reptile\n", "----\n", "tsh\n", "----\n" ] } ], "source": [ "from binaryai import BinaryAIFile\n", "# This pair of hash is the same file\n", "sha256 = \"289616b59a145e2033baddb8a8a9b5a8fb01bdbba1b8cf9acadcdd92e6cc0562\"\n", "md5 = \"c3366c6b688a5b5fa4451fec09930e06\"\n", "bai_file = BinaryAIFile(bai, md5=md5)\n", "for component in bai_file.get_sca_result():\n", " print(component.name)\n", " print(\"----\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also get a file's KHash, which can be used to compare similarities:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A<->B: 0.9716796875\n", "A<->C: 0.583984375\n", "B<->C: 0.5888671875\n" ] } ], "source": [ "from binaryai import BinaryAIFile\n", "\n", "fileA = BinaryAIFile(bai, md5=\"346136457e1eb6eca44a06bb55f93284\").get_khash_info()\n", "fileB = BinaryAIFile(bai, sha256=\"841de34799fc46bf4b926559e4e7a70e0cc386050963978d5081595e9a280ae1\").get_khash_info()\n", "fileC = BinaryAIFile(bai, sha256=\"9b53a3936c8c4202e418c37cbadeaef7cc7471f6a6522f6ead1a19b31831f4a1\").get_khash_info()\n", "assert fileA[1] == fileB[1]\n", "assert fileB[1] == fileC[1]\n", "\n", "# calculate hamming distance\n", "def khash_similarity(khash_a: str, khash_b: str) -> float:\n", " def khash_str_to_list(khash: str) -> list:\n", " return list(bin(int(khash, 16))[2:].zfill(1024))\n", " from scipy.spatial import distance\n", " khash_a, khash_b = khash_str_to_list(khash_a), khash_str_to_list(khash_b)\n", " return 1 - distance.hamming(khash_a, khash_b)\n", "print(f\"A<->B: {khash_similarity(fileA[0].hex(), fileB[0].hex())}\")\n", "print(f\"A<->C: {khash_similarity(fileA[0].hex(), fileC[0].hex())}\")\n", "print(f\"B<->C: {khash_similarity(fileB[0].hex(), fileC[0].hex())}\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In August 2024, we introduced a new feature to calculate a file's risky probability. A value ranged at `[0, 1]` might returned." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "libidn: 0.0003542900085449219\n", "tshd: 0.9892578125\n" ] } ], "source": [ "print(f\"libidn: {BinaryAIFile(bai, sha256='fed32e9a49717eacd2b2ff73ce22a6140a3b814805a089ca6c4dd09befae0d36').get_malware_probability()}\")\n", "print(f\"tshd: {BinaryAIFile(bai, sha256='289616b59a145e2033baddb8a8a9b5a8fb01bdbba1b8cf9acadcdd92e6cc0562').get_malware_probability()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As shown above, you can always give a file hash (md5 or sha256) to get its analysis result.\n", "\n", "Read `examples/` in SDK repository or read the SDK API document for more info." ] } ], "metadata": { "kernelspec": { "display_name": "binaryai-YJgBNhjL-py3.9", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }