Skip to content

H21lab/json2pcap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 

Repository files navigation

json2pcap

Script which can be used to reconstruct pcap and perform packet modifications from tshark json output. Script is also allowing to perform pcap masking or anonymization.

This repository contains more recent and experimental changes compared to Wireshark (https://github.com/wireshark/wireshark/tree/master/tools/json2pcap).

Command tshark -T json -x or -T jsonraw output adds into hex-data output in JSON also the information on which position each field is dissected in the original frame, what is the field length, the bitmask (for not byte aligned fields) and the type. This information can be used for latter processing. One use-case is the json2pcap script included in wireshark, which assembles the protocol layers back together from upper to lowers layers. This allows reverse json to pcap conversion and also the packet modification/editing/rewriting.

/wireshark/tools/json2pcap/json2pcap.py

Prerequisites

pip install scapy
pip install ijson
pip install bitstring

Usage

usage: json2pcap.py [-h] [--version] [-i [INFILE]] -o OUTFILE [-p] [-m MASKED_FIELD] [-a ANONYMIZED_FIELD] [-s SALT] [-v]

json2pcap 1.3

Utility to generate pcap from json format.

Packet modification:
In input json  it is possible to  modify the raw values  of decoded fields.
The  output  pcap  will  include  the modified  values.  The  algorithm  of
generating the output pcap is to get all raw hex fields from input json and
then  assembling them  by layering  from longest  (less decoded  fields) to
shortest  (more decoded  fields). It  means if  the modified  raw field  is
shorter field (more decoded field) it takes precedence against modification
in longer field  (less decoded field). If the json  includes duplicated raw
fields with  same position and  length, the behavior is  not deterministic.
For manual packet editing it is  always possible to remove any not required
raw fields from json, only frame_raw is field mandatory for reconstruction.

Packet modification with -p switch:
The python  script is generated  instead of  pcap. This python  script when
executed  will  generate the  pcap  of  1st  packet  from input  json.  The
generated code includes the decoded fields and the function to assembly the
packet.  This enables  to modify  the script  and programmatically  edit or
encode the packet variables. The assembling algorithm is different, because
the decoded packet fields are relative and points to parent node with their
position (compared to input json which has absolute positions).

Pcap masking and anonymization with -m and -a switch:
The script allows to mask or anonymize the selected json raw fields. If the
The fields are selected and located on  lower protocol layers, they are not
The overwritten by  upper fields  which are not  marked by  these switches.
The pcap masking and anonymization can be performed in the following way:

tshark -r orig.pcap -T json -x --no-duplicate-keys | \ python json2pcap.py
-m "ip.src_raw" -a "ip.dst_raw" -o anonymized.pcap
In this example the ip.src_raw field is masked with ffffffff by byte values
and ip.dst_raw is hashed by randomly generated salt.

Additionally the following syntax is valid to anonymize portion of field
tshark -r orig.pcap -T json -x --no-duplicate-keys  | \ python json2pcap.py
-m "ip.src_raw[2:]" -a "ip.dst_raw[:-2]" -o anonymized.pcap
Where the src_ip first byte is preserved and dst_ip last byte is preserved.
And the same can be achieved by
tshark -r orig.pcap -T json -x --no-duplicate-keys | \ python json2pcap.py
-m "ip.src_raw[2:8]" -a "ip.dst_raw[0:6]" -o anonymized.pcap

Masking and anonymization  limitations are mainly the following:
- In case  the tshark is performing reassembling from  multiple frames, the
backward pcap  reconstruction is not  properly performed and can  result in
malformed frames.
- The  new values  in the  fields could  violate the  field format,  as the
json2pcap  is  no performing  correct  protocol  encoding with  respect  to
allowed values of the target field and field encoding.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -i [INFILE], --infile [INFILE]
                        json generated by tshark -T json -x
                        or by tshark -T jsonraw (not preserving frame timestamps).
                        If no inpout file is specified script reads from stdin.
  -o OUTFILE, --outfile OUTFILE
                        output pcap filename
  -p, --python          generate python payload instead of pcap (only 1st packet)
  -m MASKED_FIELD, --mask MASKED_FIELD
                        mask the specific raw field (e.g. -m "ip.src_raw" -m "ip.dst_raw[2:6]")
  -a ANONYMIZED_FIELD, --anonymize ANONYMIZED_FIELD
                        anonymize the specific raw field (e.g. -a "ip.src_raw[2:]" -a "ip.dst_raw[:-2]")
  -s SALT, --salt SALT  salt use for anonymization. If no value is provided it is randomized.
  -v, --verbose         verbose output

Pcap anonymization

Pcap anonymization can be performed in the following way:

tshark -r original.pcap -T json -x --no-duplicate-keys | \
python json2pcap.py -a "ip.src_raw" -a "ip.dst_raw" -o anonymized.pcap

By -a switch should be specified all fields which require anonymization.

Limitations

In case the tshark is performing reassembly from multiple frames, the backward pcap reconstruction performed by json2pcap is not properly recovering the original frames.

To overcome this limitation it is possible to use tshark with supressed packet reassembly. To disable reassembly for specific protocol use tshark -o <SELECTED_REASSEMPLY_OPTION>:FALSE. And for <SELECTED_REASSEMPLY_OPTION> see tshark -G defaultprefs. After disabling packet reassembly, the protocol frames should be assembled correctly by json2pcap. However the masking/anonymization will not be performed for fragmented protocols.

The fields that are using bitmask could be incorrectly re-encoded. From the tshark json raw output it is ambigious if the field is encoded by little endian or by big endian.

Attribution

Copyright 2020, Martin Kacer <kacer.martin[AT]gmail.com> and contributors

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.