Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCoA Printers #33

Open
ra1nst0rm3d opened this issue Nov 8, 2020 · 64 comments
Open

SCoA Printers #33

ra1nst0rm3d opened this issue Nov 8, 2020 · 64 comments

Comments

@ra1nst0rm3d
Copy link

When are you going to implement support for SCoA printers?

@agalakhov
Copy link
Owner

I need a volunteer who has a SCoA printer and can program for that.

@ra1nst0rm3d
Copy link
Author

I has LBP1120 and may code on C/C++

@mounaiban
Copy link
Contributor

Just leaving this link just in case it would be useful to anyone else who also wants to have a go at implementing SCoA: https://github.com/caxapyk/capt_lbp810-1120

It's a fork of the original Nicolas Boichat driver for the LBP-810, which contains an SCoA implementation.

@ra1nst0rm3d
Copy link
Author

No, as say @agalakhov its don't realizing SCoA compression

@mounaiban
Copy link
Contributor

mounaiban commented Nov 21, 2020

No, as say @agalakhov its don't realizing SCoA compression

Oh, I didn't know that Nicolas' driver wasn't complete 😞

For those coming from Google or the future: see also #20

@agalakhov
Copy link
Owner

agalakhov commented Nov 21, 2020

You're wrong. It DOES implement SCoA compression. But it does not handle control commands correctly. It just sends "magic sequences of bytes" without knowing anything about their meaning, which is not always correct. To get an idea:

	{
		unsigned char buf[] = {
			0x00, 0x00, 0xa4, 0x01, 0x02, 0x01, 0x00, 0x00, 0x1f, 0x1f, 
			0x1f, 0x1f,	0x00, 0x11, 0x03, 0x01, 0x01, 0x01, 0x02, 0x00, 
			0x00, 0x00, 0x70, 0x00, 0x78, 0x00, 0x50, 0x02, 0x7a, 0x1a, 
			0x60, 0x13, 0x67, 0x1b};
		write_command_packet_buf(0xa0, 0xd0, 0, 0, (unsigned char*)&buf, 34);
	}

Now we know that this configures paper dimensions, toner saving mode and paper thickness. Nicolas did not know that and just hardcoded the values he had on his configuration. They're wrong for some users.

The actual documentation of SCoA compression is in SPECS file and seems to be complete. The driver contains a working SCoA compression example (but please don't just copy and paste this code, it requires huge refactoring).

@ra1nst0rm3d
Copy link
Author

Very useful info, thx

@mounaiban
Copy link
Contributor

mounaiban commented Apr 11, 2022

👉 EDIT: the SCoA specs now have a page in my wiki, please check it out for more info.

@agalakhov I have been trying to understand SCoA for the past week, and I think I am beginning to get it. From what I figure:

  • SCoA is a Run-Length Encoding (RLE) scheme
  • It uses bit-sized opcodes and parameters with a format like opcode param_a param_b operand_a [operand_b]
  • Its 'instruction set' is as follows, according to Nicolas Boichat's SPECS, and the conversation in LBP-1120 not supported #20:
Command Opcode (bits) Param A Param B Operand A Operand B Illustration
Copy 0b00 bytes to copy (uint3_t) 0 uncompressed bytes -- 00aaa000 🅰️
Repeat 0b01 reps (uint3_t) 0 single repeated byte -- 01aaa000 🅰️
Repeat and Copy 0b11 reps (uint3_t) bytes to copy (uint3_t) single repeated byte uncompressed bytes to copy 11aaabbb 🅰️ 🅱️
Enter long command 0b101 length/reps (higher 5 bits) -- Long command -- 101aaaaa 🅰️
Long Copy 0b111 (only as operand of 'Enter long command') bytes to copy (lower 3 bits) 0 uncompressed bytes to copy -- 11aaa000 🅰️
Long Repeat and Short Copy 0b00 (only as operand of 'Enter long command') reps (lower 3 bits) bytes to copy (uint3_t) single repeated byte uncompressed bytes to copy 00aaabbb2 🅰️ 🅱️
Repeat last line or sub-buffer End of Line, dump buffer to output 0x41 -- -- -- -- 01000001

Don't judge me, uint3_t and uint5_t is the best C-esque name I can think of! 😅

Is my understanding correct, or am I way off?

Footnotes

  1. This bothers me a little, I think Long copy should have been 0b00 to be in line with short copy

  2. SPECS only mentions Long repeat with short copy where bbb is 001. Other values have not been tested.

@ra1nst0rm3d
Copy link
Author

@mounaiban Relying on Nicolas' code, looks like you on right way) U will be very useful) Thanks for your research)

@agalakhov
Copy link
Owner

It looks more or less correct. SCoA commands are not very consistent.

There is a "simple" possibility to check how it works. There is a binary in the stock closed-source Linux driver called "captfilter". Calling it with correct parameters generates a print job with some (not all) A0 commands and with correct compressed pages. You can then try to decompress the output based on your guesses and check if it is decompressed correctly. Look at https://github.com/agalakhov/anticapt/blob/master/filter.sh

@ra1nst0rm3d
Copy link
Author

ra1nst0rm3d commented Apr 11, 2022

@mounaiban I have some question for you)

  1. What do you mean under length in Copy cmd?
  2. Illustration of it for 0d7 will look like 00111000 00000111 ?

@mounaiban
Copy link
Contributor

mounaiban commented Apr 12, 2022

@ra1nst0rm3d In the Copy command, length means the number of bytes in the input, after the opcode, to copy to the output. Sorry for the unclear wording; I'm trying to figure out clear names for these things.
For your second question: are you trying to copy 7 bytes after the opcode in the output?

@agalakhov Thanks for the tip on the captfilter command. Now if only Canon had left behind some man pages on how to use that thing... 🤷‍♂️
Well, that doesn't stop me from going strings /usr/bin/captfilter though 😈

@ra1nst0rm3d
Copy link
Author

No, I understood you, thx.

@ra1nst0rm3d
Copy link
Author

@mounaiban Found some interesting commands from disassembled code of captfilter.


fcn.0804ec34("NOP_Command\n");
fcn.0804ec3c(var_1ch, 0x40);

fcn.0804ec34("EOP_Command\n");
fcn.0804ec3c(var_1ch, 0x42);

fcn.0804ec34("RepeatThenRaw_Command\n");
fcn.0804ec3c(var_1ch, (uVar8 & 7) << 3 | 0xc0 | uVar10 & 7);
fcn.0804ec3c(var_1ch, (uint32_t)(uint8_t)var_440h);
fcn.0804ec64(var_1ch, (int32_t)&var_440h + 1, uVar10);

fcn.0804ec34("RepeatX_Command\n");
fcn.0804ec3c(var_1ch, (uVar8 & 7) * 8 | 0xc0);

fcn.0804ec34("CopyLong_Command\n");
fcn.0804ec3c(var_1ch, iVar4 >> 3 & 0x1fU | 0x80);

fcn.0804ec34("CopyShort_Command\n");
fcn.0804ec3c(var_1ch, uVar8 & 7 | 0xc0);

fcn.0804ec34("EOL_Command\n");
fcn.0804ec3c(var_1ch, 0x41);

fcn.0804ec34("CopyThenRawLong_Command\n");
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (arg_10h & 7) << 3 | 0xc0 | in_EDX & 7);
fcn.0804ec64(*(int32_t *)(in_EAX + 0x42c), in_EAX + 9, arg_10h);

fcn.0804ec34("RepeatThenRawLong_Command\n");
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (in_EDX & 7) << 3 | 0x40 | arg_8h & 7U);
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (uint32_t)*(uint8_t *)(in_EAX + 8));
fcn.0804ec64(*(int32_t *)(in_EAX + 0x42c), in_EAX + 9, var_10h);

fcn.0804ec34("CopyThenRepeatLong_Command\n");
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (uVar1 & 7) << 3 | 0x80 | in_EDX & 7);
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (uint32_t)*(uint8_t *)(in_EAX + 8));

fcn.0804ec34("RepeatXLong_Command\n");
fcn.0804ec3c(*(int32_t *)(param_1 + 0x42c), (uVar1 & 7) << 3);

fcn.0804ec34("Extend_Comman\n");
fcn.0804ec3c(*(int32_t *)(param_1 + 0x42c), param_2 & 0x1f | 0xa0);

fcn.0804ec34("CopyThenRaw_Command\n");
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (arg_8h & 7U) << 3 | in_EDX & 7);
fcn.0804ec64(*(int32_t *)(in_EAX + 0x42c), in_EAX + 9, arg_8h);

fcn.0804ec34("CopyThenRepeat_Command\n");
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (arg_8h & 7U) << 3 | 0x40 | in_EDX & 7);
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (uint32_t)*(uint8_t *)(in_EAX + 8));

There can be some Hi-SCoA commands.

@mounaiban
Copy link
Contributor

mounaiban commented Apr 17, 2022

I'll just document this here before writing up a wiki page:

Generating SCoA/CAPT print data using captfilter

To generate a ready-to-send SCoA-compressed raster from a Portable Grey Map (PGM) file run:

captfilter --CNTblHalftone=0 --CNTblModel=0 input.pgm > out.scoa.capt

The --CNTblModel=0 is the part that selects SCoA compression.

Format: captfilter $OPTIONS $INPUT_FILE > $OUTPUT_FILE

All captfilter output is redirected to standard output, so you need to use the > to redirect the output to a file, or your terminal will be flooded with binary gunk. 🌊

Input formats accepted by captfilter

When you type captfilter --help you get a message that says something like

Usage: captfilter [switches] [pgm file name]

This implies that all Portable Gray Map images. I have only tested PGM P5 images, YMMV with P2 images. PBM P4 images seem to be accepted too, but I have not been able to confirm that P4s are handled correctly.

Lack of error messages

captfilter doesn't seem to be able to inform you when any switch/option is used incorrectly. Instead, empty files are written as output, or captfilter freezes until you press Enter.

Invalid options, including valid ones spelled with a different case (e.g. CNTblModel misspelled as CnTblModel), are ignored.

@mounaiban
Copy link
Contributor

@agalakhov Just a quick question: do you flip bytes in the SCoA bitstream? Or is it just the command and the packet size that's flipped to little endian?

For example, in the output of captfilter if you see:
a0 c0 06 01 20 53 0f 00 ff

Do you read it as
Command: 0xC0A0
Packet Size: 0x0106
Bitstream: 20 53 0f 00 ff => 00100000 01010011 00001111 00000000 (aka copy 53 0f 00 ff to output)
?

@agalakhov
Copy link
Owner

The stream is little-endian. My code tries to be machine-endianess-agnostic. The stream is always read byte-by-byte. Then, if we need an integer value of more than one byte, it is assembled in little-endian byte order.

That is, if we have

0x01 0x02 0x03 0x04 0x05 0x06

and have to interpret it as

byte, int16, byte, byte, byte

we get

0x01 0x0302 0x04 0x05 0x06

on both little-endian and big-endian machines.
Hope this answered your question.

@mounaiban
Copy link
Contributor

We may have discovered a few more things about the SCoA format, particularly a few new opcodes and the usage of the line buffer.

Long Repeat only Command

101aaaaa 10aaa000 🅰️

This command appears to repeat operand A into the line buffer, without any uncompressed suffix. It might have the same effect as Long Repeat and Short Copy 101aaaaa 00aaabbb 🅰️ 🅱️ where bbb == 0b000.

Line Buffer Usage

Boichat's SPECS document doesn't contain a detailed explanation of how the line buffer works, but it mentions 0x41 as End of Line (don't change the buffer). This is also confirmed in @ra1nst0rm3d's disassembly (see previous comments).

Here's how I think the SCoA line buffer works:

  1. There is a single buffer of N bytes, where N is declared in byte 26 & 27 in the very last 0xD0A0 command.

  2. Opcodes can only manipulate the buffer directly.

  3. The buffer is not set to an initial value at the start of a page, just allocated. An empty line must be manually zeroed with a series of Long Repeat 0x00's. Just three opcodes can cover a line on A4 😃

  4. A 0x41 command dumps the line buffer into the output, thus issuing multiple 0x41's in a row repeats a line.

  5. You don't have to describe the whole line, just end it early with a 0x41 if nothing has changed past the current pixel column in any previous lines 😃

  6. You can seek to a position in the buffer with special opcodes.

Here are some things I don't know:

a. What is printed when you do a 0x41 before zeroing out the buffer?

b. What happens if one fails to 0x41 before inserting too many bytes into the line buffer?

c. Does the buffer pointer reset to zero after a 0x41?

Line Buffer Pointer Opcodes and Delta Encoding

There are a few extra opcodes that cause the pointer to skip towards the right, but I don't yet know the exact format of the opcodes.

As far as I can tell, there are two "skip right and place raw byte" opcode (one short one long) and one "skip right only"

These opcodes are used to implement some kind of delta encoding when a line is similar to the previous one; instead of encoding the entire line, you can seek to where the line changed and patch.

@mounaiban
Copy link
Contributor

@agalakhov Thanks, I understand it now that there's nothing to flip in the bitstream/payload.

@ra1nst0rm3d
Copy link
Author

@mounaiban About three opcodes for A4 line buffer. It's looks like this?

bf b8 00
bf b8 00
aa b0 00

@mounaiban
Copy link
Contributor

@ra1nst0rm3d Well done, you have just drawn a complete blank line across an A4 sheet! 💯

Also:
bf b8 ff bf b8 ff aa b0 ff
41 41 41 41 41 41 41 (and so on for 7016 times)

@mounaiban
Copy link
Contributor

mounaiban commented Apr 26, 2022

I have confirmed the existence of the buffer pointer seek/skip opcode. There are three forms:

Short seek: skip right up to 7 bytes, place a single byte A

0x0b00001aaa 🅰️

Long seek: skip up to 255 bytes, place a single byte A

0b100aaaaa 0b00001aaa 🅰️

Extra long seek: skip 256 or more bytes, place a single byte A

Add 0x9F one or more times before 0b100aaaaa 0b00001aaa 🅰️ or 0x0b0001aaa 🅰️

Examples:
0x9F 0x9F 0b10010101 0b0001111 🅰️
0x9F 0b0001000 🅰️

Each 0x9F adds 255 248 bytes to the skip. (see later comments)

Master Samples

The use of the seek opcodes can be clearly observed in test pages that look like these:

half-diagonal-preview
Half diagonal filled

quarter-diagonal-preview
Half horizontal with quarter diagonal filled

My master samples were A4 x 600dpi, which works out to around 4958 x 7016 px. Please be aware that captfilter crops things as it sees fit.

Compressed Raster Preparation

I compressed the rasters to CAPT format with the following command on an Ubuntu 14.04 system with the Canon driver installed:

captfilter --CNTblModel=0 --Resolution=600 --PageSize=A4 --MediaType=PlainPaper $INPUT_FILE > $OUTPUT_FILE

PROTIP: switches and arguments are case-sensitive

Substitute $INPUT_FILE for the raw PBM P4 image, $OUTPUT_FILE with the name of the output file (taking care not to overwrite anything precious!)

Other Stuff

The exact behaviour of the seek opcode is not yet fully figured out. The following questions remain:

  • Does the seek put the pointer on or after the last skipped byte?

  • Which opcodes are allowed after a seek? Must it be a single byte, or can a repeat or copy opcode begin right afterwards?

I am starting to like how the 0x41 opcode coincides with the ASCII letter 'A', because it makes it very easy to see on the text view in hex dumps.

@ra1nst0rm3d
Copy link
Author

ra1nst0rm3d commented Apr 26, 2022

@mounaiban Can you rebuild your opcodes table with new information? This will help me to write compressor.

P.S Hmm... Can we just build first line, then push it to printer, get next line and check differences between them?

P.P.S Can we just init buffer every single band and push data to initialized buffer?

P.P.P.S This looks strange...

static void write_simple_byte(struct state *state)
{
	unsigned i = 1;

	while(state->input_buf[state->input_pos] != state->input_buf[state->input_pos + i] && i < 7) {
		i++;
	}

	push_byte(state, (i << 3), "simple_byte");

	for(; state->input_pos < state->input_pos + i; state->input_pos++) {
		push_byte(state, state->input_buf[state->input_pos], "simple_byte");
	}
}

@mounaiban
Copy link
Contributor

@ra1nst0rm3d I have compiled all known SCoA opcodes into a single document in my wiki, have fun!

I think you are meant to work on it like they do with video compression: start with a "key" line that encodes the whole line, then make "delta" lines that take the previous line and change it.

Also try compressing a sample that looks like this, using captfilter:

circle-preview

The sample makes use of all the opcodes we know so far...

P.S. I will look at the code later, I'm only getting started with this thing 😅

@mounaiban
Copy link
Contributor

mounaiban commented May 5, 2022

This is an urgent update about the 0x9f opcode, especially for @ra1nst0rm3d:

0x9f adds 248 bytes to the skip, not the other number as previously thought.

This incidentally is an answer to the contradiction that arises under the initial understanding of 0x9f. Recall the long seek opcode 0b100aaaaa 0b00001aaa skipping 0baaaaaaaa bytes.

If 0x9f meant adding 256 bytes, 0x9f 0x09 (0b10011111 0b00001001) would mean both "skip 256+1 bytes" and "skip 248+1 bytes". 🤯

Sorry if you got stuck, as this misunderstanding could be why you got stuck. I will update the wiki and my previous significant comments soon.

Notes about the function names found in the captfilter disassembly

Also, I think I am beginning to understand the names found in the disassembly for captfilter earlier in this thread.

Canon uses the words 'Raw', 'Repeat' and 'Copy'. While 'Repeat' is obvious, 'Raw' could mean 'uncompressed bytes' and 'Copy' could mean 'copy from the line buffer', contrary to the terminology I have been using so far.

To avoid potentially contradicting the original terms, I'll just use repeat for compressed bytes, and new for uncompressed bytes not yet written out, old for bytes in the buffer.

From my understanding of Hi-SCoA, I think there's just one big adaptive uncompress command. LZ77 (which Hi-SCoA is a specialisation of) attempts to compress everything; repeated segments don't have to be contiguous in LZ77, so there is no need for separate uncompressed and compressed segments and their separate commands.

@ra1nst0rm3d
Copy link
Author

@mounaiban captdriver works on FreeBSD with CUPS: printer responds on commands sent to him. Nicolas' driver don't work, because bug in kernel.

@ra1nst0rm3d
Copy link
Author

@mounaiban Can you generate PPD file for LBP1120? I think, I has some errors in this thing.

@mounaiban
Copy link
Contributor

@ra1nst0rm3d Thanks for testing the driver on FreeBSD. It's really great to know we can run on BSDs 👍
Which FreeBSD version, CUPS/libusb (or equivalent) version and architecture did you run it on? I'm going to update my wiki to document this.

As for the PPD, what errors are you getting? The PPD from your fork compiles on my test Ubuntu (14.04.6) system just fine.
I used the command ppdc -vd . src/canon-lbp.drv

@mounaiban
Copy link
Contributor

mounaiban commented May 9, 2022

⚠️ The table below may be out of date by the time you read this, please check out the SCoA Specifications on the captdriver wiki for the latest version

In the meantime I think I might have discovered more opcodes (or rather, new ways to use the ones we already know).
Here's what I have observed so far:

This is just a provisional list, we will find out for sure only once the SCoA decoder is done.

Opcode My Name (Alleged) Canon 1 Name Comment
0b01BBBAAA X old + repeat CopyThenRepeat A (1-7) bytes from previous line then B (1-7) repeated bytes. 🆕
0b00BBBAAA X0..Xn old + new CopyThenRaw A (1-7) bytes from previous line then B (1-7) uncompressed bytes. 🆕
0x9f extend_old_Long Extend Add 248 to the old byte count for old_Long+new and old_Long+repeat commands
0b100AAAAA 0b00BBBAAA X0..Xn old_Long + new CopyThenRawLong A (8-255) bytes from previous line then B (1-7) uncompressed bytes. Re-interpretation of recently known long seek command 🆕
0b100AAAAA 0b10BBBAAA 0b01BBBAAA X old_Long + repeat CopyThenRepeatLong A (8-255) bytes from previous line then B (1-7) repeated bytes 🆕
0b01AAA000 X repeat RepeatX A (1-7) repeated bytes. Could be "zero old + repeat" bytes.
0b11AAABBB X Y0..Yn repeat + new RepeatThenRaw A (1-7) repeated bytes, then B (1-7) uncompressed bytes 2
0b101AAAAA 0b10AAA000 X repeat_Long RepeatXLong A (8-255) repeated bytes only
0b101AAAAA 0b00AAABBB X Y0..Yn repeat_Long + new RepeatThenLong A (8-255) repeated bytes, then B (1-7) uncompressed bytes
0b101AAAAA 0b11AAA000 X0..Xn new_Long CopyLong 3 A (8-255) uncompressed bytes only
0x40 NOP NOP Dummy non-op. Not seen during tests but found in disassembly of captfilter
0x41 EOL EOL End of line. Dump line buffer to output, return line buffer pointer to zero.
0x42 EOP EOP End of page/picture. Don't decompress anything past this point.

Watch out for the A's and B's... in some opcodes B comes first!

Now that we have an interpretation for 13/14 opcodes, it's time to test the living daylights out of this thing... we still don't have a place for CopyShort though.

Footnotes

  1. pun intended 🤓

  2. Nicolas' original SPECS file mentions a value of 0b001 for B only.

  3. this would contradict Canon's definition of "Copy" so far, here it means "Copy from uncompressed part of input".

@ra1nst0rm3d
Copy link
Author

@mounaiban Your new table approves my write_simple_byte() and try_write_byterepeat() implementation.

@mounaiban
Copy link
Contributor

@ra1nst0rm3d I saw your commits over the past week, and it looks like it has a working SCoA and CAPT 1 implementation. Did you manage to print?

And what the hell is Numba? It's possessed! 🐍 💨
Tried it and it made my code 3x faster, and that's just using @jit. I wonder how fast I can go with the parallelisation and an Nvidia GPU/processor card. Thanks for the tip! 👍

@ra1nst0rm3d
Copy link
Author

ra1nst0rm3d commented May 16, 2022

@mounaiban I tried, but it won't work at all)IMG_20220516_220411.jpg
This is CUPS test page ))0)

@mounaiban
Copy link
Contributor

@ra1nst0rm3d is the upper left corner of the picture the upper right of the page?

Maybe the printer is not getting the correct line width or image size? That's my wild guess seeing how the pixels are staying on one side of the page and going off the bottom while staying the same width throughout...

My other guess is that the printer is not getting the correct code for long runs, but I think this is less likely 🤔

I am also questioning my understanding of old_Long + new and old_Long + repeat; I suspect that

  • old_Long+new could be 0b100PPPPP 0b00PPPQQQ X0..Xn 0bPPPPPPPP bytes from last line, then QQQ uncompressed bytes

  • old_Long+repeat could be 0b100PPPPP 0b10PPPQQQ X 0bPPPPPPPP bytes from last line, then QQQ repeated bytes

contrary to the table above. Try the above interpretations if the current table doesn't seem right...

@mounaiban
Copy link
Contributor

I have just added a SCoA decoder to the studycapt repo, but it's not quite working correctly yet. I'm trying to figure out if there was an error in the data extraction from the captfilter job files or if our understanding of the algorithm is still not 100% correct.

This is an attempt to decompress the circle sample page:

Screenshot from 2022-05-19 23-23-02

Note that the circle has been squashed, and the glitched lines are remarkably consistent. The black region at the bottom indicates missing pixels. Were long runs cut short? I don't know yet, but I may have made my first NFT 🤑

@ra1nst0rm3d
Copy link
Author

@mounaiban Thanks for your SCoA decompressor) This will give a huge impact to development)

@mounaiban
Copy link
Contributor

@ra1nst0rm3d You're welcome 😄
Remember to pull the latest fix, there's at least one typo that is messing up the decompression.

scoa.SCoADecoder.decode() git:ef9c0d output

The latest fix as of mounaiban/studycapt@ef9c0d1 improves the accuracy of the decompression, but we're still not 100% (which we need, because lossless)

There is one opcode that's eluding me: in the A4-sized circle sample from sample_blots.py, the SCoA-compressed version of the sixth line of the circle reads: 9f 83 a4 9e ff 08 fe 41. The two bytes that I suspect to contain pixel data are ff and fe. On the circle sample, bytes containing both set and unset pixels can only occur up to twice per line, and set and cleared pixels are contiguous.

If that is the case, everything I thought I knew about old_Long or 0b100PPPPP 0b10QQQPPP could be wrong.

@ra1nst0rm3d
Copy link
Author

Circle not full black? If it full black, pixel data will be 'ff'.

@mounaiban
Copy link
Contributor

mounaiban commented May 21, 2022

This is how we unpack 9f 83 a4 9e ff 08 fe 41:

Update: what I said earlier, I take that back. Check this post history for the original version.

9f => 0b10011111 (old_Long, add 248 to count)
83 => 0b10000011 (old_Long, add 0b00011 << 3 (24))
a4 => 0b10100100 (old_Long (add 0b100 (4) and repeat 0b100 bytes (4))
Dump a total of 276 bytes from the previous line, repeat 4 bytes
9e => 0b10011110 (mystery opcode 🕵️‍♂️, some kind of modifier?)
ff => (byte to be repeated)
It looks like 0x9f has to be interpreted as a separate opcode. I would call it a form of old_Long.
All of the drama above presumably fill the inside of the circle.

08 => 0b00001000 (put just one byte)
fe => (byte to be placed)
This would patch the right edge of the circle

41 => end of line

Our mystery of the week (hopefully we can solve it in a week) is: what does 0x9e do, and when does Canon use it?

@mounaiban
Copy link
Contributor

@ra1nst0rm3d The circle is fully filled black. The test page is generated by sample_blots.py:

./sample_blots.py --mode circle --size a4 --resolution 600 --format p4 --out_file circle-a4.pbm

The sample is then compressed with:
captfilter --CNTblModel=0 --Resolution=600 --PageSize=A4 circle-a4.pbm > circle-a4.capt

Currently at a loss trying to figure out these mystery opcodes that captfilter shoves, when it feels like it, between the opcode and the operands, like in the example above.

Because my decompressor doesn't handle them, they cause the wrong bytes to be repeated or passed, and the remaining data gets misinterpreted as opcodes.

The mystery opcodes aren't always 0x9f, so we have to figure out another way to detect them. Wish me luck! 👼

I hope these mystery codes aren't needed to make the printer usable...

@ra1nst0rm3d
Copy link
Author

ra1nst0rm3d commented May 23, 2022

Sorry, but I'm suspending development of captdriver to ~middle of Jule, because I'm giving final exams on school and I'm going to university this autumn :) Maybe, I will push some my decisions to my fork, but it will be a bit unstable.

P.S Good luck, @mounaiban. I think, that you discover all opcodes and I write final version of SCoA compressor

@mounaiban
Copy link
Contributor

@ra1nst0rm3d All the best for your exams! 🖖

@mounaiban
Copy link
Contributor

mounaiban commented May 29, 2022

I believe we have found another opcode, and it's the longest one so far 🦒 :
0b100UUUUU 0b101YYYYY 0b10ZZZWWW P
and it means 0bUUUUUWWW old bytes from the previous line followed by 0bYYYYYZZZ repeats of byte P.

I'm inclined to call it old_Long + repeat_Long in the meantime. It's Canon name might be CopyThenRepeatLong, presumably with an 8-bit argument value.

This also solves this week's Mystery Opcode: 9f 83 a4 9e ff
9f => 248 old bytes from the previous line
83 a4 9e => 0b10000011 0b10100100 0b10011110, 0b00011110 (30) more old bytes + 0b00100011 (35) repeated bytes
ff => byte to be repeated

This has been verified with the original uncompressed A4 600dpi circle sample. The sixth line of the circle is 287px long, and from our mystery opcode, 35 * 8 == 280. There are seven more pixels from the following opcode which places a single fe byte, bringing up the count to a matching 287.

SCoA Decoder WIP

UPDATE: We have successfully decompressed test samples. The SCoA decoder now works correctly on test samples at time of writing. Further work on the decompressor is expected to be mostly validation! validation! validation!👏 and will continue as far as long as this driver is relevant.

scoa-decoder-circle-success
This is a decompressed image, I swear! The black bar at the bottom are stand-ins for missing pixels due an incorrect height declaration in the output P4 bitmap.

@ra1nst0rm3d
Copy link
Author

I passed my first Unified State Exam in the russian language. It's been very easy!
Anyway, good job @mounaiban. I'm passing next two exams within June and I'm getting back to work!

@mounaiban
Copy link
Contributor

@ra1nst0rm3d Well done, sounds like you were well prepared for your exams! ✌️

There's yet another opcode, and I'd call it old_Long + new_Long. It makes perfect sense to have an uncompressed counterpart to old_Long + repeat_Long, I guess 👯

The instruction bit layout is like:
0b100UUUUU 0b101YYYYY 0b11ZZZWWW P0..Pn
and it means 0bUUUUUWWW old bytes from the previous line followed by P0..Pn uncompressed and of length 0bYYYYYZZZ. The first two bits of the third byte is 11 instead.

I will post the updates when I get the chance...

@ra1nst0rm3d
Copy link
Author

Can we assume that second bit of third byte switch between new and repeat?

@mounaiban
Copy link
Contributor

If you are thinking one bit at a time and in terms of binary trees, yes.
But if you are reading multiple bits and treating it as a flag, that works in this case but it actually isn't the correct way to read it (even if it is more efficient).

SCoA appears to be a prefix, variable-length code. The funky opcode numbers might be a way of making sure the opcodes are easy to process, without confusing them for another code, when reading one bit at time.

I didn't use the "correct" approach with my SCoA decoder. When decoding, I think the decompressor is supposed to use a function pointer binary tree to switch between decompression operations. Reading one bit at a time, the bits determine how the tree is navigated. I might do that for a C implementation someday...

@mounaiban
Copy link
Contributor

mounaiban commented Jun 13, 2022

Just when I thought we knew all the opcodes, another new one turns up...

I'll call this one repeat + new_Long, and it looks like 0b101WWWWW 0b01XXXYYY R Z0..Zn
Repeat R 0bXXX times, then dump 0bWWWWWYYY uncompressed bytes Z0..Zn

But wait, there's more!

The new_Long and repeat_Long opcodes were in fact:

old + new_Long: 0b101WWWWW 0b11XXXYYY Z0..Zn, 0bYYY bytes from previous line, then 0bWWWWWXXX uncompressed bytes Z0..Zn

old + repeat_Long: 0b101WWWWW 0b10XXXYYY R, 0bYYY bytes from the previous line, then 0bWWWWWXXX repeats of byte R

UPDATE: the SCoA decoder is now pretty much beta-quality AFAIK, I have been able to decompress all of my test samples so far.
Check out this commit: mounaiban/studycapt@f49052ead12a4f7cc5331d495f7365c7fc143561.

The SCoA specs on my wiki have also been updated. Hopefully, we have completely charted the SCoA and preserved the specs for future generations, like just in case someone stumbles upon an LBP800 in 2077...

@ValdikSS
Copy link

Is anyone still interested in adding LBP-810/1120 support into this driver? @mounaiban, maybe?
I have LBP-1120 in good condition which I can share over the network if anyone willing to work on it.

@mounaiban
Copy link
Contributor

mounaiban commented Nov 28, 2023

Thanks for reaching out @ValdikSS; LBP-810 and 1120 (and possibly LBP3200) support is still part of the master plan, but we're just short on time. I don't know when we will be able to get back onto working on the driver, but in the meantime, I have updated the SCoA Specifications, particularly on the treatment of previous lines, to hopefully be easier to read.

@ra1nst0rm3d has made a fork specially for working on SCoA support. The driver was reportedly able to communicate with the printer, but the output was not correct. Maybe give it a try if you know C programming?

@ValdikSS
Copy link

ValdikSS commented Nov 28, 2023

I've ended up writing my own pbmraw → CAPTv1 converter as a replacement for proprietary captfilter. My goal is to run it on ARM single-board computer, while qemu-i386 gave too low performance for captfilter, about a single page in 30 seconds. Big thanks for your specification on the wiki, it helped a lot, but also had to reverse-engineer the binary as well.

By the way, CAPT implementation from Boichat Nicolas produces incorrectly encoded stream (sporadic black lines, also detected by your decoder). Haven't tried @ra1nst0rm3d encoder or lbp800 encoder.

@ValdikSS
Copy link

@mounaiban, how could I contact you in private? Your email is hidden in the commits. Please write me at [email protected]

@ValdikSS
Copy link

ValdikSS commented Jan 3, 2024

@mounaiban
https://github.com/mounaiban/captdriver/wiki/SPECS:-SCoA-Specifications
CopyThenRepeat (0b01YYYXXX C) accepts 2-7 repeats of C, not 1. Otherwise the printout is garbled.

@mounaiban
Copy link
Contributor

@ValdikSS thanks, I've updated the specs. Sorry I haven't had the time to open up an email for inquiries for my projects, but I feel that I have to do pretty soon; there are some issues that aren't really meant to be publicly discussed... 🙊

@ValdikSS
Copy link

@mounaiban, I'd like to offer paid captdriver improvement for you (or any other developer willing to do that job).
I'm making a print server which I want to be the print server with good obsolete hardware support.

@mounaiban
Copy link
Contributor

@ValdikSS to go faster than 30s/page, you might have to encode the page at the rasteriser as the pixels are being output, which means writing a Ghostscript device.

As for payment, I don't feel it's the right time to accept payment or donations as the driver isn't quite consumer-friendly yet. Payments may be accepted in the future as reimbursement for development costs (and toner!), but not before we achieve 100% layout accuracy (enough for cutting and drilling templates) and at least 75% the speed of the Canon drivers.

@ValdikSS
Copy link

to go faster than 30s/page

I've switched to box86 for x86 emulation, it's about 4 times faster than qemu, so it's now 6-7 seconds per page on original driver. I've dropped my encoder implementation (although it worked pretty fine).

As for payment, I don't feel it's the right time to accept payment or donations as the driver isn't quite consumer-friendly yet.

I'm not offering a donation, quite the contrary: I'm willing to sponsor the development.

@ValdikSS
Copy link

Here is my converter implementation if it of any help. Comes with some comments and headers.
It's battle-tested on many documents, but only on A4 format.
captfilter.zip

@RWAP
Copy link

RWAP commented Aug 1, 2024

@ValdikSS to go faster than 30s/page, you might have to encode the page at the rasteriser as the pixels are being output, which means writing a Ghostscript device.

As for payment, I don't feel it's the right time to accept payment or donations as the driver isn't quite consumer-friendly yet. Payments may be accepted in the future as reimbursement for development costs (and toner!), but not before we achieve 100% layout accuracy (enough for cutting and drilling templates) and at least 75% the speed of the Canon drivers.

I am also interested in developing a printer emulation for the Canon CAPT based printers (yes, I am awkward). It works the other way around, so interpreting the data sent to a parallel port (intended for an early Lasershot LBP-1310) and then creates a PDF of the printed page.

Is anyone interested in this type of code - would be a good way of testing the printer driver too

@mounaiban
Copy link
Contributor

@RWAP LBP-1310 uses LIPS, not CAPT. LIPS is supported by GhostScript in most Linux installations. On an up-to-date GNU/Linux system with CUPS+GS, when you type gs --help into your terminal you should see lips2p thru lips4 and lips4v in the list of devices. GS LIPS output is ready for use with the LBP-1310, all you need is a PPD, or IPP adaptor software.

On the other hand, the much smaller LBP-1210 is a CAPT device.

On the topic of CAPT-to-PDF: all you will get is a PDF with a bitmap for every page, because CAPT doesn't have a page description language. Think of CAPT like fax on USB or IP.

Meanwhile, a LIPS-to-PDF routine has more utility, but I don't know any tool that does such a thing. There's a Japanese document for LIPS on the LBP-1310, but I can't read! 🙈

For details, please check out this discussion on my repo: mounaiban#37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants