edit: get the script for yourself here https://codeberg.org/CritBase111/Comicify
Someone sent me the comic you see attached to the post, and I thought, hey, that seems like something you could automate. Put 4 images in a folder, run a script, and it automatically lays them out in a 2x2 grid.
With Crush and deepseek it basically writes itself. I'm actually kinda surprised at how well it's doing. There was just one bug that I sent to Claude to fix.
I'm putting the examples in an imgur link because they're still quite heavy on disk space and I don't want to overwhelm the server: https://imgur.com/a/bTiHV4b (yes the pictures make it kinda hard to understand but I'll try to make a real comic with it eventually)
It's difficult to get other people hyped about a work in progress lol but hopefully this makes sense.
It's a python script and this is how you'd run it:
options:
-h, --help show this help message and exit
--folder FOLDER Path to folder containing comic panels
--panels {1,2,3,4} Number of panels to arrange (1-4). Not required when using --all
--gutter GUTTER Gutter size between panels in pixels (default: 15)
--border BORDER Border size around panels in pixels (default: 0)
--border-color BORDER_COLOR
Border color (CSS color name or hex code, default: black)
--gutter-color GUTTER_COLOR
Background/gutter color (CSS color name, hex code, or "transparent"|"none", default: white)
--margin MARGIN Margin multiplier relative to gutter size (default: 2.0, meaning 2x gutter)
--layout {horizontal,vertical,h,v}
Layout direction for 2 and 3-panel comics (default: horizontal). Use "h" or "v" as shorthand.
For 4-panel, use "h" or "v" for linear layouts, omit for 2x2 grid.
--all Generate all possible combinations (1, 2, 3, 4 panels with all layouts)
Examples:
python Comicify.py --folder "Comics/My first comic" --panels 4
python Comicify.py --folder "Comics/My first comic" --panels 4 --gutter 20
python Comicify.py --folder "Comics/My first comic" --panels 4 --layout h
python Comicify.py --folder "Comics/My first comic" --panels 2 --layout v
python Comicify.py --folder "Comics/My first comic" --all
python Comicify.py --folder "Comics/My first comic" --panels 3 --gutter-color none
If this doesn't speak to you basically you would download the .py file, and run one of the examples above. The --all flag makes all possible outputs (1, 2, 3 or 4 panel in all layouts). I also plan on adding "--reverse" to handle right-to-left order. you can see the examples in the imgur link, I included the sample code for some of them.
Deepseek wrote all of this, and it only costs like... a dollar or so. It's not the best at coding but it's so, so cheap that I fear I may spend all my savings on it lol. I didn't write a single line of code, just told it what I wanted (with more or less details) and it got it right on the first try. Now I'm just adding new functionalities to it.
I think one HUGE thing with LLM coding is it can go beyond what you know, so even if you can code it can do things you wouldn't necessarily do or know about.
I think this has uses for agitprop, if you want to make quick memes in a comic format for online or offline use. This is just one part of potentially a much bigger pipeline - I was thinking of having a simple gui afterwards you can send the output to, where you can add speech bubble stickers and write into them too, or like for one panel comics just write under it. Also with LLMs you can easily transpose this code to use javascript instead, or whatever else you want.
Oh if you send me a few pictures to comicify I can run them and show you the results. Script will not be available for download until it's fully ready.
Yeah im not sure either. It def searches Wikipedia, which eventually lead it to marxists.org for some questions. I wonder how hard it would be to build search tools for other sites. Could make a prolewiki tool. I've already spent 0.50$ fucking around ๐ . I have some other ideas to try out today. Cleaning up OCR text from a PDF for example. The text is normally to large to upload but with an agent it could break the task up. Ive also tried using llms to jump start an slq query in a DB im not fully versed in but have a data dictionary for. Since this thing can grep text files it can scan a massive data dictionary for table names and field names. Just giving it a promt like "tell me about this project" is intersting to watch. It gets a directory listing, reads files, summarizes the whole thing. With the project I was tinkering with, I had it periodically keep its own notes in a MD file to reference the next session.
It's way more useful then just interacting with a chat window. I wish i had a web interface though just for spell checking lol.
I just made a community earlier today to share stuff about using crush: https://lemmygrad.ml/c/crushagent
There's MCPs you can install and a whole list here https://mcpservers.org/ (both proprietary and local) but the one I tried to install didn't really work and I'm not sure what the problem is lol. MCPs allow your agent to communicate with specific websites in specific ways, there might be one for the MIA but I got one for arxiv for example, and I'm sure there's bound to be a wikipedia one. It can help enhance what the LLM is able to scrape since it provides a structured way to access it.
Lol I hate seeing that number go up but imagine how much you'd have spent with GPT or claude! Instantly blown through 10 dollars or more with those two.
Actually for the pdf try mistral API, it has a free option (but they train on your inputs). I could send you my script, it's meant for translation but all it does is break a text file in chunks, send a chunk to mistral API with a custom prompt attached, then save the result to an output text file. With crush you could have it repurpose the script for your needs. Or abbyy OCR if you have windows (I can also send it to you), it's probably the best OCR tool currently.