Posted | Modified

Back in May 27, 2020 Microsoft published a blog post describing a new feature in Notepad:

when you see an asterisk in the title bar you’ll know you have unsaved changes

I’m using Windows 10 20H2 (OS Build 19042.928) and most of the time this feature works as described. For example, when you open a text document in Notepad and change the text, an asterisk appears in the title bar to indicate there are unsaved changes.

However, when you open a text document and change the text by using the Delete key to delete the character ahead of the cursor, the asterisk does not appear in the title bar. If you delete the character using the Backspace the asterisk appears though.

Notepad adds the asterisk to the title bar via the notepad!UpdateTitle function. This function is not being called when pressing the Delete key. However, it is being called when pressing other keys, for example, the Backspace key.

notepad!UpdateTitle can be triggered by keys that do not edit text, for example by pressing the the arrow keys. When you press Delete key the asterisk does not appear, but when you press an arrow key afterwards the asterisk in the title bar now does appear.

Posted | Modified

Why Do Conditional Breakpoints Slow Down Debugging?

When a breakpoint is hit a context switch between the target process and the debugger takes place. The debugger evaluates the expression of the conditional breakpoint, and based on the result of evaluation, the execution of target either continues or halts.

If the debugger keeps evaluating the expression of the conditional breakpoint to continue execution because of too many breakpoint hits, this will result in too many context switches between the target process and the debugeer. The cost of the switches add up, leading to slow execution of target.

Besides nuisance, the slowdown can be extremely inefficient, often to the point that makes the approach of using conditional breakpoint entirely unpractical.

Unless you can change the use of the conditional breakpoints towards acceptable performance. That is to reduce the number of context switches, which can be done by minimizing the number of breakpoint hits.

Are There Redundant Breakpoints That Can Be Removed?

Breakpoints that used to be useful but which now are no longer useful may have been accidentally left there. These unneeded breakpoints can cause unnecessary hits when running the target in the debugger and so causing slow execution. While it is obvious to remove or disable unneeded breakpoints, it is sometimes unnoticed.

Can the Conditional Breakpoint Be Replaced With One or More Non-Conditional Breakpoints?

Practically, there are numerous locations to place breakpoints to break near the desired program state. However, the location which more often hit is less advantageous than the location which less often hit during execution. If you can discover suitable location which less often hit, and put the breakpoint there, you can achieve better or perhaps significantly better performance.

You can replace a conditional breakpoint with one or more non-conditional breakpoints, or with another conditional breakpoint at a location which is less often hit during execution.

For example, if you want to place a conditional breakpoint which is using the return value of the function but the function is called way too many times before the desired value is returned, and this causes performance hit you can try with the followings. Find the location in the code where the desired return value is set, and place a simple non-conditional breakpoint there, rather than where the function returns the value. If there are few of those locations that should not be an issue either because you can place non-conditional breakpoints at each location.

Can the Breakpoint Be Split Into Two Breakpoints?

Another way to reduce the unnecessary breakpoint hits is to issue two breakpoints from within different program states, in two steps. That is to place the second conditional breakpoint, after the first conditional breakpoint has been reached.

For example, when there is a loop inside another loop, and you want to place the breakpoint inside the inner loop, but by doing so the breakpoint gets too many hits leading to excess slowdown that makes debugging unpractical, consider to use two breakpoints. The first breakpoint is placed inside the outer loop, before the entry point of inner loop with the conditions to halt execution that we reached the right inner loop. Additional research might be needed to what are the right conditions for the breakpoint. When the breakpoint is hit, from this place we issue the second breakpoint for the inner loop. And the next breakpoint hit we should reach the right program state within the inner loop.

Why does it work? Image if the outer loop has 1000 iterations and the inner loop has 1000 iteration as well. Imagine the wanted breakpoint is at outer loop 500 and inner loop 500. If we only put breakpoint in the inner loop that costs us 500500 condition checks or context switches to reach the right program state. However, if we use two breakpoints, that will costs us only 1000 condition checks or context switches. That is more than 500 times more efficient.

Can the Breakpoint Be Limited to a Specific Thread?

The code location of the breakpoint may or may not be executed by different threads.

If the code location of the breakpoint is executed by different threads and it can be determined that the breakpoint should be applied to one specific thread then it is reasonable to apply the breakpoint to that specific thread rather than to all threads which is normally done by debuggers unless specified.

Can the Breakpoint Be Replaced With Data Access Breakpoint?

Data access breakpoint or hardware breakpoint is supported by processor and so it allows to break into the debugger not just on execution but also when a specific address is being read or written.

If the execution breakpoint or the traditional INT 3 breakpoint can be ditched for a data access breakpoint that might be something worth considering to reduce the number of breakpoint hits.

For example, let’s assume there is a loop, and the loop incrementally iterates over a memory range to read every single byte of that range to do some computation on the bytes. For whatever reason, you want the debugger to break-in when the 10000th byte is being read. You can put a conditional breakpoint inside the loop that would monitor each read and compare the iteration count, which is sub-optimal. Instead a more optimal solution would be to put a data access breakpoint on the right memory address that is the address of the 10000th byte. This solution is more optimal because the breakpoint would only hit when needed.

Do You Consider Setting a Breakpoint on Function Entry Point?

Sometimes it is useful to think about why you want to set the conditional breakpoint on the entry point of the function. This is because setting the breakpoints on the callers rather than on the entry point of calle is advisable to reduce unnecessary breakpoint hits. And if you can narrow down the callers you might be able to reduce the unnecessary breakpoint hits.

Is Modifying the Binary Code in Memory a Viable Option?

The idea is to inject code which performs the condition check. If the conditions meet the debugger breaks in.

This approach involves to find a location for the condition check at which point the code needs to be patched to redirect execution to a yet unused executable region. The patch should not cross basic block boundaries for stability reasons. The executable region should contain the condition check and of course the original instructions overwritten by the patch as well as the restoration of register context. The non-conditional breakpoint should be put on the code path which is reached when the conditions meet.

Since the condition check is being done by the target, there are no unnecessary breakpoint hits and context switches, and so no significant slowdown.

Is the Source Code of the Target Available?

If the source code of the target is available to modify and the debugging can be performed on the modified source code, that can be advantageous.

Simply, add the break condition to the source code, rebuild the code, and when launching the debugger just add the non-conditional breakpoint to the location which is reached when the condition evaluates true.

Can Changing the Input Speed Up Execution?

If you understand the input data at some level, by modifying it, you may or may not be able to influence that how many times the location of conditional breakpoint is being executed.

For example, if normally the conditional breakpoint would hit 10000 times, and you may be able to modify the input in a way that after the modification the breakpoint would hit only 10 times that might be a great news.

Is Debugging From Within a Virtual Machine Is a Viable Option?

Being able to perform debugging from within a virtual machine has its advantages.

Virtual machines allow to take and restore snapshots. If a program state is reached on the expense of performance, it might be a good idea to take a snapshot of that program state. The snapshot will allow to restore the same program state without performance bottleneck when it is needed.

Can You Ditch the Debugger for a DBI Framework?

If the workaround does not provide acceptable debugging performance, which can happen specially when automating debugging tasks, you might want to consider ditching the debugger for a DBI framework. Pin, DynamoRIO and Frida are the most well-known DBI frameworks in the industry and the latter is gaining popularity recently due to its usage via scripts.

Final Note

There is no silver bullet to find the right approach, and often the combination of ideas work better than a single idea.

Identifying great breakpoint locations can be time consuming. It is up to the researcher’s discretion to invest further time into researching better breakpoint locations and conditions, or bear the longer execution. It is possible that sometimes it is worth bearing the one-off slow execution for an overall faster project completion.

Posted | Modified


Compression algorithms benefit from delta encoding, which makes the data more compressible. Delta encoding of correlated ASCII decimal strings improves the overall compression ratio, thereby providing a considerable benefit to compression algorithms.

ASCII Decimal String

An ASCII decimal string consists of a set of printable characters, which range from codes 0×30 to 0×39 that represent the digits from 0 to 9.

Delta Encoding ASCII Decimal Strings

Textual and semi-textual data can contain ASCII decimal strings that have the same digit lengths and are correlated. For example, the 8-digit decimal strings are in an incremental order as shown in Figure 1.

12345678 12345679 12345680 12345681

Figure 1: Correlated Decimal Strings

This understanding is useful for delta encoding decimal strings. The first decimal string has the base value, and each subsequent decimal string has the value difference from the previous value. The encoding is performed in a digit-by-digit manner rather than as a whole integer. The encoding can produce results similar to the one shown in Figure 2.

12345678 00000001 00000011 00000001

Figure 2: Delta-encoded Decimal Strings

In real-world data, there may be other data between the decimal strings, such as strings as shown in Figure 3.

ID 12345678 ID 12345679 ID 12345680 ID 12345681

Figure 3: Correlated Decimal Strings with Strings In Between

The delta coder recognizes the decimal strings regardless of location. The coder ignores the other strings during the encoding process. The encoding can result in a display similar to the one shown in Figure 4.

ID 12345678 ID 00000001 ID 00000011 ID 00000001

Figure 4: Delta-encoded Decimal Strings with Strings In Between

In real-world data, there may be other data between the decimal strings, such as decimal strings with different lengths as shown in Figure 5.

ID99 12345678 ID100 12345679 ID101 12345680 ID102 12345681

Figure 5: Correlated Decimal Strings with Decimal Strings In Between

The delta coder performs multiple passes on the data and encodes decimal strings with different lengths separately. In Figure 6, the coder has encoded decimal strings with 8 digits. Similarly, decimal strings with different lengths are encoded independently.

ID99 12345678 ID100 00000001 ID101 00000011 ID102 00000001

Figure 6: Delta-encoded Decimal Strings with Strings and Decimal Strings In Between

Delta encoding is beneficial for compression algorithms because it increases the occurrence of the same characters. In this example, it is the zero character. For this reason, many compression algorithms can compress data more efficiently. However, if the decimal strings are not correlated, the delta encoding process may not improve the compression ratio, or it could even make it worse.

Experimental Analysis

In this experiment, the efficiency of delta encoding is tested using a real-world PDF (Portable Document Format) file, which is a semi-textual format and represents the varying degrees of correlated ASCII decimal strings.

Ten delta encoding methods are applied to the sample. Delta encoding procedures range from encoding 1-digit decimal strings to 10-digit decimal strings. One combined delta encoding method is applied to the sample. It combines the five delta encoding methods that are the strongest individually. Seven compression methods are applied to the original and delta-encoded files. Each method is set to the maximum compression ratio.


Combined delta encoding contributes to varying degrees of compression improvements ranging from 3.5 percent to 18.8 percent as shown in Table 1. PPMd, BZip2, and RAR achieve 18.8 percent, 15.8 percent, and 14.7 percent improvements respectively. ZPAQ achieves the least, but still the 3.5 percent result is noteworthy. Without delta encoding, ZPAQ produces the smallest file, followed by LZMA and Brotli. With combined delta encoding, ZPAQ produces the smallest file, but Brotli achieves a better result than LZMA. Delta encoding with 2, 4, 5, 6 and 10 digits improves the compression ratio. However, delta encoding with 1 and 9 digits deteriorates the compression ratio.

Table 1: PDF Sample Results
Delta encoding method ZIP:Deflate BZip2 7z:PPMd 7z:LZMA 7z:Brotli RAR ZPAQ
None (original sample) 6,798,779 6,769,657 7,013,053 6,040,012 6,151,065 6,750,897 5,692,034
1-digit 6,804,274 6,788,146 7,042,361 6,057,956 6,167,812 6,773,170 5,704,985
2-digit 6,753,716 6,721,548 6,977,163 6,034,762 6,134,957 6,722,578 5,680,818
3-digit 6,807,662 6,765,354 7,029,890 6,076,672 6,170,911 6,808,052 5,718,458
4-digit 6,747,452 6,734,268 6,942,803 6,032,576 6,134,263 6,708,992 5,685,910
5-digit 6,369,822 6,252,086 6,397,184 5,863,166 5,884,288 6,283,388 5,631,217
6-digit 6,649,723 6,609,305 6,804,199 5,975,727 6,055,567 6,596,946 5,668,890
7-digit 6,798,901 6,769,197 7,012,649 6,039,961 6,150,558 6,751,004 5,692,130
8-digit 6,798,790 6,769,490 7,012,362 6,041,421 6,150,680 6,750,935 5,692,051
9-digit 6,798,795 6,769,657 7,013,069 6,040,028 6,151,081 6,750,913 5,692,042
10-digit 6,591,545 6,456,421 6,579,591 5,943,496 5,987,127 6,452,858 5,598,654
Combined of 2, 4, 5, 6, 10 digits 5,920,912 (12.9%) 5,699,267 (15.8%) 5,696,300 (18.8%) 5,668,295 (6.2%) 5,580,474 (9.3%) 5,760,317 (14.7%) 5,494,643 (3.5%)

The size of the original PDF file is 20,642,355 bytes. The file size in yellow indicates that the compression ratio is better after individual delta encoding.


The delta encoding presented in this analysis is lossless and reversible. The delta encoder used in this experiment can undo the delta encoding to restore the original data.


The experiment provides conclusive evidence for the benefits of delta encoding correlated ASCII decimal strings, which increases the occurrence of the same decimal characters and improves the overall compression ratio.


Posted | Modified


Compression algorithms benefit from pre-processing, aka filtering, which makes data more compressible. Pre-processing printable Unicode text measurably improves the overall compression ratio, thereby providing a significant benefit to compression algorithms.

Pre-processing Unicode Text

Unicode text is characterized by a pattern, which can be visualized on a Hex editor as shown in Figure 1. A printable character follows each zero (non-printable) character. Thus, even-offset characters are printable characters, while odd-offset characters are zeroes.

48 00 65 00 6C 00 6C 00 6F 00 20 00 77 00 6F 00  H.e.l.l.o. .w.o. 
72 00 6C 00 64 00 21 00                          r.l.d.!.

Figure 1: Unicode Text

This understanding is useful for organizing textual data. Bytes at even and odd offsets could be presented sequentially. Placing printable characters before non-printable characters could result in a display shown in Figure 2.

48 65 6C 6C 6F 20 77 6F 72 6C 64 21 00 00 00 00  Hello world!.... 
00 00 00 00 00 00 00 00                          ........

Figure 2: Pre-processed Unicode Text

Compression algorithms benefit from the better organization of data. Algorithms including the LZ (Lempel-Ziv) algorithm can find matches easily, for instance, while the zeroes are more compressible because the RLE (Run-Length Encoding) algorithm can be effectively applied on them.

Experimental Analysis

In this experiment, the efficiency of the pre-processor is tested using real-world and random data. Real-world data includes data from a Windows registry (.REG) file, while random data includes random printable characters. Both files have the same sizes. Six compression methods are applied to the original and pre-processed samples. Each method is manually set to the maximum compression ratio.

Real-world Data Results

Pre-processing contributes to varying degrees of compression improvements ranging from 3.72 percent to 43.87 percent as shown in Table 1. The RAR algorithm outperforms all the other algorithms for the pre-processed data. ZPAQ and ZIP:Deltate also achieve very good results. The 7z:LZMA algorithm is the most effective for the original data and still achieves good result for the pre-processed data.

Table 1: Real-world Data Results
Uncompressed ZIP:Deflate BZip2 7z:PPMd 7z:LZMA RAR ZPAQ
Original 82,097,412 2,923,818 1,926,411 1,607,819 1,116,793 2,080,856 1,205,937
Pre-processed 82,097,412 2,328,555 1,854,655 1,469,189 1,007,911 1,167,924 694,266
Gain 20.36% 3.72% 8.62% 9.75% 43.87% 42.43%

Random Data Results

Findings for the random data show that ZIP and RAR achieve 16.4 percent and 11.53 percent respectively, while the pre-processor for BZip2 and ZPAQ provide negligible improvements.

Table 2: Random Data Results
Uncompressed ZIP:Deflate BZip2 7z:PPMd 7z:LZMA RAR ZPAQ
Original 82,097,412 40,812,580 33,951,889 34,784,486 35,666,811 38,877,871 33,679,718
Pre-processed 82,097,412 34,102,961 33,947,916 34,593,752 34,268,387 34,394,031 33,661,096
Gain 16.4% 0.01% 0.55% 3.92% 11.53% 0.06%


The experiment provides conclusive evidence for the benefits of pre-processing printable Unicode text, which makes the data better organized and improves the overall compression ratio.


Posted | Modified

Numerous data formats specify CRC32 (Cyclic Redundancy Check 32-bit) checksum for some part of the data for detecting accidental data corruption. For example, the PNG (Portable Network Graphics) file type is such a format. In addition to documented data formats, undocumented formats can also contain CRC32 checksum of some kind.

These data formats, among many constructs, contain a block of data and a field for the CRC32 value for that block of data.

One of the recognizers implemented in BinCovery is to find CRC32 value-block pairs.

This is the description of the algorithm for finding CRC32 value-block pairs.

The recognizer starts reading UInt32 values from the binary data from offset 0. After each read, it moves the data pointer towards the end of window. It creates a list and adds each UInt32 value to that list.

The recognizer calculates the CRC32 checksum for all continuous blocks within the current window, starting from data offset 0. Then it starts matching each of these CRC32 checksums to each UInt32 value from the list. If there is a match, we have found a CRC32 value-block pair. The result will be saved, and the matching resumes to find further pairs.

The recognizer can be configured to find CRC32 values on data offset that is a multiple of four (alignment). Also, it can be configured to calculate the checksum for continuous blocks that match minimum and maximum size criteria. Using these criteria improves the processing speed and reduces the false positive result.

When the scan is finished in the current window, the window is moved forward to continue processing the rest of the data.

The recognizer can be configured to find little and big endian CRC32 values. Also, it can perform CRC32 calculation with various polynomials.

Practical Importance

Reverse engineering and digital forensics. Numerous data formats guard an internal header by a checksum. A header usually carries key information regarding the way to access the data. Examples of such information include offset and size. When you want to locate key information, you may look for a header in the data. You can try looking for such headers by finding CRC32 value-block pairs.

Data mining. You can run the recognizer on a set of files to separate files with CRC32 blocks from files without such blocks. When necessary, you can set the recognizer to use a particular CRC polynomial. You can specify the location, minimum and maximum length of CRC block to find the files with the best matches.

Data compression. The data guarded by a checksum is likely to be different than other parts of the data. Separating different kinds of data allows the best compression method to be applied for each part. This approach can lead to a better overall compression ratio. Additionally, running a pre-processor on certain parts of the data — rather than on the whole data — can also improve the compression ratio.

Fuzz testing. Mutating the checksum-guarded-data can be a good idea when you update the checksum field in the data. Excluding the data guarded by checksum from fuzzing and focusing on a different part of the data is also an approach one may consider.