• 0

2 Questions about Hash Strings


Question

When I generate a hash (say SHA256 or SHA512) , the hash string is composed of a combination of a-f and 0-9.

 

  1. Is there a way to generate a hash string that is composed of a-z, A-Z and 0-9? 
  2. Is there away to control what characters are used so if I only wanted m-z, A-L, 0-9 and "-_=*^#@!()[]{}<>;:,.?" that would be a possibility?
Link to comment
https://www.neowin.net/forum/topic/1369666-2-questions-about-hash-strings/
Share on other sites

11 answers to this question

Recommended Posts

  • 0

You could base36 encode the hash output to give you a string composed of a-z0-9 (or write a simple custom cipher to map to whatever set of characters you want) but I can't think of a reason why you would want to do this?

  • 0

Understanding why it's needed is my business. But thank you.

 

Secondly ,PHP does this when creating a session. You are able to customize it's sid_bits_per_character to 6, which does a-zA-Z and 0-9; thus I assumed there is a method to specify what values you want to be included as the components of the hash.

  • 0

You assumed wrong. What PHP is doing is completely independent of the hashing method, it's simply taking the bits returned from (any) hashing method and rather than displaying them as a hexadecimal representation it's encoding them into a string using a character set of their choosing, just as I said you could do:

 


static char hexconvtab[] = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,-";

static void bin_to_readable(unsigned char *in, size_t inlen, char *out, size_t outlen, char nbits) /* {{{ */
{
	unsigned char *p, *q;
	unsigned short w;
	int mask;
	int have;

	p = (unsigned char *)in;
	q = (unsigned char *)in + inlen;

	w = 0;
	have = 0;
	mask = (1 << nbits) - 1;

	while (outlen--) {
		if (have < nbits) {
			if (p < q) {
				w |= *p++ << have;
				have += 8;
			} else {
				/* Should never happen. Input must be large enough. */
				ZEND_ASSERT(0);
				break;
			}
		}

		/* consume nbits */
		*out++ = hexconvtab[w & mask];
		w >>= nbits;
		have -= nbits;
	}

	*out = '\0';
}

https://github.com/php/php-src/blob/master/ext/session/session.c#L269

 

 

I asked why because I hope you're not using this for security purposes, based on the fact that you had to ask this question in the first place you're more likely to end up reducing security rather than increasing it. 

Edited by ZakO
  • Like 1
  • Thanks 1
  • 0

1. No you can only have one of A-Z or a-z.  This can be done by encoding the string that your hashing algorithm to something that is hex.

2. No, this defeats the point of a hash.  I can't think of a good reason for doing this.

  • 0
16 hours ago, Fahim S. said:

1. No you can only have one of A-Z or a-z.  This can be done by encoding the string that your hashing algorithm to something that is hex.

2. No, this defeats the point of a hash.  I can't think of a good reason for doing this.

Thanks.

It's funny when I make inquiries, rather than answering I am offered personal opinions of understanding.  

 

While you offered some answers, you added your ego (or lack of worldly experience) in to the mix.  You do not need to know why I want something.  The comment of "I can't think of a good reason for doing this" is naive and immature.  Of course you cannot think of a good reason to do this; it's because you haven't lived my life; surely you understand that. But mostly that comment is completely a relevant.

 

In future, just answer the question and don't interject your immaturity in to your response.

 

Cheers mate.

Edited by Brian Miller
  • 0
16 hours ago, ZakO said:

You assumed wrong. What PHP is doing is completely independent of the hashing method, it's simply taking the bits returned from (any) hashing method and rather than displaying them as a hexadecimal representation it's encoding them into a string using a character set of their choosing, just as I said you could do:

  



static char hexconvtab[] = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,-";

static void bin_to_readable(unsigned char *in, size_t inlen, char *out, size_t outlen, char nbits) /* {{{ */
{
	unsigned char *p, *q;
	unsigned short w;
	int mask;
	int have;

	p = (unsigned char *)in;
	q = (unsigned char *)in + inlen;

	w = 0;
	have = 0;
	mask = (1 << nbits) - 1;

	while (outlen--) {
		if (have < nbits) {
			if (p < q) {
				w |= *p++ << have;
				have += 8;
			} else {
				/* Should never happen. Input must be large enough. */
				ZEND_ASSERT(0);
				break;
			}
		}

		/* consume nbits */
		*out++ = hexconvtab[w & mask];
		w >>= nbits;
		have -= nbits;
	}

	*out = '\0';
}

https://github.com/php/php-src/blob/master/ext/session/session.c#L269

 

 

I asked why because I hope you're not using this for security purposes, based on the fact that you had to ask this question in the first place you're more likely to end up reducing security rather than increasing it. 

 

Thanks dude, that's what I thought too.  I like your idea of the Base encoding it, I may use Base56.

 

The reason I had asked is because I wanted to learn about forming such strings, and not necessarily for any foolish attempt at security.  The formation of BitCoin addresses such as "1BoNtSLRHtKNngkdx3e0bR7gb53L3TtpYt" first peeked my curiosity, then when I discovered PHP sessions can also include  a-zA-Z and 0-9 when setting it's sid_bits_per_character to 6 prompted me to enquire with learned people here.

 

  • 0
6 hours ago, Brian Miller said:

Thanks.

It's funny when I make inquiries, rather than answering I am offered personal opinions of understanding.  

 

While you offered some answers, you added your ego (or lack of worldly experience) in to the mix.  You do not need to know why I want something.  The comment of "I can't think of a good reason for doing this" is naive and immature.  Of course you cannot think of a good reason to do this; it's because you haven't lived my life; surely you understand that. But mostly that comment is completely a relevant.

 

In future, just answer the question and don't interject your immaturity in to your response.

 

Cheers mate.

Err... ok. 

 

As we are offering tips to one-another let me give you one: a bit of context can go a long way in getting an answer.  Developers like solving problems, and without detail of the underlying problem it is difficult to help. 

 

The comment was intended to probe for context (suggest you read about the 5 whys) so that I can try my best (within the bounds of my knowledge) to help you come to an answer quicker.  I apologise for the negative impression that you drew from it.

  • Like 2
  • 0
7 hours ago, Brian Miller said:

Thanks.

It's funny when I make inquiries, rather than answering I am offered personal opinions of understanding.  

 

While you offered some answers, you added your ego (or lack of worldly experience) in to the mix.  You do not need to know why I want something.  The comment of "I can't think of a good reason for doing this" is naive and immature.  Of course you cannot think of a good reason to do this; it's because you haven't lived my life; surely you understand that. But mostly that comment is completely a relevant.

 

In future, just answer the question and don't interject your immaturity in to your response.

 

Cheers mate.

 

Mate, I see your post count and reputation, but this doesn't mean you should act like an a**hole  You are not paying those people to have such expectations for their answers. From my point of view those were relevant and polite answers and doing their best to help you.

 

...and yes adding my ego is perfectly fine on public forum.

 

Have a nice day!

  • 0

Hello,

 

SHA-256 and SHA-512 output their results in hexadecimal notation, which is why you see 0-9 and a-f used in the results--those are the sixteen digits which compose hexadecimal notation.

 

Instead of having to re-write the hashing algorithms to provide your own numbering system, perhaps it would be better to use something like SSDeep, instead, which supports a larger encoding set?

 

Regards,

 

Aryeh Goretsky

 

  • 0
On 8/5/2018 at 10:26 PM, Brian Miller said:

You do not need to know why I want something.

You clearly do not understand how forums work... That you got the answers you got is way more than I would ever in a million years given you..  With such a comment when asked why..

  • 0

Just encode the hash, shortest practical encoding I can find is base85.

 

Encoding: input -> SHA256/512 -> base85

Decoding: base85 -> SHA256/512 -> Find input data with hash

 

There are multiple common base encodings: base2(1), base10(2), base16(3), base32, base36, base58, base64(4), base85, base91(5), base128(6)

 

Above base encodings have a default character set of X characters that are being used for encoding, but it's possible to replace those with your own character set.

 

  1. Base encoding of binary data (0 & 1)
  2. Base encoding of a decimal number (0-9)
  3. Base encoding of hexidecimal string like a SHA512 hash (A-F0-9)
  4. Base encoding commonly used for encoding binary data to a string to embed it in websites
  5. Base encoding with most printable characters
  6. Base encoding of a byte and ascii string

 

BUT

If you actually try to encode your hash you will find the string doesn't become shorter ?

 

Original:

seahorsepip

SHA512: 

F9AA2F6D639C026E3325F31247E8253987D6EC6EEC7E93764F9F3CC25D08FABA7DF95FAF94779CACF22D72F96EEE88D46C90A8CE727944218A1DC272EDA29084

base85: 

mMA+2gdBe&hzWuXfFUKPgCZ^zmL=+Og=E&.gbQ<Fi5:k-mme$4mmf15iwSPLg!6F{gEBI+hafbTmNovbh:*a}hax(3iw-VNiyu81mLV.!h.)&.hBQ56i5<[email protected].)rLg=c%xi6/n!lOZOQmmoA%iwrAH

 

Why doesn't it become shorter?

When you create a hash from data it returns a hash string in the hexadecimal format, also known as base16.

So when you encode the hash as a string using base85 you actually tell the base85 encoder that the input is an ascii string (base128), so that means you're encoding a base128 string to a base85 string which results in a longer instead of shorter string!

 

How to fix this?

Make sure to actually let the base85 encoder know that the input format is base16.

So to do that you can actually convert the hex string to bytes(base128) with hex2bin in php for example and use those bytes as input for the base85 encoder.

This means the encoding would be: input -> SHA256/512(base16) -> bytes(base128) -> base85

 

Original:

seahorsepip

SHA512: 

F9AA2F6D639C026E3325F31247E8253987D6EC6EEC7E93764F9F3CC25D08FABA7DF95FAF94779CACF22D72F96EEE88D46C90A8CE727944218A1DC272EDA29084

base85: 

}kM8+w1iQrgBqS9n9zlVHT$*f)0+dwpOf+^t)QqrEFElNLY@U0[?1[TzTJtgy(>QvA^p4@IxfMO)v]X}

And even shorter base91 (only 1 char shorter in this example):

?e#a_*!Og$d0Rh"Y4Qx.=8}^zpmb~^B4aGWI;`W=?}5&b`B3w0Exl`S[GYF#fG9.1,vcLH]LR%LxhzQ

 

And the encoded string is now shorter :D

 

Php libraries to do this:

hex2bin: http://php.net/hex2bin

base85: https://github.com/tuupola/base85

 

So php code to create a shorter hash:

$shorterHash = $base85->encode(hex2bin(hash($file)))

 

Update:

Seems like you wanted to create a custom base56 encoding, to do that we could manually create functions encode and decode it:

$base56_digits = '0123456789ABCDEFGHIJKLMNOPQRSTVWXYZabcdefghijklmnopqrstv';
$custom_digits = 'mnopqrstvwxyzABCDEFGHIJKL0123456789-_=*^#@!()[]{}<>;:,.?';

function encode($base16) {
    global $base56_digits, $custom_digits;

    $base56 = base_convert($base16, 16, 56);
    $custom = strtr($base56, $base56_digits, $custom_digits);

    return $custom;
}

function decode($custom) {
    global $base56_digits, $custom_digits;

    $base56 = strtr($custom, $custom_digits, $base56_digits);
    $base16 = base_convert($base56, 56, 16);

    return $base16;
}

But above doesn't work since php base_convert is limited to base36 :(

Instead you can use a magnificent 3rd party library: https://github.com/ArtBIT/base_convert

 

And then you have:

$custom_digits = 'mnopqrstvwxyzABCDEFGHIJKL0123456789-_=*^#@!()[]{}<>;:,.?';

function encode($base16) {
    global $custom_digits;

    return math\base_convert($base16, 16, $custom_digits);
}

function decode($custom) {
    global $custom_digits;

    return math\base_convert($custom, $custom_digits, 16);
}

Original:

seahorsepip

SHA512: 

F9AA2F6D639C026E3325F31247E8253987D6EC6EEC7E93764F9F3CC25D08FABA7DF95FAF94779CACF22D72F96EEE88D46C90A8CE727944218A1DC272EDA29084

Custom base56: 

n<^(q8}=_G@x0;B1]K6zD-DF*96yE-6L#_>K8vJ},vCz02m,8yB][4qA^12>.pw>2-?_m,{0L<qFCK:K,2@04)3s:

 

TL;DR

All data is encoded in a specific base, data can be represented as a shorter string by increasing it's base and can be respresented with a smaller character dictionary by decreasing it's base.

 

Oftopic:

Quote

Stop the bickering back and forth, we're here to learn things and help each other, if someone doesn't want to share why he wants to do something then that's his right.

Though that doesn't mean that you have to be rude about it, if you don't want to share the why, let others know in a respectful manner.

 

  • Like 2
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Zed 1.7.2 has landed with updated OpenCode models, bug fixes and other improvements by David Uzondu Zed 1.7.2 recently landed on the stable release channel, bringing a host of AI-related features including automatic context compaction and settings-based skill management, along with other things like better Markdown preview rendering and custom git commands in the graph view. Starting with the AI stuff, the developers introduced "/compact", a command that basically summarizes your conversation history on demand. This tool prevents your active chat window from hitting token limits by compressing older parts of the dialogue into a brief overview. In addition to that, the team relocated skill management to the settings UI, improving how the application communicates errors regarding those skills, and updated the OpenCode model roster to support DeepSeek V4 Flash, MiniMax M3, Qwen 3.7 Plus, and Nemotron 3 Ultra Free. External agent users can also monitor context window cost metrics and delete individual sessions directly from their history. Right-clicking ref labels in the git graph now opens a context menu that runs different actions against selected targets, kind of how VS Code does it. Here are some of the bug fixes this new release brings: The active agent fails to auto-select when creating a new git worktree. A scrollbar unexpectedly appears on wrapped code blocks in the agent chat. Collapse indicators for project headers appear when performing sidebar searches. Bracketed ellipsis title prefixes fail to show the ellipsis icon properly. Project icons render incorrectly in the recent projects picker. Diff hunk controls appear inside non-editable commit view multibuffers. The software update button hangs indefinitely on the downloading stage. Restoring an agent terminal in a remote project triggers a sudden crash. Splitting a pane that contains an active commit view causes a crash. Linux Wayland freezes when trying to read the clipboard from laggy external apps. Zed is a "newish" code editor trying to break the massive stronghold VS Code has on the developer community. Funny enough, the editor was created by former GitHub employees who worked on the Atom text editor (which Microsoft killed in 2022, several years after it bought GitHub). The project officially hit version 1.0 back in April, introducing platform parity for Windows and Linux alongside deep support for DeepSeek-V4-Pro.
    • 26H2 absolutely will support ARM Windows just not on devices that came with 26H1. This is evident by the fact I am running 26H2, which on my MacBook Neo and Surface Pro 12 (inch), within a VM.
    • Mp3tag 3.35 by Razvan Serea Mp3tag is a powerful and yet easy-to-use tool to edit metadata (ID3, Vorbis Comments and APE) of common audio formats. It can rename files based on the tag information, replace characters or words from tags and filenames, import/export tag information, create playlists and more. The program supports online freedb database lookups for selected files, allowing you to automatically gather proper tag information for select files or CDs. Mp3tag supports the following audio formats: Advanced Audio Coding (aac) Free Lossless Audio Codec (flac) Monkeys Audio (ape) Mpeg Layer 3 (mp3) MPEG-4 (mp4 / m4a / m4b / iTunes compatible) Musepack (mpc) Ogg Vorbis (ogg) OptimFROG (ofr) OptimFROG DualStream (ofs) Speex (spx) Toms Audio Kompressor (tak) True Audio (tta) Windows Media Audio (wma) WavPack (wv) Mp3tag 3.35 changelog: This version introduces a new Files options page, enhanced toolbar customization, support for RF64 WAV files, improved Discogs and MusicBrainz tag sources, and many other improvements and fixes. See the Release Notes for more details. Download: Mp3tag 64-bit | 5.7 MB (Freeware) Download: Mp3tag 32-bit | 5.2 MB Link: Mp3tag Homepage | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • The FIFA World Cup is not US centric.
    • It’s amusing how Microsoft is pushing IT admins as if this was a major, game-changing update. In reality, it’s just an enablement package that bumps the build number, which is disappointing compared to the more substantial 22H2 and 24H2 releases. Technically, 25H2, 26H1, and the upcoming 26H2 are essentially the same, differing only in support schedules. They could have included the Windows K2 improvements here, but chose not to. The era of Windows being in the backburner continues, and this 26H2 release feels like an afterthought. Shame, Nadella, shame.
  • Recent Achievements

    • Week One Done
      AMV earned a badge
      Week One Done
    • One Month Later
      AMV earned a badge
      One Month Later
    • Collaborator
      ryansurfer98 went up a rank
      Collaborator
    • One Month Later
      Eurosoft10 earned a badge
      One Month Later
    • Week One Done
      Eurosoft10 earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      523
    2. 2
      +Edouard
      172
    3. 3
      PsYcHoKiLLa
      78
    4. 4
      Steven P.
      72
    5. 5
      Michael Scrip
      71
  • Tell a friend

    Love Neowin? Tell a friend!