Length of Human DNA in bytes


Recommended Posts

It's a very valid question and I have a very good reason for asking it. I will post more about my reasoning in another tread later on.

I had a look on Yahoo Answers and I read this: http://answers.yahoo.com/question/index?qi...25135324AApHMlU

I wonder if the Human Genome Project have the full file for us to download? If they RAR it up, the file should compress nicely for about 50MB.

Why would you need 12 bytes per base pair? There can be only four types of pairing (AT, TA, GC, CG). So, one pair can be encoded in two bits. [say, 00=AT, 01=TA, 10=GC, 11=CG]. Since a male human has 3080 Million base pairs in his DNA, it translates to 6160000000 bits or a little more than 734 MB. And for the human female with 3022 base pairs in her DNA, its 6044000000 bits or a little more than 720 MB.

That's of course assuming writing it in binary. Writing it in ASCII will be 8 times larger, and using Unicode will be two or four times larger than the ASCII text file. I don't know how well it will get compressed.

Edit: Wikipedia gets a big larger file size.

Edit 2: While I was calculating, virtorio beat me with the WP link.

Also, does that sequence include the RNA sequence too?

Central dogma of genetics: DNA -> RNA -> Protein. The short story is that RNA is transcribed from DNA (the genome).

So what are you asking for exactly? The entire genome as in all DNA contained in a human cell's chromosomes? Total coding regions? Total protein-encoding genes? Total DNA and products such as mRNA, siRNA, miRNA, tRNA, rRNA?

Say, 00=AT, 01=TA, 10=GC, 11=CG

If you're willing to simplify and assume Watson-Crick base-pairing, then you may as well take only the total genome length (as opposed to total genome length x 2 to account for dsDNA), since A "always" pairs with T, and C "always" base pairs with G. If you want to get more complicated, you will have modified bases, ranging from methylation, acetylation, and a few other base analogues, as well as non-W/C pairing.

Also note the presence of mobile DNA elements such as (retro)transposons. They can move or replicate and insert themselves back into the genome. You also get all sorts of other crazy events like site-specific recombination that can result in duplication or excision of a stretch of DNA from a replicating chromosome, so really, the take-home message is that it is a freaking miracle that anything manages to live.

the take-home message is that it is a freaking miracle that anything manages to live.

lol, this^

I think there are quite large issues with the HGP (or anyone else) publishing somebody's genome, I doubt they'd allow any Joe Bloggs to download it.

edit: "All genome sequence generated by the Human Genome Project has been deposited into GenBank, a public database freely accessible by anyone with a connection to the Internet."

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • If I could, I would commemorate it the best way possible: Replacing old machines that are still running Windows XP with something more modern, stable and better.     Noone and nothing should be running Windows XP in 2026.
    • Google's new hand-wave reCAPTCHA can be bypassed with a stock photo by Ivan Jenic Image: Screenshot Google is testing a new reCAPTCHA method that asks you to wave at your camera to prove you're human. So, besides solving puzzles and reading distorted text, you can now use your computer’s camera to pass the verification test. When the hand gesture verification is triggered, your browser asks for camera access and prompts you to perform a simple gesture, like a wave or an open palm. Google says it records a short video of the movement and uses AI to extract 21 hand-knuckle coordinates to complete the verification process. The video is then immediately deleted, and Google swears it doesn't keep it. The process alone can be uncomfortable for people who wouldn’t want their biometric data, which hand scans technically qualify as, recorded. But it gets even more nuanced, as early testers discovered that the new hand-waving reCAPTCHA can be passed with a simple stock image. A user on X tested the new challenge using a stock image of a hand fed through OBS Virtual Camera, and it passed. I wanted to verify it, so I tried the same thing. It took me a few tries and a few stock images, but in the end, I was also able to pass the test. I simply had to readjust the stock image of a generic person waving inside OBS, and Google’s mechanism registered it as a legitimate hand gesture. Once again, it didn’t even have to be a video or an AI-generated hand animation. Given the simplicity of the process, the entire action can be automated in minutes. All it takes is a simple Python script to render the new reCAPTCHA method obsolete. And it doesn’t even have to be an AI bot, which is usually used for solving puzzles and other verification methods. The new reCAPTCHA method is still in its early phase, and Google will, hopefully, update its AI to at least reject still images. However, this incident, combined with users’ initial skepticism about Google’s practices regarding user data, likely won’t make too many people wave at the camera anytime soon.
    • 🤣🤣🤣🤣🤣 "to fund healthcare and tuition" 🤣🤣🤣🤣 Who do you think you are talking about, some COMMUNIST? We are better than them, doG bless Murica!!! p.s. I'm from a country where government does exactly that, i.e. not form US.
    • Apparently not. I know it is on Edge for business at the moment, but how long will it be before it become on the home version of Edge?
  • Recent Achievements

    • First Post
      carols23 earned a badge
      First Post
    • One Month Later
      Tom Willson earned a badge
      One Month Later
    • Apprentice
      Asgardi went up a rank
      Apprentice
    • One Month Later
      sunrisea2milk earned a badge
      One Month Later
    • Week One Done
      sunrisea2milk earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      498
    2. 2
      +Edouard
      257
    3. 3
      PsYcHoKiLLa
      155
    4. 4
      Steven P.
      89
    5. 5
      macoman
      66
  • Tell a friend

    Love Neowin? Tell a friend!