Length of Human DNA in bytes


Recommended Posts

It's a very valid question and I have a very good reason for asking it. I will post more about my reasoning in another tread later on.

I had a look on Yahoo Answers and I read this: http://answers.yahoo.com/question/index?qi...25135324AApHMlU

I wonder if the Human Genome Project have the full file for us to download? If they RAR it up, the file should compress nicely for about 50MB.

Why would you need 12 bytes per base pair? There can be only four types of pairing (AT, TA, GC, CG). So, one pair can be encoded in two bits. [say, 00=AT, 01=TA, 10=GC, 11=CG]. Since a male human has 3080 Million base pairs in his DNA, it translates to 6160000000 bits or a little more than 734 MB. And for the human female with 3022 base pairs in her DNA, its 6044000000 bits or a little more than 720 MB.

That's of course assuming writing it in binary. Writing it in ASCII will be 8 times larger, and using Unicode will be two or four times larger than the ASCII text file. I don't know how well it will get compressed.

Edit: Wikipedia gets a big larger file size.

Edit 2: While I was calculating, virtorio beat me with the WP link.

Also, does that sequence include the RNA sequence too?

Central dogma of genetics: DNA -> RNA -> Protein. The short story is that RNA is transcribed from DNA (the genome).

So what are you asking for exactly? The entire genome as in all DNA contained in a human cell's chromosomes? Total coding regions? Total protein-encoding genes? Total DNA and products such as mRNA, siRNA, miRNA, tRNA, rRNA?

Say, 00=AT, 01=TA, 10=GC, 11=CG

If you're willing to simplify and assume Watson-Crick base-pairing, then you may as well take only the total genome length (as opposed to total genome length x 2 to account for dsDNA), since A "always" pairs with T, and C "always" base pairs with G. If you want to get more complicated, you will have modified bases, ranging from methylation, acetylation, and a few other base analogues, as well as non-W/C pairing.

Also note the presence of mobile DNA elements such as (retro)transposons. They can move or replicate and insert themselves back into the genome. You also get all sorts of other crazy events like site-specific recombination that can result in duplication or excision of a stretch of DNA from a replicating chromosome, so really, the take-home message is that it is a freaking miracle that anything manages to live.

the take-home message is that it is a freaking miracle that anything manages to live.

lol, this^

I think there are quite large issues with the HGP (or anyone else) publishing somebody's genome, I doubt they'd allow any Joe Bloggs to download it.

edit: "All genome sequence generated by the Human Genome Project has been deposited into GenBank, a public database freely accessible by anyone with a connection to the Internet."

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Linux 7.2's first release candidate gets off to a good start by Paul Hill Credit: Larry Ewing It has been a few weeks since the release of Linux 7.1, and in that time, the Linux 7.2 merge window has been open, where developers can submit their features and patches ready for the upcoming release. That window is now shut, and the release candidate phase has begun so that new features can be tested and further fixes applied. According to the founder of Linux, Linus Torvalds, this week’s release candidate looks “reasonably normal”. Although we are super early in the release candidates, this is a good sign as it makes it more likely that an eighth release candidate will not be needed. Torvalds even mentioned that the update’s stats are only larger than they really are because there was another AMD header drop with a third of the patch just being AMD GPU register definitions, which aren’t big changes but make the code contributed look larger overall. In addition to this, he noted that just over half the patch is drivers, even when excluding the AMD register dump. The rest of the changes are spread out over architecture updates, tooling, documentation, and core kernel updates. In the next week, Torvalds says that he will be chilling out, taking the week “mostly off”. Despite this, he will be reading emails and keeping up with things, so if he is slow responding, now you know why. He said he is hoping for a calm week, but we will just have to see if the second release candidate is actually like that. We should expect seven or eight release candidates before Linux 7.2 is released, so expect it around the end of August. If you missed it a few weeks ago, be sure to check out our coverage of Linux 7.1's release.
    • Ridiculous claim that the labor cost difference of $6000 annually would increase cost per phone by $200. The employees produce 3 phones per month or what?
    • Sparkle 2.20.1 by Razvan Serea Sparkle is a free, open-source Windows optimization tool designed to make your PC faster, cleaner, and more private. With Sparkle, you can easily debloat Windows by removing unnecessary apps and services, disable Microsoft tracking to enhance privacy, and apply performance tweaks to boost speed. Its cleaner removes junk and temporary files, while every change is safe and fully reversible. Sparkle also features a modern, user-friendly interface with automatic updates, making system maintenance simple. Explore over 39 tweaks, from disabling telemetry and hibernation to optimizing network and game settings, all aimed at customizing and enhancing your Windows experience. Sparkle supports Windows 10 and 11. Sparkle 2.20.1 changelog: You can now change the Animation Direction from Up, Left, or Off. Added configurable animation direction (Up, Left, Off) for improved accessibility Added TTL caching to the system info backend Refactored tweak application flow to await NvidiaProfileInspector Improved IPC listener cleanup to correctly remove specific listeners Fixed online status not updating after successful network requests Updated system info tests to support backend caching Removed electron-toolkit utils dependency in favor of internal is.dev helper Fixed unwanted files and folders being included in application bundles Download: Sparkle 2.20.1 | Portable | ~100.0 MB (Open Source) Links: Sparkle Website | Github | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • Never used the G7 Pro, but I've never had a good experience with that style of d-pad and fighting games.
    • And I just bought a seat cushion for my mesh chair. The chair feels nice but the first time I sat in it with boxers, I realized I don't like the feel of mesh on my legs. 😂
  • Recent Achievements

    • One Month Later
      JKR earned a badge
      One Month Later
    • Dedicated
      Asgardi earned a badge
      Dedicated
    • Conversation Starter
      jessse3334 earned a badge
      Conversation Starter
    • Reacting Well
      JuvenileDelinquent earned a badge
      Reacting Well
    • One Month Later
      Excellence2025 earned a badge
      One Month Later
  • Popular Contributors

    1. 1
      +primortal
      496
    2. 2
      +Edouard
      247
    3. 3
      PsYcHoKiLLa
      154
    4. 4
      Steven P.
      86
    5. 5
      macoman
      65
  • Tell a friend

    Love Neowin? Tell a friend!