added a subset of Crypto++ 5.6.0 with 48% faster ASM SHA-256, combined speedup 2.5x faster vs 0.3.3, thanks BlackEye for figuring out the alignment problem

2010-07-27 20:43:55 +00:00 · 2010-07-27 20:43:55 +00:00 · 3dd20ff2f8
commit 3dd20ff2f8
parent 9f35575ca3
26 changed files with 6093 additions and 895 deletions
--- a/cryptopp/License.txt
+++ b/cryptopp/License.txt
@ -0,0 +1,67 @@
+Compilation Copyright (c) 1995-2009 by Wei Dai.  All rights reserved.
+This copyright applies only to this software distribution package 
+as a compilation, and does not imply a copyright on any particular 
+file in the package.
+
+The following files are copyrighted by their respective original authors,
+and their use is subject to additional licenses included in these files.
+
+mars.cpp - Copyright 1998 Brian Gladman.
+
+All other files in this compilation are placed in the public domain by
+Wei Dai and other contributors.
+
+I would like to thank the following authors for placing their works into
+the public domain:
+
+Joan Daemen - 3way.cpp
+Leonard Janke - cast.cpp, seal.cpp
+Steve Reid - cast.cpp
+Phil Karn - des.cpp
+Andrew M. Kuchling - md2.cpp, md4.cpp
+Colin Plumb - md5.cpp
+Seal Woods - rc6.cpp
+Chris Morgan - rijndael.cpp
+Paulo Baretto - rijndael.cpp, skipjack.cpp, square.cpp
+Richard De Moliner - safer.cpp
+Matthew Skala - twofish.cpp
+Kevin Springle - camellia.cpp, shacal2.cpp, ttmac.cpp, whrlpool.cpp, ripemd.cpp
+
+Permission to use, copy, modify, and distribute this compilation for
+any purpose, including commercial applications, is hereby granted
+without fee, subject to the following restrictions:
+
+1. Any copy or modification of this compilation in any form, except
+in object code form as part of an application software, must include
+the above copyright notice and this license.
+
+2. Users of this software agree that any modification or extension
+they provide to Wei Dai will be considered public domain and not
+copyrighted unless it includes an explicit copyright notice.
+
+3. Wei Dai makes no warranty or representation that the operation of the
+software in this compilation will be error-free, and Wei Dai is under no
+obligation to provide any services, by way of maintenance, update, or
+otherwise.  THE SOFTWARE AND ANY DOCUMENTATION ARE PROVIDED "AS IS"
+WITHOUT EXPRESS OR IMPLIED WARRANTY INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE. IN NO EVENT WILL WEI DAI OR ANY OTHER CONTRIBUTOR BE LIABLE FOR
+DIRECT, INCIDENTAL OR CONSEQUENTIAL DAMAGES, EVEN IF
+ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+
+4. Users will not use Wei Dai or any other contributor's name in any 
+publicity or advertising, without prior written consent in each case.
+
+5. Export of this software from the United States may require a
+specific license from the United States Government.  It is the
+responsibility of any person or organization contemplating export
+to obtain such a license before exporting.
+
+6. Certain parts of this software may be protected by patents.  It
+is the users' responsibility to obtain the appropriate
+licenses before using those parts.
+
+If this compilation is used in object code form in an application
+software, acknowledgement of the author is not required but would be
+appreciated. The contribution of any useful modifications or extensions
+to Wei Dai is not required but would also be appreciated.
--- a/cryptopp/Readme.txt
+++ b/cryptopp/Readme.txt
@ -0,0 +1,429 @@
+Crypto++: a C++ Class Library of Cryptographic Schemes
+Version 5.6.0 (3/15/2009)
+
+Crypto++ Library is a free C++ class library of cryptographic schemes.
+Currently the library contains the following algorithms:
+
+                   algorithm type  name
+
+ authenticated encryption schemes  GCM, CCM, EAX
+ 
+        high speed stream ciphers  Panama, Sosemanuk, Salsa20, XSalsa20
+
+           AES and AES candidates  AES (Rijndael), RC6, MARS, Twofish, Serpent,
+                                   CAST-256
+
+                                   IDEA, Triple-DES (DES-EDE2 and DES-EDE3),
+              other block ciphers  Camellia, SEED, RC5, Blowfish, TEA, XTEA,
+                                   Skipjack, SHACAL-2
+
+  block cipher modes of operation  ECB, CBC, CBC ciphertext stealing (CTS),
+                                   CFB, OFB, counter mode (CTR)
+
+     message authentication codes  VMAC, HMAC, CMAC, CBC-MAC, DMAC, 
+                                   Two-Track-MAC
+
+                                   SHA-1, SHA-2 (SHA-224, SHA-256, SHA-384, and
+                   hash functions  SHA-512), Tiger, WHIRLPOOL, RIPEMD-128,
+                                   RIPEMD-256, RIPEMD-160, RIPEMD-320
+
+                                   RSA, DSA, ElGamal, Nyberg-Rueppel (NR),
+          public-key cryptography  Rabin, Rabin-Williams (RW), LUC, LUCELG,
+                                   DLIES (variants of DHAES), ESIGN
+
+   padding schemes for public-key  PKCS#1 v2.0, OAEP, PSS, PSSR, IEEE P1363
+                          systems  EMSA2 and EMSA5
+
+                                   Diffie-Hellman (DH), Unified Diffie-Hellman
+            key agreement schemes  (DH2), Menezes-Qu-Vanstone (MQV), LUCDIF,
+                                   XTR-DH
+
+      elliptic curve cryptography  ECDSA, ECNR, ECIES, ECDH, ECMQV
+
+          insecure or obsolescent  MD2, MD4, MD5, Panama Hash, DES, ARC4, SEAL
+algorithms retained for backwards  3.0, WAKE, WAKE-OFB, DESX (DES-XEX3), RC2,
+     compatibility and historical  SAFER, 3-WAY, GOST, SHARK, CAST-128, Square
+                            value
+
+Other features include:
+
+  * pseudo random number generators (PRNG): ANSI X9.17 appendix C, RandomPool
+  * password based key derivation functions: PBKDF1 and PBKDF2 from PKCS #5,
+    PBKDF from PKCS #12 appendix B
+  * Shamir's secret sharing scheme and Rabin's information dispersal algorithm
+    (IDA)
+  * fast multi-precision integer (bignum) and polynomial operations
+  * finite field arithmetics, including GF(p) and GF(2^n)
+  * prime number generation and verification
+  * useful non-cryptographic algorithms
+      + DEFLATE (RFC 1951) compression/decompression with gzip (RFC 1952) and
+        zlib (RFC 1950) format support
+      + hex, base-32, and base-64 coding/decoding
+      + 32-bit CRC and Adler32 checksum
+  * class wrappers for these operating system features (optional):
+      + high resolution timers on Windows, Unix, and Mac OS
+      + Berkeley and Windows style sockets
+      + Windows named pipes
+      + /dev/random, /dev/urandom, /dev/srandom
+      + Microsoft's CryptGenRandom on Windows
+  * A high level interface for most of the above, using a filter/pipeline
+    metaphor
+  * benchmarks and validation testing
+  * x86, x86-64 (x64), MMX, and SSE2 assembly code for the most commonly used
+    algorithms, with run-time CPU feature detection and code selection
+  * some versions are available in FIPS 140-2 validated form
+
+You are welcome to use it for any purpose without paying me, but see
+License.txt for the fine print.
+
+The following compilers are supported for this release. Please visit
+http://www.cryptopp.com the most up to date build instructions and porting notes.
+
+  * MSVC 6.0 - 2008
+  * GCC 3.3 - 4.3
+  * C++Builder 2009
+  * Intel C++ Compiler 9 - 11
+  * Sun Studio 12 (CC 5.9)
+
+*** Important Usage Notes ***
+
+1. If a constructor for A takes a pointer to an object B (except primitive
+types such as int and char), then A owns B and will delete B at A's
+destruction.  If a constructor for A takes a reference to an object B,
+then the caller retains ownership of B and should not destroy it until
+A no longer needs it. 
+
+2. Crypto++ is thread safe at the class level. This means you can use
+Crypto++ safely in a multithreaded application, but you must provide
+synchronization when multiple threads access a common Crypto++ object.
+
+*** MSVC-Specific Information ***
+
+On Windows, Crypto++ can be compiled into 3 forms: a static library
+including all algorithms, a DLL with only FIPS Approved algorithms, and
+a static library with only algorithms not in the DLL.
+(FIPS Approved means Approved according to the FIPS 140-2 standard.)
+The DLL may be used by itself, or it may be used together with the second
+form of the static library. MSVC project files are included to build
+all three forms, and sample applications using each of the three forms
+are also included.
+
+To compile Crypto++ with MSVC, open the "cryptest.dsw" (for MSVC 6 and MSVC .NET 
+2003) or "cryptest.sln" (for MSVC .NET 2005) workspace file and build one or 
+more of the following projects:
+
+cryptdll - This builds the DLL. Please note that if you wish to use Crypto++
+  as a FIPS validated module, you must use a pre-built DLL that has undergone
+  the FIPS validation process instead of building your own.
+dlltest - This builds a sample application that only uses the DLL.
+cryptest Non-DLL-Import Configuration - This builds the full static library
+  along with a full test driver.
+cryptest DLL-Import Configuration - This builds a static library containing
+  only algorithms not in the DLL, along with a full test driver that uses
+  both the DLL and the static library.
+
+To use the Crypto++ DLL in your application, #include "dll.h" before including
+any other Crypto++ header files, and place the DLL in the same directory as
+your .exe file. dll.h includes the line #pragma comment(lib, "cryptopp")
+so you don't have to explicitly list the import library in your project
+settings. To use a static library form of Crypto++, specify it as
+an additional library to link with in your project settings.
+In either case you should check the compiler options to
+make sure that the library and your application are using the same C++
+run-time libraries and calling conventions.
+
+*** DLL Memory Management ***
+
+Because it's possible for the Crypto++ DLL to delete objects allocated 
+by the calling application, they must use the same C++ memory heap. Three 
+methods are provided to achieve this.
+1.  The calling application can tell Crypto++ what heap to use. This method 
+    is required when the calling application uses a non-standard heap.
+2.  Crypto++ can tell the calling application what heap to use. This method 
+    is required when the calling application uses a statically linked C++ Run 
+    Time Library. (Method 1 does not work in this case because the Crypto++ DLL 
+    is initialized before the calling application's heap is initialized.)
+3.  Crypto++ can automatically use the heap provided by the calling application's 
+    dynamically linked C++ Run Time Library. The calling application must
+    make sure that the dynamically linked C++ Run Time Library is initialized
+    before Crypto++ is loaded. (At this time it is not clear if it is possible
+    to control the order in which DLLs are initialized on Windows 9x machines,
+    so it might be best to avoid using this method.)
+
+When Crypto++ attaches to a new process, it searches all modules loaded 
+into the process space for exported functions "GetNewAndDeleteForCryptoPP" 
+and "SetNewAndDeleteFromCryptoPP". If one of these functions is found, 
+Crypto++ uses methods 1 or 2, respectively, by calling the function. 
+Otherwise, method 3 is used. 
+
+*** GCC-Specific Information ***
+
+A makefile is included for you to compile Crypto++ with GCC. Make sure
+you are using GNU Make and GNU ld. The make process will produce two files,
+libcryptopp.a and cryptest.exe. Run "cryptest.exe v" for the validation
+suite.
+
+*** Documentation and Support ***
+
+Crypto++ is documented through inline comments in header files, which are
+processed through Doxygen to produce an HTML reference manual. You can find
+a link to the manual from http://www.cryptopp.com. Also at that site is
+the Crypto++ FAQ, which you should browse through before attempting to 
+use this library, because it will likely answer many of questions that
+may come up.
+
+If you run into any problems, please try the Crypto++ mailing list.
+The subscription information and the list archive are available on
+http://www.cryptopp.com. You can also email me directly by visiting
+http://www.weidai.com, but you will probably get a faster response through
+the mailing list.
+
+*** History ***
+
+1.0 - First public release.  Withdrawn at the request of RSA DSI.
+    - included Blowfish, BBS, DES, DH, Diamond, DSA, ElGamal, IDEA,
+      MD5, RC4, RC5, RSA, SHA, WAKE, secret sharing, DEFLATE compression
+    - had a serious bug in the RSA key generation code.
+
+1.1 - Removed RSA, RC4, RC5
+    - Disabled calls to RSAREF's non-public functions
+    - Minor bugs fixed
+
+2.0 - a completely new, faster multiprecision integer class
+    - added MD5-MAC, HAVAL, 3-WAY, TEA, SAFER, LUC, Rabin, BlumGoldwasser,
+      elliptic curve algorithms
+    - added the Lucas strong probable primality test
+    - ElGamal encryption and signature schemes modified to avoid weaknesses
+    - Diamond changed to Diamond2 because of key schedule weakness
+    - fixed bug in WAKE key setup
+    - SHS class renamed to SHA
+    - lots of miscellaneous optimizations
+
+2.1 - added Tiger, HMAC, GOST, RIPE-MD160, LUCELG, LUCDIF, XOR-MAC,
+      OAEP, PSSR, SHARK
+    - added precomputation to DH, ElGamal, DSA, and elliptic curve algorithms
+    - added back RC5 and a new RSA
+    - optimizations in elliptic curves over GF(p)
+    - changed Rabin to use OAEP and PSSR
+    - changed many classes to allow copy constructors to work correctly
+    - improved exception generation and handling
+
+2.2 - added SEAL, CAST-128, Square
+    - fixed bug in HAVAL (padding problem)
+    - fixed bug in triple-DES (decryption order was reversed)
+    - fixed bug in RC5 (couldn't handle key length not a multiple of 4)
+    - changed HMAC to conform to RFC-2104 (which is not compatible
+      with the original HMAC)
+    - changed secret sharing and information dispersal to use GF(2^32)
+      instead of GF(65521)
+    - removed zero knowledge prover/verifier for graph isomorphism
+    - removed several utility classes in favor of the C++ standard library
+
+2.3 - ported to EGCS
+    - fixed incomplete workaround of min/max conflict in MSVC
+
+3.0 - placed all names into the "CryptoPP" namespace
+    - added MD2, RC2, RC6, MARS, RW, DH2, MQV, ECDHC, CBC-CTS
+    - added abstract base classes PK_SimpleKeyAgreementDomain and
+      PK_AuthenticatedKeyAgreementDomain
+    - changed DH and LUCDIF to implement the PK_SimpleKeyAgreementDomain
+      interface and to perform domain parameter and key validation
+    - changed interfaces of PK_Signer and PK_Verifier to sign and verify
+      messages instead of message digests
+    - changed OAEP to conform to PKCS#1 v2.0
+    - changed benchmark code to produce HTML tables as output
+    - changed PSSR to track IEEE P1363a
+    - renamed ElGamalSignature to NR and changed it to track IEEE P1363
+    - renamed ECKEP to ECMQVC and changed it to track IEEE P1363
+    - renamed several other classes for clarity
+    - removed support for calling RSAREF
+    - removed option to compile old SHA (SHA-0)
+    - removed option not to throw exceptions
+
+3.1 - added ARC4, Rijndael, Twofish, Serpent, CBC-MAC, DMAC
+    - added interface for querying supported key lengths of symmetric ciphers
+      and MACs
+    - added sample code for RSA signature and verification
+    - changed CBC-CTS to be compatible with RFC 2040
+    - updated SEAL to version 3.0 of the cipher specification
+    - optimized multiprecision squaring and elliptic curves over GF(p)
+    - fixed bug in MARS key setup
+    - fixed bug with attaching objects to Deflator
+
+3.2 - added DES-XEX3, ECDSA, DefaultEncryptorWithMAC
+    - renamed DES-EDE to DES-EDE2 and TripleDES to DES-EDE3
+    - optimized ARC4
+    - generalized DSA to allow keys longer than 1024 bits
+    - fixed bugs in GF2N and ModularArithmetic that can cause calculation errors
+    - fixed crashing bug in Inflator when given invalid inputs
+    - fixed endian bug in Serpent
+    - fixed padding bug in Tiger
+
+4.0 - added Skipjack, CAST-256, Panama, SHA-2 (SHA-256, SHA-384, and SHA-512),
+      and XTR-DH
+    - added a faster variant of Rabin's Information Dispersal Algorithm (IDA)
+    - added class wrappers for these operating system features:
+      - high resolution timers on Windows, Unix, and MacOS
+      - Berkeley and Windows style sockets
+      - Windows named pipes
+      - /dev/random and /dev/urandom on Linux and FreeBSD
+      - Microsoft's CryptGenRandom on Windows
+    - added support for SEC 1 elliptic curve key format and compressed points
+    - added support for X.509 public key format (subjectPublicKeyInfo) for
+      RSA, DSA, and elliptic curve schemes
+    - added support for DER and OpenPGP signature format for DSA
+    - added support for ZLIB compressed data format (RFC 1950)
+    - changed elliptic curve encryption to use ECIES (as defined in SEC 1)
+    - changed MARS key schedule to reflect the latest specification
+    - changed BufferedTransformation interface to support multiple channels
+      and messages
+    - changed CAST and SHA-1 implementations to use public domain source code
+    - fixed bug in StringSource
+    - optmized multi-precision integer code for better performance
+
+4.1 - added more support for the recommended elliptic curve parameters in SEC 2
+    - added Panama MAC, MARC4
+    - added IV stealing feature to CTS mode
+    - added support for PKCS #8 private key format for RSA, DSA, and elliptic
+      curve schemes
+    - changed Deflate, MD5, Rijndael, and Twofish to use public domain code
+    - fixed a bug with flushing compressed streams
+    - fixed a bug with decompressing stored blocks
+    - fixed a bug with EC point decompression using non-trinomial basis
+    - fixed a bug in NetworkSource::GeneralPump()
+    - fixed a performance issue with EC over GF(p) decryption
+    - fixed syntax to allow GCC to compile without -fpermissive
+    - relaxed some restrictions in the license
+
+4.2 - added support for longer HMAC keys
+    - added MD4 (which is not secure so use for compatibility purposes only)
+    - added compatibility fixes/workarounds for STLport 4.5, GCC 3.0.2,
+      and MSVC 7.0
+    - changed MD2 to use public domain code
+    - fixed a bug with decompressing multiple messages with the same object
+    - fixed a bug in CBC-MAC with MACing multiple messages with the same object
+    - fixed a bug in RC5 and RC6 with zero-length keys
+    - fixed a bug in Adler32 where incorrect checksum may be generated
+
+5.0 - added ESIGN, DLIES, WAKE-OFB, PBKDF1 and PBKDF2 from PKCS #5
+    - added key validation for encryption and signature public/private keys
+    - renamed StreamCipher interface to SymmetricCipher, which is now implemented
+      by both stream ciphers and block cipher modes including ECB and CBC
+    - added keying interfaces to support resetting of keys and IVs without
+      having to destroy and recreate objects
+    - changed filter interface to support non-blocking input/output
+    - changed SocketSource and SocketSink to use overlapped I/O on Microsoft Windows
+    - grouped related classes inside structs to help templates, for example
+      AESEncryption and AESDecryption are now AES::Encryption and AES::Decryption
+    - where possible, typedefs have been added to improve backwards 
+      compatibility when the CRYPTOPP_MAINTAIN_BACKWARDS_COMPATIBILITY macro is defined
+    - changed Serpent, HAVAL and IDEA to use public domain code
+    - implemented SSE2 optimizations for Integer operations
+    - fixed a bug in HMAC::TruncatedFinal()
+    - fixed SKIPJACK byte ordering following NIST clarification dated 5/9/02
+
+5.01 - added known answer test for X9.17 RNG in FIPS 140 power-up self test
+     - submitted to NIST/CSE, but not publicly released
+
+5.02 - changed EDC test to MAC integrity check using HMAC/SHA1
+     - improved performance of integrity check
+     - added blinding to defend against RSA timing attack
+
+5.03 - created DLL version of Crypto++ for FIPS 140-2 validation
+     - fixed vulnerabilities in GetNextIV for CTR and OFB modes
+
+5.0.4 - Removed DES, SHA-256, SHA-384, SHA-512 from DLL
+
+5.1 - added PSS padding and changed PSSR to track IEEE P1363a draft standard
+    - added blinding for RSA and Rabin to defend against timing attacks
+      on decryption operations
+    - changed signing and decryption APIs to support the above
+    - changed WaitObjectContainer to allow waiting for more than 64
+      objects at a time on Win32 platforms
+    - fixed a bug in CBC and ECB modes with processing non-aligned data
+    - fixed standard conformance bugs in DLIES (DHAES mode) and RW/EMSA2
+      signature scheme (these fixes are not backwards compatible)
+    - fixed a number of compiler warnings, minor bugs, and portability problems
+    - removed Sapphire
+
+5.2 - merged in changes for 5.01 - 5.0.4
+    - added support for using encoding parameters and key derivation parameters
+      with public key encryption (implemented by OAEP and DL/ECIES)
+    - added Camellia, SHACAL-2, Two-Track-MAC, Whirlpool, RIPEMD-320,
+      RIPEMD-128, RIPEMD-256, Base-32 coding, FIPS variant of CFB mode
+    - added ThreadUserTimer for timing thread CPU usage
+    - added option for password-based key derivation functions
+      to iterate until a mimimum elapsed thread CPU time is reached
+    - added option (on by default) for DEFLATE compression to detect
+      uncompressible files and process them more quickly
+    - improved compatibility and performance on 64-bit platforms,
+      including Alpha, IA-64, x86-64, PPC64, Sparc64, and MIPS64
+    - fixed ONE_AND_ZEROS_PADDING to use 0x80 instead 0x01 as padding.
+    - fixed encoding/decoding of PKCS #8 privateKeyInfo to properly
+      handle optional attributes
+
+5.2.1 - fixed bug in the "dlltest" DLL testing program
+      - fixed compiling with STLport using VC .NET
+      - fixed compiling with -fPIC using GCC
+      - fixed compiling with -msse2 on systems without memalign()
+      - fixed inability to instantiate PanamaMAC
+      - fixed problems with inline documentation
+
+5.2.2 - added SHA-224
+      - put SHA-256, SHA-384, SHA-512, RSASSA-PSS into DLL
+      
+5.2.3 - fixed issues with FIPS algorithm test vectors
+      - put RSASSA-ISO into DLL
+
+5.3 - ported to MSVC 2005 with support for x86-64
+    - added defense against AES timing attacks, and more AES test vectors
+    - changed StaticAlgorithmName() of Rijndael to "AES", CTR to "CTR"
+
+5.4 - added Salsa20
+    - updated Whirlpool to version 3.0
+    - ported to GCC 4.1, Sun C++ 5.8, and Borland C++Builder 2006
+
+5.5 - added VMAC and Sosemanuk (with x86-64 and SSE2 assembly)
+    - improved speed of integer arithmetic, AES, SHA-512, Tiger, Salsa20,
+      Whirlpool, and PANAMA cipher using assembly (x86-64, MMX, SSE2)
+    - optimized Camellia and added defense against timing attacks
+    - updated benchmarks code to show cycles per byte and to time key/IV setup
+    - started using OpenMP for increased multi-core speed
+    - enabled GCC optimization flags by default in GNUmakefile
+    - added blinding and computational error checking for RW signing
+    - changed RandomPool, X917RNG, GetNextIV, DSA/NR/ECDSA/ECNR to reduce
+      the risk of reusing random numbers and IVs after virtual machine state
+      rollback
+    - changed default FIPS mode RNG from AutoSeededX917RNG<DES_EDE3> to
+      AutoSeededX917RNG<AES>
+    - fixed PANAMA cipher interface to accept 256-bit key and 256-bit IV
+    - moved MD2, MD4, MD5, PanamaHash, ARC4, WAKE_CFB into the namespace "Weak"
+    - removed HAVAL, MD5-MAC, XMAC
+
+5.5.1 - fixed VMAC validation failure on 32-bit big-endian machines
+
+5.5.2 - ported x64 assembly language code for AES, Salsa20, Sosemanuk, and Panama
+        to MSVC 2005 (using MASM since MSVC doesn't support inline assembly on x64)
+      - fixed Salsa20 initialization crash on non-SSE2 machines
+      - fixed Whirlpool crash on Pentium 2 machines
+      - fixed possible branch prediction analysis (BPA) vulnerability in
+        MontgomeryReduce(), which may affect security of RSA, RW, LUC
+      - fixed link error with MSVC 2003 when using "debug DLL" form of runtime library
+      - fixed crash in SSE2_Add on P4 machines when compiled with 
+        MSVC 6.0 SP5 with Processor Pack
+      - ported to MSVC 2008, GCC 4.2, Sun CC 5.9, Intel C++ Compiler 10.0, 
+        and Borland C++Builder 2007
+
+5.6 - added AuthenticatedSymmetricCipher interface class and Filter wrappers
+    - added CCM, GCM (with SSE2 assembly), EAX, CMAC, XSalsa20, and SEED
+    - added support for variable length IVs
+    - improved AES and SHA-256 speed on x86 and x64
+    - fixed incorrect VMAC computation on message lengths 
+      that are >64 mod 128 (x86 assembly version is not affected)
+    - fixed compiler error in vmac.cpp on x86 with GCC -fPIC
+    - fixed run-time validation error on x86-64 with GCC 4.3.2 -O2
+    - fixed HashFilter bug when putMessage=true
+    - removed WORD64_AVAILABLE; compiler support for 64-bit int is now required
+    - ported to GCC 4.3, C++Builder 2009, Sun CC 5.10, Intel C++ Compiler 11
+
+Written by Wei Dai
--- a/cryptopp/config.h
+++ b/cryptopp/config.h
@ -0,0 +1,455 @@
+#ifndef CRYPTOPP_CONFIG_H
+#define CRYPTOPP_CONFIG_H
+
+// ***************** Important Settings ********************
+
+// define this if running on a big-endian CPU
+#if !defined(IS_LITTLE_ENDIAN) && (defined(__BIG_ENDIAN__) || defined(__sparc) || defined(__sparc__) || defined(__hppa__) || defined(__mips__) || (defined(__MWERKS__) && !defined(__INTEL__)))
+#	define IS_BIG_ENDIAN
+#endif
+
+// define this if running on a little-endian CPU
+// big endian will be assumed if IS_LITTLE_ENDIAN is not defined
+#ifndef IS_BIG_ENDIAN
+#	define IS_LITTLE_ENDIAN
+#endif
+
+// define this if you want to disable all OS-dependent features,
+// such as sockets and OS-provided random number generators
+// #define NO_OS_DEPENDENCE
+
+// Define this to use features provided by Microsoft's CryptoAPI.
+// Currently the only feature used is random number generation.
+// This macro will be ignored if NO_OS_DEPENDENCE is defined.
+#define USE_MS_CRYPTOAPI
+
+// Define this to 1 to enforce the requirement in FIPS 186-2 Change Notice 1 that only 1024 bit moduli be used
+#ifndef DSA_1024_BIT_MODULUS_ONLY
+#	define DSA_1024_BIT_MODULUS_ONLY 1
+#endif
+
+// ***************** Less Important Settings ***************
+
+// define this to retain (as much as possible) old deprecated function and class names
+// #define CRYPTOPP_MAINTAIN_BACKWARDS_COMPATIBILITY
+
+#define GZIP_OS_CODE 0
+
+// Try this if your CPU has 256K internal cache or a slow multiply instruction
+// and you want a (possibly) faster IDEA implementation using log tables
+// #define IDEA_LARGECACHE
+
+// Define this if, for the linear congruential RNG, you want to use
+// the original constants as specified in S.K. Park and K.W. Miller's
+// CACM paper.
+// #define LCRNG_ORIGINAL_NUMBERS
+
+// choose which style of sockets to wrap (mostly useful for cygwin which has both)
+#define PREFER_BERKELEY_STYLE_SOCKETS
+// #define PREFER_WINDOWS_STYLE_SOCKETS
+
+// set the name of Rijndael cipher, was "Rijndael" before version 5.3
+#define CRYPTOPP_RIJNDAEL_NAME "AES"
+
+// ***************** Important Settings Again ********************
+// But the defaults should be ok.
+
+// namespace support is now required
+#ifdef NO_NAMESPACE
+#	error namespace support is now required
+#endif
+
+// Define this to workaround a Microsoft CryptoAPI bug where
+// each call to CryptAcquireContext causes a 100 KB memory leak.
+// Defining this will cause Crypto++ to make only one call to CryptAcquireContext.
+#define WORKAROUND_MS_BUG_Q258000
+
+#ifdef CRYPTOPP_DOXYGEN_PROCESSING
+// Avoid putting "CryptoPP::" in front of everything in Doxygen output
+#	define CryptoPP
+#	define NAMESPACE_BEGIN(x)
+#	define NAMESPACE_END
+// Get Doxygen to generate better documentation for these typedefs
+#	define DOCUMENTED_TYPEDEF(x, y) class y : public x {};
+#else
+#	define NAMESPACE_BEGIN(x) namespace x {
+#	define NAMESPACE_END }
+#	define DOCUMENTED_TYPEDEF(x, y) typedef x y;
+#endif
+#define ANONYMOUS_NAMESPACE_BEGIN namespace {
+#define USING_NAMESPACE(x) using namespace x;
+#define DOCUMENTED_NAMESPACE_BEGIN(x) namespace x {
+#define DOCUMENTED_NAMESPACE_END }
+
+// What is the type of the third parameter to bind?
+// For Unix, the new standard is ::socklen_t (typically unsigned int), and the old standard is int.
+// Unfortunately there is no way to tell whether or not socklen_t is defined.
+// To work around this, TYPE_OF_SOCKLEN_T is a macro so that you can change it from the makefile.
+#ifndef TYPE_OF_SOCKLEN_T
+#	if defined(_WIN32) || defined(__CYGWIN__)
+#		define TYPE_OF_SOCKLEN_T int
+#	else
+#		define TYPE_OF_SOCKLEN_T ::socklen_t
+#	endif
+#endif
+
+#if defined(__CYGWIN__) && defined(PREFER_WINDOWS_STYLE_SOCKETS)
+#	define __USE_W32_SOCKETS
+#endif
+
+typedef unsigned char byte;		// put in global namespace to avoid ambiguity with other byte typedefs
+
+NAMESPACE_BEGIN(CryptoPP)
+
+typedef unsigned short word16;
+typedef unsigned int word32;
+
+#if defined(_MSC_VER) || defined(__BORLANDC__)
+	typedef unsigned __int64 word64;
+	#define W64LIT(x) x##ui64
+#else
+	typedef unsigned long long word64;
+	#define W64LIT(x) x##ULL
+#endif
+
+// define large word type, used for file offsets and such
+typedef word64 lword;
+const lword LWORD_MAX = W64LIT(0xffffffffffffffff);
+
+#ifdef __GNUC__
+	#define CRYPTOPP_GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
+#endif
+
+// define hword, word, and dword. these are used for multiprecision integer arithmetic
+// Intel compiler won't have _umul128 until version 10.0. See http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/30231625.aspx
+#if (defined(_MSC_VER) && (!defined(__INTEL_COMPILER) || __INTEL_COMPILER >= 1000) && (defined(_M_X64) || defined(_M_IA64))) || (defined(__DECCXX) && defined(__alpha__)) || (defined(__INTEL_COMPILER) && defined(__x86_64__)) || (defined(__SUNPRO_CC) && defined(__x86_64__))
+	typedef word32 hword;
+	typedef word64 word;
+#else
+	#define CRYPTOPP_NATIVE_DWORD_AVAILABLE
+	#if defined(__alpha__) || defined(__ia64__) || defined(_ARCH_PPC64) || defined(__x86_64__) || defined(__mips64) || defined(__sparc64__)
+		#if defined(__GNUC__) && !defined(__INTEL_COMPILER) && !(CRYPTOPP_GCC_VERSION == 40001 && defined(__APPLE__)) && CRYPTOPP_GCC_VERSION >= 30400
+			// GCC 4.0.1 on MacOS X is missing __umodti3 and __udivti3
+			// mode(TI) division broken on amd64 with GCC earlier than GCC 3.4
+			typedef word32 hword;
+			typedef word64 word;
+			typedef __uint128_t dword;
+			typedef __uint128_t word128;
+			#define CRYPTOPP_WORD128_AVAILABLE
+		#else
+			// if we're here, it means we're on a 64-bit CPU but we don't have a way to obtain 128-bit multiplication results
+			typedef word16 hword;
+			typedef word32 word;
+			typedef word64 dword;
+		#endif
+	#else
+		// being here means the native register size is probably 32 bits or less
+		#define CRYPTOPP_BOOL_SLOW_WORD64 1
+		typedef word16 hword;
+		typedef word32 word;
+		typedef word64 dword;
+	#endif
+#endif
+#ifndef CRYPTOPP_BOOL_SLOW_WORD64
+	#define CRYPTOPP_BOOL_SLOW_WORD64 0
+#endif
+
+const unsigned int WORD_SIZE = sizeof(word);
+const unsigned int WORD_BITS = WORD_SIZE * 8;
+
+NAMESPACE_END
+
+#ifndef CRYPTOPP_L1_CACHE_LINE_SIZE
+	// This should be a lower bound on the L1 cache line size. It's used for defense against timing attacks.
+	#if defined(_M_X64) || defined(__x86_64__)
+		#define CRYPTOPP_L1_CACHE_LINE_SIZE 64
+	#else
+		// L1 cache line size is 32 on Pentium III and earlier
+		#define CRYPTOPP_L1_CACHE_LINE_SIZE 32
+	#endif
+#endif
+
+#if defined(_MSC_VER)
+	#if _MSC_VER == 1200
+		#include <malloc.h>
+	#endif
+	#if _MSC_VER > 1200 || defined(_mm_free)
+		#define CRYPTOPP_MSVC6PP_OR_LATER		// VC 6 processor pack or later
+	#else
+		#define CRYPTOPP_MSVC6_NO_PP			// VC 6 without processor pack
+	#endif
+#endif
+
+#ifndef CRYPTOPP_ALIGN_DATA
+	#if defined(CRYPTOPP_MSVC6PP_OR_LATER)
+		#define CRYPTOPP_ALIGN_DATA(x) __declspec(align(x))
+	#elif defined(__GNUC__)
+		#define CRYPTOPP_ALIGN_DATA(x) __attribute__((aligned(x)))
+	#else
+		#define CRYPTOPP_ALIGN_DATA(x)
+	#endif
+#endif
+
+#ifndef CRYPTOPP_SECTION_ALIGN16
+	#if defined(__GNUC__) && !defined(__APPLE__)
+		// the alignment attribute doesn't seem to work without this section attribute when -fdata-sections is turned on
+		#define CRYPTOPP_SECTION_ALIGN16 __attribute__((section ("CryptoPP_Align16")))
+	#else
+		#define CRYPTOPP_SECTION_ALIGN16
+	#endif
+#endif
+
+#if defined(_MSC_VER) || defined(__fastcall)
+	#define CRYPTOPP_FASTCALL __fastcall
+#else
+	#define CRYPTOPP_FASTCALL
+#endif
+
+// VC60 workaround: it doesn't allow typename in some places
+#if defined(_MSC_VER) && (_MSC_VER < 1300)
+#define CPP_TYPENAME
+#else
+#define CPP_TYPENAME typename
+#endif
+
+// VC60 workaround: can't cast unsigned __int64 to float or double
+#if defined(_MSC_VER) && !defined(CRYPTOPP_MSVC6PP_OR_LATER)
+#define CRYPTOPP_VC6_INT64 (__int64)
+#else
+#define CRYPTOPP_VC6_INT64
+#endif
+
+#ifdef _MSC_VER
+#define CRYPTOPP_NO_VTABLE __declspec(novtable)
+#else
+#define CRYPTOPP_NO_VTABLE
+#endif
+
+#ifdef _MSC_VER
+	// 4231: nonstandard extension used : 'extern' before template explicit instantiation
+	// 4250: dominance
+	// 4251: member needs to have dll-interface
+	// 4275: base needs to have dll-interface
+	// 4660: explicitly instantiating a class that's already implicitly instantiated
+	// 4661: no suitable definition provided for explicit template instantiation request
+	// 4786: identifer was truncated in debug information
+	// 4355: 'this' : used in base member initializer list
+	// 4910: '__declspec(dllexport)' and 'extern' are incompatible on an explicit instantiation
+#	pragma warning(disable: 4231 4250 4251 4275 4660 4661 4786 4355 4910)
+#endif
+
+#ifdef __BORLANDC__
+// 8037: non-const function called for const object. needed to work around BCB2006 bug
+#	pragma warn -8037
+#endif
+
+#if (defined(_MSC_VER) && _MSC_VER <= 1300) || defined(__MWERKS__) || defined(_STLPORT_VERSION)
+#define CRYPTOPP_DISABLE_UNCAUGHT_EXCEPTION
+#endif
+
+#ifndef CRYPTOPP_DISABLE_UNCAUGHT_EXCEPTION
+#define CRYPTOPP_UNCAUGHT_EXCEPTION_AVAILABLE
+#endif
+
+#ifdef CRYPTOPP_DISABLE_X86ASM		// for backwards compatibility: this macro had both meanings
+#define CRYPTOPP_DISABLE_ASM
+#define CRYPTOPP_DISABLE_SSE2
+#endif
+
+#if !defined(CRYPTOPP_DISABLE_ASM) && ((defined(_MSC_VER) && defined(_M_IX86)) || (defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))))
+	#define CRYPTOPP_X86_ASM_AVAILABLE
+
+	#if !defined(CRYPTOPP_DISABLE_SSE2) && (defined(CRYPTOPP_MSVC6PP_OR_LATER) || CRYPTOPP_GCC_VERSION >= 30300)
+		#define CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE 1
+	#else
+		#define CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE 0
+	#endif
+
+	// SSSE3 was actually introduced in GNU as 2.17, which was released 6/23/2006, but we can't tell what version of binutils is installed.
+	// GCC 4.1.2 was released on 2/13/2007, so we'll use that as a proxy for the binutils version.
+	#if !defined(CRYPTOPP_DISABLE_SSSE3) && (_MSC_VER >= 1400 || CRYPTOPP_GCC_VERSION >= 40102)
+		#define CRYPTOPP_BOOL_SSSE3_ASM_AVAILABLE 1
+	#else
+		#define CRYPTOPP_BOOL_SSSE3_ASM_AVAILABLE 0
+	#endif
+#endif
+
+#if !defined(CRYPTOPP_DISABLE_ASM) && defined(_MSC_VER) && defined(_M_X64)
+	#define CRYPTOPP_X64_MASM_AVAILABLE
+#endif
+
+#if !defined(CRYPTOPP_DISABLE_ASM) && defined(__GNUC__) && defined(__x86_64__)
+	#define CRYPTOPP_X64_ASM_AVAILABLE
+#endif
+
+#if !defined(CRYPTOPP_DISABLE_SSE2) && (defined(CRYPTOPP_MSVC6PP_OR_LATER) || defined(__SSE2__))
+	#define CRYPTOPP_BOOL_SSE2_INTRINSICS_AVAILABLE 1
+#else
+	#define CRYPTOPP_BOOL_SSE2_INTRINSICS_AVAILABLE 0
+#endif
+
+#if CRYPTOPP_BOOL_SSE2_INTRINSICS_AVAILABLE || CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE || defined(CRYPTOPP_X64_MASM_AVAILABLE)
+	#define CRYPTOPP_BOOL_ALIGN16_ENABLED 1
+#else
+	#define CRYPTOPP_BOOL_ALIGN16_ENABLED 0
+#endif
+
+// how to allocate 16-byte aligned memory (for SSE2)
+#if defined(CRYPTOPP_MSVC6PP_OR_LATER)
+	#define CRYPTOPP_MM_MALLOC_AVAILABLE
+#elif defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__)
+	#define CRYPTOPP_MALLOC_ALIGNMENT_IS_16
+#elif defined(__linux__) || defined(__sun__) || defined(__CYGWIN__)
+	#define CRYPTOPP_MEMALIGN_AVAILABLE
+#else
+	#define CRYPTOPP_NO_ALIGNED_ALLOC
+#endif
+
+// how to disable inlining
+#if defined(_MSC_VER) && _MSC_VER >= 1300
+#	define CRYPTOPP_NOINLINE_DOTDOTDOT
+#	define CRYPTOPP_NOINLINE __declspec(noinline)
+#elif defined(__GNUC__)
+#	define CRYPTOPP_NOINLINE_DOTDOTDOT
+#	define CRYPTOPP_NOINLINE __attribute__((noinline))
+#else
+#	define CRYPTOPP_NOINLINE_DOTDOTDOT ...
+#	define CRYPTOPP_NOINLINE 
+#endif
+
+// how to declare class constants
+#if (defined(_MSC_VER) && _MSC_VER <= 1300) || defined(__INTEL_COMPILER)
+#	define CRYPTOPP_CONSTANT(x) enum {x};
+#else
+#	define CRYPTOPP_CONSTANT(x) static const int x;
+#endif
+
+#if defined(_M_X64) || defined(__x86_64__)
+	#define CRYPTOPP_BOOL_X64 1
+#else
+	#define CRYPTOPP_BOOL_X64 0
+#endif
+
+// see http://predef.sourceforge.net/prearch.html
+#if defined(_M_IX86) || defined(__i386__) || defined(__i386) || defined(_X86_) || defined(__I86__) || defined(__INTEL__)
+	#define CRYPTOPP_BOOL_X86 1
+#else
+	#define CRYPTOPP_BOOL_X86 0
+#endif
+
+#if CRYPTOPP_BOOL_X64 || CRYPTOPP_BOOL_X86 || defined(__powerpc__)
+	#define CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
+#endif
+
+#define CRYPTOPP_VERSION 560
+
+// ***************** determine availability of OS features ********************
+
+#ifndef NO_OS_DEPENDENCE
+
+#if defined(_WIN32) || defined(__CYGWIN__)
+#define CRYPTOPP_WIN32_AVAILABLE
+#endif
+
+#if defined(__unix__) || defined(__MACH__) || defined(__NetBSD__) || defined(__sun)
+#define CRYPTOPP_UNIX_AVAILABLE
+#endif
+
+#if defined(CRYPTOPP_WIN32_AVAILABLE) || defined(CRYPTOPP_UNIX_AVAILABLE)
+#	define HIGHRES_TIMER_AVAILABLE
+#endif
+
+#ifdef CRYPTOPP_UNIX_AVAILABLE
+#	define HAS_BERKELEY_STYLE_SOCKETS
+#endif
+
+#ifdef CRYPTOPP_WIN32_AVAILABLE
+#	define HAS_WINDOWS_STYLE_SOCKETS
+#endif
+
+#if defined(HIGHRES_TIMER_AVAILABLE) && (defined(HAS_BERKELEY_STYLE_SOCKETS) || defined(HAS_WINDOWS_STYLE_SOCKETS))
+#	define SOCKETS_AVAILABLE
+#endif
+
+#if defined(HAS_WINDOWS_STYLE_SOCKETS) && (!defined(HAS_BERKELEY_STYLE_SOCKETS) || defined(PREFER_WINDOWS_STYLE_SOCKETS))
+#	define USE_WINDOWS_STYLE_SOCKETS
+#else
+#	define USE_BERKELEY_STYLE_SOCKETS
+#endif
+
+#if defined(HIGHRES_TIMER_AVAILABLE) && defined(CRYPTOPP_WIN32_AVAILABLE) && !defined(USE_BERKELEY_STYLE_SOCKETS)
+#	define WINDOWS_PIPES_AVAILABLE
+#endif
+
+#if defined(CRYPTOPP_WIN32_AVAILABLE) && defined(USE_MS_CRYPTOAPI)
+#	define NONBLOCKING_RNG_AVAILABLE
+#	define OS_RNG_AVAILABLE
+#endif
+
+#if defined(CRYPTOPP_UNIX_AVAILABLE) || defined(CRYPTOPP_DOXYGEN_PROCESSING)
+#	define NONBLOCKING_RNG_AVAILABLE
+#	define BLOCKING_RNG_AVAILABLE
+#	define OS_RNG_AVAILABLE
+#	define HAS_PTHREADS
+#	define THREADS_AVAILABLE
+#endif
+
+#ifdef CRYPTOPP_WIN32_AVAILABLE
+#	define HAS_WINTHREADS
+#	define THREADS_AVAILABLE
+#endif
+
+#endif	// NO_OS_DEPENDENCE
+
+// ***************** DLL related ********************
+
+#ifdef CRYPTOPP_WIN32_AVAILABLE
+
+#ifdef CRYPTOPP_EXPORTS
+#define CRYPTOPP_IS_DLL
+#define CRYPTOPP_DLL __declspec(dllexport)
+#elif defined(CRYPTOPP_IMPORTS)
+#define CRYPTOPP_IS_DLL
+#define CRYPTOPP_DLL __declspec(dllimport)
+#else
+#define CRYPTOPP_DLL
+#endif
+
+#define CRYPTOPP_API __cdecl
+
+#else	// CRYPTOPP_WIN32_AVAILABLE
+
+#define CRYPTOPP_DLL
+#define CRYPTOPP_API
+
+#endif	// CRYPTOPP_WIN32_AVAILABLE
+
+#if defined(__MWERKS__)
+#define CRYPTOPP_EXTERN_DLL_TEMPLATE_CLASS extern class CRYPTOPP_DLL
+#elif defined(__BORLANDC__) || defined(__SUNPRO_CC)
+#define CRYPTOPP_EXTERN_DLL_TEMPLATE_CLASS template class CRYPTOPP_DLL
+#else
+#define CRYPTOPP_EXTERN_DLL_TEMPLATE_CLASS extern template class CRYPTOPP_DLL
+#endif
+
+#if defined(CRYPTOPP_MANUALLY_INSTANTIATE_TEMPLATES) && !defined(CRYPTOPP_IMPORTS)
+#define CRYPTOPP_DLL_TEMPLATE_CLASS template class CRYPTOPP_DLL
+#else
+#define CRYPTOPP_DLL_TEMPLATE_CLASS CRYPTOPP_EXTERN_DLL_TEMPLATE_CLASS
+#endif
+
+#if defined(__MWERKS__)
+#define CRYPTOPP_EXTERN_STATIC_TEMPLATE_CLASS extern class
+#elif defined(__BORLANDC__) || defined(__SUNPRO_CC)
+#define CRYPTOPP_EXTERN_STATIC_TEMPLATE_CLASS template class
+#else
+#define CRYPTOPP_EXTERN_STATIC_TEMPLATE_CLASS extern template class
+#endif
+
+#if defined(CRYPTOPP_MANUALLY_INSTANTIATE_TEMPLATES) && !defined(CRYPTOPP_EXPORTS)
+#define CRYPTOPP_STATIC_TEMPLATE_CLASS template class
+#else
+#define CRYPTOPP_STATIC_TEMPLATE_CLASS CRYPTOPP_EXTERN_STATIC_TEMPLATE_CLASS
+#endif
+
+#endif
--- a/cryptopp/cpu.cpp
+++ b/cryptopp/cpu.cpp
@ -0,0 +1,199 @@
+// cpu.cpp - written and placed in the public domain by Wei Dai
+
+#include "pch.h"
+
+#ifndef CRYPTOPP_IMPORTS
+
+#include "cpu.h"
+#include "misc.h"
+#include <algorithm>
+
+#ifdef __GNUC__
+#include <signal.h>
+#include <setjmp.h>
+#endif
+
+#ifdef CRYPTOPP_MSVC6PP_OR_LATER
+#include <emmintrin.h>
+#endif
+
+NAMESPACE_BEGIN(CryptoPP)
+
+#ifdef CRYPTOPP_X86_ASM_AVAILABLE
+
+#ifndef _MSC_VER
+typedef void (*SigHandler)(int);
+
+static jmp_buf s_jmpNoCPUID;
+static void SigIllHandlerCPUID(int)
+{
+	longjmp(s_jmpNoCPUID, 1);
+}
+#endif
+
+bool CpuId(word32 input, word32 *output)
+{
+#ifdef _MSC_VER
+    __try
+	{
+		__asm
+		{
+			mov eax, input
+			cpuid
+			mov edi, output
+			mov [edi], eax
+			mov [edi+4], ebx
+			mov [edi+8], ecx
+			mov [edi+12], edx
+		}
+	}
+    __except (1)
+	{
+		return false;
+    }
+	return true;
+#else
+	SigHandler oldHandler = signal(SIGILL, SigIllHandlerCPUID);
+	if (oldHandler == SIG_ERR)
+		return false;
+
+	bool result = true;
+	if (setjmp(s_jmpNoCPUID))
+		result = false;
+	else
+	{
+		__asm__
+		(
+			// save ebx in case -fPIC is being used
+#if CRYPTOPP_BOOL_X86
+			"push %%ebx; cpuid; mov %%ebx, %%edi; pop %%ebx"
+#else
+			"pushq %%rbx; cpuid; mov %%ebx, %%edi; popq %%rbx"
+#endif
+			: "=a" (output[0]), "=D" (output[1]), "=c" (output[2]), "=d" (output[3])
+			: "a" (input)
+		);
+	}
+
+	signal(SIGILL, oldHandler);
+	return result;
+#endif
+}
+
+#ifndef _MSC_VER
+static jmp_buf s_jmpNoSSE2;
+static void SigIllHandlerSSE2(int)
+{
+	longjmp(s_jmpNoSSE2, 1);
+}
+#endif
+
+#elif _MSC_VER >= 1400 && CRYPTOPP_BOOL_X64
+
+bool CpuId(word32 input, word32 *output)
+{
+	__cpuid((int *)output, input);
+	return true;
+}
+
+#endif
+
+#ifdef CRYPTOPP_CPUID_AVAILABLE
+
+static bool TrySSE2()
+{
+#if CRYPTOPP_BOOL_X64
+	return true;
+#elif defined(_MSC_VER)
+    __try
+	{
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
+        AS2(por xmm0, xmm0)        // executing SSE2 instruction
+#elif CRYPTOPP_BOOL_SSE2_INTRINSICS_AVAILABLE
+		__mm128i x = _mm_setzero_si128();
+		return _mm_cvtsi128_si32(x) == 0;
+#endif
+	}
+    __except (1)
+	{
+		return false;
+    }
+	return true;
+#elif defined(__GNUC__)
+	SigHandler oldHandler = signal(SIGILL, SigIllHandlerSSE2);
+	if (oldHandler == SIG_ERR)
+		return false;
+
+	bool result = true;
+	if (setjmp(s_jmpNoSSE2))
+		result = false;
+	else
+	{
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
+		__asm __volatile ("por %xmm0, %xmm0");
+#elif CRYPTOPP_BOOL_SSE2_INTRINSICS_AVAILABLE
+		__mm128i x = _mm_setzero_si128();
+		result = _mm_cvtsi128_si32(x) == 0;
+#endif
+	}
+
+	signal(SIGILL, oldHandler);
+	return result;
+#else
+	return false;
+#endif
+}
+
+bool g_x86DetectionDone = false;
+bool g_hasISSE = false, g_hasSSE2 = false, g_hasSSSE3 = false, g_hasMMX = false, g_isP4 = false;
+word32 g_cacheLineSize = CRYPTOPP_L1_CACHE_LINE_SIZE;
+
+void DetectX86Features()
+{
+	word32 cpuid[4], cpuid1[4];
+	if (!CpuId(0, cpuid))
+		return;
+	if (!CpuId(1, cpuid1))
+		return;
+
+	g_hasMMX = (cpuid1[3] & (1 << 23)) != 0;
+	if ((cpuid1[3] & (1 << 26)) != 0)
+		g_hasSSE2 = TrySSE2();
+	g_hasSSSE3 = g_hasSSE2 && (cpuid1[2] & (1<<9));
+
+	if ((cpuid1[3] & (1 << 25)) != 0)
+		g_hasISSE = true;
+	else
+	{
+		word32 cpuid2[4];
+		CpuId(0x080000000, cpuid2);
+		if (cpuid2[0] >= 0x080000001)
+		{
+			CpuId(0x080000001, cpuid2);
+			g_hasISSE = (cpuid2[3] & (1 << 22)) != 0;
+		}
+	}
+
+	std::swap(cpuid[2], cpuid[3]);
+	if (memcmp(cpuid+1, "GenuineIntel", 12) == 0)
+	{
+		g_isP4 = ((cpuid1[0] >> 8) & 0xf) == 0xf;
+		g_cacheLineSize = 8 * GETBYTE(cpuid1[1], 1);
+	}
+	else if (memcmp(cpuid+1, "AuthenticAMD", 12) == 0)
+	{
+		CpuId(0x80000005, cpuid);
+		g_cacheLineSize = GETBYTE(cpuid[2], 0);
+	}
+
+	if (!g_cacheLineSize)
+		g_cacheLineSize = CRYPTOPP_L1_CACHE_LINE_SIZE;
+
+	g_x86DetectionDone = true;
+}
+
+#endif
+
+NAMESPACE_END
+
+#endif
--- a/cryptopp/cpu.h
+++ b/cryptopp/cpu.h
@ -0,0 +1,263 @@
+#ifndef CRYPTOPP_CPU_H
+#define CRYPTOPP_CPU_H
+
+#ifdef CRYPTOPP_GENERATE_X64_MASM
+
+#define CRYPTOPP_X86_ASM_AVAILABLE
+#define CRYPTOPP_BOOL_X64 1
+#define CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE 1
+#define NAMESPACE_END
+
+#else
+
+#include "config.h"
+
+#ifdef CRYPTOPP_MSVC6PP_OR_LATER
+	#include <emmintrin.h>
+#endif
+
+NAMESPACE_BEGIN(CryptoPP)
+
+#if defined(CRYPTOPP_X86_ASM_AVAILABLE) || (_MSC_VER >= 1400 && CRYPTOPP_BOOL_X64)
+
+#define CRYPTOPP_CPUID_AVAILABLE
+
+// these should not be used directly
+extern CRYPTOPP_DLL bool g_x86DetectionDone;
+extern CRYPTOPP_DLL bool g_hasSSE2;
+extern CRYPTOPP_DLL bool g_hasISSE;
+extern CRYPTOPP_DLL bool g_hasMMX;
+extern CRYPTOPP_DLL bool g_hasSSSE3;
+extern CRYPTOPP_DLL bool g_isP4;
+extern CRYPTOPP_DLL word32 g_cacheLineSize;
+CRYPTOPP_DLL void CRYPTOPP_API DetectX86Features();
+
+CRYPTOPP_DLL bool CRYPTOPP_API CpuId(word32 input, word32 *output);
+
+#if CRYPTOPP_BOOL_X64
+inline bool HasSSE2()	{return true;}
+inline bool HasISSE()	{return true;}
+inline bool HasMMX()	{return true;}
+#else
+
+inline bool HasSSE2()
+{
+	if (!g_x86DetectionDone)
+		DetectX86Features();
+	return g_hasSSE2;
+}
+
+inline bool HasISSE()
+{
+	if (!g_x86DetectionDone)
+		DetectX86Features();
+	return g_hasISSE;
+}
+
+inline bool HasMMX()
+{
+	if (!g_x86DetectionDone)
+		DetectX86Features();
+	return g_hasMMX;
+}
+
+#endif
+
+inline bool HasSSSE3()
+{
+	if (!g_x86DetectionDone)
+		DetectX86Features();
+	return g_hasSSSE3;
+}
+
+inline bool IsP4()
+{
+	if (!g_x86DetectionDone)
+		DetectX86Features();
+	return g_isP4;
+}
+
+inline int GetCacheLineSize()
+{
+	if (!g_x86DetectionDone)
+		DetectX86Features();
+	return g_cacheLineSize;
+}
+
+#else
+
+inline int GetCacheLineSize()
+{
+	return CRYPTOPP_L1_CACHE_LINE_SIZE;
+}
+
+inline bool HasSSSE3()	{return false;}
+inline bool IsP4()		{return false;}
+
+// assume MMX and SSE2 if intrinsics are enabled
+#if CRYPTOPP_BOOL_SSE2_INTRINSICS_AVAILABLE || CRYPTOPP_BOOL_X64
+inline bool HasSSE2()	{return true;}
+inline bool HasISSE()	{return true;}
+inline bool HasMMX()	{return true;}
+#else
+inline bool HasSSE2()	{return false;}
+inline bool HasISSE()	{return false;}
+inline bool HasMMX()	{return false;}
+#endif
+
+#endif		// #ifdef CRYPTOPP_X86_ASM_AVAILABLE || _MSC_VER >= 1400
+
+#endif
+
+#ifdef CRYPTOPP_GENERATE_X64_MASM
+	#define AS1(x) x*newline*
+	#define AS2(x, y) x, y*newline*
+	#define AS3(x, y, z) x, y, z*newline*
+	#define ASS(x, y, a, b, c, d) x, y, a*64+b*16+c*4+d*newline*
+	#define ASL(x) label##x:*newline*
+	#define ASJ(x, y, z) x label##y*newline*
+	#define ASC(x, y) x label##y*newline*
+	#define AS_HEX(y) 0##y##h
+#elif defined(__GNUC__)
+	// define these in two steps to allow arguments to be expanded
+	#define GNU_AS1(x) #x ";"
+	#define GNU_AS2(x, y) #x ", " #y ";"
+	#define GNU_AS3(x, y, z) #x ", " #y ", " #z ";"
+	#define GNU_ASL(x) "\n" #x ":"
+	#define GNU_ASJ(x, y, z) #x " " #y #z ";"
+	#define AS1(x) GNU_AS1(x)
+	#define AS2(x, y) GNU_AS2(x, y)
+	#define AS3(x, y, z) GNU_AS3(x, y, z)
+	#define ASS(x, y, a, b, c, d) #x ", " #y ", " #a "*64+" #b "*16+" #c "*4+" #d ";"
+	#define ASL(x) GNU_ASL(x)
+	#define ASJ(x, y, z) GNU_ASJ(x, y, z)
+	#define ASC(x, y) #x " " #y ";"
+	#define CRYPTOPP_NAKED
+	#define AS_HEX(y) 0x##y
+#else
+	#define AS1(x) __asm {x}
+	#define AS2(x, y) __asm {x, y}
+	#define AS3(x, y, z) __asm {x, y, z}
+	#define ASS(x, y, a, b, c, d) __asm {x, y, _MM_SHUFFLE(a, b, c, d)}
+	#define ASL(x) __asm {label##x:}
+	#define ASJ(x, y, z) __asm {x label##y}
+	#define ASC(x, y) __asm {x label##y}
+	#define CRYPTOPP_NAKED __declspec(naked)
+	#define AS_HEX(y) 0x##y
+#endif
+
+#define IF0(y)
+#define IF1(y) y
+
+#ifdef CRYPTOPP_GENERATE_X64_MASM
+#define ASM_MOD(x, y) ((x) MOD (y))
+#define XMMWORD_PTR XMMWORD PTR
+#else
+// GNU assembler doesn't seem to have mod operator
+#define ASM_MOD(x, y) ((x)-((x)/(y))*(y))
+// GAS 2.15 doesn't support XMMWORD PTR. it seems necessary only for MASM
+#define XMMWORD_PTR
+#endif
+
+#if CRYPTOPP_BOOL_X86
+	#define AS_REG_1 ecx
+	#define AS_REG_2 edx
+	#define AS_REG_3 esi
+	#define AS_REG_4 edi
+	#define AS_REG_5 eax
+	#define AS_REG_6 ebx
+	#define AS_REG_7 ebp
+	#define AS_REG_1d ecx
+	#define AS_REG_2d edx
+	#define AS_REG_3d esi
+	#define AS_REG_4d edi
+	#define AS_REG_5d eax
+	#define AS_REG_6d ebx
+	#define AS_REG_7d ebp
+	#define WORD_SZ 4
+	#define WORD_REG(x)	e##x
+	#define WORD_PTR DWORD PTR
+	#define AS_PUSH_IF86(x) AS1(push e##x)
+	#define AS_POP_IF86(x) AS1(pop e##x)
+	#define AS_JCXZ jecxz
+#elif CRYPTOPP_BOOL_X64
+	#ifdef CRYPTOPP_GENERATE_X64_MASM
+		#define AS_REG_1 rcx
+		#define AS_REG_2 rdx
+		#define AS_REG_3 r8
+		#define AS_REG_4 r9
+		#define AS_REG_5 rax
+		#define AS_REG_6 r10
+		#define AS_REG_7 r11
+		#define AS_REG_1d ecx
+		#define AS_REG_2d edx
+		#define AS_REG_3d r8d
+		#define AS_REG_4d r9d
+		#define AS_REG_5d eax
+		#define AS_REG_6d r10d
+		#define AS_REG_7d r11d
+	#else
+		#define AS_REG_1 rdi
+		#define AS_REG_2 rsi
+		#define AS_REG_3 rdx
+		#define AS_REG_4 rcx
+		#define AS_REG_5 r8
+		#define AS_REG_6 r9
+		#define AS_REG_7 r10
+		#define AS_REG_1d edi
+		#define AS_REG_2d esi
+		#define AS_REG_3d edx
+		#define AS_REG_4d ecx
+		#define AS_REG_5d r8d
+		#define AS_REG_6d r9d
+		#define AS_REG_7d r10d
+	#endif
+	#define WORD_SZ 8
+	#define WORD_REG(x)	r##x
+	#define WORD_PTR QWORD PTR
+	#define AS_PUSH_IF86(x)
+	#define AS_POP_IF86(x)
+	#define AS_JCXZ jrcxz
+#endif
+
+// helper macro for stream cipher output
+#define AS_XMM_OUTPUT4(labelPrefix, inputPtr, outputPtr, x0, x1, x2, x3, t, p0, p1, p2, p3, increment)\
+	AS2(	test	inputPtr, inputPtr)\
+	ASC(	jz,		labelPrefix##3)\
+	AS2(	test	inputPtr, 15)\
+	ASC(	jnz,	labelPrefix##7)\
+	AS2(	pxor	xmm##x0, [inputPtr+p0*16])\
+	AS2(	pxor	xmm##x1, [inputPtr+p1*16])\
+	AS2(	pxor	xmm##x2, [inputPtr+p2*16])\
+	AS2(	pxor	xmm##x3, [inputPtr+p3*16])\
+	AS2(	add		inputPtr, increment*16)\
+	ASC(	jmp,	labelPrefix##3)\
+	ASL(labelPrefix##7)\
+	AS2(	movdqu	xmm##t, [inputPtr+p0*16])\
+	AS2(	pxor	xmm##x0, xmm##t)\
+	AS2(	movdqu	xmm##t, [inputPtr+p1*16])\
+	AS2(	pxor	xmm##x1, xmm##t)\
+	AS2(	movdqu	xmm##t, [inputPtr+p2*16])\
+	AS2(	pxor	xmm##x2, xmm##t)\
+	AS2(	movdqu	xmm##t, [inputPtr+p3*16])\
+	AS2(	pxor	xmm##x3, xmm##t)\
+	AS2(	add		inputPtr, increment*16)\
+	ASL(labelPrefix##3)\
+	AS2(	test	outputPtr, 15)\
+	ASC(	jnz,	labelPrefix##8)\
+	AS2(	movdqa	[outputPtr+p0*16], xmm##x0)\
+	AS2(	movdqa	[outputPtr+p1*16], xmm##x1)\
+	AS2(	movdqa	[outputPtr+p2*16], xmm##x2)\
+	AS2(	movdqa	[outputPtr+p3*16], xmm##x3)\
+	ASC(	jmp,	labelPrefix##9)\
+	ASL(labelPrefix##8)\
+	AS2(	movdqu	[outputPtr+p0*16], xmm##x0)\
+	AS2(	movdqu	[outputPtr+p1*16], xmm##x1)\
+	AS2(	movdqu	[outputPtr+p2*16], xmm##x2)\
+	AS2(	movdqu	[outputPtr+p3*16], xmm##x3)\
+	ASL(labelPrefix##9)\
+	AS2(	add		outputPtr, increment*16)
+
+NAMESPACE_END
+
+#endif
--- a/cryptopp/cryptlib.h
+++ b/cryptopp/cryptlib.h
--- a/cryptopp/iterhash.h
+++ b/cryptopp/iterhash.h
@ -0,0 +1,29 @@
+#ifndef CRYPTOPP_ITERHASH_H
+#define CRYPTOPP_ITERHASH_H
+
+#include "secblock.h"
+
+NAMESPACE_BEGIN(CryptoPP)
+
+// *** trimmed down dependency from iterhash.h ***
+template <class T_HashWordType, class T_Endianness, unsigned int T_BlockSize, unsigned int T_StateSize, class T_Transform, unsigned int T_DigestSize = 0, bool T_StateAligned = false>
+class CRYPTOPP_NO_VTABLE IteratedHashWithStaticTransform
+{
+public:
+	CRYPTOPP_CONSTANT(DIGESTSIZE = T_DigestSize ? T_DigestSize : T_StateSize)
+	unsigned int DigestSize() const {return DIGESTSIZE;};
+    typedef T_HashWordType HashWordType;
+    CRYPTOPP_CONSTANT(BLOCKSIZE = T_BlockSize)
+
+protected:
+	IteratedHashWithStaticTransform() {this->Init();}
+	void HashEndianCorrectedBlock(const T_HashWordType *data) {T_Transform::Transform(this->m_state, data);}
+	void Init() {T_Transform::InitState(this->m_state);}
+
+	T_HashWordType* StateBuf() {return this->m_state;}
+	FixedSizeAlignedSecBlock<T_HashWordType, T_BlockSize/sizeof(T_HashWordType), T_StateAligned> m_state;
+};
+
+NAMESPACE_END
+
+#endif
--- a/cryptopp/misc.h
+++ b/cryptopp/misc.h
--- a/cryptopp/obj/.gitignore
+++ b/cryptopp/obj/.gitignore
@ -0,0 +1,2 @@
+*
+!.gitignore
--- a/cryptopp/pch.h
+++ b/cryptopp/pch.h
@ -0,0 +1,21 @@
+#ifndef CRYPTOPP_PCH_H
+#define CRYPTOPP_PCH_H
+
+#ifdef CRYPTOPP_GENERATE_X64_MASM
+
+	#include "cpu.h"
+
+#else
+
+	#include "config.h"
+
+	#ifdef USE_PRECOMPILED_HEADERS
+		#include "simple.h"
+		#include "secblock.h"
+		#include "misc.h"
+		#include "smartptr.h"
+	#endif
+
+#endif
+
+#endif
--- a/cryptopp/secblock.h
+++ b/cryptopp/secblock.h
@ -0,0 +1,500 @@
+// secblock.h - written and placed in the public domain by Wei Dai
+
+#ifndef CRYPTOPP_SECBLOCK_H
+#define CRYPTOPP_SECBLOCK_H
+
+#include "config.h"
+#include "misc.h"
+#include <assert.h>
+
+#if defined(CRYPTOPP_MEMALIGN_AVAILABLE) || defined(CRYPTOPP_MM_MALLOC_AVAILABLE) || defined(QNX)
+	#include <malloc.h>
+#else
+	#include <stdlib.h>
+#endif
+
+NAMESPACE_BEGIN(CryptoPP)
+
+// ************** secure memory allocation ***************
+
+template<class T>
+class AllocatorBase
+{
+public:
+	typedef T value_type;
+	typedef size_t size_type;
+#ifdef CRYPTOPP_MSVCRT6
+	typedef ptrdiff_t difference_type;
+#else
+	typedef std::ptrdiff_t difference_type;
+#endif
+	typedef T * pointer;
+	typedef const T * const_pointer;
+	typedef T & reference;
+	typedef const T & const_reference;
+
+	pointer address(reference r) const {return (&r);}
+	const_pointer address(const_reference r) const {return (&r); }
+	void construct(pointer p, const T& val) {new (p) T(val);}
+	void destroy(pointer p) {p->~T();}
+	size_type max_size() const {return ~size_type(0)/sizeof(T);}	// switch to std::numeric_limits<T>::max later
+
+protected:
+	static void CheckSize(size_t n)
+	{
+		if (n > ~size_t(0) / sizeof(T))
+			throw InvalidArgument("AllocatorBase: requested size would cause integer overflow");
+	}
+};
+
+#define CRYPTOPP_INHERIT_ALLOCATOR_TYPES	\
+typedef typename AllocatorBase<T>::value_type value_type;\
+typedef typename AllocatorBase<T>::size_type size_type;\
+typedef typename AllocatorBase<T>::difference_type difference_type;\
+typedef typename AllocatorBase<T>::pointer pointer;\
+typedef typename AllocatorBase<T>::const_pointer const_pointer;\
+typedef typename AllocatorBase<T>::reference reference;\
+typedef typename AllocatorBase<T>::const_reference const_reference;
+
+#if defined(_MSC_VER) && (_MSC_VER < 1300)
+// this pragma causes an internal compiler error if placed immediately before std::swap(a, b)
+#pragma warning(push)
+#pragma warning(disable: 4700)	// VC60 workaround: don't know how to get rid of this warning
+#endif
+
+template <class T, class A>
+typename A::pointer StandardReallocate(A& a, T *p, typename A::size_type oldSize, typename A::size_type newSize, bool preserve)
+{
+	if (oldSize == newSize)
+		return p;
+
+	if (preserve)
+	{
+		typename A::pointer newPointer = a.allocate(newSize, NULL);
+		memcpy_s(newPointer, sizeof(T)*newSize, p, sizeof(T)*STDMIN(oldSize, newSize));
+		a.deallocate(p, oldSize);
+		return newPointer;
+	}
+	else
+	{
+		a.deallocate(p, oldSize);
+		return a.allocate(newSize, NULL);
+	}
+}
+
+#if defined(_MSC_VER) && (_MSC_VER < 1300)
+#pragma warning(pop)
+#endif
+
+template <class T, bool T_Align16 = false>
+class AllocatorWithCleanup : public AllocatorBase<T>
+{
+public:
+	CRYPTOPP_INHERIT_ALLOCATOR_TYPES
+
+	pointer allocate(size_type n, const void * = NULL)
+	{
+		CheckSize(n);
+		if (n == 0)
+			return NULL;
+
+		if (CRYPTOPP_BOOL_ALIGN16_ENABLED && T_Align16 && n*sizeof(T) >= 16)
+		{
+			byte *p;
+		#ifdef CRYPTOPP_MM_MALLOC_AVAILABLE
+			while (!(p = (byte *)_mm_malloc(sizeof(T)*n, 16)))
+		#elif defined(CRYPTOPP_MEMALIGN_AVAILABLE)
+			while (!(p = (byte *)memalign(16, sizeof(T)*n)))
+		#elif defined(CRYPTOPP_MALLOC_ALIGNMENT_IS_16)
+			while (!(p = (byte *)malloc(sizeof(T)*n)))
+		#else
+			while (!(p = (byte *)malloc(sizeof(T)*n + 16)))
+		#endif
+				CallNewHandler();
+
+		#ifdef CRYPTOPP_NO_ALIGNED_ALLOC
+			size_t adjustment = 16-((size_t)p%16);
+			p += adjustment;
+			p[-1] = (byte)adjustment;
+		#endif
+
+			assert(IsAlignedOn(p, 16));
+			return (pointer)p;
+		}
+
+		pointer p;
+		while (!(p = (pointer)malloc(sizeof(T)*n)))
+			CallNewHandler();
+		return p;
+	}
+
+	void deallocate(void *p, size_type n)
+	{
+		memset_z(p, 0, n*sizeof(T));
+
+		if (CRYPTOPP_BOOL_ALIGN16_ENABLED && T_Align16 && n*sizeof(T) >= 16)
+		{
+		#ifdef CRYPTOPP_MM_MALLOC_AVAILABLE
+			_mm_free(p);
+		#elif defined(CRYPTOPP_NO_ALIGNED_ALLOC)
+			p = (byte *)p - ((byte *)p)[-1];
+			free(p);
+		#else
+			free(p);
+		#endif
+			return;
+		}
+
+		free(p);
+	}
+
+	pointer reallocate(T *p, size_type oldSize, size_type newSize, bool preserve)
+	{
+		return StandardReallocate(*this, p, oldSize, newSize, preserve);
+	}
+
+	// VS.NET STL enforces the policy of "All STL-compliant allocators have to provide a
+	// template class member called rebind".
+    template <class U> struct rebind { typedef AllocatorWithCleanup<U, T_Align16> other; };
+#if _MSC_VER >= 1500
+	AllocatorWithCleanup() {}
+	template <class U, bool A> AllocatorWithCleanup(const AllocatorWithCleanup<U, A> &) {}
+#endif
+};
+
+CRYPTOPP_DLL_TEMPLATE_CLASS AllocatorWithCleanup<byte>;
+CRYPTOPP_DLL_TEMPLATE_CLASS AllocatorWithCleanup<word16>;
+CRYPTOPP_DLL_TEMPLATE_CLASS AllocatorWithCleanup<word32>;
+CRYPTOPP_DLL_TEMPLATE_CLASS AllocatorWithCleanup<word64>;
+#if CRYPTOPP_BOOL_X86
+CRYPTOPP_DLL_TEMPLATE_CLASS AllocatorWithCleanup<word, true>;	// for Integer
+#endif
+
+template <class T>
+class NullAllocator : public AllocatorBase<T>
+{
+public:
+	CRYPTOPP_INHERIT_ALLOCATOR_TYPES
+
+	pointer allocate(size_type n, const void * = NULL)
+	{
+		assert(false);
+		return NULL;
+	}
+
+	void deallocate(void *p, size_type n)
+	{
+		assert(false);
+	}
+
+	size_type max_size() const {return 0;}
+};
+
+// This allocator can't be used with standard collections because
+// they require that all objects of the same allocator type are equivalent.
+// So this is for use with SecBlock only.
+template <class T, size_t S, class A = NullAllocator<T>, bool T_Align16 = false>
+class FixedSizeAllocatorWithCleanup : public AllocatorBase<T>
+{
+public:
+	CRYPTOPP_INHERIT_ALLOCATOR_TYPES
+
+	FixedSizeAllocatorWithCleanup() : m_allocated(false) {}
+
+	pointer allocate(size_type n)
+	{
+		assert(IsAlignedOn(m_array, 8));
+
+		if (n <= S && !m_allocated)
+		{
+			m_allocated = true;
+			return GetAlignedArray();
+		}
+		else
+			return m_fallbackAllocator.allocate(n);
+	}
+
+	pointer allocate(size_type n, const void *hint)
+	{
+		if (n <= S && !m_allocated)
+		{
+			m_allocated = true;
+			return GetAlignedArray();
+		}
+		else
+			return m_fallbackAllocator.allocate(n, hint);
+	}
+
+	void deallocate(void *p, size_type n)
+	{
+		if (p == GetAlignedArray())
+		{
+			assert(n <= S);
+			assert(m_allocated);
+			m_allocated = false;
+			memset(p, 0, n*sizeof(T));
+		}
+		else
+			m_fallbackAllocator.deallocate(p, n);
+	}
+
+	pointer reallocate(pointer p, size_type oldSize, size_type newSize, bool preserve)
+	{
+		if (p == GetAlignedArray() && newSize <= S)
+		{
+			assert(oldSize <= S);
+			if (oldSize > newSize)
+				memset(p + newSize, 0, (oldSize-newSize)*sizeof(T));
+			return p;
+		}
+
+		pointer newPointer = allocate(newSize, NULL);
+		if (preserve)
+			memcpy(newPointer, p, sizeof(T)*STDMIN(oldSize, newSize));
+		deallocate(p, oldSize);
+		return newPointer;
+	}
+
+	size_type max_size() const {return STDMAX(m_fallbackAllocator.max_size(), S);}
+
+private:
+#ifdef __BORLANDC__
+	T* GetAlignedArray() {return m_array;}
+	T m_array[S];
+#else
+	T* GetAlignedArray() {return (CRYPTOPP_BOOL_ALIGN16_ENABLED && T_Align16) ? (T*)(((byte *)m_array) + (0-(size_t)m_array)%16) : m_array;}
+	CRYPTOPP_ALIGN_DATA(8) T m_array[(CRYPTOPP_BOOL_ALIGN16_ENABLED && T_Align16) ? S+8/sizeof(T) : S];
+#endif
+	A m_fallbackAllocator;
+	bool m_allocated;
+};
+
+//! a block of memory allocated using A
+template <class T, class A = AllocatorWithCleanup<T> >
+class SecBlock
+{
+public:
+	typedef typename A::value_type value_type;
+	typedef typename A::pointer iterator;
+	typedef typename A::const_pointer const_iterator;
+	typedef typename A::size_type size_type;
+
+	explicit SecBlock(size_type size=0)
+		: m_size(size) {m_ptr = m_alloc.allocate(size, NULL);}
+	SecBlock(const SecBlock<T, A> &t)
+		: m_size(t.m_size) {m_ptr = m_alloc.allocate(m_size, NULL); memcpy_s(m_ptr, m_size*sizeof(T), t.m_ptr, m_size*sizeof(T));}
+	SecBlock(const T *t, size_type len)
+		: m_size(len)
+	{
+		m_ptr = m_alloc.allocate(len, NULL);
+		if (t == NULL)
+			memset_z(m_ptr, 0, len*sizeof(T));
+		else
+			memcpy(m_ptr, t, len*sizeof(T));
+	}
+
+	~SecBlock()
+		{m_alloc.deallocate(m_ptr, m_size);}
+
+#ifdef __BORLANDC__
+	operator T *() const
+		{return (T*)m_ptr;}
+#else
+	operator const void *() const
+		{return m_ptr;}
+	operator void *()
+		{return m_ptr;}
+
+	operator const T *() const
+		{return m_ptr;}
+	operator T *()
+		{return m_ptr;}
+#endif
+
+//	T *operator +(size_type offset)
+//		{return m_ptr+offset;}
+
+//	const T *operator +(size_type offset) const
+//		{return m_ptr+offset;}
+
+//	T& operator[](size_type index)
+//		{assert(index >= 0 && index < m_size); return m_ptr[index];}
+
+//	const T& operator[](size_type index) const
+//		{assert(index >= 0 && index < m_size); return m_ptr[index];}
+
+	iterator begin()
+		{return m_ptr;}
+	const_iterator begin() const
+		{return m_ptr;}
+	iterator end()
+		{return m_ptr+m_size;}
+	const_iterator end() const
+		{return m_ptr+m_size;}
+
+	typename A::pointer data() {return m_ptr;}
+	typename A::const_pointer data() const {return m_ptr;}
+
+	size_type size() const {return m_size;}
+	bool empty() const {return m_size == 0;}
+
+	byte * BytePtr() {return (byte *)m_ptr;}
+	const byte * BytePtr() const {return (const byte *)m_ptr;}
+	size_type SizeInBytes() const {return m_size*sizeof(T);}
+
+	//! set contents and size
+	void Assign(const T *t, size_type len)
+	{
+		New(len);
+		memcpy_s(m_ptr, m_size*sizeof(T), t, len*sizeof(T));
+	}
+
+	//! copy contents and size from another SecBlock
+	void Assign(const SecBlock<T, A> &t)
+	{
+		New(t.m_size);
+		memcpy_s(m_ptr, m_size*sizeof(T), t.m_ptr, m_size*sizeof(T));
+	}
+
+	SecBlock<T, A>& operator=(const SecBlock<T, A> &t)
+	{
+		Assign(t);
+		return *this;
+	}
+
+	// append to this object
+	SecBlock<T, A>& operator+=(const SecBlock<T, A> &t)
+	{
+		size_type oldSize = m_size;
+		Grow(m_size+t.m_size);
+		memcpy_s(m_ptr+oldSize, m_size*sizeof(T), t.m_ptr, t.m_size*sizeof(T));
+		return *this;
+	}
+
+	// append operator
+	SecBlock<T, A> operator+(const SecBlock<T, A> &t)
+	{
+		SecBlock<T, A> result(m_size+t.m_size);
+		memcpy_s(result.m_ptr, result.m_size*sizeof(T), m_ptr, m_size*sizeof(T));
+		memcpy_s(result.m_ptr+m_size, t.m_size*sizeof(T), t.m_ptr, t.m_size*sizeof(T));
+		return result;
+	}
+
+	bool operator==(const SecBlock<T, A> &t) const
+	{
+		return m_size == t.m_size && VerifyBufsEqual(m_ptr, t.m_ptr, m_size*sizeof(T));
+	}
+
+	bool operator!=(const SecBlock<T, A> &t) const
+	{
+		return !operator==(t);
+	}
+
+	//! change size, without preserving contents
+	void New(size_type newSize)
+	{
+		m_ptr = m_alloc.reallocate(m_ptr, m_size, newSize, false);
+		m_size = newSize;
+	}
+
+	//! change size and set contents to 0
+	void CleanNew(size_type newSize)
+	{
+		New(newSize);
+		memset_z(m_ptr, 0, m_size*sizeof(T));
+	}
+
+	//! change size only if newSize > current size. contents are preserved
+	void Grow(size_type newSize)
+	{
+		if (newSize > m_size)
+		{
+			m_ptr = m_alloc.reallocate(m_ptr, m_size, newSize, true);
+			m_size = newSize;
+		}
+	}
+
+	//! change size only if newSize > current size. contents are preserved and additional area is set to 0
+	void CleanGrow(size_type newSize)
+	{
+		if (newSize > m_size)
+		{
+			m_ptr = m_alloc.reallocate(m_ptr, m_size, newSize, true);
+			memset(m_ptr+m_size, 0, (newSize-m_size)*sizeof(T));
+			m_size = newSize;
+		}
+	}
+
+	//! change size and preserve contents
+	void resize(size_type newSize)
+	{
+		m_ptr = m_alloc.reallocate(m_ptr, m_size, newSize, true);
+		m_size = newSize;
+	}
+
+	//! swap contents and size with another SecBlock
+	void swap(SecBlock<T, A> &b)
+	{
+		std::swap(m_alloc, b.m_alloc);
+		std::swap(m_size, b.m_size);
+		std::swap(m_ptr, b.m_ptr);
+	}
+
+//private:
+	A m_alloc;
+	size_type m_size;
+	T *m_ptr;
+};
+
+typedef SecBlock<byte> SecByteBlock;
+typedef SecBlock<byte, AllocatorWithCleanup<byte, true> > AlignedSecByteBlock;
+typedef SecBlock<word> SecWordBlock;
+
+//! a SecBlock with fixed size, allocated statically
+template <class T, unsigned int S, class A = FixedSizeAllocatorWithCleanup<T, S> >
+class FixedSizeSecBlock : public SecBlock<T, A>
+{
+public:
+	explicit FixedSizeSecBlock() : SecBlock<T, A>(S) {}
+};
+
+template <class T, unsigned int S, bool T_Align16 = true>
+class FixedSizeAlignedSecBlock : public FixedSizeSecBlock<T, S, FixedSizeAllocatorWithCleanup<T, S, NullAllocator<T>, T_Align16> >
+{
+};
+
+//! a SecBlock that preallocates size S statically, and uses the heap when this size is exceeded
+template <class T, unsigned int S, class A = FixedSizeAllocatorWithCleanup<T, S, AllocatorWithCleanup<T> > >
+class SecBlockWithHint : public SecBlock<T, A>
+{
+public:
+	explicit SecBlockWithHint(size_t size) : SecBlock<T, A>(size) {}
+};
+
+template<class T, bool A, class U, bool B>
+inline bool operator==(const CryptoPP::AllocatorWithCleanup<T, A>&, const CryptoPP::AllocatorWithCleanup<U, B>&) {return (true);}
+template<class T, bool A, class U, bool B>
+inline bool operator!=(const CryptoPP::AllocatorWithCleanup<T, A>&, const CryptoPP::AllocatorWithCleanup<U, B>&) {return (false);}
+
+NAMESPACE_END
+
+NAMESPACE_BEGIN(std)
+template <class T, class A>
+inline void swap(CryptoPP::SecBlock<T, A> &a, CryptoPP::SecBlock<T, A> &b)
+{
+	a.swap(b);
+}
+
+#if defined(_STLP_DONT_SUPPORT_REBIND_MEMBER_TEMPLATE) || (defined(_STLPORT_VERSION) && !defined(_STLP_MEMBER_TEMPLATE_CLASSES))
+// working for STLport 5.1.3 and MSVC 6 SP5
+template <class _Tp1, class _Tp2>
+inline CryptoPP::AllocatorWithCleanup<_Tp2>&
+__stl_alloc_rebind(CryptoPP::AllocatorWithCleanup<_Tp1>& __a, const _Tp2*)
+{
+	return (CryptoPP::AllocatorWithCleanup<_Tp2>&)(__a);
+}
+#endif
+
+NAMESPACE_END
+
+#endif
--- a/cryptopp/sha.cpp
+++ b/cryptopp/sha.cpp
@ -0,0 +1,899 @@
+// sha.cpp - modified by Wei Dai from Steve Reid's public domain sha1.c
+
+// Steve Reid implemented SHA-1. Wei Dai implemented SHA-2.
+// Both are in the public domain.
+
+// use "cl /EP /P /DCRYPTOPP_GENERATE_X64_MASM sha.cpp" to generate MASM code
+
+#include "pch.h"
+
+#ifndef CRYPTOPP_IMPORTS
+#ifndef CRYPTOPP_GENERATE_X64_MASM
+
+#include "sha.h"
+#include "misc.h"
+#include "cpu.h"
+
+NAMESPACE_BEGIN(CryptoPP)
+
+// start of Steve Reid's code
+
+#define blk0(i) (W[i] = data[i])
+#define blk1(i) (W[i&15] = rotlFixed(W[(i+13)&15]^W[(i+8)&15]^W[(i+2)&15]^W[i&15],1))
+
+void SHA1::InitState(HashWordType *state)
+{
+	state[0] = 0x67452301L;
+	state[1] = 0xEFCDAB89L;
+	state[2] = 0x98BADCFEL;
+	state[3] = 0x10325476L;
+	state[4] = 0xC3D2E1F0L;
+}
+
+#define f1(x,y,z) (z^(x&(y^z)))
+#define f2(x,y,z) (x^y^z)
+#define f3(x,y,z) ((x&y)|(z&(x|y)))
+#define f4(x,y,z) (x^y^z)
+
+/* (R0+R1), R2, R3, R4 are the different operations used in SHA1 */
+#define R0(v,w,x,y,z,i) z+=f1(w,x,y)+blk0(i)+0x5A827999+rotlFixed(v,5);w=rotlFixed(w,30);
+#define R1(v,w,x,y,z,i) z+=f1(w,x,y)+blk1(i)+0x5A827999+rotlFixed(v,5);w=rotlFixed(w,30);
+#define R2(v,w,x,y,z,i) z+=f2(w,x,y)+blk1(i)+0x6ED9EBA1+rotlFixed(v,5);w=rotlFixed(w,30);
+#define R3(v,w,x,y,z,i) z+=f3(w,x,y)+blk1(i)+0x8F1BBCDC+rotlFixed(v,5);w=rotlFixed(w,30);
+#define R4(v,w,x,y,z,i) z+=f4(w,x,y)+blk1(i)+0xCA62C1D6+rotlFixed(v,5);w=rotlFixed(w,30);
+
+void SHA1::Transform(word32 *state, const word32 *data)
+{
+	word32 W[16];
+    /* Copy context->state[] to working vars */
+    word32 a = state[0];
+    word32 b = state[1];
+    word32 c = state[2];
+    word32 d = state[3];
+    word32 e = state[4];
+    /* 4 rounds of 20 operations each. Loop unrolled. */
+    R0(a,b,c,d,e, 0); R0(e,a,b,c,d, 1); R0(d,e,a,b,c, 2); R0(c,d,e,a,b, 3);
+    R0(b,c,d,e,a, 4); R0(a,b,c,d,e, 5); R0(e,a,b,c,d, 6); R0(d,e,a,b,c, 7);
+    R0(c,d,e,a,b, 8); R0(b,c,d,e,a, 9); R0(a,b,c,d,e,10); R0(e,a,b,c,d,11);
+    R0(d,e,a,b,c,12); R0(c,d,e,a,b,13); R0(b,c,d,e,a,14); R0(a,b,c,d,e,15);
+    R1(e,a,b,c,d,16); R1(d,e,a,b,c,17); R1(c,d,e,a,b,18); R1(b,c,d,e,a,19);
+    R2(a,b,c,d,e,20); R2(e,a,b,c,d,21); R2(d,e,a,b,c,22); R2(c,d,e,a,b,23);
+    R2(b,c,d,e,a,24); R2(a,b,c,d,e,25); R2(e,a,b,c,d,26); R2(d,e,a,b,c,27);
+    R2(c,d,e,a,b,28); R2(b,c,d,e,a,29); R2(a,b,c,d,e,30); R2(e,a,b,c,d,31);
+    R2(d,e,a,b,c,32); R2(c,d,e,a,b,33); R2(b,c,d,e,a,34); R2(a,b,c,d,e,35);
+    R2(e,a,b,c,d,36); R2(d,e,a,b,c,37); R2(c,d,e,a,b,38); R2(b,c,d,e,a,39);
+    R3(a,b,c,d,e,40); R3(e,a,b,c,d,41); R3(d,e,a,b,c,42); R3(c,d,e,a,b,43);
+    R3(b,c,d,e,a,44); R3(a,b,c,d,e,45); R3(e,a,b,c,d,46); R3(d,e,a,b,c,47);
+    R3(c,d,e,a,b,48); R3(b,c,d,e,a,49); R3(a,b,c,d,e,50); R3(e,a,b,c,d,51);
+    R3(d,e,a,b,c,52); R3(c,d,e,a,b,53); R3(b,c,d,e,a,54); R3(a,b,c,d,e,55);
+    R3(e,a,b,c,d,56); R3(d,e,a,b,c,57); R3(c,d,e,a,b,58); R3(b,c,d,e,a,59);
+    R4(a,b,c,d,e,60); R4(e,a,b,c,d,61); R4(d,e,a,b,c,62); R4(c,d,e,a,b,63);
+    R4(b,c,d,e,a,64); R4(a,b,c,d,e,65); R4(e,a,b,c,d,66); R4(d,e,a,b,c,67);
+    R4(c,d,e,a,b,68); R4(b,c,d,e,a,69); R4(a,b,c,d,e,70); R4(e,a,b,c,d,71);
+    R4(d,e,a,b,c,72); R4(c,d,e,a,b,73); R4(b,c,d,e,a,74); R4(a,b,c,d,e,75);
+    R4(e,a,b,c,d,76); R4(d,e,a,b,c,77); R4(c,d,e,a,b,78); R4(b,c,d,e,a,79);
+    /* Add the working vars back into context.state[] */
+    state[0] += a;
+    state[1] += b;
+    state[2] += c;
+    state[3] += d;
+    state[4] += e;
+}
+
+// end of Steve Reid's code
+
+// *************************************************************
+
+void SHA224::InitState(HashWordType *state)
+{
+	static const word32 s[8] = {0xc1059ed8, 0x367cd507, 0x3070dd17, 0xf70e5939, 0xffc00b31, 0x68581511, 0x64f98fa7, 0xbefa4fa4};
+	memcpy(state, s, sizeof(s));
+}
+
+void SHA256::InitState(HashWordType *state)
+{
+	static const word32 s[8] = {0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19};
+	memcpy(state, s, sizeof(s));
+}
+
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
+CRYPTOPP_ALIGN_DATA(16) extern const word32 SHA256_K[64] CRYPTOPP_SECTION_ALIGN16 = {
+#else
+extern const word32 SHA256_K[64] = {
+#endif
+	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
+	0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
+	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
+	0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
+	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
+	0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
+	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
+	0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
+	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
+	0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
+	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
+	0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
+	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
+	0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
+	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
+	0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+};
+
+#endif // #ifndef CRYPTOPP_GENERATE_X64_MASM
+
+#if defined(CRYPTOPP_X86_ASM_AVAILABLE) || defined(CRYPTOPP_GENERATE_X64_MASM)
+
+#pragma warning(disable: 4731)	// frame pointer register 'ebp' modified by inline assembly code
+
+static void CRYPTOPP_FASTCALL X86_SHA256_HashBlocks(word32 *state, const word32 *data, size_t len
+#if defined(_MSC_VER) && (_MSC_VER == 1200)
+	, ...	// VC60 workaround: prevent VC 6 from inlining this function
+#endif
+	)
+{
+#if defined(_MSC_VER) && (_MSC_VER == 1200)
+	AS2(mov ecx, [state])
+	AS2(mov edx, [data])
+#endif
+
+	#define LOCALS_SIZE	8*4 + 16*4 + 4*WORD_SZ
+	#define H(i)		[BASE+ASM_MOD(1024+7-(i),8)*4]
+	#define G(i)		H(i+1)
+	#define F(i)		H(i+2)
+	#define E(i)		H(i+3)
+	#define D(i)		H(i+4)
+	#define C(i)		H(i+5)
+	#define B(i)		H(i+6)
+	#define A(i)		H(i+7)
+	#define Wt(i)		BASE+8*4+ASM_MOD(1024+15-(i),16)*4
+	#define Wt_2(i)		Wt((i)-2)
+	#define Wt_15(i)	Wt((i)-15)
+	#define Wt_7(i)		Wt((i)-7)
+	#define K_END		[BASE+8*4+16*4+0*WORD_SZ]
+	#define STATE_SAVE	[BASE+8*4+16*4+1*WORD_SZ]
+	#define DATA_SAVE	[BASE+8*4+16*4+2*WORD_SZ]
+	#define DATA_END	[BASE+8*4+16*4+3*WORD_SZ]
+	#define Kt(i)		WORD_REG(si)+(i)*4
+#if CRYPTOPP_BOOL_X86
+	#define BASE		esp+4
+#elif defined(__GNUC__)
+	#define BASE		r8
+#else
+	#define BASE		rsp
+#endif
+
+#define RA0(i, edx, edi)		\
+	AS2(	add edx, [Kt(i)]	)\
+	AS2(	add edx, [Wt(i)]	)\
+	AS2(	add edx, H(i)		)\
+
+#define RA1(i, edx, edi)
+
+#define RB0(i, edx, edi)
+
+#define RB1(i, edx, edi)	\
+	AS2(	mov AS_REG_7d, [Wt_2(i)]	)\
+	AS2(	mov edi, [Wt_15(i)])\
+	AS2(	mov ebx, AS_REG_7d	)\
+	AS2(	shr AS_REG_7d, 10		)\
+	AS2(	ror ebx, 17		)\
+	AS2(	xor AS_REG_7d, ebx	)\
+	AS2(	ror ebx, 2		)\
+	AS2(	xor ebx, AS_REG_7d	)/* s1(W_t-2) */\
+	AS2(	add ebx, [Wt_7(i)])\
+	AS2(	mov AS_REG_7d, edi	)\
+	AS2(	shr AS_REG_7d, 3		)\
+	AS2(	ror edi, 7		)\
+	AS2(	add ebx, [Wt(i)])/* s1(W_t-2) + W_t-7 + W_t-16 */\
+	AS2(	xor AS_REG_7d, edi	)\
+	AS2(	add edx, [Kt(i)])\
+	AS2(	ror edi, 11		)\
+	AS2(	add edx, H(i)	)\
+	AS2(	xor AS_REG_7d, edi	)/* s0(W_t-15) */\
+	AS2(	add AS_REG_7d, ebx	)/* W_t = s1(W_t-2) + W_t-7 + s0(W_t-15) W_t-16*/\
+	AS2(	mov [Wt(i)], AS_REG_7d)\
+	AS2(	add edx, AS_REG_7d	)\
+
+#define ROUND(i, r, eax, ecx, edi, edx)\
+	/* in: edi = E	*/\
+	/* unused: eax, ecx, temp: ebx, AS_REG_7d, out: edx = T1 */\
+	AS2(	mov edx, F(i)	)\
+	AS2(	xor edx, G(i)	)\
+	AS2(	and edx, edi	)\
+	AS2(	xor edx, G(i)	)/* Ch(E,F,G) = (G^(E&(F^G))) */\
+	AS2(	mov AS_REG_7d, edi	)\
+	AS2(	ror edi, 6		)\
+	AS2(	ror AS_REG_7d, 25		)\
+	RA##r(i, edx, edi		)/* H + Wt + Kt + Ch(E,F,G) */\
+	AS2(	xor AS_REG_7d, edi	)\
+	AS2(	ror edi, 5		)\
+	AS2(	xor AS_REG_7d, edi	)/* S1(E) */\
+	AS2(	add edx, AS_REG_7d	)/* T1 = S1(E) + Ch(E,F,G) + H + Wt + Kt */\
+	RB##r(i, edx, edi		)/* H + Wt + Kt + Ch(E,F,G) */\
+	/* in: ecx = A, eax = B^C, edx = T1 */\
+	/* unused: edx, temp: ebx, AS_REG_7d, out: eax = A, ecx = B^C, edx = E */\
+	AS2(	mov ebx, ecx	)\
+	AS2(	xor ecx, B(i)	)/* A^B */\
+	AS2(	and eax, ecx	)\
+	AS2(	xor eax, B(i)	)/* Maj(A,B,C) = B^((A^B)&(B^C) */\
+	AS2(	mov AS_REG_7d, ebx	)\
+	AS2(	ror ebx, 2		)\
+	AS2(	add eax, edx	)/* T1 + Maj(A,B,C) */\
+	AS2(	add edx, D(i)	)\
+	AS2(	mov D(i), edx	)\
+	AS2(	ror AS_REG_7d, 22		)\
+	AS2(	xor AS_REG_7d, ebx	)\
+	AS2(	ror ebx, 11		)\
+	AS2(	xor AS_REG_7d, ebx	)\
+	AS2(	add eax, AS_REG_7d	)/* T1 + S0(A) + Maj(A,B,C) */\
+	AS2(	mov H(i), eax	)\
+
+#define SWAP_COPY(i)		\
+	AS2(	mov		WORD_REG(bx), [WORD_REG(dx)+i*WORD_SZ])\
+	AS1(	bswap	WORD_REG(bx))\
+	AS2(	mov		[Wt(i*(1+CRYPTOPP_BOOL_X64)+CRYPTOPP_BOOL_X64)], WORD_REG(bx))
+
+#if defined(__GNUC__)
+	#if CRYPTOPP_BOOL_X64
+		FixedSizeAlignedSecBlock<byte, LOCALS_SIZE> workspace;
+	#endif
+	__asm__ __volatile__
+	(
+	#if CRYPTOPP_BOOL_X64
+		"lea %4, %%r8;"
+	#endif
+	".intel_syntax noprefix;"
+#elif defined(CRYPTOPP_GENERATE_X64_MASM)
+		ALIGN   8
+	X86_SHA256_HashBlocks	PROC FRAME
+		rex_push_reg rsi
+		push_reg rdi
+		push_reg rbx
+		push_reg rbp
+		alloc_stack(LOCALS_SIZE+8)
+		.endprolog
+		mov rdi, r8
+		lea rsi, [?SHA256_K@CryptoPP@@3QBIB + 48*4]
+#endif
+
+#if CRYPTOPP_BOOL_X86
+	#ifndef __GNUC__
+		AS2(	mov		edi, [len])
+		AS2(	lea		WORD_REG(si), [SHA256_K+48*4])
+	#endif
+	#if !defined(_MSC_VER) || (_MSC_VER < 1400)
+		AS_PUSH_IF86(bx)
+	#endif
+
+	AS_PUSH_IF86(bp)
+	AS2(	mov		ebx, esp)
+	AS2(	and		esp, -16)
+	AS2(	sub		WORD_REG(sp), LOCALS_SIZE)
+	AS_PUSH_IF86(bx)
+#endif
+	AS2(	mov		STATE_SAVE, WORD_REG(cx))
+	AS2(	mov		DATA_SAVE, WORD_REG(dx))
+	AS2(	add		WORD_REG(di), WORD_REG(dx))
+	AS2(	mov		DATA_END, WORD_REG(di))
+	AS2(	mov		K_END, WORD_REG(si))
+
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
+#if CRYPTOPP_BOOL_X86
+	AS2(	test	edi, 1)
+	ASJ(	jnz,	2, f)
+#endif
+	AS2(	movdqa	xmm0, XMMWORD_PTR [WORD_REG(cx)+0*16])
+	AS2(	movdqa	xmm1, XMMWORD_PTR [WORD_REG(cx)+1*16])
+#endif
+
+#if CRYPTOPP_BOOL_X86
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
+	ASJ(	jmp,	0, f)
+#endif
+	ASL(2)	// non-SSE2
+	AS2(	mov		esi, ecx)
+	AS2(	lea		edi, A(0))
+	AS2(	mov		ecx, 8)
+	AS1(	rep movsd)
+	AS2(	mov		esi, K_END)
+	ASJ(	jmp,	3, f)
+#endif
+
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
+	ASL(0)
+	AS2(	movdqa	E(0), xmm1)
+	AS2(	movdqa	A(0), xmm0)
+#endif
+#if CRYPTOPP_BOOL_X86
+	ASL(3)
+#endif
+	AS2(	sub		WORD_REG(si), 48*4)
+	SWAP_COPY(0)	SWAP_COPY(1)	SWAP_COPY(2)	SWAP_COPY(3)
+	SWAP_COPY(4)	SWAP_COPY(5)	SWAP_COPY(6)	SWAP_COPY(7)
+#if CRYPTOPP_BOOL_X86
+	SWAP_COPY(8)	SWAP_COPY(9)	SWAP_COPY(10)	SWAP_COPY(11)
+	SWAP_COPY(12)	SWAP_COPY(13)	SWAP_COPY(14)	SWAP_COPY(15)
+#endif
+	AS2(	mov		edi, E(0))	// E
+	AS2(	mov		eax, B(0))	// B
+	AS2(	xor		eax, C(0))	// B^C
+	AS2(	mov		ecx, A(0))	// A
+
+	ROUND(0, 0, eax, ecx, edi, edx)
+	ROUND(1, 0, ecx, eax, edx, edi)
+	ROUND(2, 0, eax, ecx, edi, edx)
+	ROUND(3, 0, ecx, eax, edx, edi)
+	ROUND(4, 0, eax, ecx, edi, edx)
+	ROUND(5, 0, ecx, eax, edx, edi)
+	ROUND(6, 0, eax, ecx, edi, edx)
+	ROUND(7, 0, ecx, eax, edx, edi)
+	ROUND(8, 0, eax, ecx, edi, edx)
+	ROUND(9, 0, ecx, eax, edx, edi)
+	ROUND(10, 0, eax, ecx, edi, edx)
+	ROUND(11, 0, ecx, eax, edx, edi)
+	ROUND(12, 0, eax, ecx, edi, edx)
+	ROUND(13, 0, ecx, eax, edx, edi)
+	ROUND(14, 0, eax, ecx, edi, edx)
+	ROUND(15, 0, ecx, eax, edx, edi)
+
+	ASL(1)
+	AS2(add WORD_REG(si), 4*16)
+	ROUND(0, 1, eax, ecx, edi, edx)
+	ROUND(1, 1, ecx, eax, edx, edi)
+	ROUND(2, 1, eax, ecx, edi, edx)
+	ROUND(3, 1, ecx, eax, edx, edi)
+	ROUND(4, 1, eax, ecx, edi, edx)
+	ROUND(5, 1, ecx, eax, edx, edi)
+	ROUND(6, 1, eax, ecx, edi, edx)
+	ROUND(7, 1, ecx, eax, edx, edi)
+	ROUND(8, 1, eax, ecx, edi, edx)
+	ROUND(9, 1, ecx, eax, edx, edi)
+	ROUND(10, 1, eax, ecx, edi, edx)
+	ROUND(11, 1, ecx, eax, edx, edi)
+	ROUND(12, 1, eax, ecx, edi, edx)
+	ROUND(13, 1, ecx, eax, edx, edi)
+	ROUND(14, 1, eax, ecx, edi, edx)
+	ROUND(15, 1, ecx, eax, edx, edi)
+	AS2(	cmp		WORD_REG(si), K_END)
+	ASJ(	jne,	1, b)
+
+	AS2(	mov		WORD_REG(dx), DATA_SAVE)
+	AS2(	add		WORD_REG(dx), 64)
+	AS2(	mov		AS_REG_7, STATE_SAVE)
+	AS2(	mov		DATA_SAVE, WORD_REG(dx))
+
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
+#if CRYPTOPP_BOOL_X86
+	AS2(	test	DWORD PTR DATA_END, 1)
+	ASJ(	jnz,	4, f)
+#endif
+	AS2(	movdqa	xmm1, XMMWORD_PTR [AS_REG_7+1*16])
+	AS2(	movdqa	xmm0, XMMWORD_PTR [AS_REG_7+0*16])
+	AS2(	paddd	xmm1, E(0))
+	AS2(	paddd	xmm0, A(0))
+	AS2(	movdqa	[AS_REG_7+1*16], xmm1)
+	AS2(	movdqa	[AS_REG_7+0*16], xmm0)
+	AS2(	cmp		WORD_REG(dx), DATA_END)
+	ASJ(	jl,		0, b)
+#endif
+
+#if CRYPTOPP_BOOL_X86
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
+	ASJ(	jmp,	5, f)
+	ASL(4)	// non-SSE2
+#endif
+	AS2(	add		[AS_REG_7+0*4], ecx)	// A
+	AS2(	add		[AS_REG_7+4*4], edi)	// E
+	AS2(	mov		eax, B(0))
+	AS2(	mov		ebx, C(0))
+	AS2(	mov		ecx, D(0))
+	AS2(	add		[AS_REG_7+1*4], eax)
+	AS2(	add		[AS_REG_7+2*4], ebx)
+	AS2(	add		[AS_REG_7+3*4], ecx)
+	AS2(	mov		eax, F(0))
+	AS2(	mov		ebx, G(0))
+	AS2(	mov		ecx, H(0))
+	AS2(	add		[AS_REG_7+5*4], eax)
+	AS2(	add		[AS_REG_7+6*4], ebx)
+	AS2(	add		[AS_REG_7+7*4], ecx)
+	AS2(	mov		ecx, AS_REG_7d)
+	AS2(	cmp		WORD_REG(dx), DATA_END)
+	ASJ(	jl,		2, b)
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
+	ASL(5)
+#endif
+#endif
+
+	AS_POP_IF86(sp)
+	AS_POP_IF86(bp)
+	#if !defined(_MSC_VER) || (_MSC_VER < 1400)
+		AS_POP_IF86(bx)
+	#endif
+
+#ifdef CRYPTOPP_GENERATE_X64_MASM
+	add		rsp, LOCALS_SIZE+8
+	pop		rbp
+	pop		rbx
+	pop		rdi
+	pop		rsi
+	ret
+	X86_SHA256_HashBlocks ENDP
+#endif
+
+#ifdef __GNUC__
+	".att_syntax prefix;"
+	: 
+	: "c" (state), "d" (data), "S" (SHA256_K+48), "D" (len)
+	#if CRYPTOPP_BOOL_X64
+		, "m" (workspace[0])
+	#endif
+	: "memory", "cc", "%eax"
+	#if CRYPTOPP_BOOL_X64
+		, "%rbx", "%r8"
+	#endif
+	);
+#endif
+}
+
+#endif	// #if defined(CRYPTOPP_X86_ASM_AVAILABLE) || defined(CRYPTOPP_GENERATE_X64_MASM)
+
+#ifndef CRYPTOPP_GENERATE_X64_MASM
+
+#ifdef CRYPTOPP_X64_MASM_AVAILABLE
+extern "C" {
+void CRYPTOPP_FASTCALL X86_SHA256_HashBlocks(word32 *state, const word32 *data, size_t len);
+}
+#endif
+
+#if defined(CRYPTOPP_X86_ASM_AVAILABLE) || defined(CRYPTOPP_X64_MASM_AVAILABLE)
+
+size_t SHA256::HashMultipleBlocks(const word32 *input, size_t length)
+{
+	X86_SHA256_HashBlocks(m_state, input, (length&(size_t(0)-BLOCKSIZE)) - !HasSSE2());
+	return length % BLOCKSIZE;
+}
+
+size_t SHA224::HashMultipleBlocks(const word32 *input, size_t length)
+{
+	X86_SHA256_HashBlocks(m_state, input, (length&(size_t(0)-BLOCKSIZE)) - !HasSSE2());
+	return length % BLOCKSIZE;
+}
+
+#endif
+
+#define blk2(i) (W[i&15]+=s1(W[(i-2)&15])+W[(i-7)&15]+s0(W[(i-15)&15]))
+
+#define Ch(x,y,z) (z^(x&(y^z)))
+#define Maj(x,y,z) (y^((x^y)&(y^z)))
+
+#define a(i) T[(0-i)&7]
+#define b(i) T[(1-i)&7]
+#define c(i) T[(2-i)&7]
+#define d(i) T[(3-i)&7]
+#define e(i) T[(4-i)&7]
+#define f(i) T[(5-i)&7]
+#define g(i) T[(6-i)&7]
+#define h(i) T[(7-i)&7]
+
+#define R(i) h(i)+=S1(e(i))+Ch(e(i),f(i),g(i))+SHA256_K[i+j]+(j?blk2(i):blk0(i));\
+	d(i)+=h(i);h(i)+=S0(a(i))+Maj(a(i),b(i),c(i))
+
+// for SHA256
+#define S0(x) (rotrFixed(x,2)^rotrFixed(x,13)^rotrFixed(x,22))
+#define S1(x) (rotrFixed(x,6)^rotrFixed(x,11)^rotrFixed(x,25))
+#define s0(x) (rotrFixed(x,7)^rotrFixed(x,18)^(x>>3))
+#define s1(x) (rotrFixed(x,17)^rotrFixed(x,19)^(x>>10))
+
+void SHA256::Transform(word32 *state, const word32 *data)
+{
+	word32 W[16];
+#if defined(CRYPTOPP_X86_ASM_AVAILABLE) || defined(CRYPTOPP_X64_MASM_AVAILABLE)
+	// this byte reverse is a waste of time, but this function is only called by MDC
+	ByteReverse(W, data, BLOCKSIZE);
+	X86_SHA256_HashBlocks(state, W, BLOCKSIZE - !HasSSE2());
+#else
+	word32 T[8];
+    /* Copy context->state[] to working vars */
+	memcpy(T, state, sizeof(T));
+    /* 64 operations, partially loop unrolled */
+	for (unsigned int j=0; j<64; j+=16)
+	{
+		R( 0); R( 1); R( 2); R( 3);
+		R( 4); R( 5); R( 6); R( 7);
+		R( 8); R( 9); R(10); R(11);
+		R(12); R(13); R(14); R(15);
+	}
+    /* Add the working vars back into context.state[] */
+    state[0] += a(0);
+    state[1] += b(0);
+    state[2] += c(0);
+    state[3] += d(0);
+    state[4] += e(0);
+    state[5] += f(0);
+    state[6] += g(0);
+    state[7] += h(0);
+#endif
+}
+
+/* 
+// smaller but slower
+void SHA256::Transform(word32 *state, const word32 *data)
+{
+	word32 T[20];
+	word32 W[32];
+	unsigned int i = 0, j = 0;
+	word32 *t = T+8;
+
+	memcpy(t, state, 8*4);
+	word32 e = t[4], a = t[0];
+
+	do 
+	{
+		word32 w = data[j];
+		W[j] = w;
+		w += SHA256_K[j];
+		w += t[7];
+		w += S1(e);
+		w += Ch(e, t[5], t[6]);
+		e = t[3] + w;
+		t[3] = t[3+8] = e;
+		w += S0(t[0]);
+		a = w + Maj(a, t[1], t[2]);
+		t[-1] = t[7] = a;
+		--t;
+		++j;
+		if (j%8 == 0)
+			t += 8;
+	} while (j<16);
+
+	do
+	{
+		i = j&0xf;
+		word32 w = s1(W[i+16-2]) + s0(W[i+16-15]) + W[i] + W[i+16-7];
+		W[i+16] = W[i] = w;
+		w += SHA256_K[j];
+		w += t[7];
+		w += S1(e);
+		w += Ch(e, t[5], t[6]);
+		e = t[3] + w;
+		t[3] = t[3+8] = e;
+		w += S0(t[0]);
+		a = w + Maj(a, t[1], t[2]);
+		t[-1] = t[7] = a;
+
+		w = s1(W[(i+1)+16-2]) + s0(W[(i+1)+16-15]) + W[(i+1)] + W[(i+1)+16-7];
+		W[(i+1)+16] = W[(i+1)] = w;
+		w += SHA256_K[j+1];
+		w += (t-1)[7];
+		w += S1(e);
+		w += Ch(e, (t-1)[5], (t-1)[6]);
+		e = (t-1)[3] + w;
+		(t-1)[3] = (t-1)[3+8] = e;
+		w += S0((t-1)[0]);
+		a = w + Maj(a, (t-1)[1], (t-1)[2]);
+		(t-1)[-1] = (t-1)[7] = a;
+
+		t-=2;
+		j+=2;
+		if (j%8 == 0)
+			t += 8;
+	} while (j<64);
+
+    state[0] += a;
+    state[1] += t[1];
+    state[2] += t[2];
+    state[3] += t[3];
+    state[4] += e;
+    state[5] += t[5];
+    state[6] += t[6];
+    state[7] += t[7];
+}
+*/
+
+#undef S0
+#undef S1
+#undef s0
+#undef s1
+#undef R
+
+// *************************************************************
+
+void SHA384::InitState(HashWordType *state)
+{
+	static const word64 s[8] = {
+		W64LIT(0xcbbb9d5dc1059ed8), W64LIT(0x629a292a367cd507),
+		W64LIT(0x9159015a3070dd17), W64LIT(0x152fecd8f70e5939),
+		W64LIT(0x67332667ffc00b31), W64LIT(0x8eb44a8768581511),
+		W64LIT(0xdb0c2e0d64f98fa7), W64LIT(0x47b5481dbefa4fa4)};
+	memcpy(state, s, sizeof(s));
+}
+
+void SHA512::InitState(HashWordType *state)
+{
+	static const word64 s[8] = {
+		W64LIT(0x6a09e667f3bcc908), W64LIT(0xbb67ae8584caa73b),
+		W64LIT(0x3c6ef372fe94f82b), W64LIT(0xa54ff53a5f1d36f1),
+		W64LIT(0x510e527fade682d1), W64LIT(0x9b05688c2b3e6c1f),
+		W64LIT(0x1f83d9abfb41bd6b), W64LIT(0x5be0cd19137e2179)};
+	memcpy(state, s, sizeof(s));
+}
+
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE && CRYPTOPP_BOOL_X86
+CRYPTOPP_ALIGN_DATA(16) static const word64 SHA512_K[80] CRYPTOPP_SECTION_ALIGN16 = {
+#else
+static const word64 SHA512_K[80] = {
+#endif
+	W64LIT(0x428a2f98d728ae22), W64LIT(0x7137449123ef65cd),
+	W64LIT(0xb5c0fbcfec4d3b2f), W64LIT(0xe9b5dba58189dbbc),
+	W64LIT(0x3956c25bf348b538), W64LIT(0x59f111f1b605d019),
+	W64LIT(0x923f82a4af194f9b), W64LIT(0xab1c5ed5da6d8118),
+	W64LIT(0xd807aa98a3030242), W64LIT(0x12835b0145706fbe),
+	W64LIT(0x243185be4ee4b28c), W64LIT(0x550c7dc3d5ffb4e2),
+	W64LIT(0x72be5d74f27b896f), W64LIT(0x80deb1fe3b1696b1),
+	W64LIT(0x9bdc06a725c71235), W64LIT(0xc19bf174cf692694),
+	W64LIT(0xe49b69c19ef14ad2), W64LIT(0xefbe4786384f25e3),
+	W64LIT(0x0fc19dc68b8cd5b5), W64LIT(0x240ca1cc77ac9c65),
+	W64LIT(0x2de92c6f592b0275), W64LIT(0x4a7484aa6ea6e483),
+	W64LIT(0x5cb0a9dcbd41fbd4), W64LIT(0x76f988da831153b5),
+	W64LIT(0x983e5152ee66dfab), W64LIT(0xa831c66d2db43210),
+	W64LIT(0xb00327c898fb213f), W64LIT(0xbf597fc7beef0ee4),
+	W64LIT(0xc6e00bf33da88fc2), W64LIT(0xd5a79147930aa725),
+	W64LIT(0x06ca6351e003826f), W64LIT(0x142929670a0e6e70),
+	W64LIT(0x27b70a8546d22ffc), W64LIT(0x2e1b21385c26c926),
+	W64LIT(0x4d2c6dfc5ac42aed), W64LIT(0x53380d139d95b3df),
+	W64LIT(0x650a73548baf63de), W64LIT(0x766a0abb3c77b2a8),
+	W64LIT(0x81c2c92e47edaee6), W64LIT(0x92722c851482353b),
+	W64LIT(0xa2bfe8a14cf10364), W64LIT(0xa81a664bbc423001),
+	W64LIT(0xc24b8b70d0f89791), W64LIT(0xc76c51a30654be30),
+	W64LIT(0xd192e819d6ef5218), W64LIT(0xd69906245565a910),
+	W64LIT(0xf40e35855771202a), W64LIT(0x106aa07032bbd1b8),
+	W64LIT(0x19a4c116b8d2d0c8), W64LIT(0x1e376c085141ab53),
+	W64LIT(0x2748774cdf8eeb99), W64LIT(0x34b0bcb5e19b48a8),
+	W64LIT(0x391c0cb3c5c95a63), W64LIT(0x4ed8aa4ae3418acb),
+	W64LIT(0x5b9cca4f7763e373), W64LIT(0x682e6ff3d6b2b8a3),
+	W64LIT(0x748f82ee5defb2fc), W64LIT(0x78a5636f43172f60),
+	W64LIT(0x84c87814a1f0ab72), W64LIT(0x8cc702081a6439ec),
+	W64LIT(0x90befffa23631e28), W64LIT(0xa4506cebde82bde9),
+	W64LIT(0xbef9a3f7b2c67915), W64LIT(0xc67178f2e372532b),
+	W64LIT(0xca273eceea26619c), W64LIT(0xd186b8c721c0c207),
+	W64LIT(0xeada7dd6cde0eb1e), W64LIT(0xf57d4f7fee6ed178),
+	W64LIT(0x06f067aa72176fba), W64LIT(0x0a637dc5a2c898a6),
+	W64LIT(0x113f9804bef90dae), W64LIT(0x1b710b35131c471b),
+	W64LIT(0x28db77f523047d84), W64LIT(0x32caab7b40c72493),
+	W64LIT(0x3c9ebe0a15c9bebc), W64LIT(0x431d67c49c100d4c),
+	W64LIT(0x4cc5d4becb3e42b6), W64LIT(0x597f299cfc657e2a),
+	W64LIT(0x5fcb6fab3ad6faec), W64LIT(0x6c44198c4a475817)
+};
+
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE && CRYPTOPP_BOOL_X86
+// put assembly version in separate function, otherwise MSVC 2005 SP1 doesn't generate correct code for the non-assembly version
+CRYPTOPP_NAKED static void CRYPTOPP_FASTCALL SHA512_SSE2_Transform(word64 *state, const word64 *data)
+{
+#ifdef __GNUC__
+	__asm__ __volatile__
+	(
+		".intel_syntax noprefix;"
+	AS1(	push	ebx)
+	AS2(	mov		ebx, eax)
+#else
+	AS1(	push	ebx)
+	AS1(	push	esi)
+	AS1(	push	edi)
+	AS2(	lea		ebx, SHA512_K)
+#endif
+
+	AS2(	mov		eax, esp)
+	AS2(	and		esp, 0xfffffff0)
+	AS2(	sub		esp, 27*16)				// 17*16 for expanded data, 20*8 for state
+	AS1(	push	eax)
+	AS2(	xor		eax, eax)
+	AS2(	lea		edi, [esp+4+8*8])		// start at middle of state buffer. will decrement pointer each round to avoid copying
+	AS2(	lea		esi, [esp+4+20*8+8])	// 16-byte alignment, then add 8
+
+	AS2(	movdqa	xmm0, [ecx+0*16])
+	AS2(	movdq2q	mm4, xmm0)
+	AS2(	movdqa	[edi+0*16], xmm0)
+	AS2(	movdqa	xmm0, [ecx+1*16])
+	AS2(	movdqa	[edi+1*16], xmm0)
+	AS2(	movdqa	xmm0, [ecx+2*16])
+	AS2(	movdq2q	mm5, xmm0)
+	AS2(	movdqa	[edi+2*16], xmm0)
+	AS2(	movdqa	xmm0, [ecx+3*16])
+	AS2(	movdqa	[edi+3*16], xmm0)
+	ASJ(	jmp,	0, f)
+
+#define SSE2_S0_S1(r, a, b, c)	\
+	AS2(	movq	mm6, r)\
+	AS2(	psrlq	r, a)\
+	AS2(	movq	mm7, r)\
+	AS2(	psllq	mm6, 64-c)\
+	AS2(	pxor	mm7, mm6)\
+	AS2(	psrlq	r, b-a)\
+	AS2(	pxor	mm7, r)\
+	AS2(	psllq	mm6, c-b)\
+	AS2(	pxor	mm7, mm6)\
+	AS2(	psrlq	r, c-b)\
+	AS2(	pxor	r, mm7)\
+	AS2(	psllq	mm6, b-a)\
+	AS2(	pxor	r, mm6)
+
+#define SSE2_s0(r, a, b, c)	\
+	AS2(	movdqa	xmm6, r)\
+	AS2(	psrlq	r, a)\
+	AS2(	movdqa	xmm7, r)\
+	AS2(	psllq	xmm6, 64-c)\
+	AS2(	pxor	xmm7, xmm6)\
+	AS2(	psrlq	r, b-a)\
+	AS2(	pxor	xmm7, r)\
+	AS2(	psrlq	r, c-b)\
+	AS2(	pxor	r, xmm7)\
+	AS2(	psllq	xmm6, c-a)\
+	AS2(	pxor	r, xmm6)
+
+#define SSE2_s1(r, a, b, c)	\
+	AS2(	movdqa	xmm6, r)\
+	AS2(	psrlq	r, a)\
+	AS2(	movdqa	xmm7, r)\
+	AS2(	psllq	xmm6, 64-c)\
+	AS2(	pxor	xmm7, xmm6)\
+	AS2(	psrlq	r, b-a)\
+	AS2(	pxor	xmm7, r)\
+	AS2(	psllq	xmm6, c-b)\
+	AS2(	pxor	xmm7, xmm6)\
+	AS2(	psrlq	r, c-b)\
+	AS2(	pxor	r, xmm7)
+
+	ASL(SHA512_Round)
+	// k + w is in mm0, a is in mm4, e is in mm5
+	AS2(	paddq	mm0, [edi+7*8])		// h
+	AS2(	movq	mm2, [edi+5*8])		// f
+	AS2(	movq	mm3, [edi+6*8])		// g
+	AS2(	pxor	mm2, mm3)
+	AS2(	pand	mm2, mm5)
+	SSE2_S0_S1(mm5,14,18,41)
+	AS2(	pxor	mm2, mm3)
+	AS2(	paddq	mm0, mm2)			// h += Ch(e,f,g)
+	AS2(	paddq	mm5, mm0)			// h += S1(e)
+	AS2(	movq	mm2, [edi+1*8])		// b
+	AS2(	movq	mm1, mm2)
+	AS2(	por		mm2, mm4)
+	AS2(	pand	mm2, [edi+2*8])		// c
+	AS2(	pand	mm1, mm4)
+	AS2(	por		mm1, mm2)
+	AS2(	paddq	mm1, mm5)			// temp = h + Maj(a,b,c)
+	AS2(	paddq	mm5, [edi+3*8])		// e = d + h
+	AS2(	movq	[edi+3*8], mm5)
+	AS2(	movq	[edi+11*8], mm5)
+	SSE2_S0_S1(mm4,28,34,39)			// S0(a)
+	AS2(	paddq	mm4, mm1)			// a = temp + S0(a)
+	AS2(	movq	[edi-8], mm4)
+	AS2(	movq	[edi+7*8], mm4)
+	AS1(	ret)
+
+	// first 16 rounds
+	ASL(0)
+	AS2(	movq	mm0, [edx+eax*8])
+	AS2(	movq	[esi+eax*8], mm0)
+	AS2(	movq	[esi+eax*8+16*8], mm0)
+	AS2(	paddq	mm0, [ebx+eax*8])
+	ASC(	call,	SHA512_Round)
+	AS1(	inc		eax)
+	AS2(	sub		edi, 8)
+	AS2(	test	eax, 7)
+	ASJ(	jnz,	0, b)
+	AS2(	add		edi, 8*8)
+	AS2(	cmp		eax, 16)
+	ASJ(	jne,	0, b)
+
+	// rest of the rounds
+	AS2(	movdqu	xmm0, [esi+(16-2)*8])
+	ASL(1)
+	// data expansion, W[i-2] already in xmm0
+	AS2(	movdqu	xmm3, [esi])
+	AS2(	paddq	xmm3, [esi+(16-7)*8])
+	AS2(	movdqa	xmm2, [esi+(16-15)*8])
+	SSE2_s1(xmm0, 6, 19, 61)
+	AS2(	paddq	xmm0, xmm3)
+	SSE2_s0(xmm2, 1, 7, 8)
+	AS2(	paddq	xmm0, xmm2)
+	AS2(	movdq2q	mm0, xmm0)
+	AS2(	movhlps	xmm1, xmm0)
+	AS2(	paddq	mm0, [ebx+eax*8])
+	AS2(	movlps	[esi], xmm0)
+	AS2(	movlps	[esi+8], xmm1)
+	AS2(	movlps	[esi+8*16], xmm0)
+	AS2(	movlps	[esi+8*17], xmm1)
+	// 2 rounds
+	ASC(	call,	SHA512_Round)
+	AS2(	sub		edi, 8)
+	AS2(	movdq2q	mm0, xmm1)
+	AS2(	paddq	mm0, [ebx+eax*8+8])
+	ASC(	call,	SHA512_Round)
+	// update indices and loop
+	AS2(	add		esi, 16)
+	AS2(	add		eax, 2)
+	AS2(	sub		edi, 8)
+	AS2(	test	eax, 7)
+	ASJ(	jnz,	1, b)
+	// do housekeeping every 8 rounds
+	AS2(	mov		esi, 0xf)
+	AS2(	and		esi, eax)
+	AS2(	lea		esi, [esp+4+20*8+8+esi*8])
+	AS2(	add		edi, 8*8)
+	AS2(	cmp		eax, 80)
+	ASJ(	jne,	1, b)
+
+#define SSE2_CombineState(i)	\
+	AS2(	movdqa	xmm0, [edi+i*16])\
+	AS2(	paddq	xmm0, [ecx+i*16])\
+	AS2(	movdqa	[ecx+i*16], xmm0)
+
+	SSE2_CombineState(0)
+	SSE2_CombineState(1)
+	SSE2_CombineState(2)
+	SSE2_CombineState(3)
+
+	AS1(	pop		esp)
+	AS1(	emms)
+
+#if defined(__GNUC__)
+	AS1(	pop		ebx)
+	".att_syntax prefix;"
+		:
+		: "a" (SHA512_K), "c" (state), "d" (data)
+		: "%esi", "%edi", "memory", "cc"
+	);
+#else
+	AS1(	pop		edi)
+	AS1(	pop		esi)
+	AS1(	pop		ebx)
+	AS1(	ret)
+#endif
+}
+#endif	// #if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
+
+void SHA512::Transform(word64 *state, const word64 *data)
+{
+#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE && CRYPTOPP_BOOL_X86
+	if (HasSSE2())
+	{
+		SHA512_SSE2_Transform(state, data);
+		return;
+	}
+#endif
+
+#define S0(x) (rotrFixed(x,28)^rotrFixed(x,34)^rotrFixed(x,39))
+#define S1(x) (rotrFixed(x,14)^rotrFixed(x,18)^rotrFixed(x,41))
+#define s0(x) (rotrFixed(x,1)^rotrFixed(x,8)^(x>>7))
+#define s1(x) (rotrFixed(x,19)^rotrFixed(x,61)^(x>>6))
+
+#define R(i) h(i)+=S1(e(i))+Ch(e(i),f(i),g(i))+SHA512_K[i+j]+(j?blk2(i):blk0(i));\
+	d(i)+=h(i);h(i)+=S0(a(i))+Maj(a(i),b(i),c(i))
+
+	word64 W[16];
+	word64 T[8];
+    /* Copy context->state[] to working vars */
+	memcpy(T, state, sizeof(T));
+    /* 80 operations, partially loop unrolled */
+	for (unsigned int j=0; j<80; j+=16)
+	{
+		R( 0); R( 1); R( 2); R( 3);
+		R( 4); R( 5); R( 6); R( 7);
+		R( 8); R( 9); R(10); R(11);
+		R(12); R(13); R(14); R(15);
+	}
+    /* Add the working vars back into context.state[] */
+    state[0] += a(0);
+    state[1] += b(0);
+    state[2] += c(0);
+    state[3] += d(0);
+    state[4] += e(0);
+    state[5] += f(0);
+    state[6] += g(0);
+    state[7] += h(0);
+}
+
+NAMESPACE_END
+
+#endif	// #ifndef CRYPTOPP_GENERATE_X64_MASM
+#endif	// #ifndef CRYPTOPP_IMPORTS
--- a/cryptopp/sha.h
+++ b/cryptopp/sha.h
@ -0,0 +1,63 @@
+#ifndef CRYPTOPP_SHA_H
+#define CRYPTOPP_SHA_H
+
+#include "iterhash.h"
+
+NAMESPACE_BEGIN(CryptoPP)
+
+/// <a href="http://www.weidai.com/scan-mirror/md.html#SHA-1">SHA-1</a>
+class CRYPTOPP_DLL SHA1 : public IteratedHashWithStaticTransform<word32, BigEndian, 64, 20, SHA1>
+{
+public:
+	static void CRYPTOPP_API InitState(HashWordType *state);
+	static void CRYPTOPP_API Transform(word32 *digest, const word32 *data);
+	static const char * CRYPTOPP_API StaticAlgorithmName() {return "SHA-1";}
+};
+
+typedef SHA1 SHA;	// for backwards compatibility
+
+//! implements the SHA-256 standard
+class CRYPTOPP_DLL SHA256 : public IteratedHashWithStaticTransform<word32, BigEndian, 64, 32, SHA256, 32, true>
+{
+public:
+#if defined(CRYPTOPP_X86_ASM_AVAILABLE) || defined(CRYPTOPP_X64_MASM_AVAILABLE)
+	size_t HashMultipleBlocks(const word32 *input, size_t length);
+#endif
+	static void CRYPTOPP_API InitState(HashWordType *state);
+	static void CRYPTOPP_API Transform(word32 *digest, const word32 *data);
+	static const char * CRYPTOPP_API StaticAlgorithmName() {return "SHA-256";}
+};
+
+//! implements the SHA-224 standard
+class CRYPTOPP_DLL SHA224 : public IteratedHashWithStaticTransform<word32, BigEndian, 64, 32, SHA224, 28, true>
+{
+public:
+#if defined(CRYPTOPP_X86_ASM_AVAILABLE) || defined(CRYPTOPP_X64_MASM_AVAILABLE)
+	size_t HashMultipleBlocks(const word32 *input, size_t length);
+#endif
+	static void CRYPTOPP_API InitState(HashWordType *state);
+	static void CRYPTOPP_API Transform(word32 *digest, const word32 *data) {SHA256::Transform(digest, data);}
+	static const char * CRYPTOPP_API StaticAlgorithmName() {return "SHA-224";}
+};
+
+//! implements the SHA-512 standard
+class CRYPTOPP_DLL SHA512 : public IteratedHashWithStaticTransform<word64, BigEndian, 128, 64, SHA512, 64, CRYPTOPP_BOOL_X86>
+{
+public:
+	static void CRYPTOPP_API InitState(HashWordType *state);
+	static void CRYPTOPP_API Transform(word64 *digest, const word64 *data);
+	static const char * CRYPTOPP_API StaticAlgorithmName() {return "SHA-512";}
+};
+
+//! implements the SHA-384 standard
+class CRYPTOPP_DLL SHA384 : public IteratedHashWithStaticTransform<word64, BigEndian, 128, 64, SHA384, 48, CRYPTOPP_BOOL_X86>
+{
+public:
+	static void CRYPTOPP_API InitState(HashWordType *state);
+	static void CRYPTOPP_API Transform(word64 *digest, const word64 *data) {SHA512::Transform(digest, data);}
+	static const char * CRYPTOPP_API StaticAlgorithmName() {return "SHA-384";}
+};
+
+NAMESPACE_END
+
+#endif
--- a/cryptopp/simple.h
+++ b/cryptopp/simple.h
@ -0,0 +1 @@
+
--- a/cryptopp/smartptr.h
+++ b/cryptopp/smartptr.h
@ -0,0 +1,223 @@
+#ifndef CRYPTOPP_SMARTPTR_H
+#define CRYPTOPP_SMARTPTR_H
+
+#include "config.h"
+#include <algorithm>
+
+NAMESPACE_BEGIN(CryptoPP)
+
+template <class T> class simple_ptr
+{
+public:
+	simple_ptr() : m_p(NULL) {}
+	~simple_ptr() {delete m_p;}
+	T *m_p;
+};
+
+template <class T> class member_ptr
+{
+public:
+	explicit member_ptr(T *p = NULL) : m_p(p) {}
+
+	~member_ptr();
+
+	const T& operator*() const { return *m_p; }
+	T& operator*() { return *m_p; }
+
+	const T* operator->() const { return m_p; }
+	T* operator->() { return m_p; }
+
+	const T* get() const { return m_p; }
+	T* get() { return m_p; }
+
+	T* release()
+	{
+		T *old_p = m_p;
+		m_p = 0;
+		return old_p;
+	} 
+
+	void reset(T *p = 0);
+
+protected:
+	member_ptr(const member_ptr<T>& rhs);		// copy not allowed
+	void operator=(const member_ptr<T>& rhs);	// assignment not allowed
+
+	T *m_p;
+};
+
+template <class T> member_ptr<T>::~member_ptr() {delete m_p;}
+template <class T> void member_ptr<T>::reset(T *p) {delete m_p; m_p = p;}
+
+// ********************************************************
+
+template<class T> class value_ptr : public member_ptr<T>
+{
+public:
+	value_ptr(const T &obj) : member_ptr<T>(new T(obj)) {}
+	value_ptr(T *p = NULL) : member_ptr<T>(p) {}
+	value_ptr(const value_ptr<T>& rhs)
+		: member_ptr<T>(rhs.m_p ? new T(*rhs.m_p) : NULL) {}
+
+	value_ptr<T>& operator=(const value_ptr<T>& rhs);
+	bool operator==(const value_ptr<T>& rhs)
+	{
+		return (!this->m_p && !rhs.m_p) || (this->m_p && rhs.m_p && *this->m_p == *rhs.m_p);
+	}
+};
+
+template <class T> value_ptr<T>& value_ptr<T>::operator=(const value_ptr<T>& rhs)
+{
+	T *old_p = this->m_p;
+	this->m_p = rhs.m_p ? new T(*rhs.m_p) : NULL;
+	delete old_p;
+	return *this;
+}
+
+// ********************************************************
+
+template<class T> class clonable_ptr : public member_ptr<T>
+{
+public:
+	clonable_ptr(const T &obj) : member_ptr<T>(obj.Clone()) {}
+	clonable_ptr(T *p = NULL) : member_ptr<T>(p) {}
+	clonable_ptr(const clonable_ptr<T>& rhs)
+		: member_ptr<T>(rhs.m_p ? rhs.m_p->Clone() : NULL) {}
+
+	clonable_ptr<T>& operator=(const clonable_ptr<T>& rhs);
+};
+
+template <class T> clonable_ptr<T>& clonable_ptr<T>::operator=(const clonable_ptr<T>& rhs)
+{
+	T *old_p = this->m_p;
+	this->m_p = rhs.m_p ? rhs.m_p->Clone() : NULL;
+	delete old_p;
+	return *this;
+}
+
+// ********************************************************
+
+template<class T> class counted_ptr
+{
+public:
+	explicit counted_ptr(T *p = 0);
+	counted_ptr(const T &r) : m_p(0) {attach(r);}
+	counted_ptr(const counted_ptr<T>& rhs);
+
+	~counted_ptr();
+
+	const T& operator*() const { return *m_p; }
+	T& operator*() { return *m_p; }
+
+	const T* operator->() const { return m_p; }
+	T* operator->() { return get(); }
+
+	const T* get() const { return m_p; }
+	T* get();
+
+	void attach(const T &p);
+
+	counted_ptr<T> & operator=(const counted_ptr<T>& rhs);
+
+private:
+	T *m_p;
+};
+
+template <class T> counted_ptr<T>::counted_ptr(T *p)
+	: m_p(p) 
+{
+	if (m_p)
+		m_p->m_referenceCount = 1;
+}
+
+template <class T> counted_ptr<T>::counted_ptr(const counted_ptr<T>& rhs)
+	: m_p(rhs.m_p)
+{
+	if (m_p)
+		m_p->m_referenceCount++;
+}
+
+template <class T> counted_ptr<T>::~counted_ptr()
+{
+	if (m_p && --m_p->m_referenceCount == 0)
+		delete m_p;
+}
+
+template <class T> void counted_ptr<T>::attach(const T &r)
+{
+	if (m_p && --m_p->m_referenceCount == 0)
+		delete m_p;
+	if (r.m_referenceCount == 0)
+	{
+		m_p = r.clone();
+		m_p->m_referenceCount = 1;
+	}
+	else
+	{
+		m_p = const_cast<T *>(&r);
+		m_p->m_referenceCount++;
+	}
+}
+
+template <class T> T* counted_ptr<T>::get()
+{
+	if (m_p && m_p->m_referenceCount > 1)
+	{
+		T *temp = m_p->clone();
+		m_p->m_referenceCount--;
+		m_p = temp;
+		m_p->m_referenceCount = 1;
+	}
+	return m_p;
+}
+
+template <class T> counted_ptr<T> & counted_ptr<T>::operator=(const counted_ptr<T>& rhs)
+{
+	if (m_p != rhs.m_p)
+	{
+		if (m_p && --m_p->m_referenceCount == 0)
+			delete m_p;
+		m_p = rhs.m_p;
+		if (m_p)
+			m_p->m_referenceCount++;
+	}
+	return *this;
+}
+
+// ********************************************************
+
+template <class T> class vector_member_ptrs
+{
+public:
+	vector_member_ptrs(size_t size=0)
+		: m_size(size), m_ptr(new member_ptr<T>[size]) {}
+	~vector_member_ptrs()
+		{delete [] this->m_ptr;}
+
+	member_ptr<T>& operator[](size_t index)
+		{assert(index<this->m_size); return this->m_ptr[index];}
+	const member_ptr<T>& operator[](size_t index) const
+		{assert(index<this->m_size); return this->m_ptr[index];}
+
+	size_t size() const {return this->m_size;}
+	void resize(size_t newSize)
+	{
+		member_ptr<T> *newPtr = new member_ptr<T>[newSize];
+		for (size_t i=0; i<this->m_size && i<newSize; i++)
+			newPtr[i].reset(this->m_ptr[i].release());
+		delete [] this->m_ptr;
+		this->m_size = newSize;
+		this->m_ptr = newPtr;
+	}
+
+private:
+	vector_member_ptrs(const vector_member_ptrs<T> &c);	// copy not allowed
+	void operator=(const vector_member_ptrs<T> &x);		// assignment not allowed
+
+	size_t m_size;
+	member_ptr<T> *m_ptr;
+};
+
+NAMESPACE_END
+
+#endif
--- a/cryptopp/stdcpp.h
+++ b/cryptopp/stdcpp.h
@ -0,0 +1,27 @@
+#ifndef CRYPTOPP_STDCPP_H
+#define CRYPTOPP_STDCPP_H
+
+#include <stddef.h>
+#include <assert.h>
+#include <limits.h>
+#include <memory>
+#include <string>
+#include <exception>
+#include <typeinfo>
+
+
+#ifdef _MSC_VER
+#include <string.h>	// CodeWarrior doesn't have memory.h
+#include <algorithm>
+#include <map>
+#include <vector>
+
+// re-disable this
+#pragma warning(disable: 4231)
+#endif
+
+#if defined(_MSC_VER) && defined(_CRTAPI1)
+#define CRYPTOPP_MSVCRT6
+#endif
+
+#endif
--- a/main.cpp
+++ b/main.cpp
@ -3,7 +3,7 @@
 // file license.txt or http://www.opensource.org/licenses/mit-license.php.

 #include "headers.h"
-#include "sha.h"
+#include "cryptopp/sha.h"



@ -1369,6 +1369,8 @@ bool CBlock::AcceptBlock()
        return error("AcceptBlock() : rejected by checkpoint lockin at 33333");
    if (pindexPrev->nHeight+1 == 68555 && hash != uint256("0x00000000001e1b4903550a0b96e9a9405c8a95f387162e4944e8d9fbe501cd6a"))
        return error("AcceptBlock() : rejected by checkpoint lockin at 68555");
+    if (pindexPrev->nHeight+1 == 70567 && hash != uint256("0x00000000006a49b14bcf27462068f1264c961f11fa2e0eddd2be0791e1d4124a"))
+        return error("AcceptBlock() : rejected by checkpoint lockin at 70567");

    // Write block to history file
    if (!CheckDiskSpace(::GetSerializeSize(*this, SER_DISK)))
@ -2551,82 +2553,11 @@ using CryptoPP::ByteReverse;
 static const unsigned int pSHA256InitState[8] =
 {0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19};

-static const unsigned int SHA256_K[64] = {
-    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
-    0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
-    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
-    0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
-    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
-    0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
-    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
-    0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
-    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
-    0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
-    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
-    0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
-    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
-    0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
-    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
-    0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
-};
-
-#define blk0(i) (W[i] = dat[i])
-
-#define blk2(i) (W[i&15]+=s1(W[(i-2)&15])+W[(i-7)&15]+s0(W[(i-15)&15]))
-
-#define Ch(x,y,z) (z^(x&(y^z)))
-#define Maj(x,y,z) ((x&y)|(z&(x|y)))
-
-#define a(i) T[(0-i)&7]
-#define b(i) T[(1-i)&7]
-#define c(i) T[(2-i)&7]
-#define d(i) T[(3-i)&7]
-#define e(i) T[(4-i)&7]
-#define f(i) T[(5-i)&7]
-#define g(i) T[(6-i)&7]
-#define h(i) T[(7-i)&7]
-
-#define R(i,j) h(i)+=S1(e(i))+Ch(e(i),f(i),g(i))+SHA256_K[i+j]+(j?blk2(i):blk0(i));\
-                                         d(i)+=h(i);h(i)+=S0(a(i))+Maj(a(i),b(i),c(i))
-
-#define rotrFixed(x,y) ((x>>y) | (x<<(sizeof(unsigned int)*8-y)))
-
-// for SHA256
-#define S0(x) (rotrFixed(x,2)^rotrFixed(x,13)^rotrFixed(x,22))
-#define S1(x) (rotrFixed(x,6)^rotrFixed(x,11)^rotrFixed(x,25))
-#define s0(x) (rotrFixed(x,7)^rotrFixed(x,18)^(x>>3))
-#define s1(x) (rotrFixed(x,17)^rotrFixed(x,19)^(x>>10))
-
-#if 1
-inline void SHA256Transform(void* pout, const void* pin, const void* pinit)
-{
-    memcpy(pout, pinit, 32);
-    unsigned int* dat = (unsigned int*)pin;
-    unsigned int* T = (unsigned int*)pout;
-    unsigned int* initstate = (unsigned int*)pinit;
-    unsigned int W[16];
-
-    R( 0,  0); R( 1,  0); R( 2,  0); R( 3,  0); R( 4,  0); R( 5,  0); R( 6,  0); R( 7,  0); R( 8,  0); R( 9,  0); R(10,  0); R(11,  0); R(12,  0); R(13,  0); R(14,  0); R(15,  0);
-    R( 0, 16); R( 1, 16); R( 2, 16); R( 3, 16); R( 4, 16); R( 5, 16); R( 6, 16); R( 7, 16); R( 8, 16); R( 9, 16); R(10, 16); R(11, 16); R(12, 16); R(13, 16); R(14, 16); R(15, 16);
-    R( 0, 32); R( 1, 32); R( 2, 32); R( 3, 32); R( 4, 32); R( 5, 32); R( 6, 32); R( 7, 32); R( 8, 32); R( 9, 32); R(10, 32); R(11, 32); R(12, 32); R(13, 32); R(14, 32); R(15, 32);
-    R( 0, 48); R( 1, 48); R( 2, 48); R( 3, 48); R( 4, 48); R( 5, 48); R( 6, 48); R( 7, 48); R( 8, 48); R( 9, 48); R(10, 48); R(11, 48); R(12, 48); R(13, 48); R(14, 48); R(15, 48);
-
-    T[0] += initstate[0];
-    T[1] += initstate[1];
-    T[2] += initstate[2];
-    T[3] += initstate[3];
-    T[4] += initstate[4];
-    T[5] += initstate[5];
-    T[6] += initstate[6];
-    T[7] += initstate[7];
-}
-#else
 inline void SHA256Transform(void* pstate, void* pinput, const void* pinit)
 {
    memcpy(pstate, pinit, 32);
    CryptoPP::SHA256::Transform((CryptoPP::word32*)pstate, (CryptoPP::word32*)pinput);
 }
-#endif



@ -2645,7 +2576,7 @@ void BitcoinMiner()
        Sleep(50);
        if (fShutdown)
            return;
-        while (vNodes.empty())
+        while (vNodes.empty() || IsInitialBlockDownload())
        {
            Sleep(1000);
            if (fShutdown)
@ -2728,7 +2659,7 @@ void BitcoinMiner()
        //
        // Prebuild hash buffer
        //
-        struct unnamed1
+        struct tmpworkspace
        {
            struct unnamed2
            {
@ -2743,8 +2674,9 @@ void BitcoinMiner()
            unsigned char pchPadding0[64];
            uint256 hash1;
            unsigned char pchPadding1[64];
-        }
-        tmp;
+        };
+        char tmpbuf[sizeof(tmpworkspace)+16];
+        tmpworkspace& tmp = *(tmpworkspace*)alignup<16>(tmpbuf);

        tmp.block.nVersion       = pblock->nVersion;
        tmp.block.hashPrevBlock  = pblock->hashPrevBlock  = (pindexPrev ? pindexPrev->GetBlockHash() : 0);
@ -2761,7 +2693,8 @@ void BitcoinMiner()
            ((unsigned int*)&tmp)[i] = ByteReverse(((unsigned int*)&tmp)[i]);

        // Precalc the first half of the first hash, which stays constant
-        uint256 midstate;
+        uint256 midstatebuf[2];
+        uint256& midstate = *alignup<16>(midstatebuf);
        SHA256Transform(&midstate, &tmp.block, pSHA256InitState);


@ -2770,7 +2703,8 @@ void BitcoinMiner()
        //
        int64 nStart = GetTime();
        uint256 hashTarget = CBigNum().SetCompact(pblock->nBits).getuint256();
-        uint256 hash;
+        uint256 hashbuf[2];
+        uint256& hash = *alignup<16>(hashbuf);
        loop
        {
            SHA256Transform(&tmp.hash1, (char*)&tmp.block + 64, &midstate);
@ -2868,25 +2802,7 @@ void BitcoinMiner()
                if (nTransactionsUpdated != nTransactionsUpdatedLast && GetTime() - nStart > 60)
                    break;
                if (pindexPrev != pindexBest)
-                {
-                    // Pause generating during initial download
-                    if (GetTime() - nStart < 20)
-                    {
-                        CBlockIndex* pindexTmp;
-                        do
-                        {
-                            pindexTmp = pindexBest;
-                            for (int i = 0; i < 10; i++)
-                            {
-                                Sleep(1000);
-                                if (fShutdown)
-                                    return;
-                            }
-                        }
-                        while (pindexTmp != pindexBest);
-                    }
                    break;
-                }

                pblock->nTime = max(pindexPrev->GetMedianTimePast()+1, GetAdjustedTime());
                tmp.block.nTime = ByteReverse(pblock->nTime);
--- a/makefile.mingw
+++ b/makefile.mingw
@ -32,20 +32,7 @@ DEFS=-DWIN32 -D__WXMSW__ -D_WINDOWS -DNOPCH
 DEBUGFLAGS=-g -D__WXDEBUG__
 CFLAGS=-mthreads -O2 -w -Wno-invalid-offsetof -Wformat $(DEBUGFLAGS) $(DEFS) $(INCLUDEPATHS)
 HEADERS=headers.h strlcpy.h serialize.h uint256.h util.h key.h bignum.h base58.h \
-    script.h db.h net.h irc.h main.h rpc.h uibase.h ui.h noui.h init.h sha.h
-
-
-all: bitcoin.exe
-
-
-obj/%.o: %.cpp $(HEADERS)
-	g++ -c $(CFLAGS) -DGUI -o $@ $<
-
-obj/sha.o: sha.cpp sha.h
-	g++ -c $(CFLAGS) -O3 -o $@ $<
-
-obj/ui_res.o: ui.rc  rc/bitcoin.ico rc/check.ico rc/send16.bmp rc/send16mask.bmp rc/send16masknoshadow.bmp rc/send20.bmp rc/send20mask.bmp rc/addressbook16.bmp rc/addressbook16mask.bmp rc/addressbook20.bmp rc/addressbook20mask.bmp
-	windres $(DEFS) $(INCLUDEPATHS) -o $@ -i $<
+    script.h db.h net.h irc.h main.h rpc.h uibase.h ui.h noui.h init.h

 OBJS= \
    obj/util.o \
@ -55,20 +42,36 @@ OBJS= \
    obj/irc.o \
    obj/main.o \
    obj/rpc.o \
-    obj/init.o
+    obj/init.o \
+    cryptopp/obj/sha.o \
+    cryptopp/obj/cpu.o

-bitcoin.exe: $(OBJS) obj/ui.o obj/uibase.o obj/sha.o obj/ui_res.o
+
+all: bitcoin.exe
+
+
+obj/%.o: %.cpp $(HEADERS)
+	g++ -c $(CFLAGS) -DGUI -o $@ $<
+
+cryptopp/obj/%.o: cryptopp/%.cpp
+	g++ -c $(CFLAGS) -O3 -DCRYPTOPP_X86_ASM_AVAILABLE -o $@ $<
+
+obj/ui_res.o: ui.rc  rc/bitcoin.ico rc/check.ico rc/send16.bmp rc/send16mask.bmp rc/send16masknoshadow.bmp rc/send20.bmp rc/send20mask.bmp rc/addressbook16.bmp rc/addressbook16mask.bmp rc/addressbook20.bmp rc/addressbook20mask.bmp
+	windres $(DEFS) $(INCLUDEPATHS) -o $@ -i $<
+
+bitcoin.exe: $(OBJS) obj/ui.o obj/uibase.o obj/ui_res.o
 	g++ $(CFLAGS) -mwindows -Wl,--subsystem,windows -o $@ $(LIBPATHS) $^ $(WXLIBS) $(LIBS)


 obj/nogui/%.o: %.cpp $(HEADERS)
 	g++ -c $(CFLAGS) -o $@ $<

-bitcoind.exe: $(OBJS:obj/%=obj/nogui/%) obj/sha.o obj/ui_res.o
+bitcoind.exe: $(OBJS:obj/%=obj/nogui/%) obj/ui_res.o
 	g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(LIBS)


 clean:
 	-del /Q obj\*
 	-del /Q obj\nogui\*
+	-del /Q cryptopp\obj\*
 	-del /Q headers.h.gch
--- a/makefile.osx
+++ b/makefile.osx
@ -29,17 +29,7 @@ DEBUGFLAGS=-g -DwxDEBUG_LEVEL=0
 # ppc doesn't work because we don't support big-endian
 CFLAGS=-mmacosx-version-min=10.5 -arch i386 -arch x86_64 -O2 -Wno-invalid-offsetof -Wformat $(DEBUGFLAGS) $(DEFS) $(INCLUDEPATHS)
 HEADERS=headers.h strlcpy.h serialize.h uint256.h util.h key.h bignum.h base58.h \
-    script.h db.h net.h irc.h main.h rpc.h uibase.h ui.h noui.h init.h sha.h
-
-
-all: bitcoin
-
-
-obj/%.o: %.cpp $(HEADERS)
-	g++ -c $(CFLAGS) -DGUI -o $@ $<
-
-obj/sha.o: sha.cpp sha.h
-	g++ -c $(CFLAGS) -O3 -o $@ $<
+    script.h db.h net.h irc.h main.h rpc.h uibase.h ui.h noui.h init.h

 OBJS= \
    obj/util.o \
@ -49,16 +39,28 @@ OBJS= \
    obj/irc.o \
    obj/main.o \
    obj/rpc.o \
-    obj/init.o
+    obj/init.o \
+    cryptopp/obj/sha.o \
+    cryptopp/obj/cpu.o
 	
-bitcoin: $(OBJS) obj/ui.o obj/uibase.o obj/sha.o
+
+all: bitcoin
+
+
+obj/%.o: %.cpp $(HEADERS)
+	g++ -c $(CFLAGS) -DGUI -o $@ $<
+
+cryptopp/obj/%.o: cryptopp/%.cpp
+	g++ -c $(CFLAGS) -O3 -o $@ $<
+
+bitcoin: $(OBJS) obj/ui.o obj/uibase.o
 	g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(WXLIBS) $(LIBS)


 obj/nogui/%.o: %.cpp $(HEADERS)
 	g++ -c $(CFLAGS) -o $@ $<

-bitcoind: $(OBJS:obj/%=obj/nogui/%) obj/sha.o
+bitcoind: $(OBJS:obj/%=obj/nogui/%)
 	g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(LIBS)


--- a/makefile.unix
+++ b/makefile.unix
@ -33,7 +33,19 @@ DEFS=-D__WXGTK__ -DNOPCH
 DEBUGFLAGS=-g -D__WXDEBUG__
 CFLAGS=-O2 -Wno-invalid-offsetof -Wformat $(DEBUGFLAGS) $(DEFS) $(INCLUDEPATHS)
 HEADERS=headers.h strlcpy.h serialize.h uint256.h util.h key.h bignum.h base58.h \
-    script.h db.h net.h irc.h main.h rpc.h uibase.h ui.h noui.h init.h sha.h
+    script.h db.h net.h irc.h main.h rpc.h uibase.h ui.h noui.h init.h
+
+OBJS= \
+    obj/util.o \
+    obj/script.o \
+    obj/db.o \
+    obj/net.o \
+    obj/irc.o \
+    obj/main.o \
+    obj/rpc.o \
+    obj/init.o \
+    cryptopp/obj/sha.o \
+    cryptopp/obj/cpu.o


 all: bitcoin
@ -45,31 +57,22 @@ headers.h.gch: headers.h $(HEADERS)
 obj/%.o: %.cpp $(HEADERS) headers.h.gch
 	g++ -c $(CFLAGS) -DGUI -o $@ $<

-obj/sha.o: sha.cpp sha.h
+cryptopp/obj/%.o: cryptopp/%.cpp
 	g++ -c $(CFLAGS) -O3 -o $@ $<

-OBJS= \
-    obj/util.o \
-    obj/script.o \
-    obj/db.o \
-    obj/net.o \
-    obj/irc.o \
-    obj/main.o \
-    obj/rpc.o \
-    obj/init.o
-
-bitcoin: $(OBJS) obj/ui.o obj/uibase.o obj/sha.o
+bitcoin: $(OBJS) obj/ui.o obj/uibase.o
 	g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(WXLIBS) $(LIBS)


 obj/nogui/%.o: %.cpp $(HEADERS)
 	g++ -c $(CFLAGS) -o $@ $<

-bitcoind: $(OBJS:obj/%=obj/nogui/%) obj/sha.o
+bitcoind: $(OBJS:obj/%=obj/nogui/%)
 	g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(LIBS)


 clean:
 	-rm -f obj/*.o
 	-rm -f obj/nogui/*.o
+	-rm -f cryptopp/obj/*.o
 	-rm -f headers.h.gch
--- a/makefile.vc
+++ b/makefile.vc
@ -28,17 +28,29 @@ LIBS= \
  kernel32.lib user32.lib gdi32.lib comdlg32.lib winspool.lib winmm.lib shell32.lib comctl32.lib ole32.lib oleaut32.lib uuid.lib rpcrt4.lib advapi32.lib ws2_32.lib shlwapi.lib

 DEFS=/DWIN32 /D__WXMSW__ /D_WINDOWS /DNOPCH
-DEBUGFLAGS=/Zi /Od /D__WXDEBUG__
+DEBUGFLAGS=/Zi /D__WXDEBUG__
 CFLAGS=/c /nologo /MDd /EHsc /GR /Zm300 $(DEBUGFLAGS) $(DEFS) $(INCLUDEPATHS)
 HEADERS=headers.h strlcpy.h serialize.h uint256.h util.h key.h bignum.h base58.h \
-    script.h db.h net.h irc.h main.h rpc.h uibase.h ui.h noui.h init.h sha.h
+    script.h db.h net.h irc.h main.h rpc.h uibase.h ui.h noui.h init.h
+
+OBJS= \
+    obj\util.obj \
+    obj\script.obj \
+    obj\db.obj \
+    obj\net.obj \
+    obj\irc.obj \
+    obj\main.obj \
+    obj\rpc.obj \
+    obj\init.obj \
+    cryptopp\obj\sha.obj \
+    cryptopp\obj\cpu.obj


 all: bitcoin.exe


 .cpp{obj}.obj:
-	cl $(CFLAGS) /DGUI /Fo$@ %s
+    cl $(CFLAGS) /DGUI /Fo$@ %s

 obj\util.obj: $(HEADERS)

@ -60,28 +72,21 @@ obj\ui.obj: $(HEADERS)

 obj\uibase.obj: $(HEADERS)

-obj\sha.obj: sha.cpp sha.h
-	cl $(CFLAGS) /O2 /Fo$@ %s
+cryptopp\obj\sha.obj: cryptopp\sha.cpp
+    cl $(CFLAGS) /O2 /Fo$@ %s
+
+cryptopp\obj\cpu.obj: cryptopp\cpu.cpp
+    cl $(CFLAGS) /O2 /Fo$@ %s

 obj\ui.res: ui.rc  rc/bitcoin.ico rc/check.ico rc/send16.bmp rc/send16mask.bmp rc/send16masknoshadow.bmp rc/send20.bmp rc/send20mask.bmp rc/addressbook16.bmp rc/addressbook16mask.bmp rc/addressbook20.bmp rc/addressbook20mask.bmp
-	rc $(INCLUDEPATHS) $(DEFS) /Fo$@ %s
+    rc $(INCLUDEPATHS) $(DEFS) /Fo$@ %s

-OBJS= \
-    obj\util.obj \
-    obj\script.obj \
-    obj\db.obj \
-    obj\net.obj \
-    obj\irc.obj \
-    obj\main.obj \
-    obj\rpc.obj \
-    obj\init.obj
-
-bitcoin.exe: $(OBJS) obj\ui.obj obj\uibase.obj obj\sha.obj obj\ui.res
+bitcoin.exe: $(OBJS) obj\ui.obj obj\uibase.obj obj\ui.res
    link /nologo /DEBUG /SUBSYSTEM:WINDOWS /OUT:$@ $(LIBPATHS) $** $(WXLIBS) $(LIBS)


 .cpp{obj\nogui}.obj:
-	cl $(CFLAGS) /Fo$@ %s
+    cl $(CFLAGS) /Fo$@ %s

 obj\nogui\util.obj: $(HEADERS)

@ -99,11 +104,13 @@ obj\nogui\rpc.obj: $(HEADERS)

 obj\nogui\init.obj: $(HEADERS)

-bitcoind.exe: $(OBJS:obj\=obj\nogui\) obj\sha.obj obj\ui.res
-	link /nologo /DEBUG /OUT:$@ $(LIBPATHS) $** $(LIBS)
+bitcoind.exe: $(OBJS:obj\=obj\nogui\) obj\ui.res
+    link /nologo /DEBUG /OUT:$@ $(LIBPATHS) $** $(LIBS)


 clean:
-	-del /Q obj\*
-	-del *.ilk
-	-del *.pdb
+    -del /Q obj\*
+    -del /Q obj\nogui\*
+    -del /Q cryptopp\obj\*
+    -del /Q *.ilk
+    -del /Q *.pdb
--- a/obj/.gitignore
+++ b/obj/.gitignore
@ -0,0 +1,2 @@
+*
+!.gitignore
--- a/obj/nogui/.gitignore
+++ b/obj/nogui/.gitignore
@ -0,0 +1,2 @@
+*
+!.gitignore
--- a/sha.cpp
+++ b/sha.cpp
@ -1,554 +0,0 @@
-// This file is public domain
-// SHA routines extracted as a standalone file from:
-// Crypto++: a C++ Class Library of Cryptographic Schemes
-// Version 5.5.2 (9/24/2007)
-// http://www.cryptopp.com
-
-// sha.cpp - modified by Wei Dai from Steve Reid's public domain sha1.c
-
-// Steve Reid implemented SHA-1. Wei Dai implemented SHA-2.
-// Both are in the public domain.
-
-#include <assert.h>
-#include <memory.h>
-#include "sha.h"
-
-namespace CryptoPP
-{
-
-// start of Steve Reid's code
-
-#define blk0(i) (W[i] = data[i])
-#define blk1(i) (W[i&15] = rotlFixed(W[(i+13)&15]^W[(i+8)&15]^W[(i+2)&15]^W[i&15],1))
-
-void SHA1::InitState(HashWordType *state)
-{
-    state[0] = 0x67452301L;
-    state[1] = 0xEFCDAB89L;
-    state[2] = 0x98BADCFEL;
-    state[3] = 0x10325476L;
-    state[4] = 0xC3D2E1F0L;
-}
-
-#define f1(x,y,z) (z^(x&(y^z)))
-#define f2(x,y,z) (x^y^z)
-#define f3(x,y,z) ((x&y)|(z&(x|y)))
-#define f4(x,y,z) (x^y^z)
-
-/* (R0+R1), R2, R3, R4 are the different operations used in SHA1 */
-#define R0(v,w,x,y,z,i) z+=f1(w,x,y)+blk0(i)+0x5A827999+rotlFixed(v,5);w=rotlFixed(w,30);
-#define R1(v,w,x,y,z,i) z+=f1(w,x,y)+blk1(i)+0x5A827999+rotlFixed(v,5);w=rotlFixed(w,30);
-#define R2(v,w,x,y,z,i) z+=f2(w,x,y)+blk1(i)+0x6ED9EBA1+rotlFixed(v,5);w=rotlFixed(w,30);
-#define R3(v,w,x,y,z,i) z+=f3(w,x,y)+blk1(i)+0x8F1BBCDC+rotlFixed(v,5);w=rotlFixed(w,30);
-#define R4(v,w,x,y,z,i) z+=f4(w,x,y)+blk1(i)+0xCA62C1D6+rotlFixed(v,5);w=rotlFixed(w,30);
-
-void SHA1::Transform(word32 *state, const word32 *data)
-{
-    word32 W[16];
-    /* Copy context->state[] to working vars */
-    word32 a = state[0];
-    word32 b = state[1];
-    word32 c = state[2];
-    word32 d = state[3];
-    word32 e = state[4];
-    /* 4 rounds of 20 operations each. Loop unrolled. */
-    R0(a,b,c,d,e, 0); R0(e,a,b,c,d, 1); R0(d,e,a,b,c, 2); R0(c,d,e,a,b, 3);
-    R0(b,c,d,e,a, 4); R0(a,b,c,d,e, 5); R0(e,a,b,c,d, 6); R0(d,e,a,b,c, 7);
-    R0(c,d,e,a,b, 8); R0(b,c,d,e,a, 9); R0(a,b,c,d,e,10); R0(e,a,b,c,d,11);
-    R0(d,e,a,b,c,12); R0(c,d,e,a,b,13); R0(b,c,d,e,a,14); R0(a,b,c,d,e,15);
-    R1(e,a,b,c,d,16); R1(d,e,a,b,c,17); R1(c,d,e,a,b,18); R1(b,c,d,e,a,19);
-    R2(a,b,c,d,e,20); R2(e,a,b,c,d,21); R2(d,e,a,b,c,22); R2(c,d,e,a,b,23);
-    R2(b,c,d,e,a,24); R2(a,b,c,d,e,25); R2(e,a,b,c,d,26); R2(d,e,a,b,c,27);
-    R2(c,d,e,a,b,28); R2(b,c,d,e,a,29); R2(a,b,c,d,e,30); R2(e,a,b,c,d,31);
-    R2(d,e,a,b,c,32); R2(c,d,e,a,b,33); R2(b,c,d,e,a,34); R2(a,b,c,d,e,35);
-    R2(e,a,b,c,d,36); R2(d,e,a,b,c,37); R2(c,d,e,a,b,38); R2(b,c,d,e,a,39);
-    R3(a,b,c,d,e,40); R3(e,a,b,c,d,41); R3(d,e,a,b,c,42); R3(c,d,e,a,b,43);
-    R3(b,c,d,e,a,44); R3(a,b,c,d,e,45); R3(e,a,b,c,d,46); R3(d,e,a,b,c,47);
-    R3(c,d,e,a,b,48); R3(b,c,d,e,a,49); R3(a,b,c,d,e,50); R3(e,a,b,c,d,51);
-    R3(d,e,a,b,c,52); R3(c,d,e,a,b,53); R3(b,c,d,e,a,54); R3(a,b,c,d,e,55);
-    R3(e,a,b,c,d,56); R3(d,e,a,b,c,57); R3(c,d,e,a,b,58); R3(b,c,d,e,a,59);
-    R4(a,b,c,d,e,60); R4(e,a,b,c,d,61); R4(d,e,a,b,c,62); R4(c,d,e,a,b,63);
-    R4(b,c,d,e,a,64); R4(a,b,c,d,e,65); R4(e,a,b,c,d,66); R4(d,e,a,b,c,67);
-    R4(c,d,e,a,b,68); R4(b,c,d,e,a,69); R4(a,b,c,d,e,70); R4(e,a,b,c,d,71);
-    R4(d,e,a,b,c,72); R4(c,d,e,a,b,73); R4(b,c,d,e,a,74); R4(a,b,c,d,e,75);
-    R4(e,a,b,c,d,76); R4(d,e,a,b,c,77); R4(c,d,e,a,b,78); R4(b,c,d,e,a,79);
-    /* Add the working vars back into context.state[] */
-    state[0] += a;
-    state[1] += b;
-    state[2] += c;
-    state[3] += d;
-    state[4] += e;
-}
-
-// end of Steve Reid's code
-
-// *************************************************************
-
-void SHA224::InitState(HashWordType *state)
-{
-    static const word32 s[8] = {0xc1059ed8, 0x367cd507, 0x3070dd17, 0xf70e5939, 0xffc00b31, 0x68581511, 0x64f98fa7, 0xbefa4fa4};
-    memcpy(state, s, sizeof(s));
-}
-
-void SHA256::InitState(HashWordType *state)
-{
-    static const word32 s[8] = {0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19};
-    memcpy(state, s, sizeof(s));
-}
-
-static const word32 SHA256_K[64] = {
-    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
-    0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
-    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
-    0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
-    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
-    0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
-    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
-    0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
-    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
-    0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
-    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
-    0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
-    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
-    0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
-    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
-    0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
-};
-
-#define blk2(i) (W[i&15]+=s1(W[(i-2)&15])+W[(i-7)&15]+s0(W[(i-15)&15]))
-
-#define Ch(x,y,z) (z^(x&(y^z)))
-#define Maj(x,y,z) ((x&y)|(z&(x|y)))
-
-#define a(i) T[(0-i)&7]
-#define b(i) T[(1-i)&7]
-#define c(i) T[(2-i)&7]
-#define d(i) T[(3-i)&7]
-#define e(i) T[(4-i)&7]
-#define f(i) T[(5-i)&7]
-#define g(i) T[(6-i)&7]
-#define h(i) T[(7-i)&7]
-
-#define R(i) h(i)+=S1(e(i))+Ch(e(i),f(i),g(i))+SHA256_K[i+j]+(j?blk2(i):blk0(i));\
-    d(i)+=h(i);h(i)+=S0(a(i))+Maj(a(i),b(i),c(i))
-
-// for SHA256
-#define S0(x) (rotrFixed(x,2)^rotrFixed(x,13)^rotrFixed(x,22))
-#define S1(x) (rotrFixed(x,6)^rotrFixed(x,11)^rotrFixed(x,25))
-#define s0(x) (rotrFixed(x,7)^rotrFixed(x,18)^(x>>3))
-#define s1(x) (rotrFixed(x,17)^rotrFixed(x,19)^(x>>10))
-
-void SHA256::Transform(word32 *state, const word32 *data)
-{
-    word32 W[16];
-    word32 T[8];
-    /* Copy context->state[] to working vars */
-    memcpy(T, state, sizeof(T));
-    /* 64 operations, partially loop unrolled */
-    for (unsigned int j=0; j<64; j+=16)
-    {
-        R( 0); R( 1); R( 2); R( 3);
-        R( 4); R( 5); R( 6); R( 7);
-        R( 8); R( 9); R(10); R(11);
-        R(12); R(13); R(14); R(15);
-    }
-    /* Add the working vars back into context.state[] */
-    state[0] += a(0);
-    state[1] += b(0);
-    state[2] += c(0);
-    state[3] += d(0);
-    state[4] += e(0);
-    state[5] += f(0);
-    state[6] += g(0);
-    state[7] += h(0);
-}
-
-/*
-// smaller but slower
-void SHA256_Transform(word32 *state, const word32 *data)
-{
-    word32 T[20];
-    word32 W[32];
-    unsigned int i = 0, j = 0;
-    word32 *t = T+8;
-
-    memcpy(t, state, 8*4);
-    word32 e = t[4], a = t[0];
-
-    do
-    {
-        word32 w = data[j];
-        W[j] = w;
-        w += K[j];
-        w += t[7];
-        w += S1(e);
-        w += Ch(e, t[5], t[6]);
-        e = t[3] + w;
-        t[3] = t[3+8] = e;
-        w += S0(t[0]);
-        a = w + Maj(a, t[1], t[2]);
-        t[-1] = t[7] = a;
-        --t;
-        ++j;
-        if (j%8 == 0)
-            t += 8;
-    } while (j<16);
-
-    do
-    {
-        i = j&0xf;
-        word32 w = s1(W[i+16-2]) + s0(W[i+16-15]) + W[i] + W[i+16-7];
-        W[i+16] = W[i] = w;
-        w += K[j];
-        w += t[7];
-        w += S1(e);
-        w += Ch(e, t[5], t[6]);
-        e = t[3] + w;
-        t[3] = t[3+8] = e;
-        w += S0(t[0]);
-        a = w + Maj(a, t[1], t[2]);
-        t[-1] = t[7] = a;
-
-        w = s1(W[(i+1)+16-2]) + s0(W[(i+1)+16-15]) + W[(i+1)] + W[(i+1)+16-7];
-        W[(i+1)+16] = W[(i+1)] = w;
-        w += K[j+1];
-        w += (t-1)[7];
-        w += S1(e);
-        w += Ch(e, (t-1)[5], (t-1)[6]);
-        e = (t-1)[3] + w;
-        (t-1)[3] = (t-1)[3+8] = e;
-        w += S0((t-1)[0]);
-        a = w + Maj(a, (t-1)[1], (t-1)[2]);
-        (t-1)[-1] = (t-1)[7] = a;
-
-        t-=2;
-        j+=2;
-        if (j%8 == 0)
-            t += 8;
-    } while (j<64);
-
-    state[0] += a;
-    state[1] += t[1];
-    state[2] += t[2];
-    state[3] += t[3];
-    state[4] += e;
-    state[5] += t[5];
-    state[6] += t[6];
-    state[7] += t[7];
-}
-*/
-
-#undef S0
-#undef S1
-#undef s0
-#undef s1
-#undef R
-
-// *************************************************************
-
-#ifdef WORD64_AVAILABLE
-
-void SHA384::InitState(HashWordType *state)
-{
-    static const word64 s[8] = {
-        W64LIT(0xcbbb9d5dc1059ed8), W64LIT(0x629a292a367cd507),
-        W64LIT(0x9159015a3070dd17), W64LIT(0x152fecd8f70e5939),
-        W64LIT(0x67332667ffc00b31), W64LIT(0x8eb44a8768581511),
-        W64LIT(0xdb0c2e0d64f98fa7), W64LIT(0x47b5481dbefa4fa4)};
-    memcpy(state, s, sizeof(s));
-}
-
-void SHA512::InitState(HashWordType *state)
-{
-    static const word64 s[8] = {
-        W64LIT(0x6a09e667f3bcc908), W64LIT(0xbb67ae8584caa73b),
-        W64LIT(0x3c6ef372fe94f82b), W64LIT(0xa54ff53a5f1d36f1),
-        W64LIT(0x510e527fade682d1), W64LIT(0x9b05688c2b3e6c1f),
-        W64LIT(0x1f83d9abfb41bd6b), W64LIT(0x5be0cd19137e2179)};
-    memcpy(state, s, sizeof(s));
-}
-
-CRYPTOPP_ALIGN_DATA(16) static const word64 SHA512_K[80] CRYPTOPP_SECTION_ALIGN16 = {
-    W64LIT(0x428a2f98d728ae22), W64LIT(0x7137449123ef65cd),
-    W64LIT(0xb5c0fbcfec4d3b2f), W64LIT(0xe9b5dba58189dbbc),
-    W64LIT(0x3956c25bf348b538), W64LIT(0x59f111f1b605d019),
-    W64LIT(0x923f82a4af194f9b), W64LIT(0xab1c5ed5da6d8118),
-    W64LIT(0xd807aa98a3030242), W64LIT(0x12835b0145706fbe),
-    W64LIT(0x243185be4ee4b28c), W64LIT(0x550c7dc3d5ffb4e2),
-    W64LIT(0x72be5d74f27b896f), W64LIT(0x80deb1fe3b1696b1),
-    W64LIT(0x9bdc06a725c71235), W64LIT(0xc19bf174cf692694),
-    W64LIT(0xe49b69c19ef14ad2), W64LIT(0xefbe4786384f25e3),
-    W64LIT(0x0fc19dc68b8cd5b5), W64LIT(0x240ca1cc77ac9c65),
-    W64LIT(0x2de92c6f592b0275), W64LIT(0x4a7484aa6ea6e483),
-    W64LIT(0x5cb0a9dcbd41fbd4), W64LIT(0x76f988da831153b5),
-    W64LIT(0x983e5152ee66dfab), W64LIT(0xa831c66d2db43210),
-    W64LIT(0xb00327c898fb213f), W64LIT(0xbf597fc7beef0ee4),
-    W64LIT(0xc6e00bf33da88fc2), W64LIT(0xd5a79147930aa725),
-    W64LIT(0x06ca6351e003826f), W64LIT(0x142929670a0e6e70),
-    W64LIT(0x27b70a8546d22ffc), W64LIT(0x2e1b21385c26c926),
-    W64LIT(0x4d2c6dfc5ac42aed), W64LIT(0x53380d139d95b3df),
-    W64LIT(0x650a73548baf63de), W64LIT(0x766a0abb3c77b2a8),
-    W64LIT(0x81c2c92e47edaee6), W64LIT(0x92722c851482353b),
-    W64LIT(0xa2bfe8a14cf10364), W64LIT(0xa81a664bbc423001),
-    W64LIT(0xc24b8b70d0f89791), W64LIT(0xc76c51a30654be30),
-    W64LIT(0xd192e819d6ef5218), W64LIT(0xd69906245565a910),
-    W64LIT(0xf40e35855771202a), W64LIT(0x106aa07032bbd1b8),
-    W64LIT(0x19a4c116b8d2d0c8), W64LIT(0x1e376c085141ab53),
-    W64LIT(0x2748774cdf8eeb99), W64LIT(0x34b0bcb5e19b48a8),
-    W64LIT(0x391c0cb3c5c95a63), W64LIT(0x4ed8aa4ae3418acb),
-    W64LIT(0x5b9cca4f7763e373), W64LIT(0x682e6ff3d6b2b8a3),
-    W64LIT(0x748f82ee5defb2fc), W64LIT(0x78a5636f43172f60),
-    W64LIT(0x84c87814a1f0ab72), W64LIT(0x8cc702081a6439ec),
-    W64LIT(0x90befffa23631e28), W64LIT(0xa4506cebde82bde9),
-    W64LIT(0xbef9a3f7b2c67915), W64LIT(0xc67178f2e372532b),
-    W64LIT(0xca273eceea26619c), W64LIT(0xd186b8c721c0c207),
-    W64LIT(0xeada7dd6cde0eb1e), W64LIT(0xf57d4f7fee6ed178),
-    W64LIT(0x06f067aa72176fba), W64LIT(0x0a637dc5a2c898a6),
-    W64LIT(0x113f9804bef90dae), W64LIT(0x1b710b35131c471b),
-    W64LIT(0x28db77f523047d84), W64LIT(0x32caab7b40c72493),
-    W64LIT(0x3c9ebe0a15c9bebc), W64LIT(0x431d67c49c100d4c),
-    W64LIT(0x4cc5d4becb3e42b6), W64LIT(0x597f299cfc657e2a),
-    W64LIT(0x5fcb6fab3ad6faec), W64LIT(0x6c44198c4a475817)
-};
-
-#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE && CRYPTOPP_BOOL_X86
-// put assembly version in separate function, otherwise MSVC 2005 SP1 doesn't generate correct code for the non-assembly version
-CRYPTOPP_NAKED static void CRYPTOPP_FASTCALL SHA512_SSE2_Transform(word64 *state, const word64 *data)
-{
-#ifdef __GNUC__
-    __asm__ __volatile__
-    (
-        ".intel_syntax noprefix;"
-    AS1(    push    ebx)
-    AS2(    mov     ebx, eax)
-#else
-    AS1(    push    ebx)
-    AS1(    push    esi)
-    AS1(    push    edi)
-    AS2(    lea     ebx, SHA512_K)
-#endif
-
-    AS2(    mov     eax, esp)
-    AS2(    and     esp, 0xfffffff0)
-    AS2(    sub     esp, 27*16)             // 17*16 for expanded data, 20*8 for state
-    AS1(    push    eax)
-    AS2(    xor     eax, eax)
-    AS2(    lea     edi, [esp+4+8*8])       // start at middle of state buffer. will decrement pointer each round to avoid copying
-    AS2(    lea     esi, [esp+4+20*8+8])    // 16-byte alignment, then add 8
-
-    AS2(    movq    mm4, [ecx+0*8])
-    AS2(    movq    [edi+0*8], mm4)
-    AS2(    movq    mm0, [ecx+1*8])
-    AS2(    movq    [edi+1*8], mm0)
-    AS2(    movq    mm0, [ecx+2*8])
-    AS2(    movq    [edi+2*8], mm0)
-    AS2(    movq    mm0, [ecx+3*8])
-    AS2(    movq    [edi+3*8], mm0)
-    AS2(    movq    mm5, [ecx+4*8])
-    AS2(    movq    [edi+4*8], mm5)
-    AS2(    movq    mm0, [ecx+5*8])
-    AS2(    movq    [edi+5*8], mm0)
-    AS2(    movq    mm0, [ecx+6*8])
-    AS2(    movq    [edi+6*8], mm0)
-    AS2(    movq    mm0, [ecx+7*8])
-    AS2(    movq    [edi+7*8], mm0)
-    ASJ(    jmp,    0, f)
-
-#define SSE2_S0_S1(r, a, b, c)  \
-    AS2(    movq    mm6, r)\
-    AS2(    psrlq   r, a)\
-    AS2(    movq    mm7, r)\
-    AS2(    psllq   mm6, 64-c)\
-    AS2(    pxor    mm7, mm6)\
-    AS2(    psrlq   r, b-a)\
-    AS2(    pxor    mm7, r)\
-    AS2(    psllq   mm6, c-b)\
-    AS2(    pxor    mm7, mm6)\
-    AS2(    psrlq   r, c-b)\
-    AS2(    pxor    r, mm7)\
-    AS2(    psllq   mm6, b-a)\
-    AS2(    pxor    r, mm6)
-
-#define SSE2_s0(r, a, b, c) \
-    AS2(    movdqa  xmm6, r)\
-    AS2(    psrlq   r, a)\
-    AS2(    movdqa  xmm7, r)\
-    AS2(    psllq   xmm6, 64-c)\
-    AS2(    pxor    xmm7, xmm6)\
-    AS2(    psrlq   r, b-a)\
-    AS2(    pxor    xmm7, r)\
-    AS2(    psrlq   r, c-b)\
-    AS2(    pxor    r, xmm7)\
-    AS2(    psllq   xmm6, c-a)\
-    AS2(    pxor    r, xmm6)
-
-#define SSE2_s1(r, a, b, c) \
-    AS2(    movdqa  xmm6, r)\
-    AS2(    psrlq   r, a)\
-    AS2(    movdqa  xmm7, r)\
-    AS2(    psllq   xmm6, 64-c)\
-    AS2(    pxor    xmm7, xmm6)\
-    AS2(    psrlq   r, b-a)\
-    AS2(    pxor    xmm7, r)\
-    AS2(    psllq   xmm6, c-b)\
-    AS2(    pxor    xmm7, xmm6)\
-    AS2(    psrlq   r, c-b)\
-    AS2(    pxor    r, xmm7)
-
-    ASL(SHA512_Round)
-    // k + w is in mm0, a is in mm4, e is in mm5
-    AS2(    paddq   mm0, [edi+7*8])     // h
-    AS2(    movq    mm2, [edi+5*8])     // f
-    AS2(    movq    mm3, [edi+6*8])     // g
-    AS2(    pxor    mm2, mm3)
-    AS2(    pand    mm2, mm5)
-    SSE2_S0_S1(mm5,14,18,41)
-    AS2(    pxor    mm2, mm3)
-    AS2(    paddq   mm0, mm2)           // h += Ch(e,f,g)
-    AS2(    paddq   mm5, mm0)           // h += S1(e)
-    AS2(    movq    mm2, [edi+1*8])     // b
-    AS2(    movq    mm1, mm2)
-    AS2(    por     mm2, mm4)
-    AS2(    pand    mm2, [edi+2*8])     // c
-    AS2(    pand    mm1, mm4)
-    AS2(    por     mm1, mm2)
-    AS2(    paddq   mm1, mm5)           // temp = h + Maj(a,b,c)
-    AS2(    paddq   mm5, [edi+3*8])     // e = d + h
-    AS2(    movq    [edi+3*8], mm5)
-    AS2(    movq    [edi+11*8], mm5)
-    SSE2_S0_S1(mm4,28,34,39)            // S0(a)
-    AS2(    paddq   mm4, mm1)           // a = temp + S0(a)
-    AS2(    movq    [edi-8], mm4)
-    AS2(    movq    [edi+7*8], mm4)
-    AS1(    ret)
-
-    // first 16 rounds
-    ASL(0)
-    AS2(    movq    mm0, [edx+eax*8])
-    AS2(    movq    [esi+eax*8], mm0)
-    AS2(    movq    [esi+eax*8+16*8], mm0)
-    AS2(    paddq   mm0, [ebx+eax*8])
-    ASC(    call,   SHA512_Round)
-    AS1(    inc     eax)
-    AS2(    sub     edi, 8)
-    AS2(    test    eax, 7)
-    ASJ(    jnz,    0, b)
-    AS2(    add     edi, 8*8)
-    AS2(    cmp     eax, 16)
-    ASJ(    jne,    0, b)
-
-    // rest of the rounds
-    AS2(    movdqu  xmm0, [esi+(16-2)*8])
-    ASL(1)
-    // data expansion, W[i-2] already in xmm0
-    AS2(    movdqu  xmm3, [esi])
-    AS2(    paddq   xmm3, [esi+(16-7)*8])
-    AS2(    movdqa  xmm2, [esi+(16-15)*8])
-    SSE2_s1(xmm0, 6, 19, 61)
-    AS2(    paddq   xmm0, xmm3)
-    SSE2_s0(xmm2, 1, 7, 8)
-    AS2(    paddq   xmm0, xmm2)
-    AS2(    movdq2q mm0, xmm0)
-    AS2(    movhlps xmm1, xmm0)
-    AS2(    paddq   mm0, [ebx+eax*8])
-    AS2(    movlps  [esi], xmm0)
-    AS2(    movlps  [esi+8], xmm1)
-    AS2(    movlps  [esi+8*16], xmm0)
-    AS2(    movlps  [esi+8*17], xmm1)
-    // 2 rounds
-    ASC(    call,   SHA512_Round)
-    AS2(    sub     edi, 8)
-    AS2(    movdq2q mm0, xmm1)
-    AS2(    paddq   mm0, [ebx+eax*8+8])
-    ASC(    call,   SHA512_Round)
-    // update indices and loop
-    AS2(    add     esi, 16)
-    AS2(    add     eax, 2)
-    AS2(    sub     edi, 8)
-    AS2(    test    eax, 7)
-    ASJ(    jnz,    1, b)
-    // do housekeeping every 8 rounds
-    AS2(    mov     esi, 0xf)
-    AS2(    and     esi, eax)
-    AS2(    lea     esi, [esp+4+20*8+8+esi*8])
-    AS2(    add     edi, 8*8)
-    AS2(    cmp     eax, 80)
-    ASJ(    jne,    1, b)
-
-#define SSE2_CombineState(i)    \
-    AS2(    movq    mm0, [edi+i*8])\
-    AS2(    paddq   mm0, [ecx+i*8])\
-    AS2(    movq    [ecx+i*8], mm0)
-
-    SSE2_CombineState(0)
-    SSE2_CombineState(1)
-    SSE2_CombineState(2)
-    SSE2_CombineState(3)
-    SSE2_CombineState(4)
-    SSE2_CombineState(5)
-    SSE2_CombineState(6)
-    SSE2_CombineState(7)
-
-    AS1(    pop     esp)
-    AS1(    emms)
-
-#if defined(__GNUC__)
-    AS1(    pop     ebx)
-    ".att_syntax prefix;"
-        :
-        : "a" (SHA512_K), "c" (state), "d" (data)
-        : "%esi", "%edi", "memory", "cc"
-    );
-#else
-    AS1(    pop     edi)
-    AS1(    pop     esi)
-    AS1(    pop     ebx)
-    AS1(    ret)
-#endif
-}
-#endif  // #if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE
-
-void SHA512::Transform(word64 *state, const word64 *data)
-{
-#if CRYPTOPP_BOOL_SSE2_ASM_AVAILABLE && CRYPTOPP_BOOL_X86
-    if (HasSSE2())
-    {
-        SHA512_SSE2_Transform(state, data);
-        return;
-    }
-#endif
-
-#define S0(x) (rotrFixed(x,28)^rotrFixed(x,34)^rotrFixed(x,39))
-#define S1(x) (rotrFixed(x,14)^rotrFixed(x,18)^rotrFixed(x,41))
-#define s0(x) (rotrFixed(x,1)^rotrFixed(x,8)^(x>>7))
-#define s1(x) (rotrFixed(x,19)^rotrFixed(x,61)^(x>>6))
-
-#define R(i) h(i)+=S1(e(i))+Ch(e(i),f(i),g(i))+SHA512_K[i+j]+(j?blk2(i):blk0(i));\
-    d(i)+=h(i);h(i)+=S0(a(i))+Maj(a(i),b(i),c(i))
-
-    word64 W[16];
-    word64 T[8];
-    /* Copy context->state[] to working vars */
-    memcpy(T, state, sizeof(T));
-    /* 80 operations, partially loop unrolled */
-    for (unsigned int j=0; j<80; j+=16)
-    {
-        R( 0); R( 1); R( 2); R( 3);
-        R( 4); R( 5); R( 6); R( 7);
-        R( 8); R( 9); R(10); R(11);
-        R(12); R(13); R(14); R(15);
-    }
-    /* Add the working vars back into context.state[] */
-    state[0] += a(0);
-    state[1] += b(0);
-    state[2] += c(0);
-    state[3] += d(0);
-    state[4] += e(0);
-    state[5] += f(0);
-    state[6] += g(0);
-    state[7] += h(0);
-}
-
-#endif
-
-}
--- a/sha.h
+++ b/sha.h
@ -1,177 +0,0 @@
-// This file is public domain
-// SHA routines extracted as a standalone file from:
-// Crypto++: a C++ Class Library of Cryptographic Schemes
-// Version 5.5.2 (9/24/2007)
-// http://www.cryptopp.com
-#ifndef CRYPTOPP_SHA_H
-#define CRYPTOPP_SHA_H
-#include <stdlib.h>
-
-namespace CryptoPP
-{
-
-//
-// Dependencies
-//
-
-typedef unsigned char byte;
-typedef unsigned short word16;
-typedef unsigned int word32;
-#if defined(_MSC_VER) || defined(__BORLANDC__)
-typedef unsigned __int64 word64;
-#else
-typedef unsigned long long word64;
-#endif
-
-template <class T> inline T rotlFixed(T x, unsigned int y)
-{
-    assert(y < sizeof(T)*8);
-    return T((x<<y) | (x>>(sizeof(T)*8-y)));
-}
-
-template <class T> inline T rotrFixed(T x, unsigned int y)
-{
-    assert(y < sizeof(T)*8);
-    return T((x>>y) | (x<<(sizeof(T)*8-y)));
-}
-
-// ************** endian reversal ***************
-
-#ifdef _MSC_VER
-    #if _MSC_VER >= 1400
-        #define CRYPTOPP_FAST_ROTATE(x) 1
-    #elif _MSC_VER >= 1300
-        #define CRYPTOPP_FAST_ROTATE(x) ((x) == 32 | (x) == 64)
-    #else
-        #define CRYPTOPP_FAST_ROTATE(x) ((x) == 32)
-    #endif
-#elif (defined(__MWERKS__) && TARGET_CPU_PPC) || \
-    (defined(__GNUC__) && (defined(_ARCH_PWR2) || defined(_ARCH_PWR) || defined(_ARCH_PPC) || defined(_ARCH_PPC64) || defined(_ARCH_COM)))
-    #define CRYPTOPP_FAST_ROTATE(x) ((x) == 32)
-#elif defined(__GNUC__) && (CRYPTOPP_BOOL_X64 || CRYPTOPP_BOOL_X86) // depend on GCC's peephole optimization to generate rotate instructions
-    #define CRYPTOPP_FAST_ROTATE(x) 1
-#else
-    #define CRYPTOPP_FAST_ROTATE(x) 0
-#endif
-
-inline byte ByteReverse(byte value)
-{
-    return value;
-}
-
-inline word16 ByteReverse(word16 value)
-{
-#ifdef CRYPTOPP_BYTESWAP_AVAILABLE
-    return bswap_16(value);
-#elif defined(_MSC_VER) && _MSC_VER >= 1300
-    return _byteswap_ushort(value);
-#else
-    return rotlFixed(value, 8U);
-#endif
-}
-
-inline word32 ByteReverse(word32 value)
-{
-#if defined(__GNUC__)
-    __asm__ ("bswap %0" : "=r" (value) : "0" (value));
-    return value;
-#elif defined(CRYPTOPP_BYTESWAP_AVAILABLE)
-    return bswap_32(value);
-#elif defined(__MWERKS__) && TARGET_CPU_PPC
-    return (word32)__lwbrx(&value,0);
-#elif _MSC_VER >= 1400 || (_MSC_VER >= 1300 && !defined(_DLL))
-    return _byteswap_ulong(value);
-#elif CRYPTOPP_FAST_ROTATE(32)
-    // 5 instructions with rotate instruction, 9 without
-    return (rotrFixed(value, 8U) & 0xff00ff00) | (rotlFixed(value, 8U) & 0x00ff00ff);
-#else
-    // 6 instructions with rotate instruction, 8 without
-    value = ((value & 0xFF00FF00) >> 8) | ((value & 0x00FF00FF) << 8);
-    return rotlFixed(value, 16U);
-#endif
-}
-
-#ifdef WORD64_AVAILABLE
-inline word64 ByteReverse(word64 value)
-{
-#if defined(__GNUC__) && defined(__x86_64__)
-    __asm__ ("bswap %0" : "=r" (value) : "0" (value));
-    return value;
-#elif defined(CRYPTOPP_BYTESWAP_AVAILABLE)
-    return bswap_64(value);
-#elif defined(_MSC_VER) && _MSC_VER >= 1300
-    return _byteswap_uint64(value);
-#elif defined(CRYPTOPP_SLOW_WORD64)
-    return (word64(ByteReverse(word32(value))) << 32) | ByteReverse(word32(value>>32));
-#else
-    value = ((value & W64LIT(0xFF00FF00FF00FF00)) >> 8) | ((value & W64LIT(0x00FF00FF00FF00FF)) << 8);
-    value = ((value & W64LIT(0xFFFF0000FFFF0000)) >> 16) | ((value & W64LIT(0x0000FFFF0000FFFF)) << 16);
-    return rotlFixed(value, 32U);
-#endif
-}
-#endif
-
-
-//
-// SHA
-//
-
-// http://www.weidai.com/scan-mirror/md.html#SHA-1
-class SHA1
-{
-public:
-    typedef word32 HashWordType;
-    static void InitState(word32 *state);
-    static void Transform(word32 *digest, const word32 *data);
-    static const char * StaticAlgorithmName() {return "SHA-1";}
-};
-
-typedef SHA1 SHA;   // for backwards compatibility
-
-// implements the SHA-256 standard
-class SHA256
-{
-public:
-    typedef word32 HashWordType;
-    static void InitState(word32 *state);
-    static void Transform(word32 *digest, const word32 *data);
-    static const char * StaticAlgorithmName() {return "SHA-256";}
-};
-
-// implements the SHA-224 standard
-class SHA224
-{
-public:
-    typedef word32 HashWordType;
-    static void InitState(word32 *state);
-    static void Transform(word32 *digest, const word32 *data) {SHA256::Transform(digest, data);}
-    static const char * StaticAlgorithmName() {return "SHA-224";}
-};
-
-#ifdef WORD64_AVAILABLE
-
-// implements the SHA-512 standard
-class SHA512
-{
-public:
-    typedef word64 HashWordType;
-    static void InitState(word64 *state);
-    static void Transform(word64 *digest, const word64 *data);
-    static const char * StaticAlgorithmName() {return "SHA-512";}
-};
-
-// implements the SHA-384 standard
-class SHA384
-{
-public:
-    typedef word64 HashWordType;
-    static void InitState(word64 *state);
-    static void Transform(word64 *digest, const word64 *data) {SHA512::Transform(digest, data);}
-    static const char * StaticAlgorithmName() {return "SHA-384";}
-};
-
-#endif
-
-}
-
-#endif
--- a/util.h
+++ b/util.h
@ -54,6 +54,20 @@ inline T& REF(const T& val)
    return (T&)val;
 }

+// Align by increasing pointer, must have extra space at end of buffer
+template <size_t nBytes, typename T>
+T* alignup(T* p)
+{
+    union
+    {
+        T* ptr;
+        size_t n;
+    } u;
+    u.ptr = p;
+    u.n = (u.n + (nBytes-1)) & ~(nBytes-1);
+    return u.ptr;
+}
+
 #ifdef __WXMSW__
 #define MSG_NOSIGNAL        0
 #define MSG_DONTWAIT        0