archive-ca.com » CA » E » EVANJONES.CA

Total: 397

Choose link from "Titles, links and description words view":

Or switch to "Titles and links view".
  • Weak Isolation in Relational Databases (evanjones.ca)
    to the reads holding locks or by aborting the incorrectly ordered transaction Locking used by MySQL when using the SERIALIZABLE isolation level will cause a deadlock to be detected in this example aborting one transaction Snapshot isolation does detect write write conflicts If both updates above were for id 0 the database will detect the conflict and abort one of them MySQL with REPEATABLE READ will block However that may not work for applications that rely on the semantics of values from multiple rows As another example consider a transaction that counts the number of rows in a table then inserts a row with that value With serializable semantics the table will have incrementing values eg 0 1 2 With snapshot isolation it is possible that some values will be duplicated and some will be skipped eg 0 1 1 3 3 5 One way to resolve this problem is to use SELECT FOR UPDATE when needed which acquires write locks on the accessed rows A good explanation of snapshot isolation anomalies can be found in A Read Only Transaction Anomaly Under Snapshot Isolation PDF InnoDB Repeatable Read No Phantoms By default MySQL with InnoDB runs transactions at the REPEATABLE READ isolation level see the InnoDB manual This is supposed to mean that a transaction only reads committed values and once it has read a value future reads will see the same value However the actual implementation seems equivalent to Postgres s default READ COMMITTED isolation level so this discussion applies to both According to the SQL standard this isolation level permits phantom reads However phantom reads will not happen For example executing the following transactions does what you expect Connection 1 Connection 2 BEGIN BEGIN SELECT FROM table00 WHERE id 100 INSERT INTO table00 VALUES 98 0 99 1 SELECT FROM table00 WHERE id 100 COMMIT SELECT FROM table00 WHERE id 100 COMMIT SELECT FROM table00 WHERE id 100 In this case the first connection always sees the original state of the database After both transactions commit the inserted rows are visible This is stronger than required for REPEATABLE READ which permits the third SELECT to return the newly inserted rows Write Skew In the default REPEATABLE READ isolation level InnoDB does not detect read write conflicts This is the same problems as mentioned above for snapshot isolation write skew However MySQL takes a unique approach to resolving write write conflicts Traditional snapshot isolation systems eg Postgres and Oracle will abort the second writer once the first writer commits as described above InnoDB instead performs the update on the most recent version of the row This can lead to suprising results Consider a transaction that implements an increment by first reading the value with a SELECT computing the new value then writing it with UPDATE This works with traditional snapshot isolation since a conflicting UPDATE will be aborted With InnoDB this can miss increments because the UPDATE will happily overwrite the most recent value As before using SELECT FOR

    Original URL path: http://www.evanjones.ca/db-isolation-semantics.html (2016-04-30)
    Open archived version from archive


  • Java String Encoding Performance (evanjones.ca)
    ran my experiments and produced an improved implementation He found that with JDK7 there is very little difference between String getBytes and my faster version However he created a version that uses the Unsafe class to get access to String internals which is still faster String getBytes The standard approach is to call String getBytes to get a temporary byte array then copy it into the destination buffer with System arraycopy This works pretty well particularly if there are only a few strings to encode The implementation uses some private APIs to make this fast Unfortunately there is an extra memory allocation and copy for the temporary byte array CharsetEncoder CharsetEncoder takes a CharBuffer and encodes it into a ByteBuffer A CharBuffer backed by the string can be allocated by calling CharBuffer wrap and a ByteBuffer backed by the destination byte array can be allocated with ByteBuffer wrap This seems perfect it converts the String into an existing buffer Unfortunately it turns out that this is quite slow It seems that accessing the characters of the String via the CharBuffer which in turn uses the CharSequence interface is slow According to my benchmarks this approach is always slower than using String getBytes so you should never use it CharsetEncoder With a char Buffer To avoid the slow access to the individual characters of the String we can copy them in bulk using String getChars into a char array which is wrapped by a CharBuffer Then we can use the CharsetEncoder to encode the characters Amazingly despite the copy from the String into the char array this is faster than String getBytes as long as the encoder and temporary arrays are reused If the encoder is only used once then the overhead of allocating the temporary objects outweighs the advantages Performance

    Original URL path: http://www.evanjones.ca/software/java-string-encoding.html (2016-04-30)
    Open archived version from archive

  • Incomplete Index of Serialization Formats (evanjones.ca)
    this up to date Somewhat Broadly Used XDR eXternal Data Representation An IETF standard Define messages using a language that looks like C structures Compile that to generate serialization deserialization code Used for NFS Python includes xdrlib A GPL C implementation is included in sfslite The GNU C library includes a version but apparently it is very bad so Portable XDR is a re implementation under an LGPL GPL license FreeBSD includes a version somewhere Abstract Syntax Notations One ASN 1 An ITU standard that is used for cryptographic and telecom standards as well as LDAP Apparently has poor open source tool support although asn1c asnparser Erlang s Asn1 and a Java ASN 1 framework exist among others I ve never used this SOAP A complex XML based format Huge wad of specifications standardized by the W3C I ve never used this and recommend avoiding it unless you absolutely must Used by Salesforce com s Web Services API XML RPC A simple XML based format There is some confusion about using Unicode strings Python includes the xmlrpclib module Used by Sun s Storage appliance code named Fishworks among others JavaScript Object Notation JSON Originally used to communicate with Javascript programs but is now used for other applications since it is a simple text based format There are two incompatible RPC specifications JSON RPC 1 0 and 2 0 Google Protocol Buffers Uses variable length encoding for space efficiency Supports optional fields to permit upgrades Messages are defined in a proto file which is used to generate parsers Provides an RPC interface but no implementation Facebook Apache Thrift Very similar design to Google protocol buffers Provides an RPC implementation Probably more widely used Not Widely Used Hadoop Record Used to implement Hadoop A compiler generates parsers Apache Avro A Hadoop subproject

    Original URL path: http://www.evanjones.ca/software/serialization-index.html (2016-04-30)
    Open archived version from archive



  • Building Reliable Storage on Virtual Infrastructure (evanjones.ca)
    around this limitation many servers use a RAID array so that the data survives at least one hard drive failure If the server itself fails the disks must be attached to a different system in order to recover the data If multiple disks fail than the data can still be lost Therefore critical data must still be backed up Running on Virtual Hardware When deploying a service on virtual infrastructure the same kinds of problems can occur The difference is that because you no longer have access to the physical hardware you can t move hard drives from a failed server to a working backup Rather if the server dies your data is gone This is not really a new type of failure and services should already be designed to handle it However this type of failure might be more common on virtual hardware since the data center operator may need to bring a physical host server down for many reasons such as software upgrades Note that a failure does not mean a reboot In case the physical host reboots the virtual machine will still see the data on its hard drive Amazon s EC2 documentation explains this as follows If an instance reboots intentionally or unintentionally the data on the instance store will survive If the underlying drive fails or the instance is terminated the data will be lost Replication on Virtual Infrastructure It is still possible to reliably store data on virtual infrastructure The answer is replication The data must be stored multiple times on systems that fail independently On Amazon s EC2 you can use their Elastic Block Storage service to get RAID like reliability guarantees with a disk interface An alternative approach is to replicate the entire service across multiple virtual machines In this case if

    Original URL path: http://www.evanjones.ca/virtual-infrastructure-replication.html (2016-04-30)
    Open archived version from archive

  • Open Cirrus: Open Source Cloud Infrastructure (evanjones.ca)
    Chein the VP of research at Intel titled Seizing the Open Source Cloud Stack Opportunity The presentation seems to argue that we really need some common consensus on the software infrastructure a unified stack The presentation points out that there are a huge number of projects all duplicating the same functionality and we really need only one or two pieces of software for each piece of the stack monitoring job

    Original URL path: http://www.evanjones.ca/opencirrus.html (2016-04-30)
    Open archived version from archive

  • State of Toronto Espresso (evanjones.ca)
    bad acceptable good scale their espresso was only acceptable Mind you I am picky so acceptable is better than 90 of the establishments out there but this does not compare to New York or Vancouver where I have had great espresso from multiple cafés Maybe this is why Espresso Map which I find very reliable only lists one location in Toronto which I did not get the opportunity to try

    Original URL path: http://www.evanjones.ca/toronto-espresso.html (2016-04-30)
    Open archived version from archive

  • Deploying Distributed Applications (evanjones.ca)
    ideal system does need some sort of isolation so that what one application is doing does not effect other applications This isolation can be efficiently provided by container solutions such as Solaris Zones lxc OpenVZ or vservers I think it is more appropriate to virtualize and isolate applications at the system call interface which is effectively what these solutions do Each application is provided a private view of the operating system s resources without needing its own copy of the operating system and its configuration Being able to selectively share or not share resources is also useful For example for the kind of applications I develop I don t need my applications to each have their own unique IP address as long as I can assign them unique ports There are two projects I know of that manage the allocation of containers PlanetLab is widely used research infrastructure built on Linux vservers Unfortunately it is designed so users can create containers on specific machines of their choosing and when a container is allocated it is looks like a bare bones Linux system Thus it basically looks like whole operating system virtualization and is also too low level However there are third party PlanetLab tools that seem similar to what I want For example the PlanetLab Application Manager is close to what I want but it doesn t help find and allocate resources Sun s Project Caroline looks to be very similar to what I want Unfortunately it appears to require Solaris the machines must have access to a shared ZFS storage pool and it is designed to deploy Java applications This means I don t have the hardware to use it and it won t run my applications which are written in C or Python However none of these are

    Original URL path: http://www.evanjones.ca/running-distributed-apps.html (2016-04-30)
    Open archived version from archive

  • The Datacenter as a Computer (evanjones.ca)
    big Internet companies It discusses everything from facilities issues through the computing hardware through to the software infrastructure This is an excellent design guide about how everyone should be designing data centers of all sizes not just huge facilities Don t be intimidated by its length it is very easy to read Just browse the table of contents and pick and choose the sections that interest you I particularly enjoyed

    Original URL path: http://www.evanjones.ca/researchpapers/datacenter-computer.html (2016-04-30)
    Open archived version from archive