archive-ca.com » CA » E » EVANJONES.CA

Total: 397

Choose link from "Titles, links and description words view":

Or switch to "Titles and links view".
  • Debugging Java Native Memory Leaks (evanjones.ca)
    spent a bunch of time debugging this over the course of multiple months Each time we got paged we put additional workarounds in place to make it more difficult to trigger in the future We also spent more time trying to find the real bug without much success This particular problem was easy to reproduce just send many concurrent requests to a test instance Unfortunately slowing the instance down e g by using strace would make the problem go away Since it appeared that our mitigations were working we moved on to higher priority issues A few weeks ago someone reported a problem that seemed similar to ours Kirandeep Paul suggested using jemalloc s built in allocation profiling that traces what is calling malloc This is relatively low cost since it only samples a fraction of the calls to malloc It is also easy to try since it can be turned by setting the MALLOC CONF environment variable without any code changes This did the trick I set MALLOC CONF prof true lg prof interval 30 lg prof sample 17 which writes a profile to disk after every 1 GB of allocations and records a stack trace every 128 kB This file can be converted to a graph using jeprof part of jemalloc This created the following graph for our service This shows that 94 of the live blocks were allocated by Java java util zip Deflater init and deflatInit2 part of zlib while only 4 are from os malloc and therefore the JVM itself I found it surprising that this program was compressing anything so I mentioned it to Carsten Varming Twitter NYC s local JVM expert He had seen Deflater cause memory problems at Twitter before It uses the native zlib library which allocates a small buffer using

    Original URL path: http://www.evanjones.ca/java-native-leak-bug.html (2016-04-30)
    Open archived version from archive


  • SQLite: A surprisingly good data analysis tool (evanjones.ca)
    workflow is to write a Scalding job to extract the subset of data that I want from terabytes of logs using Hadoop then import a few GB of data into SQLite to be able to quickly and interactively compute aggregate statistics over subsets of the data This is many many times faster than my previous approach which was to write a one off script to filter and aggregate the data in the way I wanted Here are some random tips if you are working with the sqlite3 command line Create your table before calling import since by default SQLite uses the first row as column names Use separator t or whatever to set the correct separator mode column makes the output a bit more readable Use a recent version of sqlite3 3 7 or later seems good I encountered an ancient version 3 3 and it imports data painfully slowly Tuning Execute the following before creating your table or importing data This sets the on disk page size to match the operating system uses mmap to cache the whole DB in RAM and save temporary tables in memory PRAGMA page size 4096 PRAGMA mmap size 4294967296 PRAGMA temp store MEMORY

    Original URL path: http://www.evanjones.ca/sqlite-for-data-analysis.html (2016-04-30)
    Open archived version from archive

  • Frameworks: Necessary for large-scale software (evanjones.ca)
    once and reused everywhere For large scale software engineering you need a framework What is a framework anyway One cause for disagreement is that framework is an imprecise term and people have different definitions My definition is fairly broad A framework is a library that is used by writing code that is called by the framework e g via callbacks or interfaces This is sometimes called Inversion of Control Frequently frameworks also connect related components together in a sensible way but that isn t a requirement This broad definition comes from looking at a list of projects that call themselves frameworks ranging from low level network plumbing through to complex distributed systems Ruby on Rails Ruby Django Python Flask Python Hadoop Java Spring Java Guice Java Dropwizard Java Play Java Scala Netty Java AngularJS Javascript Revel Go Good frameworks hide lots of complexity behind simple flexible abstractions When I look at that list the ones that I think of as good frameworks put a large amount of code and complexity behind a substantially simpler API This allows the developer to learn the surface of the framework fairly much more easily than re implementing the subset that they need For example writing a new Hadoop task is not trivial if you ve never done it before However while it may take a day to learn the basic commands and APIs there is a ton of magic behind it and it can be reused in many different ways The challenge for framework designers is finding the right abstractions The right approach is probably to write a few applications in order to deeply understand the common problems then pull those out in a reusable way Martin Fowler calls this harvesting a framework The best frameworks also tend to be modular allowing users to

    Original URL path: http://www.evanjones.ca/frameworks-necessary-for-large-scale-software.html (2016-04-30)
    Open archived version from archive



  • Dotfiles: Automating my software setup (evanjones.ca)
    notice something broken many days later or when someone asks you how you set up Sublime or you shell to do that cool thing I also just like being able to run git diff and see what changes I made You won t really configure things that many times so don t waste the time I did You ll get 99 of the benefit by just keeping the easy things in a single repository I like moving the real configuration files into a Git repository then replacing the original file with a symlink If you do this manually you ll get most of the benefit Github s dotfiles guide has some more reasonable suggestions for getting started You can browse my dotfiles repository if you are curious However if you do decide to go crazy and save nearly everything in your repository hopefully these notes will save you some time Make it idempotent It is really annoying when running your auto configure command twice in a row screws something up For example to symlink and copy my configuration files I wrote a program that checks if the links already exist and skips them if they do Now if I add a new file I change the configuration at the top of the file and re run it Mac OS X Plists are weird Mac OS X applications usually store their configuration in a plist file in the Library Preferences directory These used to be XML files but became a binary format XCode comes with a GUI browser and you can also use defaults PlistBuddy or plutil to read and write them on the command line see the man pages As of Mavericks 10 9 there is a daemon called cfprefsd that caches these files so just editing the file is

    Original URL path: http://www.evanjones.ca/dotfiles-personal-software-configuration.html (2016-04-30)
    Open archived version from archive

  • Visualizing Vector Clocks (evanjones.ca)
    a bit about graph algorithms Let s start with an example Consider two independent threads T1 and T2 that both perform three operations a b c and x y z On each thread I know the operations are in their program order e g T1 a b c and T2 x y z However I can t say anything about the relative order of the events between the threads e g a and z could happen in any order However if there is some sort of synchronization then I can reason about the order For example consider that T1 s first operation is actually sending a message and T2 s second operation is receiving that message In this case I know that T2 s remaining events must happen after T1 sends the message Here is a diagram which helps make this clear My first attempt to automatically generate these diagrams was not very helpful I compared each event to every other event and outputted an edge if one happens before the other This results in far too many edges This is more cluttered and doesn t add any additional information so I needed to remove these excess edges The idea is to remove the edges that are implied by transitivity In other words if a b and b c then I can infer that a c so I don t need that edge It turns out that this is called transitive reduction and graphviz includes a program called tred to do it However I need this information anyway These events that happen immediately before are the ones I need to examine with my model checker in order to determine if the correct result was produced Calling this transitive reduction makes it sound complicated but if you don t care about

    Original URL path: http://www.evanjones.ca/visualizing-vector-clocks.html (2016-04-30)
    Open archived version from archive

  • Correct incremental builds with Makefiles (evanjones.ca)
    included when they compile a file For GCC and Clang this gets output as a Makefile fragment so we can easily include it in our Makefile We can instruct GCC to output this information by adding the MMD flag and use include to include the output in our Makefile complete example on github exe exe c gcc MMD MF exe d o include exe d The file exe d itself contains the following after compiling exe exe exe c hello h other h This is a Makefile that specifies additional dependencies for the target exe When Make encounters the same target multiple times it merges the dependency lists This means that whenever exe c or any of the header files it includes are updated exe will be rebuilt As a side effect it will also update the dependency list This now only has one tiny flaw if we remove or rename a header file that was included we get the following friendly error make No rule to make target other h needed by exe Stop Automatic dependencies with deleted files The generated dependency list tells make that exe requires other h to be built Since other h does not exist make can t build exe We need to instead tell make that if the header files do not exist try to rebuild the program anyway This allows GCC to check if the missing file is an error e g if I deleted other h but forgot to remove the include or will update the dependency list if it no longer is needed A make rule without any targets commands or dependencies does exactly what we want Make considers this target to be updated causing anything that depends on it to be rebuilt The really good news is that GCC has

    Original URL path: http://www.evanjones.ca/makefile-dependencies.html (2016-04-30)
    Open archived version from archive

  • JavaScript modules in the browser, Node, and Closure compiler (evanjones.ca)
    object The module above could look like complete example var privateAddFive function a return 5 a module exports publicAddSix function b return privateAddFive b 1 Making it work in both To make the module work on both we define namespace objects for the browser We detect Node at runtime by checking if module exports exists using typeof which doesn t throw if the variable is undefined If it exists we replace it with the namespace object we defined To import other modules we declare the variable and call require if it is not already initialized complete example new namespace object defined by this file var mylib import used by this file var dependency dependency require dependency function define properties on mylib use dependency export the namespace object if typeof module undefined module exports module exports mylib There are some important limitations Browser dependencies must be included before they are referenced the order of script tags matters since otherwise the browser will try to call the non existent require function and throw an Error Modules must always be referenced using the same name This is different from Node where the imports can be referenced using any local name One file one namespace In the browser it is easy to define some properties of a namespace in one file and others in a second file It is possible but tricky to make this work in Node Avoid hierarchical namespaces Again it is possible but tricky to use them in Node Automated unit tests This approach can be used to write unit tests that work in the browser and in Node Both Jasmine and Mocha work but I m using Jasmine since it seems to be more widely used To run them in the browser you need to add the appropriate script tags

    Original URL path: http://www.evanjones.ca/js-modules.html (2016-04-30)
    Open archived version from archive

  • Go: It's about engineering, not type systems (evanjones.ca)
    notice because a type check will fail when I try and use the variable Fast compiles In JavaScript and Python it is awesome that you can edit and re run your program instantly In Java and C you fix one compiler error then wait for a few seconds or a few minutes to get the next one Go s compiler is so fast it is nearly as convenient as Python This is not an accident it was an explicit design goal Interfaces are satisfied implicitly Any struct with the same methods as an interface can be used as that type This feels similar to duck typing although you need to manually define the interface I am still undecided if this is a big win but it has occasionally been convenient to use an interface with types from a third party package e g the standard library In Java or C I would have needed to manually write a wrapper class Easy to use third party code Using third party code is annoying in every language You need to decide if the effort required to use the code is less than the time savings you will get from not implementing it yourself This process is absolutely horrible in C where you need to deal with configuration headers compiler flags and include paths Its much easier in Java but you still need to either manually download a massive collection of JAR files or understand Maven or Ivy Go isn t perfect but does reduce the required effort One build tool go build is used for nearly everything and doesn t require configuration because the code has all the required information This eliminates the duplication where you add an import to the source file and also need to add the dependency in some

    Original URL path: http://www.evanjones.ca/go-engineering-not-types.html (2016-04-30)
    Open archived version from archive