Research Blog

I am currently focusing on the storage fabric design of Resource Disaggregation. I am delighted to be a part of the CREU program for the 2017-2018 academic year.

Week 35: April 30 - May 6

Work Accomplished: This week I worked on the final report for the CREU project. I thought about where I started and where I ended and all of the work I put in over the course of the program. I thought about how this experience affected me as a researcer and how I would carry the skills I gained throughout the rest of my academic career.

Outcomes: After reflecting on this past year's work I am happy with the result of the work I put in. This program was a great introduction to the field of research and taught me a lot of lessons. This year made me more confident in my ability to lead a research project from the front and I think this is invaluable for an undergraduate who intends to pursue academic research as a career. This project also taught me that you have to be willing to perservere when the results of your work are not exactly as expected. Research does not always go the way you plan so you have to be flexible and consistent. I believe that my participation in this project will have a lasting affect on me academically.

Week 34: April 23 - April 29

Work Accomplished: This week I started to merge together the different implementations of the data structures I decided would be useful for databases. I thought about the testing suites of the implementations and decided to conform them to the one that tested in an environment most similar to real databases. I also looked at the testing results and tried to begin implementation with the knowledge of the implementations that provide the best results.

Outcomes: By leveraging the positives of these different implementations I will be able to create a testing environment that is indicative of the real-world environment. Testing results are meaningless if they bare little relation to the actual environment. Once this work is complete I will be able to evaluate machine learning models that provide the same properties as these data structures.

Goals: Next week I plan to work on the final report for the CREU program.

Week 33: April 16 - April 22

Work Accomplished: This week I finalized my decision on what I think are the data structures that provide the basic operations needed in databases. These data structures are space-efficient and have reasonable performance. Given these characteristics I believe that these models will be competitive with any machine learning models that serve to provide the same functionality. I also took a look at a couple implementations of these datasets.

Outcomes: The work these past two weeks has given me tools whose performance I believe in. In the future when I need to compare potential machine learning models with their peers that perform the same task I can be confident in the contrasts the performances give me. This will allow me to generalize with confidence the usefulness of any given model. Looking at the different implementations of these data structures will allow me to eventually combine them and have a central place where I can evaluate performance.

Goals: Next week I plan to get started on combining these various implementations together.

Week 32: April 9 - April 15

Work Accomplished: This week I did an extensive survey of the different data structures that are used to perform the basic tasks needed within a database. I performed this work in a similar manner to the exploration I did recently. I thought about the basic guarantees that a database needs to provide and thought about the data structures that were paramount to those tasks. I also made sure to look at different variants that perform the same tasks as to choose the most effective structure.

Outcomes: The ultimate goal of this work is to provide an understanding of the performance of different data structures and relate it to machine learning models that provide the same functionality. By choosing the best data structures I can be sure that my implementation is competitive. This week gave me a wealth of understanding about the preliminary mechanisms considered as well as variants that may be more effective.

Goals: Next week my goal is to finalize my decision on my choices of structures and then begin an implementation that allows me to evaluate performance.

Week 32: April 2 - April 8

Goals: This week is spring break so I will be away from Cornell for the time being.

Week 31: March 26 - April 1

Work Accomplished: This week I went through the potential research directions and made sense of the ideas I have in mind. After looking at the pairings I came up with what I think were the best possibilities. I made sure to think about the unique usefulness that the data structure provides and also how the machine learning model I am thinking about can make it even better. I also considered how this connection would perform on various types of datasets with various types of access patterns.

Outcomes: Now that I have settled on a more concrete problem I have a good understanding of the work that needs to be done in the near future. With this problem in mind I will start out trying to recreate results for similar papers and leverage the understanding I get from this work to improve my research in this area.

Goals: Next week is spring break so I will be away from Cornell for the time being.

Week 30: March 19 - March 25

Work Accomplished: This week I looked deeper into the replacement of traditional data structures using machine learning models and came up with some preliminary ideas for a research direction. I thought about the data structures that are known to be lacking in some areas and thought about the reason behind their poor performance in those situations. I then thought about the strengths and weaknesses of other machine learning models and tried to pair these data structures and machine learning models in order to mitigate any weaknesses.

Outcomes: This has given me a list of potential pairings to choose from which will allow me to choose a more fruitful research direction. It has also given me a view of where data structures and machine learning models thrive holistically.

Goals: Next week I will think more about these pairings and try to pick a couple to focus on for the near future.

Week 29: March 12 - March 18

Work Accomplished: This week I read more literature in the area of machine learning. I took a look at an interesting way to use machine learning in order to improve the functionality that hardware provides. I also finished looking at the data structures I started to study recently. I made sure to understand their basic characteristics and think about the guarantees we would like to provide if we were to replace them with a machine learning model.

Outcomes: This weeks work has given me a great understanding of the possible data structures I would like to replace. I feel confident in deriving some constraints for a possible machine learning algorithm. The innovative machine learning work I studied has also inspired me to pursue this direction. I am excited to explore the possibilities.

Goals: Next week I will focus on the data structure I think would best benefit from a machine learning algorithm and will use recent work to inspire a research direction.

Week 28: March 5 - March 11

Work Accomplished: This week I did a survey of a couple of different data structures relating to indexing. I first took a look at their base implementations and then I compared them to their more optimized versions. While examining these implementations I thought about the performance guarantees each of these data structures provide. I also made sure to think about how a machine learning model might improve upon or worsen the performance. The last thing I did was to determine the characteristics that would need to be kept in a switch to a machine learning model.

Outcomes: As a result of my work this week I have a much better understanding of these various data structures. I took the time to understand the intuition behind them as well as when they perform well. I also have a much better understanding of the work that has been done in the field to optimize them. Now that I have a sense of how people are using them today I am better equipped to determine how a machine learning model could replace or enhance them.

Goals: Next week I will finish up my analysis of these data structures and look at more literature on related machine learning models.

Week 27: February 26 - March 4

Work Accomplished: This week I started to pursue a different perspective; new directions have emerged on using machine learning models to more efficiently use memory. I read a paper discussing this idea as an alternative to traditional data structures such as hash maps. This leads to interesting questions about the true strength of machine learning models and how to leverage their unique predictive abilities. I also thought about the weaknesses of traditional data structures and where machine learning models could improve upon them.

Outcomes: Given my survey of this topic I found I was interested in this research area, I plan to delve deeper into the questions I discussed and to get a more holistic view of machine learning models as a whole.

Goals: Next week I will continue to learn more about traditional data structures and the work that has been done to optimize them. With these optimizations in mind I will then consider the pros and cons of their partial or full replacement by a machine learning model.

Week 26: February 19 - February 25

Work Accomplished: This week I began to work on the design for the scheme I discussed last week. I looked at the Redis code and thought about where it would be efficient to place the per-user information I need to store. I also thought about how I would integrate this information with decisions the server will need to make, in order to decide whether or not to service particular commands. I also thought about the implications this would have on the performance of individual commands and how best to keep performance similar to the original case.

Outcomes: Given this work it is likely I will have to keep this information global and edit it upon each new connection. As a user accesses data I will need to update its respective pair. I also need to decide what to do with cache used up by users no longer in the system, if I choose to relinquish this space then I will need to keep track of what data each user owns which could negatively affect performance.

Goals: Next week I will work to outline the details of this design, and think about its performance implications.

Week 25: February 12 - February 18

Work Accomplished: This week I decided that the only way to sensibly store information about cache usage, by user, is to keep pairs of user id and cache owned. After settling on this scheme I thought about how it might be implemented and the possible consequences of certain implementations. I also thought about how this scheme would perform in the common case of many users.

Outcomes: Having settled on a scheme it became easier to think about the effects of my design choices. Since I would be storing data proportional to the number of users I had to make sure that these pairings would not take up too much space. Also since the number of users who have touched the system at any point before a particular time could be indefinitely large, it made sense to throw out pairings of users no longer in the system.

Goals: Next week I will put more work towards this implementation and make sure to implement it with scalability in mind.

Week 24: February 5 - February 11

Work Accomplished: This week I began to think about possible ways to store information relating to users accesses to the database. I assessed the pros and cons of a couple potential schemes. I made sure to think about the scalability of each solution given that we can expect to see a large number of users at any given time. I also thought about the potential effects on the responsiveness of the server given the different potential update times.

Outcomes: This work helped me understand the different requirements for an effective means of keeping track of transactions with users. First of all, the data stored should not be unbearably large given a large number of users. Second of all updating this data should not have a significant effect on the speed with which the server interacts with its clients.

Goals: Next week I plan to settle on a scheme and begin its implementation. After the implementation is finished I plan on using the load tester previously developed to ensure the model does not harm performance.

Week 23: January 29 - February 4

Work Accomplished: This week I studied the characteristics of the server we are using to cache data in order to benchmark performance of each user. I started by looking at how the server views each connection to the different clients. I then took a look at the mechanism used to allow clients to store data.

Outcomes: Given this work I was able to develop a better understanding of the information that the server maintains on a per client basis. I also learned how the server deals with each request and was able to think about how we might determine whether or not we want to fulfill each request.

Goals: The ultimate goal behind this work is to recreate results of papers that discuss efficient caching by using information such as: the user wanting to store data, or how much space that user already has alloted to them.

Week 22: January 22 - January 28

This week is winter break so I will be away from research for the time being.

Week 21: January 15 - January 21

This week is winter break so I will be away from research for the time being.

Week 20: January 8 - January 14

This week is winter break so I will be away from research for the time being.

Week 19: January 1 - January 7

This week is winter break so I will be away from research for the time being.

Week 18: December 25 - December 31

This week is winter break so I will be away from research for the time being.

Week 17: December 18 - December 24

This week is the beginning of winter break so I will be away from research for the time being.

Week 16: December 11 - December 17

This week is the end of finals week so I will be away from research for the time being.

Week 15: December 4 - December 10

This week is the beginning of finals week so I will be away from research for the time being.

Week 14: November 27 - December 3

Work accomplished: This week I took the time to meet with Alana to discuss the overall results of the work accomplished this semester. We discussed our starting point, motivated the shifts in perspective we came up with during the semester, and discussed our current progress. We took a look at some of our current problems with our algorithm and discussed possible solutions. We also briefly discussed some of the challenges with the implementation of our caching mechanism.

Outcomes: Alana and I were able to formalize the work we accomplished which gives us a better understanding of the approaches we have tried and characteristics we need to consider when determining the optimality of future solutions. Our discussion of the current problems gives us a clear understanding of the specific problem that needs to be solved and will allow a more focused effort to make this algorithm optimal.

Goals: This week brings us to the end of our academic semester, this makes it a good time for us to think about our research strategy. By assessing what worked well we will be able to develop more efficient research methods to employ in the future.

Week 13: November 20 - November 26

Work accomplished: This week was thanksgiving so I spent the week away from Cornell.

Goals: Next week I plan to analyze the results given to me from the load-balance tester in order to determine any shortcomings with the basic version of our caching mechanism.

Week 12: November 13 - November 19

Work accomplished: This week I was able to obtain workload testing results for our caching mechanism. This gives a lot of insight into the health of our strategy. I also took time to understand if our idea gave a maximum efficiency answer. If this is not true then our caching mechanism is not strong enough to be used in an industrial setting.

Outcomes: Because of the results I obtained I was able to identify some of the more minor problem that existed in the connection between client and server. Given these leads I will be able to achieve a better implementation.

Goals: Next week is thanksgiving, I will be able to provide updates the week after thanksgiving.

Week 11: November 6 - November 12

Work accomplished: This week my work primarily surrounded trying to put together workload testing for our basic caching mechanism. The idea was to be able to run intensive workloads remotely to gain a sense of performance in datacenter scale needs. I was not able to completely finish this work this week, but given that the server is already up and running, once I figure out the workload generator the data should come quickly.

Outcomes: I have a better understanding now of what needs to be accomplised before I can benchmark our performance. Once I fix a few issues we will be able to get a lot of useful information about our performance in a lot of different metrics.

Goals: For the following week I plan to finalize the work needed on the workload generator and begin running extensive tests.

Week 10: October 30 - November 5

Work accomplished: This week I was able to run remote testing for the basic version of our caching implementation. Utilizing some of Cornell machines I collected data on the speed with which operations completed and their respective latency. I then met with my PI to dicuss this data and its implications.

Outcomes: Collecting this data and discussing it showed that our implementation was not optimal. The problem was fairly obvious but hard to see without concrete evidence.

Goals: Next week I plan to fix these problems and then update the implementation to better reflect the use cases we envision for our caching mechanism.

Week 9: October 23 - October 29

Work accomplished: This week I began running tests in a local environment to determine if the binding I created worked efficiently. After logging some of the data it seems as though the binding is working as intended.

Outcomes: This data will be used as a baseline for our remote testing in the future. Utilizing it in this way will help us analyze shortcomings of our implementation. Most of the challenges in putting this together came from becoming familiar with the two main databases, as well as combining them into a single idea in the benchmarking code we are using. As a result I've become more familiar with each of these domains.

Goals: For the following week I plan to work on understanding how to set up the remote environment as well as actually testing it. After I get results back it will be easy to compare this data to work that has already been done in this area to see if my implementation has any shortcomings.

Week 8: October 16 - October 22

Work accomplished: This week I finished the binding for a basic version of our caching implementation. After making sure all the correct packages were installed it was simply a matter of connecting existing code for both of the databases we are using.

Outcomes: This affords us the ability to measure performance of our idea on different workloads. As we change our policy I can now reflect those changes in this code and quickly measure its performance. The performance will also give us a good idea of whether or not our implementation is working as intended.

Goals: For the following week I plan to adopt the current code to better reflect the uses we envision for our algorithm.

Week 7: October 9 - October 15

Work accomplished: This week I met with Alana along with our PI and talked about the rules I had developed along with the largest potential problem. We first worked through my current understanding of the parameters of the problem, and then began to highlight a few key examples that truly illustrate the difficult parts of gathering all of the properties we want in the algorithm.

Outcomes: In doing so we thought about the problem in a new way and were able to apply it to another area of reasoning. Doing so affords us a wealth of properties we can expect to hold, and makes it much easier to reason about the problem.

Goals: For the following week I plan to put together a simple implementation for caching that we will later use to benchmark our idea.

Week 6: October 2 - October 8

Work accomplished: This week I developed a concrete set of two rules that will ensure we can achieve the desirable properties of an efficient caching mechanism. After coming up with these two rules I worked to put together a representation of it via an algorithm and listed out potential problems.

Outcomes: The way that we handle files being stored leads to paths of completely different thinking, so it will take some care to decide which one is the most elegant. By doing this I was able to tackle the problem from a differrent angle and gain more insight into its properties.

Goals: For the following week I plan to meet with my PI and Alana to discuss my direction and determine if it is feasible.

Week 5: September 24 - October 1

Work accomplished: This week Alana and I met to consolidate the work that we had been doing individually and to talk about the different pitfalls we found. We were able to put both of our ideas together into one explicit algorithm and we met with our PI Rachit Agarwal to discuss this algorithm.

Outcomes: Trying to explain the algorithm to someone who did not come up with it was a useful step in understanding not only the work we have already done but the premise of the problem itself. With the suggestions he came up with we decided to revisit a case we had not considered as deeply initially.

Goals: For the following week I plan to meet with Alana to think about more specific cases where we can provide the characteristics a caching policy would need to be relevant to real-world contexts.

Week 4: September 17 - September 23

Work accomplished: This week I attend the ACM Richard Tapia Conference in Atlanta, Georgia so I did not have as much time to dedicate to research. Nonetheless I decided to start explicitly writing down the different steps of my algorithm in order to have a better grasp on the effect my policy would have on a system.

Outcomes: While doing this I recognized a lot of the flaws of my original idea and I was able to mitigate the issues I recognized.

Goals: For the following week I plan to meet with Alana to discuss the changes I made to our original idea and determine whether or not this aligns with the premise of our research problem.

Week 3: September 10 - September 16

Work accomplished: This week I met with Alana to discuss the approach we wanted to take to designing our own caching mechanism. We decided to start with some existing policies as a springboard and from there came up with situations in which each policy had downfalls. From there we would modify the policy we were looking at, fixing that specific problem without introducing new ones.

Outcomes: Doing this gave us some insights on the necessary properties that needed to hold as well as an idea of some of the more challenging situations. After discussing this with Professor Agarwal we decided it would be worthwhile to design an algorithm demonstrating our policy and then pick apart that more concrete definition.

Goals: For the following week, I will focus on getting the logic of the algorithm down on paper so that we can reason about the correctness of our algorithm more effectively.

Week 2: September 3 - September 9

Work Accomplished: This week during our group meeting we discussed another alternative to caching. This exercise gave us an opportunity to evaluate current caching policies and see where we could improve upon them.

Outcomes: Given how interesting this space is, I plan to work in this area for the forseeable future. I also met with Alana to discuss the shortcomings of current cache allocation policies and see if we could come up with unique examples that exemplify their weaknesses.

Goals: For the following week, I will work to finalize a more concrete understanding of the work that has already been done in this area in order to provide novel developments.

Week 1: August 27 - September 2

Work Accomplished: This week I met with my PI Rachit Agarwal, and two Ph.D. students Justin Miron and Saksham Agarwal and discussed a caching policy intended to disincentivize cheating.

Outcomes: This work brought out an interesting question: how does resource disaggregation interact with the problem described? Unsurprisingly, it exacerbates it and we will need to rethink optimal caching to efficiently utilize caches in a disaggregated context. I also took a look at Dominant Resource Fairness; this approach aims to apply a more holistic view to fairness across multiple resources. Looking at these two different views may not have provided the answer to our questions but it shows the kind of thinking needed to solve the problem.

Goals: For the following week, I will work to determine if this is a direction I want to pursue. This work would fit well into the overarching effort as all resources will need to figure out some sort of load balancing in the space of resource disaggregation promises.